- Home
- Remote Jobs
- Senior Site Reliability Engineer
Date Posted
Today
New!Remote Work Level
Hybrid Remote
Location
Hybrid Remote in Tempe, AZ
Job Schedule
Full-Time
Salary
We're sorry, the employer did not include salary information for this job.
Categories
IT, System Administrator, Tech Support, Product Manager, Project Manager, Software Engineer
About the Role
Title: Senior Site Reliability Engineer
Location: Tempe United States
Job Description:
It's fun to work in a company where people truly believe in what they are doing. At Dutch Bros Coffee, we are more than just a coffee company. We are a fun-loving, mind-blowing company that makes a difference one cup at a time.
Position Overview:
As a Senior Site Reliability Engineer you are a technical leader who will combine software engineering principles with systems operations chops to design, build, and maintain infrastructure automations and develop tools that improve systems reliability, handle incidents, and reduce manual operational interventions across a diverse multi-cloud enterprise. You will blend your software and systems engineering skills and focus on proactive problem-solving, automations, and continuous improvement by defining and then achieving measurable goals. You will collaborate closely with production teams, product managers, IT Ops, DevOps, and fellow developers to support the delivery of large-scale solutions that maintain high uptimes, and deliver excellent user experiences. You will participate in the decision-making related to improving reliability programs across the enterprise. Additionally, you will provide technical guidance and mentorship to junior team members, contributing to the overall growth and success of the platform engineering team.
Job Qualifications:
-
Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent work experience), required
-
6 or more years of relevant experience; with a strong software engineering background (Python, Java, Golang) and having systems administration expertise
-
Deep knowledge of operation systems (especially Linux), networking, and system administration tools
-
Experience with cloud platforms (AWS, Azure) and cloud-native technologies
-
Proficiency with automation and DevOps tools (Terraform, Ansible, Jenkins, etc.)
-
Ability to design and implement robust monitoring systems and analyze system metrics
-
Excellent problem-solving and analytical skills with proven experience in identifying root causes and developing effective solutions
-
Familiarity with microservices architecture and container orchestration
-
Experience collecting and analyzing data in order to helping teams draw conclusions and gain insights from their systems
-
Excellent communication skills with the ability to collaborate effectively between engineers, developers and business stakeholders
Location Requirement:
This role is located in Tempe, Arizona. This position is required to be in office 4 days per week (Mon-Thurs); Fridays are optional remote work days.
Key Result Areas (KRAs):
Develop software and implement processes and tools that provide continuous improvements in systems reliability and availability:
-
Ensure systems are consistently accessible to users while minimizing downtime
-
Optimize systems throughput by continually and effectively analyzing and addressing latencies, traffic patterns, errors, and saturation metrics
-
Anticipate future demand and ensure infrastructure can scale to peak loads effectively
-
Use automation tools and processes to streamline repetitive tasks in SDLCs and increase operational efficiencies
-
Quantify acceptable levels of downtime and errors in systems in support of their service level objectives
-
Align and help drive execution of the Platform Development team's strategies
Lead the implementation of observability, availability, and monitoring tools and practices:
-
Implement monitoring systems for tracking key metrics
-
Configure real-time alerting, system performance analytics, and issue detection
-
Support incident response efforts, providing metrics and insights, diagnostics, and contribute to the remediations and restoration of services
-
Conduct thorough post-incident reviews to identify root causes, learn from failures, and implement preventative measures
-
Other duties as assigned
Skills:
-
Advanced Experience with Observability Platforms and Practices
-
Advanced Experience with Automated Testing Tools
-
Strong Proficiency in Programming Languages
-
Strong Systems Administration Skills
-
Strong Analytical and Problem-Solving Skills
-
Proficiency With DevOps and Automation Tools and Practices
-
Performance Optimization
-
Leadership and Mentoring
Physical Requirements:
-
In-Office Environment: Must be able to work in a busy, crowded, and loud office with frequent distractions and interruptions
-
Must be able to collaborate in-person with occasional impromptu in-person meetings
-
Office Conditions: Adaptability to typical office conditions, which may include exposure to air conditioning, heating, artificial lighting, and varying noise levels
-
Mobility: Ability to sit, stand, reach, twist, stretch, and work at a desk for long stretches. Must be able to occasionally move or lift office items up to 25 pounds
-
Hearing Requirements: Hearing must be sufficient or correctable to ensure clear understanding of spoken information, including participating in virtual meetings and phone calls. Use of hearing aids or other assistive devices is acceptable if needed.
-
Reading and Writing Proficiency: Ability to read and write in English is essential for processing documents, drafting reports, and following up on necessary actions. Proficiency in written communication is required to handle job-related tasks effectively.
-
Vision Requirements: Vision must be adequate or correctable to perform essential job duties, such as reading documents on a computer screen and using other visual tools. Use of corrective lenses or other measures to meet visual requirements is expected if needed.
-
Technology Proficiency: Must be proficient in operating a computer and other office productivity tools such as printers, scanners, and collaboration software.
-
Effective Communication: Must possess strong verbal and written communication skills to interact effectively with team members, clients, and other stakeholders via email, video conferencing, and other in office communication tools.
Compensation:
DOE
If you like wild growth and working in a unique and fun environment, surrounded by positive community, you'll enjoy your career with us!