Senior Site Reliability Engineer at Mozilla
Senior Site Reliability Engineer
Location: Remote US, Remote Canada, Portland
Mozilla is a category of one: a global technology not-for-profit super-powered by a worldwide community of volunteers, with a mission to keep the internet a healthy public resource for all. By building great products, creating innovative technologies, and engaging people to take action, we create an outsized impact in the world. We always place people ahead of profit.
Mozilla wants you to help fight for an Internet that’s open and accessible to everyone. We fulfill that mission as both a corporation and a non-profit organization, blending technology with advocacy, policy and education.
Site Reliability Engineering treats operations as a software problem. In SRE, we flip between the fine-grained detail of application debugging to the big picture of capacity across a range of systems with a user population measured in hundreds of millions. We are responsible for our products in production. We drive reliability and performance by mastering the full depth of the stack. We see no reason for system downtime. You will have the opportunity to take on complex problems of scale while using your expertise in coding, algorithms, complexity analysis and large-scale system design. And your career will take big steps forward working with some of the best developers in the industry.
Your independence, curiosity, and willingness to try new things will be an asset, not a liability.
Sound exciting? Send us a brief cover letter and resume highlighting how you fit the following:
- Design, develop, document and deliver software to improve the availability, scalability, latency and efficiency of Mozilla’s services and infrastructure.
- Solve problems relating to critical services and build automation to prevent problem recurrence with the goal of automating response to all non-exceptional service conditions.
- Engage in service capacity analysis, demand forecasting, software performance analysis and system tuning.
- Provide occasional after hours and weekend support as part of an on-call rotation for critical Mozilla services.
- BS degree in Computer Science or related technical field or 5 years prior relevant experience.
- Experience with Python, preferably in a web and/or infrastructure automation setting
- Experience in designing, analyzing and running large-scale distributed systems
- Experience hosting and solving problems with public-facing services securely in AWS or GCP
- Experience designing and delivering deployment automation
- Experience automating infrastructure with tools such as Terraform, Ansible, Chef, Puppet
- 1+ years working remotely with distributed teams
- Experience with Kubernetes or an eagerness to learn Kubernetes as a platform
- Familiarity with Linux container engines like Docker
- Systematic problem solving approach, coupled with a strong sense of ownership and drive. You’re willing to dive into a problem from any level including application code, database, networking, and content caching to identify performance or availability issues.
Mozilla exists to build the Internet as a public resource accessible to all because we believe that open and free is better than closed and controlled. When you work at Mozilla, you give yourself a chance to make a difference in the lives of Web users everywhere. And you give us a chance to make a difference in your life every single day. Join us to work on the Web as the platform and help create more opportunity and innovation for everyone online.
We are an equal opportunity employer and value diversity. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
Sign up for Daily Remote Job Alerts!