Site Reliability Engineer at mParticle


Title: Site Reliability Engineer

Location: Remote

  • At mParticle, we are passionate about building software that empowers our customers to make the most of their data.
  • We count on our operations team and site reliability engineers (SREs) to keep our platform at peak performance and high availability, processing over 1 trillion events a month in near real-time, with no interruptions.
  • We are growing and expanding our customer deployments, and we are currently seeking an experienced SRE to join our operations team – someone who can bring fresh ideas, demonstrate a unique and informed viewpoint, who enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction.
  • As a Site Reliability Engineer, you will be part developer, part operations, all continuous integration and delivery expert; you will be integral to the design, set up, automation, and maintenance of our entire integration and delivery pipeline.
  • The ideal candidate should have a deep software development background married with effective intercommunication skills to promote collaboration with developers, support engineers, customers, and senior management.
  • They will work closely with development squads, our client-facing teams, and customers, as well as other engineers and developers gathering requirements, architecting, and constantly delivering quality improvements to our platform.

As an mParticle SRE, you will…

  • Be part of PagerDuty rotation responding to platform incidents and provide support for other engineers who are responding to customer issues
  • Use your daily interactions with the platform and your experience and skills to constantly improve our environment and ensure that issues do not reoccur
  • Maintain and augment our monitoring systems so that they alert on symptoms, instead of issues
  • Be proactive and take ownership in identifying, raising, and resolving issues or deficiencies you see anywhere in our environment
  • Produce and improve internal documentation and SOPs where they are missing or lacking quality or details
  • Live-debug applications and issues, and identify, resolve or own resolution for functionality and performance deficiencies
  • Identify, and suggest or resolve performance issues with production applications and their configuration
  • Automate yourself out of a job
  • Contribute to our scale goals

You will be perfect for this role, if you…

  • Have a bachelor’s degree in computer science or other highly technical, scientific discipline
  • Are able to program (structured and OO) with one or more high level languages, preferably Python and either C#, Java, or Go
  • Comfortably own the Linux shell
  • Have a proactive approach to spotting problems, areas for improvement, and performance bottlenecks
  • Have coding experience beyond simple scripts
  • Are experienced in debugging and performance tuning applications
  • Have an eye for edge cases, behaviors, creative solutions
  • Are experienced with configuration management
  • Have an unstoppable urge to fix what is broken
  • Efficiently balance speed/iteration and quality
  • Are experienced with Terraform and Ansible

As an SRE, we expect you to…

  • Fluently follow existing best practices for maintaining supported application and platform health and writing and testing code
  • Make impactful decisions about your technical contributions
  • Understand how our production systems work
  • Handle vague scope or identify improvements in small areas
  • Manage your work with little-to-no supervision
  • Actively collaborate with others through technical documentation
  • Able to troubleshoot and contribute to resolution of moderate to complex production problems, write post-mortems on them
  • Write SOPs for issues encountered and common tasks
  • Able to automate repetitive tasks using purpose-written code or commercially available tool
  • Detect inefficient common operational patterns and processes
  • Design and implement monitoring solutions for common or critical problems

As a technical resource and expert, you should be able to…

  • Handle medium complexity issues’ troubleshooting and resolution
  • Be a core resource in troubleshooting and resolving complex issues; have a deep understanding of the mParticle pipeline and be able to assist in troubleshooting medium to complex platform issues
  • Write quality, clean, and maintainable code, following company best practices with minimal guidance
  • Develop sufficient domain understanding to sanity check and ensure the quality of their output, as well as review that of other team members
  • Write custom code of medium to high complexity in at least 2 languages
  • Be the responsible/SME engineer for 2 or more internally-maintained supporting infrastructure components
  • Proactively research and keep up to date on the patterns, advancements, and evolutions of tools and technologies used in the mParticle pipeline
  • Identify problematic patterns in the mParticle applications, processes and tools and suggest and implement resolution options
  • Make small design decisions independently, making appropriate tradeoffs between simplicity and performance
  • Follow existing patterns to create new instances of projects, features, or architecture
  • Create novel architectures of small components within your area of expertise This includes diagramming the architecture and assessing trade-offs made and patterns applied, assessing the effort for the change and approximate timeline
  • Understand the flow control of nearly any system including those outside of your area of expertise, though unable to necessarily suggest improvements to systems outside of your area
  • Properly sense when to engage Security for a review of a potential change
  • Understand techniques used to troubleshoot and fix production bugs and issues
  • Develop solutions/code that reduces future operational burden (e.g. by adding appropriate self-healing, high levels of alerting/monitoring/logging, reducing alert noise, etc.)
  • Ensure that infrastructure resources are not wasted by consistently following provided best practices and rightsizing instances
  • Contribute to the build and release tooling and infrastructure

You should also be able to…

  • Be successful when working on a large feature or improvement of vague scope
  • Identify and push forward new features or enhancements that improve the functioning of a system or feature
  • Identify problems and contribute well-scoped solutions to the team’s roadmap.
  • Focus your work on what is most valuable for the team
  • Make and communicate accurate time estimates for own work, potentially spanning multiple sprints
  • Manage projects that span multiple groups of stakeholders
  • Act as an effective facilitator for team meetings
  • Consistently communicate technical decisions through high-quality design docs, tech talks, and wiki contributions
  • Create documentation and trains others, including team onboarding materials

Lastly, as part of mParticle and our Engineering organization, you should…

  • Participate, own, and improve mParticle technical recruiting, onboarding and branding
  • Act as a brand ambassador for mParticle Engineering
  • Drive the cultural direction of mParticle operations
  • Encourage people to be the best they can

See all Developer Jobs >

Sign up for Daily Remote Job Alerts!

Want Access to 25,000+ More Remote and Flexible Jobs?

More Jobs

More Jobs

Part-time to full-time,
freelance to employee

More Career Fields

More Career Fields

50+ flexible
job categories

More Resources

More Resources

Q&A's, webinars,
career coaching & more

Learn More About Our Premium Service