Manager of Data Reliability Engineering at Everbridge
Manager of Data Reliability Engineering
SaaS Operations – SaaS Operations
About the role:
Are you motivated by an incredible sense of purpose in doing work that helps keep people safe and business running daily, with results that regularly make headlines? Are you passionate about innovating on the industry’s cutting edge to develop solid architecture principles, operability guidelines, progressive scaling methodologies, and other sophisticated techniques to reliably operate critical technology infrastructure at scale? Do you have an insatiable appetite for streamlining out inefficiency, automating away toil, and proactively eliminating problems before they occur in the first place? If so, this position is a perfect opportunity for you to join the Everbridge Database Reliability Engineering team in hands-on Manager role driving the architecture, design, implementation, strategy and operability of our global platforms.
About the team:
As hands-on Manager of the Everbridge Data Reliability Engineering team, you are responsible for ensuring overall service quality and availability of Everbridge’s data solutions. The technology platforms that we support automate the international delivery of critical information to help keep people safe and businesses running. We are a 24x7x365 distributed team that can do our job anytime, anywhere on the planet with an Internet connection. Our holistic understanding of various technologies allows us to effectively maintain a heterogeneous blend of worldwide public and private cloud services where lives and livelihoods are at stake in the event of failures.
What you’ll do:
- Build and grow our data reliability engineering team, helping them achieve their roadmap objectives and inspiring them to achieve their full potential.
- Direct daily operations of the team, carrying out priorities, monitoring and analyzing operational effectiveness metrics and ensuring problems and identified and solved quickly and efficiently.
- Own operational availability, security, performance, scalability, monitoring, instrumentation, integrity, and overall service reliability of Everbridge’s data tier.
- Collaborate across Agile teams with Architects, Developers, Quality, Security, and other Operations engineers on designing and implementing highly reliable data solutions.
- Embrace Database Reliability Engineering principles of automation, proactivity, cross-functional collaboration, objective decision making, and fast+safe failing to continually improve our technology and culture.
- Enhance our infrastructure, tooling, and processes to extend operability as a self-service function for other groups in the engineering value stream.
- Establish SLA with business partners and support infrastructures and data pipeline systems to operate within SLA.
- Participate in a rotating on-call schedule to troubleshoot and resolve production escalations from our 24x7x365 NOC.
- Have fun while we work hard to make a difference.
What you’ll bring:
- 5+ years’ hands-on experience in database and site reliability, database administration, DevOps or SaaS technical operations
- 3+ years’ experience leading data engineering organizations
- 3+ years’ experience with relational and NoSQL database administration (PostgreSQL, MySQL, MongoDB preferred)
- 3+ years’ experience with automation and orchestration tools (SALT and Terraform preferred)
- 2+ years’ experience with writing code in at least one programming language (e.g. Python, Perl, Java, Ruby, Go)
- 2+ years’ experience working with data streaming and indexing (Kafka, Elastic preferred)
- 1+ years’ experience using Git for practical configuration data and code management
- Data modeling, schema design and review
- Data integrity validation, error recovery, backup and restoration
- US Citizenship or Green Card and ability to pass a Federal drug screening
Familiarity in any of the following technology areas is a plus:
- Automation framework orchestration, configuration management, and software-defined infrastructure management techniques (SaltStack preferred, others e.g. Puppet, Chef, Ansible, etc. also acceptable)
- Infrastructure/application monitoring and alerting solutions (Datadog, Elastic BELK/X-Pack, Prometheus, Nagios, Cacti, Graphite/Grafana, InfluxDB, OpenTSDB, Splunk, Graylog, etc.)
At Everbridge, we have a mission that matters – to keep people safe and businesses running during critical events. Our “Bridgers” join Everbridge to make a positive impact on the world through their work. The core of our company culture is built around making a difference. Our people are dedicated to solving problems during difficult times and challenging situations as our software was built to save lives.
We are a rapidly growing organization transforming the field of critical event management and need passionate, committed and determined individuals to help us carry out our mission. Our environment is dynamic, and our culture is constantly evolving and expanding in order to provide the best employee experience.
Everbridge is an Equal Opportunity/Affirmative Action Employer. All qualified Applicants will receive consideration for employment without regard to race, creed, color, religion, or sex including sexual orientation and gender identity, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.
Sign up for Daily Remote Job Alerts!