Site Reliability Engineer – Cloud Services at Aspect
Site Reliability Engineer – Cloud Services
Location: Remote, US
- Aspect employs a team of passionate individuals who are changing the face of customer engagement.
- Over our 40-year history we have empowered employees by creating an inspired community that values customer obsession, unlocked communications and relentless innovation.
- Our ability to think big has enabled us to continually evolve and lead the market, and to stay on the forefront with exciting technologies including cloud, mobile and artificial intelligence.
GENERAL SCOPE & SUMMARY
- The Site Reliability Engineer is responsible for automation and infrastructure buildout deploying Kubernetes clusters and migrating the workload from traditional VM’s into containers in a PCI compliant environment.
- This involves working to get applications into a CI/CD platform, ensuring uptime, and identifying service level objectives.
- This position will work closely with the R&D team to build and run large-scale, massively distributed and fault-tolerant systems through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
PRIMARY ROLE & RESPONSIBILITIES
- Support the implementation and design of containerized systems utilizing CI/CD and Kubernetes
- Deploying and administering Kubernetes clusters in multi-cloud environments
- Configure Linux systems and software packages
- Collaborate with the team on procedural knowledge of what is to be automated
- Automate operational tasks and assist in the transition to service ownership models
- Collaborate across project teams to simplify and improve software lifecycle processes
- Manage and maintain infrastructure as code
- Provision and configure cloud assets using scripts, API’s, CLI’s and management consoles
- Conduct Production Readiness Reviews (PRRs) to determine reliability of systems before release
- Participate in postmortem reviews
- Recent practical experience and background working with 3 years of Kubernetes experience
- Expertise in designing, analyzing and troubleshooting large-scale distributed systems
- Ability to debug, optimize scripts, and automate routine tasks.
- Ideally have a systematic problem-solving approach coupled with strong communication skills and a sense of ownership and drive.
- Cloud experience: AWS, Azure and other cloud platforms
- Experience with: Cassandra, Argo CD, Chart Museum, ChatOps
- Experience with Terraform, Ansible, Vault and Jenkins administration
- Experience with Linux operating systems and networking engineering
- Experience with monitoring systems such as DataDog and Prometheus
- Experience with Continuous Delivery
- Ability to define Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
- Bachelor’s degree or equivalent level of experience
Aspect is an equal opportunity/affirmative action employer with a strong commitment to diversity. In that spirit, we are particularly interested in receiving applications from a broad spectrum of people, including women, minorities, individuals with disabilities, veterans or any other legally protected group.
Sign up for Daily Remote Job Alerts!