Home
Remote Jobs
Staff Machine Learning Engineer

Staff Machine Learning Engineer

Zendesk

Apply

Date Posted:
8/20/2025
Remote Work Level:
Hybrid Remote
Location:
Hybrid Remote in Lisbon, Portugal
Job Type:
Employee
Job Schedule:
Full-Time
Career Level:
Experienced
Travel Required:
No specification
Education Level:
We're sorry, the employer did not include education information for this job.
Salary:
We're sorry, the employer did not include salary information for this job.
Categories:
Economics, Math, Data Science, Software Engineer, Python, Ruby on Rails

About the Role

Staff Machine Learning Engineer

remote type

In Office

locations

Lisbon, Portugal

time type

Full time

job requisition id

R31623

Job Description

Zendesk’s people have one goal in mind: to make Customer Experience better. Our products help more than 125,000 global brands (AirBnb, Uber, JetBrains, Slack, among others) make their billions of customers happy, every day.

Our team is responsible for helping Customer Experience teams to achieve their best, by intelligently solving repetitive work, so they can shift their focus to solving more sophisticated problems. We use the latest trends in Machine Learning and AI algorithms to help us on that mission, and we're passionate about empowering our customers.

As a Staff ML Engineer, you will be a technical leader who shapes the vision and execution of ML/AI products at scale. You'll drive architecture and design decisions, influence the broader engineering strategy, lead complex projects that span teams, and mentor the next generation of technical talent. Your work will impact millions of users and set standards for ML engineering across Zendesk.

What you get to do every day

Architect, design, and deliver ML-powered systems and features (e.g., intent detection, sentiment/language analysis, intelligent agent routing, chatbots) at a global scale with reliability, efficiency, and maintainability as core principles.
Design, build, and optimize scalable, reliable ML pipelines for processing large volumes of structured and unstructured text data (including real-time customer conversations).
Collaborate with ML Scientists and Product teams to productionize new models, LLM-powered services, and experiment with emerging AI technologies in the context of intelligent triage.
Lead large-scale, high-impact initiatives: define technical roadmaps, validate trade-offs, deliver on timelines, and ensure excellence across the entire ML lifecycle.
Develop and evolve MLOps processes (CI/CD, model versioning, monitoring, and observability), ensuring efficient model deployment and high system reliability.
Mentor and support junior engineers; share knowledge of model development, deployment, and best practices.
Drive technical reviews, provide guidance on complex system design, and rapidly resolve critical production issues involving ML models, pipelines, and infrastructure.
Act as a mentor, coach, and multiplier—elevate technical expertise within and beyond your immediate team by sharing knowledge, leading by example, and championing continuous learning and growth.
Represent Zendesk in technical forums and contribute to the broader ML engineering community.

Key challenges / use cases

How do we enrich customer service conversations with accurate language detection, intent recognition, and real-time sentiment analysis, to enable proactive customer engagement and optimal routing?
How can we automate all customer service interactions as much as possible, from process automation to agent assistance and chatbots with a knowledge base?
How do we optimize routing at scale—matching tickets or chats to the most appropriate agent/team in real-time across multiple languages and regions?
How do we automate large-scale A/B testing and model evaluation (online and offline) to continually iterate and improve ML-driven triage and agent-assist tools?
How do we extend our retrieval and information extraction platforms to support new conversational AI use cases?
How do we efficiently serve and monitor large ML/LLM models in a high-throughput, low-latency production environment?
How do we combine signals from conversation context, customer history, and external data to improve prediction and decision accuracy across our ML services?
How do we ensure fairness, explainability, and compliance in ML-driven customer interactions?
And many more!

What you bring to the role

Proven track record as a solid software engineer with a focus on Python-based software development.
Advanced proficiency with scalable data processing frameworks (e.g., Spark, AWS Batch, Airflow), distributed databases, and designing reliable data models for heterogeneous datasets.
Demonstrated technical leadership: ability to define system vision, steer architecture, make trade-off decisions, and coordinate complex projects across teams.
Experience with MLOps: CI/CD for ML, monitoring, model registries, automated retraining, and rollback.
Familiarity with cloud environments (AWS preferred, but GCP/Azure experience valued), and microservices architectures (Kubernetes, Docker).
Track record integrating modern NLP/LLM stacks and libraries (HuggingFace, OpenAI, etc.) into large-scale, customer-facing products.
A self-managed and dedicated approach with the ability to work independently.
Exceptional problem-solving skills, technical judgment, and the ability to drive innovation in ambiguous, fast-evolving business contexts.
Strong mentorship, communication, and cross-team collaboration skills; ability to uplevel peers and shape culture standards within Engineering.
Commitment to staying ahead in advances in ML/AI, sharing knowledge, and driving state-of-the-art solutions organization-wide.
Ability to mentor, review code, and drive technical excellence within a multi-disciplinary team.

What our tech stack looks like

Our code is written in Python and Ruby.
Our servers live in AWS.
Our machine learning models rely on PyTorch.
Our ML pipelines use AWS Batch and MetaFlow.
Our data is stored in S3, RDS MySQL, Redis, ElasticSearch, Snowflake and Aurora.
Our services are deployed to Kubernetes using Docker, and use Kafka for stream-processing.

Hybrid: In this role, our hybrid experience is designed at the team level to give you a rich onsite experience packed with connection, collaboration, learning, and celebration - while also giving you flexibility to work remotely for part of the week. This role must attend our local office for part of the week. The specific in-office schedule is to be determined by the hiring manager.

Apply