Introduction to the team
We create and deliver an aligned, dedicated marketing strategy to fuel each Expedia Group brand's success. Since our travelers interact with us through our brands, we have a brand focus in our marketing, while leveraging the scale and efficiency weve built in functional expertise.
At Expedia Group, our Growth Marketing team is redefining how data-driven marketing meets AI-powered, agentic automation. We drive performance through best-in-class, scalable machine learning, deploying production-ready multimodal LLMs, GenAI, and agentic NLP architectures. Our agentic systems power everything from personalized travel discovery to real-time, cross-platform campaign execution at massive scale. Explore Trip Matching on Instagram for a glimpse into our production chatbot transforming inspiring reels and posts into hotel recommendation.
As an ML Scientist III, your mission is to engineer, and scale production AI workflows, agents, and multimodal LLM pipelines, leveraging both in-house and third-party systems. Youll develop agentic solutions with advanced orchestration tools like LangGraph, Langfuse, RAG, and the latest frameworks, owning end-to-end delivery and optimizations for inference and GenAI systems.
In this role, you will :
- Develop and deploy complex agentic and multimodal AI workflows at production scale, architecting memory-enabled, context-aware agents that drive dynamic automation, personalization, real-time monitoring and content generation.
- Design and orchestrate advanced LLM and RAG-driven agent solutions using modular frameworks including LangGraph, Langfuse, CrewAI, AutoGen, and other emerging agentic orchestration toolsenabling adaptive, stateful, and highly interactive workflow graphs.
- Integrate and optimize complete multimodal GenAI pipelines : text-to-image, image-to-video, text-to-video, and text-to-voice, leveraging and extending in-house and third party models.
- Fine-tune (parameter-efficient tuning, LoRA, QLoRA, etc.), and evaluate (perplexity, accuracy, creativity metrics) both open-source (Llama, MPT, Phi-4, etc.) and commercial LLMs (OpenAI GPT-4, Gemini, Claude).
- Pioneer LLM-based simulation & evaluation environments (e.g., LLM-as-a-Judge) : Leverage the latest LLMs, multi-agent systems, and generative tools to simulate user behavior, accelerate experimental cycles, and rigorously evaluate new features or workflows in both synthetic and prod online shopping and travel environments.
- Simulate A / B testing via LLM simulations : Build A / B test agents capable of simulating user engagement, behavioral outcomes, and marketing scenarios to complement and replace large-scale online testingenabling rapid iteration, robust pre-deployment validation, and deeper insight into agentic / LLM-driven feature impact.
- Continuously adopt and iterate on the latest frameworks, staying current with state-of-the-art in agentic orchestration, memory models, multimodal GenAI, LLM pipelines, and LLM operation (RAG, memory, tracing, evaluation).
- Collaborate deeply with engineering, marketing, and product teams to translate evolving business objectives into scalable and observable AI-driven solutions.
Experience and qualifications :
Minimum 7+ years experience delivering robust, scalable AI / GenAI solutions at enterprise scale with a Bachelors degree (4-5+ year of relevant experience with a Masters / PhD in CS, EE, Stats)Agentic frameworks experience working with platforms such as LangGraph, Langfuse, RAG, CrewAI, AutoGen, and comparable agent workflow frameworksAbility to design complex agent graphs with memory, tool calling, and multi-step reasoningMultimodal and agentic AI experience in both commercial and open-source models (VLMs, CLIP, Phi-4, DINO, etc), advanced LLM pipeline design, and shipping both inference and generative AI features to usersHands-on experience in fundamental neural network models (CNN, Transformers, BERT, VAEs, Multimodals etc)Proven expertise in architecting persistent, hybrid, and context-managed memory for agentic workflowsStrong in Python (PyTorch, HuggingFace, etc.), MLOps, distributed systems (PySpark, Databricks, Airflow), and scalable API engineeringDeep experience with tracing, logging, and observability stacks (Langfuse, PromptLayer, Weights & Biases, etc.), including prompt monitoring and model evaluation in live productionAbility to translate complex technical systems into actionable solutions for cross-functional teams, leveraging Multimodal LLMs and Gen AI models