Talent.com
Principal AI/ML Operations Engineer (Pleasanton)
Principal AI/ML Operations Engineer (Pleasanton)BlackLine • Pleasanton, CA, United States
Principal AI / ML Operations Engineer (Pleasanton)

Principal AI / ML Operations Engineer (Pleasanton)

BlackLine • Pleasanton, CA, United States
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

It's fun to work in a company where people truly believe in what they're doing!

At BlackLine, we're committed to bringing passion and customer focus to the business of enterprise applications.

Since being founded in 2001, BlackLine has become a leading provider of cloud software that automates and controls the entire financial close process. Our vision is to modernize the finance and accounting function to enable greater operational effectiveness and agility, and we are committed to delivering innovative solutions and services to empower accounting and finance leaders around the world to achieve Modern Finance.

Being a best-in-class SaaS Company, we understand that bringing in new ideas and innovative technology is mission critical. At BlackLine we are always working with new, cutting edge technology that encourages our teams to learn something new and expand their creativity and technical skillset that will accelerate their careers.

Work, Play and Grow at BlackLine!

The Principal AI / ML Operations Engineer leads the architecture, automation, and operationalization of both machine learning and AI systems at scale. This role defines the strategy and technical standards for ML-Ops and AIOps across the organization, ensuring models and agents are evaluated, deployed, governed, and monitored with reliability, efficiency, and compliance. The candidate will collaborate across AI, data, and product engineering teams to drive best practices for serving, observability, automated retraining, evaluation flywheels, and operational guardrails for AI systems in production

You'll Get To :

Leadership and Strategy

  • Define enterprise-level standards and reference architectures for ML-Ops and AIOps systems.
  • Partner with data science, security, and product teams to set evaluation and governance standards (Guardrails, Bias, Drift, Latency SLAs).
  • Mentor senior engineers and drive design reviews for ML pipelines, model registries, and agentic runtime environments.
  • Lead incident response and reliability strategies for ML / AI systems.

AI System Deployment and Integration :

  • Lead the deployment of AI models and systems in various environments.
  • Collaborate with development teams to integrate AI solutions into existing workflows and applications.
  • Ensure seamless integration with different platforms and technologies.
  • Define and manage MCP Registry for agentic component onboarding, lifecycle versioning, and dependency governance.
  • Build CI / CD pipelines automating LLM agent deployment, policy validation, and prompt evaluation of workflows.
  • Develop and operationalize experimentation frameworks for agent evaluations, scenario regression, and performance analytics.
  • Implement logging, metering, and auditing for agent behavior, function calls, and compliance alignment.
  • Create scalable observability systemstracking conversation outcomes, factual accuracy, latency, escalation patterns, and safety events.
  • Architect end-to-end guardrails for AI agents including prompt injection protection, identity-aware routing, and tool usage authorization.
  • Collaborate cross-functionally to standardize authentication, authorization, and session governance for multi-agent runtimes.
  • Model Deployment and Integration :

  • Architect and standardize model registries and feature stores to support version tracking, lineage, and reproducibility across environments.
  • Lead the deployment of machine learning models into production environments, ensuring scalability, reliability, and efficiency.
  • Collaborate with software engineers to integrate machine learning models into existing applications and systems.
  • Implement and maintain APIs for model inference.
  • Infrastructure and Environment Management :

  • Design and manage training infrastructure including distributed training orchestration, GPU / TPU resource allocation, and automatic scaling.
  • Implement CI / CD for model workflows using pipelines integrated with model validation, bias checks, and rollback automation.
  • Build standardized experimentation frameworks for reproducible training, tuning, and deployment cycles (MLflow, W&B, Kubeflow).
  • Manage and optimize the infrastructure required for machine learning operations in cloud.
  • Work closely with other teams to ensure the availability, security, and performance of machine learning systems.
  • Monitoring and Maintenance :

  • Implement robust monitoring solutions for deployed machine learning models to detect issues and ensure performance.
  • Collaborate with data scientists and engineers to address and resolve model performance and data quality issues.
  • Conduct regular system maintenance, updates, and optimizations to ensure optimal performance of machine learning solutions.
  • Automation and Orchestration :

  • Develop and maintain automation scripts and tools for managing machine learning workflows.
  • Implement orchestration systems to streamline the end-to-end machine learning lifecycle, from data preparation to model deployment.
  • Collaboration with Data Science Teams :

  • Collaborate with data scientists to understand model requirements and constraints for deployment.
  • Facilitate the transition of machine learning models from research to production, ensuring scalability and efficiency.
  • Performance Optimization :

  • Identify and implement optimizations to enhance the performance and efficiency of machine learning models in production.
  • Conduct performance analysis and implement improvements based on resource utilization of metrics.
  • Security and Compliance :

  • Implement security measures to protect machine learning systems and data.
  • Ensure compliance with regulatory requirements and industry standards related to machine learning and data privacy.
  • Integrate audit controls, metadata storage, and lineage tracking across ML and AI workflows.
  • Ensure complete monitoring and feedback loops including event logs, evaluations, and automated retraining triggers.
  • Enforce secure deployment patterns with Infrastructure-as-Code and cloud-native secrets management.
  • Define SLAs, error budgets, and compliance reporting mechanisms for ML and AI systems.
  • What You'll Bring :

  • Education and Experience :
  • Bachelors or Masters degree in Computer Science, Machine Learning, Data Science, or a related field.
  • 10+ years in ML infrastructure, DevOps, and software system architecture; 4+ years in leading MLOps or AI Ops platforms.
  • Technical Skills :
  • Strong programming skills in languages such as Python, Java, or Scala.
  • Expertise in ML frameworks (TensorFlow, PyTorch, scikit-learn) and orchestration tools (Airflow, Kubeflow, Vertex AI, MLflow).
  • Proven experience operating production pipelines for ML and LLM-based systems across cloud ecosystems (GCP, AWS, Azure).
  • Deep familiarity with LangChain, LangGraph, ADK or similar agentic system runtime management.
  • Strong competencies in CI / CD, IaC, and DevSecOps pipelines integrating testing, compliance, and deployment automation.
  • Hands-on with observability stacks (Prometheus, Grafana, Newrelic) for model and agent performance tracking.
  • Understanding of governance frameworks for Responsible AI, auditability, and cost metering across training and inference workloads.
  • Proficiency in containerization technologies (e.g., Docker, Kubernetes).
  • Operations and Infrastructure :
  • Proficient in scripting languages (e.g., Bash, python) for automation.
  • Experience with workflow orchestration tools (e.g., Apache Airflow).
  • Expertise in managing and optimizing cloud-based infrastructure.
  • Familiarity with DevOps practices and tools for automated deployment.
  • Understanding of network configurations and security protocols.
  • Problem-solving and Critical Thinking :
  • Ability to define problems, collect and analyze data, and propose innovative solutions. Strong critical thinking skills to evaluate models, identify limitations, and
  • Adaptability and Learning Agility :
  • Comfortable working in a fast-paced, rapidly evolving environment. Proactive in staying up to date with the latest trends, techniques, and technologies in AI / data science
  • Thrive at BlackLine Because You Are Joining :

  • A technology-based company with a sense of adventure and a vision for the future. Every door at BlackLine is open. Just bring your brains, your problem-solving skills, and be part of a winning team at the world's most trusted name in Finance Automation!
  • A culture that is kind, open, and accepting. It's a place where people can embrace what makes them unique, and the mix of cultural backgrounds and varying interests cultivates diverse thought and perspectives.
  • A culture where BlackLiner's continued growth and learning is empowered. BlackLine offers a wide variety of professional development seminars and inclusive affinity groups to celebrate and support our diversity.
  • BlackLine is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to

    [job_alerts.create_a_job]

    Principal Engineer • Pleasanton, CA, United States

    [internal_linking.related_jobs]
    Principal ML Engineer : Applied AI & GenAI Innovation

    Principal ML Engineer : Applied AI & GenAI Innovation

    Relha LLC • Sunnyvale, CA, United States
    [job_card.full_time]
    A leading retail technology firm in Sunnyvale is seeking a Principal Machine Learning Engineer to define and solve high-value problems using advanced AI and data science techniques.This role requir...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Principal AI / ML Architect – Industrial & Automotive Solutions

    Principal AI / ML Architect – Industrial & Automotive Solutions

    Mogi I / O : OTT / Podcast / Short Video Apps for you • San Jose, California, United States
    [job_card.full_time] +1
    Principal AI / ML Architect – Industrial & Automotive Solutions Work Type : Direct Hire / Full-Time.Experience Required : 10 – 21 Years. Compensation : USD 150,000 – 199,000 Per Annum + Bonus up to 20%.E...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    AI Solutions Architect : On-Prem & Cloud ML Deployments

    AI Solutions Architect : On-Prem & Cloud ML Deployments

    7wdata • Santa Clara, CA, United States
    [job_card.full_time]
    A technology company is seeking a Machine Learning Engineer / Solution Architect with expertise in deploying deep learning models on-prem and in the cloud. Responsibilities include technical engagemen...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Sr. Software Engineer - AI / LLM Applications (26456)

    Sr. Software Engineer - AI / LLM Applications (26456)

    Supermicro • San Jose, CA, United States
    [job_card.full_time]
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    AI Solution Manager

    AI Solution Manager

    Supermicro • San Jose, CA, United States
    [job_card.full_time]
    Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    AI / ML Engineer

    AI / ML Engineer

    Mindlance • Concord, CA, United States
    [job_card.full_time]
    Title- AI / ML ( Platform Engineering).Duration- 12+ months, contract to hire.We are seeking a talented AI / ML Platform Engineer with a strong emphasis on Cloud to join our dynamic team.In this role, ...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    AI ML Engineer

    AI ML Engineer

    Smart IT Frame LLC • San Jose, California, United States
    [job_card.full_time]
    Location - San Jose , CA Hybrid (3 days WFO).At Smart IT Frame, we connect top talent with leading organizations across the USA. With over a decade of staffing excellence, we specialize in IT, healt...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Software Engineer - ML Performance

    Software Engineer - ML Performance

    Baseten • San Ramon, California, United States
    [job_card.full_time]
    We’re a growing team of builders backed by top-tier investors, including.ML teams at enterprises and category-defining AI-native companies like. Baseten to power their core production workloads with...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Staff ML Engineer — AI Platform, GPUs, Hybrid

    Staff ML Engineer — AI Platform, GPUs, Hybrid

    ServiceNow, Inc. • Santa Clara, CA, United States
    [job_card.full_time]
    A leading enterprise technology company seeks a Staff Machine Learning Engineer in Santa Clara.This role requires a commitment to building advanced AI infrastructures and collaborating with cross-f...[show_more]
    [last_updated.last_updated_1_day] • [promoted]
    Principal Machine Learning Engineer, Ads Delivery

    Principal Machine Learning Engineer, Ads Delivery

    Pinterest • Palo Alto, CA, United States
    [job_card.full_time]
    Millions of people around the world come to our platform to find creative ideas, dream about new possibilities and plan for memories that will last a lifetime. At Pinterest, we're on a mission to br...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Imaging PET / CT Tech

    Imaging PET / CT Tech

    US Oncology Network-wide Career Opportunities • Pleasant Hill, CA, United States
    [job_card.full_time]
    ANNUAL SALARY (DEPENDING ON SKILLS / EXPERIENCE) : $71.Open Positions in these Clinic Locations : Antioch, Dublin, Hayward, Emeryville, & Pleasant Hill. Under general supervision, performs diagnostic PE...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Lead AI Engineer (Fremont)

    Lead AI Engineer (Fremont)

    1Five • Fremont, CA, US
    [job_card.part_time]
    This is a leadership role at the intersection of.AI, technical architecture, and company vision.ML engineering and model development. Backflips core model, including architecture, data, training, an...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    AIML - ML Researcher in Foundation Models, Responsible AI

    AIML - ML Researcher in Foundation Models, Responsible AI

    Apple • Cupertino, CA, United States
    [job_card.full_time]
    Join us as we build world-class groundbreaking products for our customers! Apple's Data and ML Innovation team focuses on innovative technologies, methodologies, and research to enable fantastic us...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Principal AI / ML Operations Engineer (Pleasanton)

    Principal AI / ML Operations Engineer (Pleasanton)

    BlackLine • Pleasanton, CA, US
    [job_card.part_time]
    It's fun to work in a company where people truly believe in what they're doing!.At BlackLine, we're committed to bringing passion and customer focus to the business of enterprise applications.Since...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Principal ML Engineer — GenAI & Large-Scale AI Systems

    Principal ML Engineer — GenAI & Large-Scale AI Systems

    Walmart • Sunnyvale, California, United States
    [job_card.full_time]
    A large retail company in California is looking for a Principal Machine Learning Engineer to lead AI and machine learning projects. This role involves developing and deploying scalable solutions, co...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Principal AI / ML Engineer, Gen AI & LLM Ops Lead

    Principal AI / ML Engineer, Gen AI & LLM Ops Lead

    JPMorgan Chase • Palo Alto, CA, US
    [job_card.full_time]
    A leading financial services firm in Palo Alto is seeking a Principal AI / ML and Gen AI Engineer to enhance AI capabilities. The role encompasses designing scalable infrastructure on AWS, developing ...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Principal ML Architect (Machine Learning) with Imaging

    Principal ML Architect (Machine Learning) with Imaging

    VBeyond Corporation • Hayward, CA, United States
    [job_card.full_time]
    Job Title : - Principal ML Architect (Machine Learning).Location : - San Francisco, CA (Onsite 3 days / Week).Type of Employment : - Fullltime. As a Machine Learning / Deep Learning Architect you will...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    AI / Machine Learning Engineer Agentic AI & LLM Systems (Fremont)

    AI / Machine Learning Engineer Agentic AI & LLM Systems (Fremont)

    Experis • Fremont, CA, US
    [job_card.part_time]
    AI / Machine Learning Engineer Agentic AI & LLM Systems.Were partnered with a pioneering AI organisation pushing the boundaries of. LLMs reason, act, and collaborate autonomously.Designing the fram...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]