Talent.com
Data Engineer
Data EngineerInstitute of Foundation Models • Sunnyvale, CA, US
Data Engineer

Data Engineer

Institute of Foundation Models • Sunnyvale, CA, US
[job_card.30_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Job Description

Job Description

About the Institute of Foundation Models

We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.

The Role

As a Data Engineer specializing in Natural Language Processing (NLP) and large-scale data processing, you will quickly and effectively gather, curate, and prepare high-quality datasets to support cutting-edge NLP research. Your role will be instrumental in enabling researchers by delivering essential data through efficient and scalable engineering practices, including web crawling, LLM-generated content refinement, and robust data pipelines, primarily leveraging Python and related technologies.

Key Responsibilities

  • Rapidly collect, curate, and preprocess datasets based on detailed specifications provided by NLPresearchers,delivering data within tight timelines.
  • Develop and maintain efficient web crawling solutions, APIs, and automated workflows to continuously improve data collection processes.
  • Refine and evaluate outputs from Large Language Models (LLMs) to generate structured datasets suitable for model training and benchmarking.
  • Implement scalable data pipelines, ensuring efficient data processing, storage, retrieval, and distribution to research teams.
  • Collaborate closely with researchers and engineers to ensure collected data meets specified quality and relevance criteria.
  • Document data collection methodologies, dataset characteristics, and pipeline architecture clearly and effectively.
  • Engage with peer teams and participate in technical reviews to uphold best practices and data quality standards.
  • Represent MBZUAI at industry and research forums, showcasing technical capabilities in large-scale data processing and AI data infrastructure.

Academic Qualifications

  • Bachelor's degree in Computer Science, Data Science, Engineering, or a related technical field required
  • Master’s degree or PhD degree or equivalent experience in Computer Science, Data Engineering, or related technical fields preferred.
  • Professional Experience - Required

  • Extensive experience in data engineering, data processing, and automation using Python.
  • Demonstrated proficiency in designing and deploying web crawling solutions, automated data extraction, and processing pipelines.
  • Strong understanding of data structures, algorithms, databases, SQL, and performance optimization.
  • Experience working with cloud infrastructure and distributed data processing frameworks (e.g., AWS, Spark, Kafka, Kubernetes).
  • Excellent problem-solving abilities, attention to detail, and the capability to rapidly address technical challenges.
  • Strong communication and collaboration skills with cross-functional teams.
  • Professional Experience - Preferred

  • Proven track record of supporting NLP or AI research teams with rapid and reliable data delivery.
  • Experience working with large language models, including evaluation, efficient inference, and prompt engineering.
  • Experience with refining outputs from large-scale AI models, such as LLM-generated data.
  • Contributions to open-source projects, coding competitions, or high visibility in coding communities (e.g., GitHub, Stack Overflow).
  • Familiarity with the latest advancements in NLP data processing and large language model technologies.
  • Visa Sponsorship

    This position is eligible for visa sponsorship.

    Benefits Include

  • Comprehensive medical, dental, and vision benefits
  • Bonus
  • 401K Plan
  • Generous paid time off, sick leave and holidays
  • Paid Parental Leave
  • Employee Assistance Program
  • Life insurance and disability
  • [job_alerts.create_a_job]

    Data Engineer • Sunnyvale, CA, US

    [internal_linking.similar_jobs]
    Senior Data Engineer, Data Platform

    Senior Data Engineer, Data Platform

    Otter.ai • Mountain View, California, United States
    [job_card.full_time] +1
    Data Platform team and build the core data foundations that power analytics, experimentation, and decision-making across the company. In this role, you will design and own foundational data models, ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Staff Configuration Data Engineer

    Staff Configuration Data Engineer

    Archer • San Jose, California, United States
    [job_card.full_time]
    Archer is an aerospace company based in San Jose, California building an all-electric vertical takeoff and landing aircraft with a mission to advance the benefits of sustainable air mobility.We are...[show_more]
    [last_updated.last_updated_1_day] • [promoted]
    Staff Data Engineer

    Staff Data Engineer

    Coupand • Mountain View, California, United States
    [job_card.full_time]
    How did we ever live without Coupang?" Born out of an obsession to make shopping, eating, and living easier than ever,.We are one of the fastest-growing e-commerce companies that.We are proud to ha...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Data Engineer

    Data Engineer

    GeekSoft Consulting • Sunnyvale, California, United States
    [job_card.full_time]
    Geek Soft Consulting is seeking passionate and skilled.We are hiring for multiple roles across cutting-edge projects with Fortune 500 clients. If you’re looking to make an impact in a dynamic enviro...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Data Engineer - Open on W2 only

    Data Engineer - Open on W2 only

    Dataflix • San Jose, CA, United States
    [filters.remote]
    [job_card.full_time]
    We are looking for a Data Engineer to build out and scale our Analytics platform.As a member of the team, you will be responsible for building and scaling a robust platform that will act as the dri...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Data Engineer / Architect

    Data Engineer / Architect

    JLL • Mountain View, CA, United States
    [job_card.full_time]
    JLL empowers you to shape a brighter way.Our people at JLL and JLL Technologies are shaping the future of real estate for a better world by combining world class services, advisory and technology f...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Data Engineer

    Data Engineer

    AI Cybersecurity Company • San Jose, CA, US
    [job_card.full_time]
    We are a 5-days a week in-office company located in N.Sorry, no remote hiring for this particular role).Join us at our cutting-edge AI startup in the. Data engineering and data modeling.Building dat...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Sr. Data Engineer

    Sr. Data Engineer

    Anrgi Tech • San Jose, California, United States
    [job_card.full_time]
    Title : Senior Data Engineer (Snowflake and Data\-bricks) \- On\-Site.Strong hands\-on experience in data engineering using. Skilled in optimizing compute and storage costs across Snowflake and Datab...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Data Engineer

    Data Engineer

    Abaka Ai • Palo Alto, California, United States
    [job_card.full_time]
    WEB : 05e34dc7-a5e1-4b82-bbe9-dd79952f9b5e-4"> .Abaka AI is built on one mission : to be the world’s most trusted data partner for AI companies. More than 1,000 industry leaders across Generative AI, Em...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Data Engineer

    Data Engineer

    Balbix • San Jose, California, United States
    [job_card.full_time]
    The Balbix Security Cloud uses AI and automation to reinvent how the World's leading organizations reduce their cyber risk. With Balbix, security teams can accurately inventory their cloud and on-pr...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Data Engineer (Mandarin)

    Data Engineer (Mandarin)

    Intellipro Group Inc. • Sunnyvale, California, United States
    [job_card.full_time]
    MUST work onsite in a hybrid model at Sunnyvale, California.Salary Range / Rate (Currency) : Up to $60 / hr.W2 - depending on the total experience. Fluency with Mandarin is required for this role since...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Data Engineer (Enterprise AI & ERP Modernization)

    Data Engineer (Enterprise AI & ERP Modernization)

    Tessera Labs • San Jose, California, United States
    [filters.remote]
    [job_card.full_time]
    Tessera Labs is redefining how enterprises adopt and operationalize Artificial Intelligence.Backed by Foundation Capital and led by a world-class founding team, we build multi-agent AI systems that...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Snowflake Data Engineer

    Snowflake Data Engineer

    Acunor • Santa Clara, CA, US
    [job_card.full_time]
    Title : Snowflake Data Architect - Onsite all 5 days.Location : Santa Clara, CA -95054.Note : The candidate is required to attend the final interview in person at Santa Clara, CA 95054.Strong hands-on...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Data Engineer with Snowflake Certification

    Data Engineer with Snowflake Certification

    VirtualVocations • Sunnyvale, California, United States
    [job_card.temporary]
    A company is looking for a Senior Data Engineer for a 6-month contract position that may work 100% remotely.Key Responsibilities Develop efficient and scalable data extraction methodologies from ...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Staff Data Engineer

    Staff Data Engineer

    Palo Alto Networks • Santa Clara, California, United States
    [job_card.full_time]
    At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and m...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Principal Data Engineer

    Principal Data Engineer

    Sanas • Palo Alto, California, United States
    [job_card.full_time]
    Weʼre looking for an experienced and forward-thinking Principal Data Engineer to lead the design and implementation of our end-to-end data infrastructure for industry leading Voice AI products.This...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Data Engineer

    Data Engineer

    Vinci4d • Palo Alto, California, United States
    [job_card.full_time]
    We're building a next-generation AI enabled assistant for hardware designers - one that doesn't just understand design, but helps improve it. Our system blends machine learning, 3D geometry, simulat...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Software Engineer - Data Engine

    Software Engineer - Data Engine

    Applied Intuition • Mountain View, California, United States
    [job_card.full_time]
    Applied Intuition is the vehicle intelligence company that accelerates the global adoption of safe, AI-driven machines.Founded in 2017, Applied Intuition delivers the toolchain, Vehicle OS, and aut...[show_more]
    [last_updated.last_updated_30] • [promoted]