Talent.com
Data Scientist (Machine Learning & Pipeline Engineering)
Data Scientist (Machine Learning & Pipeline Engineering)Kalamata Capital, LLC. • San Francisco, CA, US
Data Scientist (Machine Learning & Pipeline Engineering)

Data Scientist (Machine Learning & Pipeline Engineering)

Kalamata Capital, LLC. • San Francisco, CA, US
[job_card.variable_days_ago]
[job_preview.job_type]
  • [job_card.full_time]
[job_card.job_description]

Job Description

Job Description

About Us

Kalamata Capital Group is a forward-thinking financial technology company committed to leveraging data-driven intelligence to support small business growth. We are seeking a highly skilled Data Scientist to develop predictive models, perform robust exploratory data analysis, and build scalable data pipelines that power key business decisions across the organization.

Summary

The ideal candidate is an experienced data scientist with deep technical expertise in machine learning, data engineering workflows, and statistical modeling. This role will work closely with engineering, product, and analytics teams to design, validate, and deploy ML solutions that improve decision-making efficiency. Strong proficiency in Pandas, PySpark, and MongoDB is essential, along with the ability to write clean, reproducible, production-ready code. The successful candidate will be equally comfortable communicating complex analytical insights to non-technical stakeholders.

Key Responsibilities

Exploratory Analysis & Data Profiling : Conduct EDA on large, complex datasets using Pandas and PySpark; assess data quality and structure.

Model Development : Build, tune, and evaluate supervised and unsupervised machine learning models (e.g., tree-based methods, regressions, boosting algorithms).

Pipeline Engineering : Design and implement reliable, maintainable machine learning pipelines and preprocessing workflows for production environments.

Data Management : Query and integrate MongoDB datasets; design efficient schemas and aggregation pipelines that support analytical and operational workloads.

Visualization : Create intuitive visualizations using seaborn, plotly, and matplotlib to support model diagnostics and business storytelling.

Reproducible Code : Write clean, modular, well-documented Python code (PEP8 compliant); maintain version control using Git.

Model Explainability : Apply model interpretation tools such as SHAP and LIME to evaluate feature impact and improve transparency.

Cross-Functional Collaboration : Partner with engineering, analytics, and product teams to translate business needs into actionable model-driven solutions.

Documentation : Produce clear technical memos, reports, and model documentation for internal stakeholders.

Required Skills & Qualifications

  • Education & Experience :
  • M.S. in Computer Science, Machine Learning, Computational Biology, or related quantitative field plus 3+ years of relevant experience , or equivalent combination of education and applied work.
  • Strong foundation in Linear Algebra, Probability, and Statistics .

Technical Expertise :

  • Advanced proficiency with Pandas and PySpark for data cleaning, reshaping, merges, feature engineering, and workflow optimization.
  • Strong experience with MongoDB , including querying, indexing, and aggregation pipelines.
  • Deep knowledge of supervised / unsupervised ML techniques and tools (scikit-learn, XGBoost).
  • Solid understanding of optimization, regularization, loss functions, and evaluation metrics (AUC, precision, recall, RMSE).
  • Core Skills :

  • Experience delivering end-to-end ML projects (data ingestion modeling evaluation optional deployment).
  • Ability to write clean, reproducible code and maintain organized notebooks / scripts.
  • Excellent communication skills with the ability to translate analysis into business insights.
  • Ability to relocate to the New York metro area.
  • Preferred (Bonus) Skills

  • Experience with AWS tools (Glue, S3, DMS).
  • Familiarity with deep learning frameworks (PyTorch, TensorFlow).
  • Experience deploying models using FastAPI, Flask, AWS, or GCP.
  • SQL, data warehousing, or data versioning experience.
  • Software engineering best practices (testing, CI / CD, code review).
  • Link to GitHub, GitLab, or portfolio of analytical / ML code.
  • Flexible work from home options available.

    [job_alerts.create_a_job]

    Scientist Machine Learning • San Francisco, CA, US

    [internal_linking.similar_jobs]
    Staff Data Scientist / Machine Learning Engineer - Listing Quality

    Staff Data Scientist / Machine Learning Engineer - Listing Quality

    Faire • San Francisco, CA, United States
    [job_card.full_time]
    Faire is an online wholesale marketplace built on the belief that the future is local — independent retailers around the globe are doing more revenue than Walmart and Amazon combined, but individua...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Machine Learning Data Scientist, Forecasting

    Machine Learning Data Scientist, Forecasting

    OpenAI • San Francisco, CA, United States
    [job_card.full_time]
    Machine Learning Data Scientist, Forecasting.Join to apply for the Machine Learning Data Scientist, Forecasting role at OpenAI. The Strategic Finance team at OpenAI plays a critical role in shaping ...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Hybrid Data Scientist : Insights, ML & Operations Analytics

    Hybrid Data Scientist : Insights, ML & Operations Analytics

    Aramark • San Francisco, CA, United States
    [job_card.full_time]
    A leading sports and entertainment company is hiring a Junior Data Scientist in San Francisco to support data analysis at Oracle Park. The role involves analyzing consumer behavior, evaluating opera...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Sr. Data Scientist, Machine Learning

    Sr. Data Scientist, Machine Learning

    Varo Money, Inc. • San Francisco, CA, United States
    [job_card.full_time]
    Varo is an entirely new kind of bank.All digital, mission‑driven, FDIC insured and designed for the way our customers live their lives. Varo represents a new generation of fintech built on technolog...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Data Scientist - ML for Growth & Retention (Remote)

    Senior Data Scientist - ML for Growth & Retention (Remote)

    RingCentral, Inc • Belmont, CA, United States
    [filters.remote]
    [job_card.full_time]
    A leading cloud communications company based in Belmont, California, is seeking a skilled professional to design and optimize advanced machine learning models. Responsibilities include collaborating...[show_more]
    [last_updated.last_updated_variable_hours] • [promoted] • [new]
    Staff Data Scientist : GTM Analytics & MLOps Lead

    Staff Data Scientist : GTM Analytics & MLOps Lead

    Okta • San Francisco, California, United States
    [job_card.full_time]
    A leading identity management firm in San Francisco is looking for a Data Scientist to lead advanced analytics strategies. The role requires 7+ years of experience in data science, along with expert...[show_more]
    [last_updated.last_updated_1_day] • [promoted]
    Founding Data Scientist / Machine Learning Engineer

    Founding Data Scientist / Machine Learning Engineer

    NLP PEOPLE • San Francisco, CA, United States
    [job_card.full_time]
    Palladio AI is the intelligence layer between raw data and decisive action, surfacing product opportunities that turn into real growth levers and guiding action so product teams iterate with confid...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Principal Data Scientist I - Lead AI / ML & Client Impact

    Principal Data Scientist I - Lead AI / ML & Client Impact

    McKinsey & Company, Inc. • San Francisco, CA, United States
    [job_card.full_time]
    A global management consultancy is seeking a Principal Data Scientist I to tackle complex data challenges and develop cutting-edge solutions. This role involves mentoring other data scientists and c...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Principal Data Scientist : ML, Analytics & Growth Strategy

    Principal Data Scientist : ML, Analytics & Growth Strategy

    RingCentral • Belmont, CA, United States
    [job_card.full_time]
    A leading cloud communications provider is seeking a Principal Data Scientist to design, develop, and optimize advanced machine learning models. The ideal candidate will have a Master's in Data Scie...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Founding Data Scientist / Machine Learning Engineer

    Founding Data Scientist / Machine Learning Engineer

    Palladio AI • San Francisco Bay, California, United States
    [job_card.full_time]
    Seeking Founding Data Scientists and Machine Learning Engineers.Imagine Multiplying Your Impact.You've unlocked major wins in your career - you've shipped models, moved key metrics, and proved what...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Data Scientist, Machine Learning (Risk)

    Senior Data Scientist, Machine Learning (Risk)

    Gemini Trust Company • San Francisco, California, United States
    [job_card.full_time]
    About the Company Gemini is a global crypto and Web3 platform founded by Cameron and Tyler Winklevoss in 2014, offering a wide range of simple, reliable, and secure crypto products and services to ...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Data Scientist (MMM)

    Data Scientist (MMM)

    FOCUSKPI INC • San Bruno, CA, US
    [job_card.temporary]
    Data Scientist with MMM (Market Mix Modeling).Perform hands-on coding to retrieve and analyze large datasets using Python and SQL. Integrate disparate data sources and leverage state-of-the-art anal...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Data Scientist I — Build & Deploy ML Solutions

    Data Scientist I — Build & Deploy ML Solutions

    Early Warning® • San Francisco, California, United States
    [job_card.full_time]
    A leading financial technology provider is looking for a Data Scientist I to leverage machine learning and AI.In this role, you'll work with diverse data sets, develop models, and partner with engi...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Sr. Data Scientist, Machine Learning

    Sr. Data Scientist, Machine Learning

    Varo Bank • San Francisco, CA, US
    [job_card.full_time]
    Varo is an entirely new kind of bank.All digital, mission-driven, FDIC insured and designed for the way our customers live their lives. Varo represents a new generation of fintech built on technolog...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Lead Data Scientist

    Lead Data Scientist

    VirtualVocations • Oakland, California, United States
    [job_card.full_time]
    A company is looking for a Lead Data Scientist.Key Responsibilities Design and implement data injection for feature engineering and machine learning Conduct exploratory data analysis to identify...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Data Engineer (Machine Learning)

    Senior Data Engineer (Machine Learning)

    Unity Technologies • San Francisco, California, United States
    [job_card.full_time]
    We are looking for an experienced Senior Data Engineer with a strong background in machine learning and a proven track record in the AdTech sector. In this role, you will be a key contributor to our...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]
    Senior Data Scientist- Machine Learning

    Senior Data Scientist- Machine Learning

    Sofi • San Francisco, California, United States
    [job_card.full_time]
    The Risk Data Science team is looking for a Sr Staff Data Scientist to develop advanced machine learning models, guide measurement, strategy, and data-driven decision making to support credit under...[show_more]
    [last_updated.last_updated_30] • [promoted]
    Senior Data Engineer (Machine Learning)

    Senior Data Engineer (Machine Learning)

    Unity • San Francisco, CA, United States
    [job_card.full_time]
    We are looking for an experienced Senior Data Engineer with a strong background in machine learning and a proven track record in the AdTech sector. In this role, you will be a key contributor to our...[show_more]
    [last_updated.last_updated_variable_days] • [promoted]