[job_card.job_description]Job Description Summary:
• Machine Learning Ops Engineer to build & support scalable, highly available and robust Machine Learning (ML) /Deep Learning (DL) platform using ML/DL frameworks, High-Performance Computing (HPC) machines, Data Science tools, products & services in cloud and on-premises for client's data & analytics organization.
• Role will expose you to cutting edge technologies related to ML/DL and the ideal candidate will be driven, focused and enthusiastic about learning new technologies and implement them.
Responsibilities:
• Build, install, configure, manage, and scale state-of-the-art machine learning platform in cloud (Azure preferred) & on-premises powering client's Data & Analytics products and solutions.
• Work with data scientists, architects, DevOps engineers, and vendors to implement scalable ML/DL solutions in cloud and on-premises to solve complex problems.
• Creating & maintaining ML/DL pipelines and overall ML/DL workflow orchestration including but not limited to data collection, prep, transform, analyze, experiment, train, validate, serve, monitor, etc.
• Implement ML/DL solutions addressing performance, scalability, and the governance/ traceability of machine learning models
• Iterate quickly through latest technologies, products, frameworks, and R&D on latest information related to ML/DL frameworks, tools & services.
Qualifications:
• 4+ years' experience delivering DevOps and MLOps in a Production/Enterprise setting
• Bachelor's degree required; Masters preferred in Computer Science or Data Science
• Excellent written and oral communication and presentation skills.
• Experienced in a technical role involving platform and infrastructure operation.
• System administration experience of Unix or Linux systems.
• Container-based deployment experience using Docker and Kubernetes.
• Proficient with the machine learning modelling lifecycle and comfortable addressing both functional and technical aspects of model delivery
• Experience with managing, deployment of large distributed systems like Spark, DASK & H20 and heterogenous platform components.
• Experienced with programming languages like Python or R and comfortable in understanding statistical foundations of most used ML algorithms.
• Experienced with Machine Learning frameworks: Sci-kit, Keras, Theano, TensorFlow, Spark Mllib, etc.
• Preferred hand-on experience IBM Watson Machine Learning systems or related preferred
• Preferred hands-on experience with HPC - Nvidia, CUDA
• Preferred experience with configuration Management tools like Ansible, puppet
• Preferred experience in monitoring and performance analysis of Machine Learning platforms using tools like Grafana and Zabbix.