Reliability Engineer, Mechanical Systems, NAVantage Data Centers • Santa Clara, California, United States

Reliability Engineer, Mechanical Systems, NA

Vantage Data Centers • Santa Clara, California, United States

[job_card.variable_days_ago]

[job_preview.job_type]

[job_card.full_time]

[job_card.job_description]

About Vantage Data Centers

Vantage Data Centers powers, cools, protects and connects the technology of the world's well-known hyperscalers, cloud providers and large enterprises. Developing and operating across North America, EMEA and Asia Pacific, Vantage has evolved data center design in innovative ways to deliver dramatic gains in reliability, efficiency and sustainability in flexible environments that can scale as quickly as the market demands.

Reliability Engineering Department

The Reliability Engineering Team is responsible for the overall operating health of critical systems across Vantage global facilities. For each of the major systems Electrical, Mechanical, and Controls, the Reliability Engineering team is responsible for ensuring success in the commissioning stages of new construction, evaluating and improving the reliability and performance of existing critical infrastructure, sustaining equipment operational availability through maintenance program design, providing ongoing technical support to the Site Operations Teams, as well as providing systems reliability and maintainability feedback to the Design Engineering teams for future design considerations.

Position Overview

This role can working from our data center campus in Quincy, WA or Santa Clara, CA. This is a hybrid role working 3 days in the office, 2 days home office

The Reliability Engineer for Mechanical Systems is responsible for the overall operational health of critical cooling systems. This responsibility includes oversight of facility acceptance testing, mechanical system maintenance planning, and Facility Operations technical support. Requirements for this position include full familiarity with mechanical systems arrangements, equipment types, system automation and control components. The Reliability Engineer candidate understands component failures and the extended effects of those failures on the larger cooling system as well as appropriate mitigating actions. Also required is an understanding of cooling system theory, an ability to read drawings and schematics, and an ability to identify likely failure points in design. The mechanical reliability engineer acts as an advisor for Site Operations to reference as needed and has enough experience to train technicians to the level of effective troubleshooting. The mechanical systems Reliability Engineer regularly interfaces the Vantage Design Engineering team, the Automation Systems Group, and Facility Operations teams.

Essential Job Functions

System Validation and Testing :

Ensure newly built systems meet design intent and perform without issue during the facility acceptance testing phases of new construction.

Review commissioning plans and ensure the thoroughness of startup testing.

Attend factory witness testing to verify and validate equipment functionality.

Maintenance and Reliability :

Design maintenance programs to enhance equipment operability and efficiency while minimizing life cycle costs.

Design maintenance programs to minimize maintenance complexity and reduce maintenance down time.

Understand potential equipment failures and provide full technical support to Facility Operations teams in the event of a critical system failure.

Perform system component upgrades as required to ensure reliability and combat obsolescence.

Validate maintenance performance by analyzing trends, operational history, and maintenance data.

Design and Construction Collaboration :

Provide systems reliability and maintainability feedback to the Design Engineering teams for future design considerations.

Work with Design Engineering and Construction teams to ensure the reliability and maintainability of new and modified installations.

Review construction equipment submittals, identify potential deficiencies, and evaluate maintenance feasibility.

Risk Management and Analysis :

Develop risk management plans that will anticipate reliability-related risks that could adversely impact plant operation.

Perform Root-Cause Failure Analysis and facilitate corrective action.

Conduct Critical System Availability and Capacity Analysis, Critical Spare Parts Analysis, Equipment Life Cycle Costing, and End of Useful Life Analysis.

Support and Training :

Provide at-request guidance and technical support to Facility Operators.

Ensure Site Teams are proficient in system operation and maintenance techniques.

Performance and Asset Management :

Work with Site Operations to perform analyses of Asset Utilization, Overall Equipment Efficiency, and Remaining useful life.

Design appropriate site response procedures based on potential critical system failures.

Perform Reliability, Availability, and Maintainability Analysis to improve system performance.

Additional Duties :

Handle additional duties as assigned by Management.

Job Requirements

Education :

Bachelor's degree in Electrical Engineering / Mechanical Engineering, or a related field or equivalent field experience preferred.

Experience :

2-3 years of experience in critical facility operations and maintenance.

Skills :

Strong organizational and project management skills.

Excellent communication and interpersonal abilities.

Strong attention to detail and accuracy.

Ability to multitask and prioritize effectively in a fast-paced environment.

Ability to work both independently and as part of a team.

Travel required is expected to be up to 25% but may increase over time as the business evolves.

Physical Demands and Special Requirements

The physical demands described here are representative of those that must be met by an employee to successfully perform the essential functions of this job. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.

While performing the duties of this job, the employee is occasionally required to stand; walk; sit; use hands to handle, or feel objects; reach with hands and arms; climb stairs; balance; stoop or kneel; talk and hear. The employee must occasionally lift and / or move up to 25 pounds.

Additional Details

Salary Range : $110,000-$120,000 Base + Bonus (this range is based on national market data and may vary in other locations).

This position is eligible for company benefits including but not limited to medical, dental, and vision coverage, life and AD&D, short and long-term disability coverage, paid time off, employee assistance, participation in a 401k program that includes company match, and many other additional voluntary benefits.

Compensation for the role will depend on a number of factors, including your qualifications, skills, competencies, and experience and may fall outside of the range shown.

#LI-Hybrid

#LI-TS1

We operate with No Ego and No Arrogance. We work to build each other up and support one another, appreciating each other's strengths and respecting each other's weaknesses. We find joy in our work and each other, actively seeking opportunities to inject fun into what we do. Our hard and efficient work is rewarded with an above market total compensation package. We offer a comprehensive suite of health and welfare, retirement, and paid leave benefits exceeding local expectations.

Throughout the year, the advantage of being part of the Vantage team is evident with an array of benefits, recognition, training and development, and the knowledge that your contribution adds value to the company and our community.

Don't meet all the requirements? Please still apply if you think you are the right person for the position. We are always keen to speak to people who connect with our mission and values.

Vantage Data Centers is an Equal Opportunity Employer

Vantage Data Centers does not accept unsolicited resumes from search firm agencies. Fees will not be paid in the event a candidate submitted by a recruiter without an agreement in place is hired; such resumes will be deemed the sole property of Vantage Data Centers.

We'll be accepting applications for at least one week from the date this role is posted. If you're interested, we encourage you to apply soon-we're excited to find the right person and will keep the role open until we do!

[job_alerts.create_a_job]

Reliability Engineer • Santa Clara, California, United States

[internal_linking.similar_jobs]

Reliability Engineer, Mechanical Systems, NA

Vantage Data Centers • Santa Clara, CA, United States

[job_card.full_time]

[last_updated.last_updated_variable_days] • [promoted]

Reliability Engineer

nEye Systems • Santa Clara, CA, US

[job_card.full_time]

Eye’s MEMS-based silicon photonics optical circuit switches (OCS) eliminate critical bottlenecks in AI processing by enabling direct optical connections among thousands of GPUs and memory uni...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Site Reliability Engineer

PsiQuantum • Palo Alto, California, United States

[job_card.full_time]

PsiQuantum'smission is to build the first useful quantum computers-machines capable of delivering the breakthroughs the field has long promised. Since our founding in 2016, our singular focus has be...[show_more]

[last_updated.last_updated_30] • [promoted]

Senior Technology Site Reliability Engineer

Cooley LLP • Palo Alto, CA, United States

[job_card.full_time]

Senior Technology Site Reliability Engineer.Cooley is seeking a Senior Site Reliability Engineer to join the.Infrastructure & Development Operations. The Senior Technology Site Reliability Engineer(...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Sr. Reliability Engineer / Sustaining

Rivian • Palo Alto, CA, United States

[job_card.full_time]

Rivian is on a mission to keep the world adventurous forever.This goes for the emissions‑free Electric Adventure Vehicles we build, and the curious, courageous souls we seek to attract.As a company...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Principal Site Reliability Engineer (Prisma AIRS)

Palo Alto Networks • Santa Clara, CA, US

[job_card.full_time]

At Palo Alto Networks® everything starts and ends with our mission : .Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer a...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Site Reliability Engineer - Kubernetes Platform

Pantera Capital • Palo Alto, CA, United States

[job_card.full_time]

AI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excelle...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Site Reliability Engineer

Archetype AI • Palo Alto, CA, United States

[job_card.full_time]

Get AI-powered advice on this job and more exclusive features.Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team f...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Reliability Engineer

Etched • Cupertino, CA, US

[job_card.full_time]

Etched is building AI chips that are hard-coded for individual model architectures.Our first product (Sohu) only supports transformers, but has an order of magnitude more throughput and lower laten...[show_more]

[last_updated.last_updated_30] • [promoted]

Site Reliability Engineer

FLUIX • Palo Alto, CA, United States

[job_card.full_time]

FLUIX is building the AI operating system that plans, designs, and optimizes AI infrastructure.We are based in Silicon Valley. We specialize in providing AI-driven solutions for data centers and pow...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Site Reliability Engineer

Foxconn Industrial Internet - FII • San Jose, CA, US

[job_card.full_time] +1

Foxconn Industrial Internet (Fii), is a world leading professional design and manufacturing service provider of communication network equipment, cloud service equipment, precision tools and industr...[show_more]

[last_updated.last_updated_30] • [promoted]

Sr. Reliability Engineer (26861)

Supermicro • San Jose, California, United States

[job_card.full_time]

Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop / Big Data, Hyperscale, HPC and IoT / Embedded customers...[show_more]

[last_updated.last_updated_30] • [promoted]

Site Reliability Engineer - Observability

Rivian and Volkswagen Group Technologies • Palo Alto, CA, United States

[job_card.full_time]

Senior Site Reliability Engineer (SRE).RivianVW's Data Platform - Production Engineering team.In this role, you will design, implement, and scale robust observability systems to ensure the health, ...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Site Reliability Engineer (San Jose)

Cypress HCM • San Jose, CA, US

[job_card.part_time]

As a Site Reliability Engineer (Contractor), you will be a hands-on contributor, focused on supporting and improving the reliability of our AWS cloud infrastructure. You will apply core SRE principl...[show_more]

[last_updated.last_updated_variable_days] • [promoted]

Staff Reliability Engineer | Systems Core

Luma AI • Palo Alto, CA, United States

[job_card.full_time]

Staff Reliability Engineer | Systems Core.Staff Reliability Engineer | Systems Core.Five days ago Be among the first 25 applicants. This range is provided by Luma AI.Your actual pay will be based on...[show_more]

[last_updated.last_updated_variable_hours] • [promoted] • [new]

Site Reliability Engineer

Cryptoware Technologies Inc • Santa Clara, CA, US

[job_card.full_time]

Lead the effort of global expansion of Huobi globe spanning infrastructure.Work with engineering teams to make sure new features and changes are deployed quickly and safely.Constantly improve our s...[show_more]

[last_updated.last_updated_30] • [promoted]

Reliability Systems Engineer | EAG Laboratories

Eurofins USA Material Sciences • Santa Clara, CA, US

[job_card.permanent]

Eurofins Scientific is a global leader in analytical testing, operating over 950 labs in 60 countries with 65,000 employees. EAG Laboratories, part of Eurofins, offers advanced services in analytica...[show_more]

[last_updated.last_updated_30] • [promoted]

Site Reliability Engineer - Kubernetes Platform

xAI • Palo Alto, CA, US

[job_card.full_time]

AI's mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering exc...[show_more]

[last_updated.last_updated_30] • [promoted]