Description : *Our organization is looking for a Site Reliability Engineer to join their SRE & Application Support organization, specifically supporting their Finance LOB. They are currently. The organization is standing up a new environment using Apache Airflow and Apache Flink to accommodate new regulatory and compliance requirements that require real-time reporting of various financial health metrics.
Apache Airflow will be used for job scheduling while Flink will be used for ETL. This team is looking for an SRE to work in tandem with development to enable creation and updating of logging standards to streamline dashboard creation and ensure usability of logging repository.
python, sre, airflow, apache, jenkins, git, load balancing, containerization, flask
python,sre,airflow,apache,jenkins,git,load balancing,containerization,flask
- Additional Skills & Qualifications : *
Product Development
- Enable creation and updating of logging standards to streamline dashboard creation and ensure usability of logging repository.
- Drive monitoring requirements to ensure business-service level visibility for all support teams.
- Provides guidance to software engineers related to design patterns that are resistant to failure.
- Communicates effectively with Development and Operation teams to align on requirements, driving SDLC requirements, capabilities, and limitations pertinent to delivering highly resilient applications.
Automation
- Responsible for evaluating and implementing orchestration, automation, and tooling solutions to ensure consistent processes and repetitive tasks are performed with a higher level of accuracy and reduced defects.
- Build, implement and advise on recovery tooling to adhere to enterprise standards and / or frameworks.
- Introduce new and impactful technologies to the production support tool chain that help minimize friction for production releases and support, and more quickly diagnose and recover from production incidents.
Operational Readiness
- Responsible for availability, proactive monitoring / alerting, capacity planning, performance (reducing latency and increasing efficiency) to include testing for technical platforms.
- Partner with appropriate supporting teams to ensure operational readiness throughout the application lifecycle.
Production Support
- Ensure application data flows are accurate and up to date with the objective to increase the knowledge base of all support teams and drive reliability.
- Facilitates the resolutions of non-application issues such as 3rd party upstream issues, infrastructure issues, storage, database, network and file transfer problems.
- Participate in architectural decisions to ensure software transaction flows are appropriately supported and designed.
- Is an IT infrastructure Subject Matter Expert (SME) and works with Development teams to build to standards that drive the hi
- Experience Level : *
Expert Level
About TEKsystems :
We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company.
The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.