SRE
Job Description
Level 2
Job Description:
Company Profile
We are a leading global financial services firm providing a wide range of investment banking, securities, wealth management and investment management services. With offices in more than 41 countries, the Firm’s employees serve clients worldwide including corporations, governments, institutions and individuals.
Reliability & Production Engineering:
Resiliency Engineering is a production-oriented discipline focused on improving service availability, latency, scalability, performance, and efficiency for technology products in he Company. Our core infrastructure processes hundreds of millions of transactions, and we serve assets of more than a trillion dollars daily. This role will be responsible for the design & implementation of the platform, and corresponding frameworks, application and gameday exercises, for testing critical applications at scale. If this scale resonates with you, come join us.
Job Profile:
Systems Reliability Engineering (SRE) is a discipline focused on improving system service availability, observability, scalability, performance, and resilience across our Company by applying sound software engineering principles and adopting the latest technology and tooling.
We are growing SRE capabilities within our Reliability & Production Engineering (RPE) organization as part of the transformation of Company’s Technology.
Responsibilities:
• Are interested in distributed systems and working with highly scalable and reliable services.
• Like to work in a fast-moving environment and you aren’t afraid to change things to make them better.
• Enjoy new technological challenges and solving hard problems.
• Believe a team working well together is smarter than the single smartest person on that team.
• Have grit, drive and a deep sense of ownership.
• Working closely with engineering/development teams to design, build, and maintain systems.
• Troubleshooting issues across the entire technology stack: hardware, software, application, and network.
• Identifying and driving opportunities to improve automation for our platforms; scope and create automation for deployment, management, and visibility of our services.
• Proactively identifying and addressing systems reliability risks.
• Working alongside existing global and regional team members on a follow-the-sun basis.
• Represent the RPE organization in design reviews and operational readiness exercises for new and existing services.
Qualifications – Skill Set:
• Demonstrated ability to troubleshoot problems and debug to identify root cause.
• Hands on experience on enterprise tools such as AppDynamics, Grafana, Splunk, Dynatrace.
• Experience with Ansible, GitHub or any automation/configuration/release management tools.
• Automation-related experience is particularly valued using scripting languages such as python, bash, perl. One higher level language is desired.
• Awareness of, and ability to reason about modern software and systems architectures, including load-balancing, databases, queueing, caching, distributed systems failure modes, micro services, Cloud, etc.
• Practical experience running large scale systems is an advantage.
• Should be able to contribute to system design and architecture with strong database knowledge.
Qualifications/Criterion:
• Background in Computer Science/Engineering or similar field.
We are an equal opportunities employer. We work to provide a supportive and inclusive environment where all individuals can maximize their full potential. Our skilled and creative workforce is comprised of individuals drawn from a broad cross section of the global communities in which we operate and who reflect a variety of backgrounds, talents, perspectives and experiences. Our strong commitment to a culture of inclusion is evident through our constant focus on recruiting, developing and advancing individuals based on their skills and talents.
Experience : Intermediate with 2 to 5 years
Top 3 Must have :
1. Strong experience with Python and / or Shell scripting
2. Strong experience with data base (DB2 knowledges is a plus)
3. Strong communication skills. The consultant will work with business users in day to day basis.
Top 2 Nice to have :
1. Good knowledges of Grafana, Prometheus
2. Good experience with debugging