Site Reliability Engineer

February 24, 2025
Apply Now

Apply for this job

Upload CV (doc, docx, pdf)

Job Description

Job Description:

Location: Montreal (day 1 onboarding onsite – in-office presence required 3x/week)

Customer IAM – Site Reliability Engineer (RE) We are seeking a highly motivated and skilled individual to join our global Customer IAM Reliability Engineering team. This role is part of the Identity and Access Management (IAM) department under the Cyber, Data, Risk, and Resilience (CDRR) division. The Site Reliability Engineer is part of a cross functional DevOps Squad, collaborating with Software Engineers, Product Owners, and others located across the globe. The team is supporting Company’s Customer facing Authentication & Authorization services as well as Credential Management and Login applications. The ideal candidate will have a strong foundation in these tools and prior experience with troubleshooting Java Spring Boot applications running in a Linux environment. This position requires technical expertise and a proactive approach to problem-solving and an intermediate level of python programming skills. Whether you are an aspiring junior or an experienced professional, we welcome your application to join our innovative and collaborative team.

Ideal Candidate Qualifications:
• Experience in Identity and Access Management (IAM)
• Previous experience in IT Operations, Reliability Engineering, or DevOps
• Exposure to Agile / DevOps environments
• Knowledge of Site Reliability Engineering (SRE) principles and methodology
• ITIL, CISSP or similar certification (optional)
• Strong knowledge of TCP/IP
• Working knowledge of SAML, OIDC, OAUTH
• Experience with Java and Spring Boot Framework
• Knowledge of Industry standard observability products (Such as Grafana, Prometheus, Splunk)
• Champion of Infrastructure as Code (IAC)
Responsibilities:
• Provide first-line support for during large-scale outages, including post-mortem, pre-mortems and problem management with a data-driven strategies and a code-first approach to problem solving.
• Prepare and execute change management activities, often automating and creating tools where necessary.
• Collaborate with partner enterprise technology teams and provide support to stakeholders and our L2 operations team.
• Contribute to performance and training assessments of team members.
• Improving stability of our platforms by identifying and implementing alert automation and self-healing functions where possible.
• Performance and scalability
• Ensure systems can scale seamlessly to handle increased load and monitor the performance of our applications an infrastructure using our service level objectives (SLO) and service level indicators (SLI).
• Participate in on-call rotations (weekday and weekend cycles)
• Share responsibilities and knowledge across the team, engaging with our community and stakeholders to gather feedback to improve our systems.
• Address security and compliance issues ensuring we are meeting industry standards and have implemented best practices for security and data protection with our systems.
Technical Skills:
• Knowledge of enterprise security standards and concepts
• Understanding of general enterprise infrastructure concepts and troubleshooting, including network, storage, web infrastructure, middleware, etc.
• Basic knowledge of operating system administration on Windows (Active Directory) and Red Hat Linux platforms (?=RHEL7+)
• Proficiency in at least one scripting language such as PowerShell, Python, Bash/Shell
• Functional knowledge of C/C++ and Java can be advantageous for some tooling.
• Experience working within large enterprise architectures.
• Familiarity with the Software Development Life Cycle (SDLC) and development environment tooling (GitHub, Jenkins, Visual Studio Code, etc.)
• Familiarity with visualization and plant and incident management tools such as Splunk, Grafana, ServiceNow, Jira, Bitbucket, PagerDuty, PowerBI
• Strong interest in automation, downtimeless deployments, using code to solve operational issues.
Recommended:
• Foundational knowledge of authentication protocols in the broader IAM domain, such as OpenID Connect, SAML, Kerberos, and Radius, and multifactor authentication solutions like RSA SecurID, Cisco Duo Security, FIDO, etc
Soft Skills:
• Excellent written and oral English communication skills; capable of writing documentation, making presentations, and positively interacting with colleagues and customers
• Independent problem-solving attitude, highly motivated, and self-directed
• Comfortable working within an operations and support team with end-user interaction and periodic on-call responsibilities
• Advocate of SRE principles
• Good organizational skills
Apply Now

Apply for this job

Upload CV (doc, docx, pdf)