Cloud SRE Specialist
Job Description
Level 3
Experienced engineer that is required to work on public cloud projects with opportunities to work on both Azure and AWS in a global financial organization. That is focused on improving operational stability and reliability.
This role requires the successful candidate to have:
– Prior SRE experience or the willingness to specialize in the field
– Experience in transforming IT Service Delivery to one of a Devops culture
– Strong infrastructure, development and Devops skills
– Proven track record of service ownership, accountability and building trusted relationships
– Experience in Change and Incident Management concepts Primary responsibilities
– Work in a globally distributed team to provide innovative and robust public Cloud solutions
– Collaborate with vendors to develop and deploy Cloud services to meet customer expectations
– Collaborate with IT Security to ensure necessary controls to Cloud services are deployed and tested
– Design, optimize and document the operational aspects of the Cloud platform
– Develop Infra as Code to automate cloud deployments
– Develop automation workflows in CI/CD pipeline adhering to change management process
– Facilitate tabletop exercises for incident management processes, chaos engineering
– Evaluate and implement emerging Devops tools
– Complex troubleshooting of on-premise and cloud environment issues
– Build and integrate observability into cloud platforms and solutions
– Highlight and reduce toil with automation, architecture improvements, and process improvements
– Prior SRE experience or the willingness to specialize in the field
– Experience in transforming IT Service Delivery to one of a Devops culture
– Strong infrastructure, development and Devops skills
– Proven track record of service ownership, accountability and building trusted relationships
– Experience in Change and Incident Management concepts Primary responsibilities
– Work in a globally distributed team to provide innovative and robust public Cloud solutions
– Collaborate with vendors to develop and deploy Cloud services to meet customer expectations
– Collaborate with IT Security to ensure necessary controls to Cloud services are deployed and tested
– Design, optimize and document the operational aspects of the Cloud platform
– Develop Infra as Code to automate cloud deployments
– Develop automation workflows in CI/CD pipeline adhering to change management process
– Facilitate tabletop exercises for incident management processes, chaos engineering
– Evaluate and implement emerging Devops tools
– Complex troubleshooting of on-premise and cloud environment issues
– Build and integrate observability into cloud platforms and solutions
– Highlight and reduce toil with automation, architecture improvements, and process improvements
Required Skills:
– Experience with Infrastructure as Code
– Experience with CI/CD pipelines
– Sound knowledge of server infrastructure and cloud computing
– Good knowledge of security (SAML, OAuth, OpenID, Kerberos, Policies, entitlements etc.)
– Experience with architecting and maintaining high availability production systems
– Strong development skills in Python
– Experience in software installation, configuration and patching
– Hands on experience in playbook and infrastructure automation (Ansible, Terraform)
– Implementing open source observability tools (Prometheus, Grafana, or Open Telemetry)
– Experience with Agile and DevOps methodologies
– Developing monitoring architecture and implementing monitoring agents, dashboards, escalations and alerts
– Ability to communicate technical issues and ideas to colleagues and customers with clarity
– Experience creating technical architecture documentation
– Experience with Infrastructure as Code
– Experience with CI/CD pipelines
– Sound knowledge of server infrastructure and cloud computing
– Good knowledge of security (SAML, OAuth, OpenID, Kerberos, Policies, entitlements etc.)
– Experience with architecting and maintaining high availability production systems
– Strong development skills in Python
– Experience in software installation, configuration and patching
– Hands on experience in playbook and infrastructure automation (Ansible, Terraform)
– Implementing open source observability tools (Prometheus, Grafana, or Open Telemetry)
– Experience with Agile and DevOps methodologies
– Developing monitoring architecture and implementing monitoring agents, dashboards, escalations and alerts
– Ability to communicate technical issues and ideas to colleagues and customers with clarity
– Experience creating technical architecture documentation
Desired Skills:
– Knowledge of security controls for the Public cloud (encryption of data in motion/rest and key management)
– Hands-on experience with Azure and/or AWS design and implementation
– Knowledge of Linux and Windows containers
– Experience with Open Source Cloud and configuration management tools (Terraform)
– Bachelor degree in a related field
– Knowledge of security controls for the Public cloud (encryption of data in motion/rest and key management)
– Hands-on experience with Azure and/or AWS design and implementation
– Knowledge of Linux and Windows containers
– Experience with Open Source Cloud and configuration management tools (Terraform)
– Bachelor degree in a related field