Site Reliability Engineer
Job Description – Site Reliability Engineer
Equinix is the world’s digital infrastructure company, operating 210 data centers across the globe and providing interconnections to all the key clouds and networks. Businesses need one place to simplify and bring together fragmented, complex infrastructure that spans private and public cloud environments. Our global platform allows customers to place infrastructure wherever they need it and connect it to everything they need to succeed.
At Equinix, we help the world’s digital leaders scale with agility, speed the launch of digital services, deliver world-class experiences, and transform people’s lives. Our culture is based on collaboration and the growth and development of our teams.
We hire hardworking people who thrive on solving challenging problems and give them opportunities to hone new skills, and try new approaches, as we grow our product portfolio with new software and network architecture solutions. We embrace diversity in thought and contribution and are committed to providing an equitable work environment. that is foundational to our core values as a company and is vital to our success
Responsibilities of Site Reliability Engineer:
As part of this team, you will:
- Be responsible for design and implementation of new strategies in an Agile Environment to optimize all aspect of the CI, release and deployment processes using latest container and virtualization techniques (Docker, Kubernetes, Ansible, AWS ECS, et al)
- Provide DevOps architecture implementation and operational support.
- Design, implement, and extend automation tools for infrastructure, application, and container management.
- Monitor production, staging, test and development environments for a myriad of applications in an agile and dynamic organization.
- Strive to improve the stability, security, efficiency, scalability, and availability of production systems by applying software engineering practices and by implementing application monitors and alerts.
- Resolve future needs for capacity and investigate new products and/or features.
- Produce update and/or endorse Site Reliability Engineering standards, guidelines, and procedures.
- Excellent troubleshooting skills with the ability to learn new technologies in complex distributed systems.
- A successful Engineer will take steps on his or her own to isolate issues and resolve root cause through investigative analysis.
- The Engineer should be an independent problem-solver who is focused and capable of exhibiting deftness to handle multiple simultaneous competing priorities and deliver solutions in a timely manner.
- BS in Computer Science or related field.
- Overall experience of 6 to 8 years of relevant industry experience
- At least 4 years relevant experience e.g., SRE /Devops etc.
- Experience working as Site Reliability or Devops Engineer for highly available systems.
- Experience working in AWS public cloud is must.
- Leverage open technology such as Docker, Kubernetes is a must
- Experience in managing and administrating in at least one CI/CD tools– Jenkins, Argo, Git Actions etc.,
- Experience working with Git.
- Experience in one of the IaC frameworks -Terraform, Ansible etc., Basic knowledge of server virtualization and storage
- Experience in automating SRE tasks
- Experience working with Agile SCRUM teams
- Open to learn new Tools and technologies.
- Excellent oral and written communication in English
Good to Have:
- Administration experience in GitHub
- Experience in any of the monitoring tools – Appdynamics, Data dog, Nagios, Dynatrace, new relic or Prometheus.
Prosimy powołaj się na portal Mamo Pracuj składając aplikację
Equinix is the world’s digital infrastructure company, operating 210 data centers across the globe and providing interconnections to all the key clouds and networks. Businesses need one place to simplify and bring together fragmented, complex infrastructure that spans private and...