Job Overview
We are seeking a skilled and experienced Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our systems. The ideal candidate will have a strong background in both software development and operations, with a focus on automation and proactive problem-solving. As an SRE, you will play a critical role in maintaining our infrastructure and ensuring seamless service delivery.
Key Responsibilities
- Implement and manage infrastructure as code using Terraform and Pulumi.
- Design and implement automated deployment pipelines using Spinnaker.
- Monitor system performance and identify potential bottlenecks.
- Participate in incident response and root cause analysis.
- Collaborate with development teams to improve system design and reliability.
Required Skills
- Proficiency in Terraform and Pulumi for infrastructure management.
- Experience with Spinnaker for continuous delivery.
- Strong understanding of cloud computing principles.
- Excellent problem-solving and troubleshooting skills.