SRE Job Description
We are seeking a skilled and motivated Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our systems and services. You will work closely with development and operations teams to identify and resolve issues, automate processes, and improve overall system stability.
- Monitor system performance and identify potential issues.
- Develop and implement automation tools and processes.
- Troubleshoot and resolve system outages and performance bottlenecks.
- Collaborate with development teams to ensure code quality and reliability.
- Participate in on-call rotations and respond to incidents.
Required Skills
- Experience with monitoring and alerting tools.
- Strong understanding of Linux/Unix systems.
- Proficiency in scripting languages such as Python or Bash.
- Experience with cloud platforms such as AWS, Azure, or GCP.