TalentAQ

TalentAQ

SRE Lead

EngineeringFull Time10+ yearsOverland Park, Kansas

Required Skills
26 skills

Jira
Teams
Confluence
Kubernetes
Docker
AWS
Azure
GCP
Kafka
Cassandra
PagerDuty
xMatters
Splunk
AppDynamics
Grafana
Prometheus
Terraform
Ansible
Python
Bash
ShellScript
ServiceNow
ELK Stack
Elasticsearch
Logstash
Kibana

Job Description

Key Responsibilities: Qualifications Minimum 10+ years of experience in relevant area. Team Leadership: Strong ability to mentor and manage teams using collaborative platforms like Jira, Teams, and Confluence. Excellent communication and collaboration skills. System Design and Architecture: Expertise in designing scalable and reliable systems using tools like #Kubernetes, #Docker, and #cloudservices (AWS, Azure, GCP). Experience with Kafka, Cassandra, and other infrastructure tools. Familiarity with middleware technologies such as Kafka, APIs, and Microservices architecture. Incident Management: Proficiency in managing incidents using tools like PagerDuty, xMatters, alongside conducting effective post-mortems. Monitoring and Analytics: Experience with monitoring tools such as Splunk, AppDynamics, #Grafana, #Prometheus, etc for proactive issue detection. Automation: Skilled in using automation tools like Terraform, Ansible, and scripting languages (Python, Bash, ShellScript) to streamline workflows. Capacity Planning: Familiarity with performance analysis and forecasting tools to ensure infrastructure scalability. SLA/SLO Management: Defining and tracking reliability goals using SRE best practices and tools like ServiceNow. Continuous Improvement: Ability to assess system reliability with tools like ELK Stack (Elasticsearch, Logstash, Kibana) and implement enhancements.
Key Responsibilities: Qualifications Minimum 10+ years of experience in relevant area. Team Leadership: Strong ability to mentor and manage teams using collaborative platforms like Jira, Teams, and Confluence. Excellent communication and collaboration skills. System Design and Architecture: Expertise in designing scalable and reliable systems using tools like #Kubernetes, #Docker, and #cloudservices (AWS, Azure, GCP). Experience with Kafka, Cassandra, and other infrastructure tools. Familiarity with middleware technologies such as Kafka, APIs, and Microservices architecture. Incident Management: Proficiency in managing incidents using tools like PagerDuty, xMatters, alongside conducting effective post-mortems. Monitoring and Analytics: Experience with monitoring tools such as Splunk, AppDynamics, #Grafana, #Prometheus, etc for proactive issue detection. Automation: Skilled in using automation tools like Terraform, Ansible, and scripting languages (Python, Bash, ShellScript) to streamline workflows. Capacity Planning: Familiarity with performance analysis and forecasting tools to ensure infrastructure scalability. SLA/SLO Management: Defining and tracking reliability goals using SRE best practices and tools like ServiceNow. Continuous Improvement: Ability to assess system reliability with tools like ELK Stack (Elasticsearch, Logstash, Kibana) and implement enhancements.

Similar Jobs

10000 jobs available

Flairtech Solutions Inc
Engineering5+ years
GCP
Cloud Composer
Airflow
+25 more
Mindstream
EngineeringContract4-10 years
Kubernetes
OpenShift
Bash
+10 more
Engineering8+ years
DevOps
SRE
CI/CD
+19 more
Engineering10+ years
Python
Kubernetes
Terraform
+17 more
EngineeringContract10+ years
Python
Kubernetes
GCP
+8 more
EngineeringContract
Python
Kubernetes
GCP
+7 more