Job Title: Site Reliability Engineer (SRE)
Experience: 12 - 15+ Years- Strong hands-on experience with Grafana (Must Have)
- Experience in Incident Management (Must Have)
We are looking for a skilled Site Reliability Engineer (SRE) with strong experience in monitoring, incident management, and cloud-native environments. The ideal candidate should have hands-on expertise in Grafana, Kubernetes, AppDynamics , and a solid understanding of production support along with basic Java knowledge .
Key Responsibilities:- Monitor system performance, availability, and reliability using tools like Grafana and AppDynamics
- Manage and respond to production incidents, ensuring quick resolution and minimal downtime
- Implement and improve incident management processes , including RCA (Root Cause Analysis)
- Work with development and DevOps teams to ensure system reliability and scalability
- Deploy, manage, and troubleshoot containerized applications in Kubernetes environments
- Automate operational tasks and improve system efficiency
- Analyze logs, metrics, and traces to identify and resolve performance bottlenecks
- Participate in on-call rotations and support critical production systems
- Strong hands-on experience with Grafana (Must Have)
- Experience in Incident Management (Must Have)
- Hands-on experience with AppDynamics
- Solid experience in Kubernetes
- Basic knowledge of Java for debugging and understanding application logs
- Familiarity with monitoring tools, alerting, and observability practices
- Experience with Linux/Unix environments
- Understanding of CI/CD pipelines and DevOps practices
- Experience with cloud platforms (AWS/Azure/Google Cloud Platform)
- Knowledge of scripting (Shell/Python)
- Exposure to microservices architecture
Job ID: 523507395
Originally Posted on: 6/3/2026
Want to find more Technology opportunities?
Check out the 165,505 verified Technology jobs on iHireTechnology
Similar Jobs