Ferns N Petals - Senior Site Reliability Engineer - Cloud Infrastructure
Please click on the Apply to verify the status of jobs posted more than 15 days ago, as they may have expired. Similar Jobs
Job Description
We are seeking an experienced SRE Manager to lead and scale our Site Reliability Engineering function. This role will be responsible for ensuring system reliability, scalability, and performance, while driving automation and operational excellence across the platform.
The ideal candidate will have a strong background in DevOps, cloud infrastructure, and distributed systems, along with proven experience in team leadership and incident management.
Key Responsibilities
Team Leadership & Management :
- Lead, mentor, and manage a team of SRE/DevOps engineers.
- Foster a culture of ownership, reliability, and continuous improvement.
- Drive team performance, skill development, and best practices adoption.
- Define and implement SRE frameworks including SLIs, SLOs, and error budgets.
- Ensure high levels of system availability, scalability, and performance.
- Continuously monitor and optimize system health and uptime.
- Drive automation initiatives across infrastructure, deployment, and operations.
- Own and optimize CI/CD pipelines and release management processes.
- Promote infrastructure as code (IaC) and DevOps best practices.
- Lead incident response processes, ensuring timely resolution of production issues.
- Conduct root cause analysis (RCA) and implement preventive measures.
- Establish processes to minimize downtime and improve resilience.
- Design and implement monitoring, logging, and observability frameworks.
- Utilize tools to proactively detect and resolve system anomalies.
- Manage and optimize cloud infrastructure across AWS, Azure, or GCP.
- Ensure efficient resource utilization, cost optimization, and scalability.
- Implement disaster recovery and business continuity plans.
- Collaborate with engineering, product, and operations teams to improve system reliability.
- Align infrastructure and reliability strategies with business goals.
- 7+ years of experience in SRE / DevOps roles.
- 3+ years of experience in team management or leadership roles.
Looking to get Placed? Try our Placement Guarantee Plan
- Hands-on experience with cloud platforms (AWS, Azure, or GCP).
- Strong experience with CI/CD tools such as Jenkins, GitLab CI, or similar.
- Expertise in containerization and orchestration (Docker, Kubernetes).
- Proficiency in scripting languages such as Python or Bash.
- Experience with Infrastructure as Code (Terraform, CloudFormation).
- Strong knowledge of monitoring and observability tools (Prometheus, Grafana, ELK).
- Experience working with microservices architecture.
- Relevant cloud certifications.
- Exposure to high-scale or e-commerce platforms.
- Knowledge of chaos engineering practices.
- Leadership and team management
- Problem-solving and analytical thinking
- Ownership and accountability
- Stakeholder management and communication
- Continuous improvement mindset
Skills
PythonDevopsDistributed SystemsKubernetesMicroservices ArchitectureScripting LanguagesCloudIf an employer asks you to pay any kind of fee, please notify us immediately. Jobaaj does not charge any fee from the applicants and we do not allow other companies also to do so.
About Company
Important dates & deadlines?
Application Deadline
23 Jun 26, 07:28 PM IST
Similar Jobs
View All

