Site Reliability Engineer

Department Icon IT / Software Development & Related
102+ Applicants
Posted: 11 months ago
0-1 years
Bengaluru / Bangalore, Karnataka
Work from Office

Posted: 11 months ago
|
Applicants: 102+
Job Description
About Company
Similar Jobs
Please verify your account first! Send OTP

Please click on the Apply to verify the status of jobs posted more than 15 days ago, as they may have expired. Similar Jobs

Job Description

As a Service Reliability Engineer (SRE) you will take a multifaceted approach to ensure technical excellence and operational efficiency within the infrastructure domain. Specializing in reliability, resilience and system performance, you take a lead role in championing the principles of Site Reliability Engineering. By strategically integrating automation, monitoring and incident response, you facilitate the evolution from traditional operations to a more customer-focused and agile approach. Emphasizing shared responsibility and a commitment to continuous improvement, you cultivate a collaborative culture, enabling organizations to meet and exceed their reliability and business objectives.

Job responsibilities

? You will be responsible for understanding requirements or SRE goals in depth from both tech and business perspectives

? You will provide solutions to improve reliability, including identifying and implementing mechanisms and architectures that enable fault tolerance and faster median time to respond and median time to detect

? You will be responsible for enhancing the incident management process, including the development of an incident prioritization matrix, triage, communication, mitigation, post-mortem analysis and implementation of corrective actions

? You will manage client stakeholder expectations and queries during production incidents, providing detailed technical analysis of issues and remediation plans for mitigation and prevention in future, and act as the interface for C-level executives, if or when needed

? You will be a liaison with client engineering teams, build trust and productive relationships with senior client stakeholders and team leads to influence them in making better decisions

? You will be responsible for identifying opportunities for enhancing system performance and reliability in alignment with business SLAs, SLOs, KPIs and objectives, and provide guidance and assistance to SRE teams in implementing the identified improvements

? As an SRE expert, you will collaborate with Thoughtworks application development leads and solution architects, recommending changes in system design and adopting best practices for improved reliability from day one

? You will oversee and mentor other SREs on the team, contributing to their growth and development

Job qualifications

Technical Skills

? You can program with one or more high-level languages such as Python, Golang, Shell scripting, Ruby or Java

? You are familiar with DevOps and GitOps practices, driving the integration of observability automation into CI/CD pipelines, e.g.: GitLab, Jenkins, CircleCI or equivalent

? You have in-depth knowledge of configuration management and Infrastructure as Code (IAC) tools such as Terraform, Ansible, ARM and CloudFormation for provisioning and managing infrastructure

? You have an expertise in observability, logs, tracing and monitoring tools such as Grafana (Loki and Tempo), Prometheus, Graylog, Jaeger, Zipkin, ELK stack or equivalent

? You have a strong understanding of container-based architecture and hands-on experience with orchestration tools such as Kubernetes, AWS EKS, Docker Swarm, Nomad, etc.

? You have in-depth experience in application and infrastructure performance tuning and scaling to handle heavy loads under different scenarios e.g.: Periodic traffic load and tsunami patterns

? You have a good understanding of essential concepts such as quality gates encompassing SLI/SLO/SLA, chaos engineering, golden signals, blameless postmortem methodologies, synthetic monitoring, distributed tracing, end-user monitoring and performance testing

? You have experience with network load balancing, security tech stacks, Transport Layer Security (TLS) and certificate management, and an understanding of standard networking protocols and configurations

Looking to get Placed? Try our Placement Guarantee Plan

Professional Skills

? You have strong communication and articulation skills, and are proficient in English

? You are able to convey resolutions to audiences with varying degrees of technical/business proficiency and bring them to consensus

? You have excellent problem-solving and analytical skills, with a focus on continuous improvement

? You have good listening and presentation skills

? You solve challenging problems and difficult to debug issues with a never give up attitude

? You can collaborate with cross-functional engineering teams to conduct capacity planning and scalability assessments, and design solutions for handling current and future growth

? You have the ability to work under pressure, with composure, during production incidents

? You understand requirements provided by the client on both technical and business aspects, and can break them down for successful implementation

? Youre willing to be part of a rotation- and need-based, 24x7 available team

Skills

CPythonDevopsGolangJavaKubernetesShell ScriptingTesting

If an employer asks you to pay any kind of fee, please notify us immediately. Jobaaj does not charge any fee from the applicants and we do not allow other companies also to do so.

About Company

Bounteous is a global experience agency that helps Fortune 500 brands create and implement digital strategies. They specialize in areas like eCommerce, digital marketing, and customer experience.

Important dates & deadlines?

Application Deadline

05 Jul 25, 04:19 PM IST

Similar Jobs

View All
Loading...
Bag Logo
Jobaaj
Don't Miss out any Updates

Subscribe now for the latest job alerts
and never miss an update

Job Alert
Google hiring for Specific Roles Apply Now!
1 min ago
New Opportunity
Amazon is hiring freshers Apply Now!
5 min ago
Featured Jobs
Microsoft opening 50+ positions Apply Now!
10 min ago

Site Reliability Engineer

Share with