Benchmark
When It Matters®
Senior Software Engineer, Site Reliability
Location
United States
Posted
28 days ago
Salary
Not specified
Bachelor Degree5 yrs expEnglishAWSCloudGrafanaPrometheusPythonTerraform
Job Description
• Contribute to the design, development, and delivery of features that enhance system reliability and scalability.
• Define, measure, and improve SLIs, SLOs, and error budgets in collaboration with engineering teams.
• Participate in building a culture of reliability through knowledge sharing, documentation, and process improvements.
• Implement and improve observability tooling and practices to monitor the health and performance of production systems.
• Participate in incident management, including on-call rotations, root cause analysis, and postmortem reviews.
• Lead smaller initiatives or components of larger projects, ensuring technical quality and operational readiness.
• Collaborate with software engineering, security, and product teams to ensure resilient and secure system design.
• Mentor junior engineers, sharing expertise in SRE principles and AWS best practices.
• Contribute to automation efforts to reduce toil and improve efficiency of operational processes.
Job Requirements
- 5+ years of experience in Site Reliability Engineering, DevOps, or Software Engineering with a focus on production operations.
- Strong knowledge of AWS cloud services and cloud-native architectures.
- Proficiency in scripting or programming languages (e.g., Python, Bash).
- Experience with observability tools (e.g., CloudWatch, Datadog, Prometheus, Grafana).
- Familiarity with infrastructure-as-code tools (e.g., Terraform, CloudFormation) and CI/CD pipelines.
- Strong problem-solving skills and ability to work cross-functionally.
- Some experience mentoring or coaching junior engineers.
Benefits
- Health insurance
- Retirement plans
- Paid time off
- Flexible work arrangements
- Professional development