Manager of Site Reliability Engineering

Full TimeRemote

Location

United States

Posted

19 hours ago

Salary

Not specified

No structured requirement data.

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

We’re looking for a Manager of Site Reliability Engineering (SRE) who is passionate about building resilient systems and leading teams that keep critical services running smoothly. In this role, you’ll guide a team responsible for the reliability, performance, and operational health of our production environments.

You’ll partner closely with engineering leaders to ensure our systems remain secure, scalable, and available for the organizations and communities who depend on them.

As the Manager of Site Reliability Engineering, you will lead a team responsible for the operational reliability of Daxko’s production platforms. Your work will focus on creating stable, high-performing systems while empowering your team to continuously improve how we operate and support our products.

Lead and support a team responsible for the reliability and performance of production systems, which includes:
- Setting clear performance expectations and goals for team members
- Providing ongoing coaching and real-time feedback
- Ensuring team members have the training and resources they need to succeed
- Coordinating on-call rotations and operational coverage
- Supporting the team during critical incidents and outages
- Managing team staffing, including hiring and headcount planning
Prioritize and coordinate work across operational initiatives, deployments, upgrades, and infrastructure improvements
Ensure high levels of system uptime, data integrity, and operational stability
Partner with Engineering Leads to align platform operations with product development needs
Maintain business continuity across all production assets
Monitor system health, performance, and capacity to proactively identify and resolve issues
Serve as a technical escalation point for complex infrastructure or platform challenges
Provide regular reporting on system availability, response times, and capacity trends
Ensure operations meet security, compliance, and regulatory requirements
Support and coordinate the team’s on-call rotation and incident response processes
Continuously improve operational practices through automation, tooling, and monitoring

Qualifications

Bachelor’s degree in a technical discipline or equivalent professional experience
3–5 years of experience leading or managing globally distributed engineering teams
3–5 years of experience in a Site Reliability Engineering or similar infrastructure-focused role

Requirements

Strong analytical and problem-solving skills
Clear communication and collaboration skills
Experience leading teams in fast-moving technical environments
The ability to balance multiple priorities and make thoughtful decisions under pressure
Strong organizational and time management skills
A customer-focused mindset and commitment to system reliability

Preferred Experience

Experience serving as a technical lead on infrastructure or platform teams
Experience with modern observability and monitoring tools, such as OpenTelemetry, Instana, LogicMonitor, PagerDuty, or OpsGenie
Experience with infrastructure and automation tooling such as GitLab CI, Jenkins, Chef, Terraform, Elasticsearch, Kubernetes, or Rancher
Scripting experience in Ruby, Python, or Bash
Familiarity with SOC, PCI, or GDPR compliance standards
Experience working with issue tracking and collaboration tools such as the Atlassian suite
Experience supporting or developing applications built with Java, PHP, or Node
Experience automating operational processes and repetitive tasks

Job Requirements

Bachelor’s degree in a technical discipline or equivalent professional experience
3–5 years of experience leading or managing globally distributed engineering teams
3–5 years of experience in a Site Reliability Engineering or similar infrastructure-focused role
Strong analytical and problem-solving skills
Clear communication and collaboration skills
Experience leading teams in fast-moving technical environments
The ability to balance multiple priorities and make thoughtful decisions under pressure
Strong organizational and time management skills
A customer-focused mindset and commitment to system reliability
Preferred Experience
Experience serving as a technical lead on infrastructure or platform teams
Experience with modern observability and monitoring tools, such as OpenTelemetry, Instana, LogicMonitor, PagerDuty, or OpsGenie
Experience with infrastructure and automation tooling such as GitLab CI, Jenkins, Chef, Terraform, Elasticsearch, Kubernetes, or Rancher
Scripting experience in Ruby, Python, or Bash
Familiarity with SOC, PCI, or GDPR compliance standards
Experience working with issue tracking and collaboration tools such as the Atlassian suite
Experience supporting or developing applications built with Java, PHP, or Node
Experience automating operational processes and repetitive tasks

Related Categories

Engineering Manager

Related Job Pages

Remote Full-time Jobs (US)More US Remote Jobs

Manager of Site Reliability Engineering

Job Description

Job Requirements

Related Guides

Related Categories

Related Job Pages