GitLab
Build software faster. The One DevOps Platform enables your entire org to collaborate around your code. We're hiring.
Senior Site Reliability Engineer, Database Excellence
Location
California + 1 moreAll locations: California, New York
Posted
1 day ago
Salary
$124.3K - $266.4K / year
Bachelor DegreeEnglishAnsibleChefKubernetesPostgre SQLPuppetRubySQLTerraformGo
Job Description
• Automate operational tasks across all environments, from package updates and configuration changes to provisioning of user-facing services, so manual effort becomes the exception, not the rule.
• Design and maintain PostgreSQL database infrastructure components that allow GitLab.com to scale reliably while supporting hundreds of thousands of concurrent users.
• Respond to production incidents and platform emergencies, working with peer SREs to diagnose and resolve database-related issues quickly and thoroughly.
• Build observability systems that monitor database health, predict capacity needs based on usage patterns, and alert on symptoms rather than outages.
• Develop and ship database performance solutions in collaboration with product and engineering teams, including query optimization, migration reviews, and infrastructure recommendations.
• Create self-service tools and automation, using Terraform, Ansible, Chef, and GitLab ChatOps, that empower engineering teams to manage their own database interactions safely.
• Document decisions, learnings, and operational procedures so that knowledge becomes repeatable actions and eventually becomes automation.
• Participate in regularly scheduled on-call rotations to ensure GitLab.com remains operational during off-hours and weekends when necessary.
Job Requirements
- Hands-on experience running PostgreSQL in high-growth, large production environments, including both self-managed infrastructure and database-as-a-service platforms.
- Expertise with infrastructure automation and configuration management tools such as Ansible, Terraform, Chef, or Puppet to automate operational tasks and drive system reliability.
- Solid understanding of SQL, PL/pgSQL, data modeling, and data structure design; ability to analyze PostgreSQL internals to troubleshoot and optimize systems.
- Experience working in large-scale, distributed SaaS production environments where you've managed reliability, performance, and scalability challenges at significant scale.
- Strong written communication skills and commitment to documentation; you thrive in remote, asynchronous environments and share knowledge effectively across your team.
- Proactive, hands-on approach where you identify issues, take ownership of solutions, and contribute improvements to infrastructure and code.
- Capability to mentor junior team members and develop deep expertise in your domain areas, then share that knowledge to help others grow.
- Backend engineering experience with languages such as Ruby or Go, and/or familiarity with OLAP databases like Clickhouse.
- Familiarity with Kubernetes and operators for managing database infrastructure and stateful services in containerized environments.
Benefits
- Benefits to support your health, finances, and well-being
- Flexible Paid Time Off
- Team Member Resource Groups
- Equity Compensation & Employee Stock Purchase Plan
- Growth and Development Fund
- Parental leave
- Home office support