WorkOS
Your app, Enterprise Ready.
Database Reliability Engineer
Location
United States
Posted
8 days ago
Salary
$175K - $275K / year
Bachelor Degree5 yrs expEnglishAnsibleAWSChefCloudDynamo DBGrafanaPostgre SQLPrometheusPythonRubySQLTerraformGo
Job Description
• Own the reliability, performance, and scalability of WorkOS's PostgreSQL infrastructure.
• Analyze and implement best practices for our database clusters, including replication, connection pooling, high availability, and disaster recovery.
• Build and maintain observability for database metrics (query performance, replication lag, connection saturation, storage growth) and ensure we meet our database SLOs.
• Provide database expertise to product engineering teams through migration reviews, query optimization guidance, and schema design consultation.
• Develop automation and self-service tooling that enables engineers to safely interact with databases without bottlenecking on the DBRE team.
• Participate in on-call rotations and lead incident response for database-related production issues, performing root cause analysis and implementing permanent fixes.
• Plan and manage database capacity, forecasting growth and ensuring our infrastructure can handle increased workloads.
• Collaborate with SREs to roll out infrastructure changes to production environments, with a focus on minimizing risk to the data layer.
• Document operational procedures, runbooks, and architectural decisions so learnings become repeatable actions and eventually automation.
• Drive improvements to backup and recovery strategies, regularly testing and validating disaster recovery procedures.
Job Requirements
- 5+ years of experience running PostgreSQL in production at scale, with strong knowledge of internals (WAL, MVCC, vacuum tuning, query planner, indexing, replication).
- Solid software engineering skills. You write production-quality code, not just scripts. Experience with Python, Go, Ruby, or similar languages.
- Experience with infrastructure-as-code and configuration management (Terraform, Ansible, Chef, or similar).
- Strong SQL skills and the ability to review and optimize complex queries for high-throughput, low-latency environments.
- Experience with database high-availability patterns: streaming replication, connection pooling (PgBouncer), failover automation (Patroni or similar).
- Familiarity with cloud database services on AWS (RDS, Aurora, DynamoDB, ElastiCache) or equivalent platforms.
- Experience with monitoring and observability tools (Datadog, Prometheus, Grafana, or similar) applied to database workloads.
- Comfort with on-call responsibilities and a track record of effective incident response.
- Strong written and verbal communication skills. You document your work and share context proactively.
- A proactive, ownership-driven mindset. When you see something broken, you fix it. When you see a pattern of toil, you automate it.
Benefits
- Competitive pay
- Substantial equity grants
- Healthcare insurance (Medical, Dental and Vision) for you and your family
- 401k matching
- Wellness and fitness monthly allowances
- PTO + paid holidays + unlimited sick leave
- Autonomy and flexibility with remote work