WorkOS

Your app, Enterprise Ready.

Database Reliability Engineer

Full TimeRemoteTeam 51-200Since 2019Company SiteLinkedIn

Location

United States

Posted

8 days ago

Salary

$175K - $275K / year

Bachelor Degree5 yrs expEnglishAnsibleAWSChefCloudDynamo DBGrafanaPostgre SQLPrometheusPythonRubySQLTerraformGo

Job Description

• Own the reliability, performance, and scalability of WorkOS's PostgreSQL infrastructure. • Analyze and implement best practices for our database clusters, including replication, connection pooling, high availability, and disaster recovery. • Build and maintain observability for database metrics (query performance, replication lag, connection saturation, storage growth) and ensure we meet our database SLOs. • Provide database expertise to product engineering teams through migration reviews, query optimization guidance, and schema design consultation. • Develop automation and self-service tooling that enables engineers to safely interact with databases without bottlenecking on the DBRE team. • Participate in on-call rotations and lead incident response for database-related production issues, performing root cause analysis and implementing permanent fixes. • Plan and manage database capacity, forecasting growth and ensuring our infrastructure can handle increased workloads. • Collaborate with SREs to roll out infrastructure changes to production environments, with a focus on minimizing risk to the data layer. • Document operational procedures, runbooks, and architectural decisions so learnings become repeatable actions and eventually automation. • Drive improvements to backup and recovery strategies, regularly testing and validating disaster recovery procedures.

Job Requirements

  • 5+ years of experience running PostgreSQL in production at scale, with strong knowledge of internals (WAL, MVCC, vacuum tuning, query planner, indexing, replication).
  • Solid software engineering skills. You write production-quality code, not just scripts. Experience with Python, Go, Ruby, or similar languages.
  • Experience with infrastructure-as-code and configuration management (Terraform, Ansible, Chef, or similar).
  • Strong SQL skills and the ability to review and optimize complex queries for high-throughput, low-latency environments.
  • Experience with database high-availability patterns: streaming replication, connection pooling (PgBouncer), failover automation (Patroni or similar).
  • Familiarity with cloud database services on AWS (RDS, Aurora, DynamoDB, ElastiCache) or equivalent platforms.
  • Experience with monitoring and observability tools (Datadog, Prometheus, Grafana, or similar) applied to database workloads.
  • Comfort with on-call responsibilities and a track record of effective incident response.
  • Strong written and verbal communication skills. You document your work and share context proactively.
  • A proactive, ownership-driven mindset. When you see something broken, you fix it. When you see a pattern of toil, you automate it.

Benefits

  • Competitive pay
  • Substantial equity grants
  • Healthcare insurance (Medical, Dental and Vision) for you and your family
  • 401k matching
  • Wellness and fitness monthly allowances
  • PTO + paid holidays + unlimited sick leave
  • Autonomy and flexibility with remote work

Related Categories

Related Job Pages