AI Agent Engineer & Enablement Lead

ContractRemote

Location

United States

Posted

10 hours ago

Salary

Not specified

No structured requirement data.

Job Description

Please note: This is an independent contractor role. The benefits described below are applicable to full-time employees only.

Three month contract to start.

100% remote and must be performed in the States.

Cover letters are welcomed, appreciated, and reviewed (by a human).

Last, before you apply, make sure you have the following experience. While you'll be reviewed by a human, you'll be passed on if you lack:

Hands-on experience building AI-enabled applications (LLM apps, tool-using agents, or workflow automation)
Strong prompt engineering skills: ability to write system prompts, skill definitions, and eval rubrics that produce consistent, high-quality agent behavior.
Strong testing and ops discipline: unit/integration tests, monitoring/logging, and incident response.
Demonstrated ability to teach and coach — whether through mentoring, workshops, pair programming, or documentation. You should enjoy making others more capable, not just shipping your own work.
4+ years of s/w engineering experience (backend, integrations, automation, platform).

About PrescriberPoint:

The brainchild of David Ricks, CEO of Lilly, and incubated by Boston Consulting Group’s Digital Ventures, PrescriberPoint is a Series A healthtech whose mission is to help Health Care Professionals reduce the time, tension, and anxieties they and their patients encounter during the prescribing process. We are funded to date by Lilly, Pfizer, Adobe, and MasterCard.

Why this role exists

We're rolling out AI agents that do real work across the organization — offloading administrative and operational tasks in Sales, Marketing, Customer Support, and Ops. We've already built a plugin marketplace with 28 agent plugins, 100+ skills, custom CLI tooling, and an eval framework. We need someone who can build new agents, harden what exists, and coach the rest of the team to build their own.

This is not an R&D sandbox. You will be measured by what ships, reliability in production, adoption by the team, and — critically — whether others can build and maintain agents without you.

What you'll own

You will own agent outcomes in production and team capability end-to-end:

Workflow discovery → agent design → build → test → deploy → monitor → iterate
Tool integrations (CRM, helpdesk, BI, docs, comms) via lightweight CLI tools that agents invoke as primitives
Quality + safety standards that prevent trust-breaking failures
Production operations: evals, logging/traceability, dashboards, incident response, and regression prevention
A repeatable agent factory (templates, shared skills, reusable connectors, scaffolding tools) that increases throughput without sacrificing quality
Team enablement: coaching staff across all functions to discover, spec, build, and maintain their own agents

What you'll do

Find the wedge + ship

Shadow functional teams, map workflows, and identify the highest-leverage admin tasks to automate.
Turn those into a tight sequence of releases: MVP → v1 → v2.
Translate business workflows into agent specifications through collaborative discovery with non-technical stakeholders.

Build real agents (not demos)

Implement agents using Claude Code's plugin architecture: agent identity files, SKILL.md skill definitions, subagent orchestration, and tool-use patterns.
Write clear, structured prompts (system prompts, skill instructions, eval rubrics) that produce reliable, repeatable agent behavior.
Build agents that run both:

Attended mode (human-in-the-loop approvals, confidence cues)
Autonomous mode (policy-based execution, safe escalation, auditable actions)

Engineer the integrations and runtime

Build and maintain lightweight Python/Typer CLI tools that serve as the connective tissue between agents and business systems (CRM, ticketing, BI/warehouse, knowledge base, email/calendar).
Design clean tool interfaces that are both human-usable at the terminal and agent-friendly via tool-use declarations.
Write and maintain production code in Python and/or TypeScript.
Design for reliability: idempotency, retries/backoff, rate limiting, timeouts, and graceful degradation.

Own quality + operability

Define and implement evals: golden-set test cases, regression suites, fixture-based grounding checks, and launch checklists using Promptfoo or similar frameworks.
Write eval rubrics and assertion layers that catch hallucination, format violations, and instruction drift.
Debug prompt-level issues — not just code bugs, but behavioral regressions in agent output.
Implement observability: structured logs, traces, tool-call auditing, failure clustering, and per-agent health dashboards.
Triage production issues, run postmortems, and prevent repeat failures through tests and guardrails.

Coach the team to build their own agents

Run hands-on workshops that take non-technical staff from "I have a repetitive task" to "I have a working agent."
Pair with team members across functions to co-build agents — not just build for them.
Create and maintain playbooks, templates, and guardrails that lower the bar so anyone on the team can ship an agent safely.
Establish patterns and conventions that make the agent ecosystem self-service over time.
Communicate agent capabilities and limitations honestly — no vapor, no overpromising.

Drive adoption

Deliver workflow-native entry points (Slack commands, CRM buttons, ticket macros, internal UI).
Document runbooks and "how to trust this" guidance based on real capability.
Measure adoption and iterate based on usage data, not assumptions.

Engineering background we expect

This role requires strong software engineering fundamentals:

Experience building and shipping backend systems / web services
Comfort with APIs, auth (OAuth/service accounts), permissions/RBAC, and secrets management
Understanding of system design tradeoffs: latency/cost, scalability, reliability, and failure modes
Comfortable with Docker and containerized deployments (for CLI tools and supporting infra)
Experience with CI/CD pipelines and production deployment workflows

Required qualifications

4+ years professional software engineering experience (backend, integrations, automation, platform).
Production coding experience in Python and/or TypeScript.
Hands-on experience building AI-enabled applications (LLM apps, tool-using agents, or workflow automation) with a focus on reliability and evaluation.
Strong prompt engineering skills: ability to write system prompts, skill definitions, and eval rubrics that produce consistent, high-quality agent behavior.
Strong testing and ops discipline: unit/integration tests, monitoring/logging, and incident response.
Demonstrated ability to teach and coach — whether through mentoring, workshops, pair programming, or documentation. You should enjoy making others more capable, not just shipping your own work.

Preferred qualifications

Experience with Claude Code (plugin authoring, skill design, subagent orchestration) or deep familiarity with Anthropic's tool-use patterns.
Experience building evaluation pipelines for LLM/agent quality (task success, groundedness, hallucination rate, context faithfulness).
Familiarity with Promptfoo or similar eval frameworks for output-quality testing.
Experience building and maintaining CLI tools (Python/Typer, Click, or similar) as integration primitives.
Experience integrating with CRM/helpdesk/BI systems (e.g., HubSpot, Zendesk, Snowflake, Google Workspace APIs).
Experience in regulated environments (healthcare/pharma) with auditability, data minimization, and access controls.
Docker experience for containerizing CLI tools and supporting services.

What success looks like (in 90 days)

You've shipped 2–3 new agents to the plugin marketplace with full eval coverage and monitoring.
You've coached at least 2 non-engineering team members through building their own agent or skill — and they can maintain it independently.
You've expanded eval coverage significantly (from baseline to 6+ plugins covered).
You've documented the "how to build an agent" playbook that any team member can follow end-to-end.
You've established yourself as the go-to person for agent quality, and the team trusts the agents you've shipped.

This role is not for you if…

You prefer prototypes to production ownership.
You don't want to write code, debug integrations, and own reliability post-launch.
You avoid accountability for business impact and adoption.
You need fully defined requirements and a roadmap handed to you.
You'd rather build frameworks than solve business problems.
You can't explain technical concepts to non-technical people.
You think "agent" means a chatbot with a system prompt.

So, why (on earth!) would you want to leave what you’re doing and join us?

We have a really good shot at improving the millions of lives and careers of HCPs, Patients, and their families (even pets!)
We hire adults with a Trust-first/It's All Life philosophy
We have some great benefits for a firm at our stage: 401(k) w/matching, all kinds of insurance (including matching HSA and pets!), commute from your kitchen, Open PTO (which leaders use!), remote stipend, yearly education budget, and working with some of the smartest yet humblest and respectful people in the business
We’re (objectively) way better looking than our competitors :-)

Beliefs:

PrescriberPoint is an equal opportunity employer that is committed to inclusion and diversity. We take affirmative action to ensure equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, status as a qualified individual with a disability, veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.

Additionally, we participate in the E-Verify program as required by applicable law. Learn more about E-Verify here.

Last, PrescriberPoint is a drug-free workplace committed to maintaining a safe workplace free from unlawful drugs and alcohol and complies with all applicable laws, including the Federal Drug-Free Workplace Act. Team members are prohibited from reporting to work or performing their duties with any unlawful drugs or alcohol in their system. They are also prohibited from using, possessing, manufacturing, selling, trading, distributing, dispensing or making arrangements or offering to distribute unlawful drugs or alcohol while at work or performing work duties. Any violation of the Company’s drug-free workplace policy may result in disciplinary action, up to and including disqualification from employment or termination, unless otherwise allowed by law.

Related Categories

Remote Software Engineer Jobs in the US Remote Full-stack Engineer Jobs in the US Remote Backend Engineer Jobs in the US Frontend Engineer Android Engineer iOS Engineer Game Engineer

Related Job Pages

Remote Software Engineer Jobs in the US More US Remote Jobs