Technical Writeup · Roster MCP Server · Companion to the PRD and Tool Specs

MCP Server & OAuth — Implementation Writeup

The infrastructure half of the Roster MCP project: the remote MCP server's behavior (transport, response envelope, pagination, errors), the OAuth 2.1 authorization service, the credential bridge into Roster's existing token model, hosting/deployment on Cloudflare Workers, and observability. The tool catalog itself lives in tool-specs.html.

Author Jeff Poulton Created 2026-06-10 Status Draft PRD prd.html Tool specs tool-specs.html
1

Architecture & Topology

One Cloudflare Worker at a single hostname serves both the OAuth authorization server and the MCP server. It is a normal client of the existing v2 API (api.getroster.com) — no database access, no shard awareness — so it can be disabled at any time without touching the portal.

mcp.getroster.com  (single Cloudflare Worker, custom domain on Roster's existing CF account)
├── GET  /.well-known/oauth-authorization-server   RFC 8414 metadata (advertises S256)
├── GET  /.well-known/oauth-protected-resource     RFC 9728 metadata
├── GET  /authorize                                consent UI (login + brand picker)
├── POST /token                                    form-encoded; code+PKCE, refresh rotation
├── POST /register                                 Dynamic Client Registration (RFC 7591)
└── POST /mcp                                      MCP Streamable HTTP (2025-11-25 spec)

One hostname keeps the resource-server and AS metadata trivially consistent — same-host is the pattern Linear and Sentry shipped. DNS: one CNAME/Worker custom domain; TLS automatic via Cloudflare.

Verified against current vendor docs 2026-06-10

Sources: workers-oauth-provider README (latest release v0.7.2, 2026-06-04 — actively maintained), Cloudflare MCP authorization guide, and the McpAgent API docs. All library facts in this document reflect those docs as of 2026-06-10.

2

MCP Server Behavior

2.1 Transport & protocol requirements (from the PRD)

2.2 Response envelope (every tool)

{
  "brand": { "name": "Acme Outdoor", "domain": "acme" },   // from the grant — always present
  "portal_source": {                                        // report/dashboard tools only
    "surface": "Sales Attribution report",
    "url": "https://app.getroster.com/reports/sales-attribution",
    "date_range": { "start": "2026-05-10", "end": "2026-06-09" }
  },
  "data": { /* tool-specific payload — see tool-specs.html */ },
  "pagination": { "cursor": "opaque-base64", "has_more": true, "total_records": 1234 },
  "truncated": false      // true + guidance text when result was capped
}

2.3 Dates & timezone (PRD Decision 5)

There is no brand-timezone field anywhere in Roster (verified — Global and shard schemas). The platform pattern is UTC storage with client-side resolution, and the MCP follows it: Claude (which knows the user's timezone) resolves relative ranges to explicit ISO dates before calling; every tool description instructs it to do so. Omitted date params default to UTC last-30-days, stated in portal_source.date_range. Where the upstream accepts DATETIMEOFFSET (program metrics), the offset passes through. Nothing is stored at consent; no server-side timezone state.

2.4 Error mapping

UpstreamMCP tool error (structured, isError: true)
401 / token invalid"Connection expired or revoked — reconnect the Roster connector." (also triggers OAuth re-auth via 401 on the MCP HTTP layer when the MCP token is bad)
403"This brand's plan does not include this feature."
429"Roster rate limit reached — wait a moment and try again." (retryable; the MCP service must not auto-retry-storm)
5xx / timeout"Roster API error — try again; if it persists, narrow the date range."
Validation (success:false w/ message)Pass through the upstream message, prefixed with the failing parameter where known.
Date range > 366 days on report toolsRejected MCP-side before calling upstream: "Date range exceeds 366 days — split the request."

Never surface raw upstream stack traces. Upstream report SPs run with 300s command timeouts against Claude's 300s tool ceiling — surface timeouts as narrow-the-range guidance.

3

Roster Token Model & Credential Bridge

A Roster login is user-level; API tokens are brand-level. The OAuth grant bridges the two: one brand per grant, chosen at consent, with a long-lived private ApiSession token held server-side and never exposed to the client. Everything below is verified against source (api-brand-portal, db-global) and the production Global DB (read-only, 2026-06-09).

3.1 Verified token facts

3.2 Minting the bridged token

Replicate UserAccessTokenService.CreateAccessSessionToken (UserAccessTokenService.cs:70-200): get/create ApiClient for the brand user → ensure subscription rate-limit item ≥100 → get/create shard AccessRight (AccessId=3) → ApiSessionService.CreateToken (JWT; claims: accessToUserId, sessionUserId, rightId, accessId) → insert Global ApiSession with 30-year expiry. Build one new internal-only bridge endpoint in api-brand-portal that runs this flow (and its reverse, ExpireAccessToken) so the OAuth service never reimplements token logic. Authenticate Worker→Roster with a shared service secret + WAF rules.

3.3 Flagging MCP-issued tokens

This flag (plus the grant store) backs the portal's new "Connected apps" section: app name, brand, who authorized, created date, last-used date, revoke action.

4

OAuth Service — Build Plan

4.1 Libraries (verified against docs 2026-06-10)

ConcernLibraryNotes
OAuth 2.1 AS (DCR, PKCE, token issuance, refresh rotation, grant storage) @cloudflare/workers-oauth-provider v0.7.2 (2026-06-04) The library Linear/Sentry/Intercom/Stripe launched on. Wraps the whole AS surface: new OAuthProvider({ apiRoute: "/mcp", apiHandler, defaultHandler, authorizeEndpoint, tokenEndpoint, clientRegistrationEndpoint }) — this exact pattern is Cloudflare's documented MCP-authorization recipe. accessTokenTTL default 3600s (matches the PRD's ≤1h); refreshTokenTTL default 30d. Refresh rotation: each use issues a new token and invalidates the older of at most two concurrently-valid refresh tokens (deliberate retry-tolerance design — satisfies Claude's rotation requirement; not strict single-use). Grants/clients/codes in Workers KV (binding OAUTH_KV); per-grant props (where the bridged Roster token lives) are "end-to-end encrypted… with the secret token as key material — impossible to derive from storage unless a valid token is provided" (README). RFC 8414 + 9728 metadata, RFC 7591 DCR, PKCE (disable allowPlainPKCE for S256-only), and CIMD already shipped behind clientIdMetadataDocumentEnabled — the PRD's "fast-follow" is a config flag.
MCP server + transport Cloudflare agents McpAgent (wraps @modelcontextprotocol/sdk, TypeScript) McpAgent.serve("/mcp") "handles Streamable HTTP transport automatically" (docs) — keepalive past the ~5-min edge idle-stream watchdog, Last-Event-ID stream recovery. Each client session is a Durable Object with hibernation enabled by default (sleeps when idle — near-zero idle cost). Plugs in as the apiHandler of workers-oauth-provider; the validated grant's props (brand id, bridged token) arrive typed via the agent's third generic param and are read as this.props inside every tool handler.
Consent UI Portal-hosted (PRD D-10): new /connect/claude route in web-app-brand-portal The Worker never renders login or handles credentials. The portal route reuses the existing login (password + social SSO + lockout), shows consent + brand picker, and hands back a one-time connect ticket. This is the same external-IdP shape the Cloudflare docs show for Stytch/Auth0 — with the Brand Portal as the IdP. Supersedes the Worker-rendered consent page in earlier drafts; decided 2026-06-11 after verifying SSO-required brands reject password auth (UserService.AuthenticateUserisSSOError).

4.2 Consent flow — concrete sequence

4.3 Storage & secrets

ItemWhere
OAuth clients (DCR), grants, auth codes, refresh-token familiesWorkers KV, binding OAUTH_KV (managed by workers-oauth-provider)
Bridged Roster ApiSession token + brand contextEncrypted props inside the grant record (library-encrypted)
Brand allowlist (Phase-1 gate)KV key, editable via wrangler/admin script
Secrets: bridge-endpoint service secret, ARCHBEE_API_KEY, cookie-signing keyWorker secrets (wrangler secret put)
Tool-call + auth event logsWorkers Analytics Engine (per-tool metrics for the success KPIs) + Logpush → existing log sink

No relational DB required on the MCP side. The only Roster-side data-model change is the ApiSession source flag (§3.3).

5

Revocation, Environments & Deploys

5.1 Revocation — both directions

5.2 Environments, CI, limits

6

Alternative: .NET House Stack

ASP.NET Core service on Azure App Service + OpenIddict (or Duende IdentityServer, licensed) for the AS + the ModelContextProtocol C# SDK for the MCP server; grant store in Azure SQL/Table Storage; same bridge/consent design. Pros: one runtime, existing Azure pipelines, in-house C# depth. Cons: assembling DCR + refresh rotation + resource metadata from primitives (~weeks of auth-surface work the CF library gives for free), and DIY on the Streamable-HTTP session plumbing. Recommendation: Cloudflare for Phase 1; this design ports to .NET without changing any external contract if eng prefers later.

7

Observability