The infrastructure half of the Roster MCP project: the remote MCP server's behavior (transport, response envelope, pagination, errors), the OAuth 2.1 authorization service, the credential bridge into Roster's existing token model, hosting/deployment on Cloudflare Workers, and observability. The tool catalog itself lives in tool-specs.html.
One Cloudflare Worker at a single hostname serves both the OAuth authorization server and the MCP
server. It is a normal client of the existing v2 API (api.getroster.com) — no database
access, no shard awareness — so it can be disabled at any time without touching the portal.
mcp.getroster.com (single Cloudflare Worker, custom domain on Roster's existing CF account) ├── GET /.well-known/oauth-authorization-server RFC 8414 metadata (advertises S256) ├── GET /.well-known/oauth-protected-resource RFC 9728 metadata ├── GET /authorize consent UI (login + brand picker) ├── POST /token form-encoded; code+PKCE, refresh rotation ├── POST /register Dynamic Client Registration (RFC 7591) └── POST /mcp MCP Streamable HTTP (2025-11-25 spec)
One hostname keeps the resource-server and AS metadata trivially consistent — same-host is the pattern Linear and Sentry shipped. DNS: one CNAME/Worker custom domain; TLS automatic via Cloudflare.
Sources: workers-oauth-provider README (latest release v0.7.2, 2026-06-04 — actively maintained), Cloudflare MCP authorization guide, and the McpAgent API docs. All library facts in this document reflect those docs as of 2026-06-10.
POST /mcp per the 2025-11-25 MCP spec revision; no SSE-only legacy transport. Publicly reachable over HTTPS, no IP allowlist.401 + WWW-Authenticate pointing at the RFC 9728 protected-resource metadata.title, an LLM-audience description, readOnlyHint: true, and openWorldHint: false. No tool may perform a write in Phase 1.{
"brand": { "name": "Acme Outdoor", "domain": "acme" }, // from the grant — always present
"portal_source": { // report/dashboard tools only
"surface": "Sales Attribution report",
"url": "https://app.getroster.com/reports/sales-attribution",
"date_range": { "start": "2026-05-10", "end": "2026-06-09" }
},
"data": { /* tool-specific payload — see tool-specs.html */ },
"pagination": { "cursor": "opaque-base64", "has_more": true, "total_records": 1234 },
"truncated": false // true + guidance text when result was capped
}
{pageIndex, pageSize, filters-hash} into an opaque cursor. Upstream pagination is index-based (1-based pageIndex/pageSize); never expose raw page indexes to the model.page_size 50, max 200 — well under the upstream max of 10,000 and Claude's ~150K-char tool-result limit. When capped, set truncated: true with guidance text ("narrow the date range or filters").
There is no brand-timezone field anywhere in Roster (verified — Global and shard schemas). The
platform pattern is UTC storage with client-side resolution, and the MCP follows it: Claude
(which knows the user's timezone) resolves relative ranges to explicit ISO dates before calling;
every tool description instructs it to do so. Omitted date params default to UTC last-30-days, stated
in portal_source.date_range. Where the upstream accepts DATETIMEOFFSET (program
metrics), the offset passes through. Nothing is stored at consent; no server-side timezone state.
| Upstream | MCP tool error (structured, isError: true) |
|---|---|
| 401 / token invalid | "Connection expired or revoked — reconnect the Roster connector." (also triggers OAuth re-auth via 401 on the MCP HTTP layer when the MCP token is bad) |
| 403 | "This brand's plan does not include this feature." |
| 429 | "Roster rate limit reached — wait a moment and try again." (retryable; the MCP service must not auto-retry-storm) |
| 5xx / timeout | "Roster API error — try again; if it persists, narrow the date range." |
Validation (success:false w/ message) | Pass through the upstream message, prefixed with the failing parameter where known. |
| Date range > 366 days on report tools | Rejected MCP-side before calling upstream: "Date range exceeds 366 days — split the request." |
Never surface raw upstream stack traces. Upstream report SPs run with 300s command timeouts against Claude's 300s tool ceiling — surface timeouts as narrow-the-range guidance.
A Roster login is user-level; API tokens are brand-level. The OAuth grant bridges the two: one brand
per grant, chosen at consent, with a long-lived private ApiSession token held server-side
and never exposed to the client. Everything below is verified against source
(api-brand-portal, db-global) and the production Global DB (read-only, 2026-06-09).
CLIENT_API_TOKEN_EXPIRE_YEARS = 30 (UserAccessTokenService.cs:32); confirmed live — the newest AccessTypeId=3 rows in Global ApiSession are 288-char JWTs with ExpireDate exactly 30 years out. The bridge mints standard tokens, no special handling.RateLimitService.VerifyRateLimiting keys its in-memory counter by access token (RateLimitService.cs:27-82); the limit value (default 100/interval, subscription item 149) is per brand. A bridged MCP token gets its own pool automatically and cannot starve a customer's existing integration token. Caveat: the counter is in-process (MemoryCache), so effective limits scale with app instances — pre-existing behavior.ApiSession lookup (indexed on AccessToken) → JWT claim validation → brand scoping via AccessToUserId (AuthorizationFilter.cs:58-133).
Replicate UserAccessTokenService.CreateAccessSessionToken (UserAccessTokenService.cs:70-200):
get/create ApiClient for the brand user → ensure subscription rate-limit item ≥100 →
get/create shard AccessRight (AccessId=3) → ApiSessionService.CreateToken
(JWT; claims: accessToUserId, sessionUserId, rightId, accessId) → insert Global
ApiSession with 30-year expiry. Build one new internal-only bridge endpoint in
api-brand-portal that runs this flow (and its reverse, ExpireAccessToken) so the
OAuth service never reimplements token logic. Authenticate Worker→Roster with a shared service secret
+ WAF rules.
SourceTypeId column on Global ApiSession (+ lookup value API_SESSION_SOURCE_MCP): zero impact on AuthorizationFilter/rate-limit code paths; trivially queryable for the portal's Connected-Apps list and usage metrics.AccessTypeId (e.g. ACCESS_TO_MCP_API): cleaner separation, but every Open API endpoint validates against an accessIds list, so it must be added everywhere ACCESS_TO_CLIENT_API is accepted — more invasive, easy to miss a path.This flag (plus the grant store) backs the portal's new "Connected apps" section: app name, brand, who authorized, created date, last-used date, revoke action.
| Concern | Library | Notes |
|---|---|---|
| OAuth 2.1 AS (DCR, PKCE, token issuance, refresh rotation, grant storage) | @cloudflare/workers-oauth-provider v0.7.2 (2026-06-04) |
The library Linear/Sentry/Intercom/Stripe launched on. Wraps the whole AS surface:
new OAuthProvider({ apiRoute: "/mcp", apiHandler, defaultHandler, authorizeEndpoint, tokenEndpoint, clientRegistrationEndpoint })
— this exact pattern is Cloudflare's documented MCP-authorization recipe.
accessTokenTTL default 3600s (matches the PRD's ≤1h);
refreshTokenTTL default 30d. Refresh rotation: each use issues a new token and
invalidates the older of at most two concurrently-valid refresh tokens
(deliberate retry-tolerance design — satisfies Claude's rotation requirement; not strict
single-use). Grants/clients/codes in Workers KV (binding OAUTH_KV);
per-grant props (where the bridged Roster token lives) are "end-to-end
encrypted… with the secret token as key material — impossible to derive from storage unless a
valid token is provided" (README). RFC 8414 + 9728 metadata, RFC 7591 DCR, PKCE
(disable allowPlainPKCE for S256-only), and CIMD already shipped
behind clientIdMetadataDocumentEnabled — the PRD's "fast-follow" is a config flag. |
| MCP server + transport | Cloudflare agents McpAgent (wraps @modelcontextprotocol/sdk, TypeScript) |
McpAgent.serve("/mcp") "handles Streamable HTTP transport
automatically" (docs) — keepalive past the ~5-min edge idle-stream watchdog,
Last-Event-ID stream recovery. Each client session is a Durable Object
with hibernation enabled by default (sleeps when idle — near-zero idle cost). Plugs in as the
apiHandler of workers-oauth-provider; the validated grant's
props (brand id, bridged token) arrive typed via the agent's third generic param and
are read as this.props inside every tool handler. |
| Consent UI | Portal-hosted (PRD D-10): new /connect/claude route in web-app-brand-portal |
The Worker never renders login or handles credentials. The portal route reuses the existing
login (password + social SSO + lockout), shows consent + brand picker, and hands back a one-time
connect ticket. This is the same external-IdP shape the Cloudflare docs show for Stytch/Auth0 —
with the Brand Portal as the IdP. Supersedes the Worker-rendered consent page in earlier
drafts; decided 2026-06-11 after verifying SSO-required brands reject password auth
(UserService.AuthenticateUser → isSSOError). |
POST /mcp unauthenticated → 401 + WWW-Authenticate → discovers metadata → POST /register (DCR) → browser to GET /authorize?client_id…&code_challenge…&redirect_uri=https://claude.ai/api/mcp/auth_callback. Redirect allowlist: claude.ai callback + loopback http://localhost:* (Claude Code) + whatever DCR registers (https or loopback only).app.getroster.com/connect/claude?request_id=…, a new route in web-app-brand-portal. The route uses the existing portal login — password, social SSO (USER_SETTING_SSO_REQUIREMENT brands work), and the 10-failures/30-min lockout, all already built; users with a live portal session skip login entirely. Brand enumeration comes from the SPA's own session data (sessionUser.systemUserAccess).App.WebApi endpoint — running under the consenting user's standard auth, so brand rights are validated at issue time — mints a one-time connect ticket (single-use, ~60s TTL, bound to {sessionUserId, brandUserId, request_id}) and redirects back to the Worker's callback with it.ApiSession token with the consenting user's authority. The Worker re-checks the brand allowlist fail-closed (Phase-1 rollout gate, KV config → friendly "not enabled for your account yet"), then stores the token in the grant's encrypted props along with {brand_user_id, brand_name, brand_domain, authorized_by, granted_at} (this snapshot also feeds the get_connection_info tool), and the api_session_id in the grant's unencrypted metadata — props can't be decrypted without a client token, and the idle sweep (§5.1) needs the id to expire the session.POST /token (PKCE verified) → access token (1h) + rotating refresh token. Every subsequent POST /mcp call: library validates the token, decrypts props, hands the tool layer the bridged Roster credential. The Roster token never leaves the Worker.access_denied redirect; no partial grants. All auth endpoints must respond <10s (Claude hard limit) — Workers cold start is ~ms; the long pole is the upstream portal-login call.| Item | Where |
|---|---|
| OAuth clients (DCR), grants, auth codes, refresh-token families | Workers KV, binding OAUTH_KV (managed by workers-oauth-provider) |
Bridged Roster ApiSession token + brand context | Encrypted props inside the grant record (library-encrypted) |
| Brand allowlist (Phase-1 gate) | KV key, editable via wrangler/admin script |
Secrets: bridge-endpoint service secret, ARCHBEE_API_KEY, cookie-signing key | Worker secrets (wrangler secret put) |
| Tool-call + auth event logs | Workers Analytics Engine (per-tool metrics for the success KPIs) + Logpush → existing log sink |
No relational DB required on the MCP side. The only Roster-side data-model change is the ApiSession source flag (§3.3).
ApiSession rows (SourceTypeId filter) and calls a Worker admin route (service-secret auth) to delete the grant + refresh family. Next Claude call → 401 → re-auth prompt./mcp call (also feeds the Connected-apps "last used" column), and a daily Cron Trigger deletes grants with no token activity for 35 days (past the 30-day refresh TTL, so provably dead) and expires their bridged ApiSession rows via the bridge endpoint (api_session_id from grant metadata). An RFC 7009 revocation endpoint stays live in case Claude ever starts revoking. Residual risk accepted: a disconnected grant's unused, server-held token survives ≤35 days; portal revoke is the immediate cutoff./authorize (env flag) + bulk-expire MCP-flagged ApiSession rows; the Worker is fully decoupled from portal serving.staging.mcp.getroster.com (wrangler env staging, pointed at TEST API/DB) and prod. Staging is what the integration/E2E suites and MCP Inspector run against.wrangler deploy per env; secrets from repo environments. Rollback = redeploy previous version (Workers keeps versions).
ASP.NET Core service on Azure App Service + OpenIddict (or Duende IdentityServer,
licensed) for the AS + the ModelContextProtocol C# SDK for the MCP server; grant store in
Azure SQL/Table Storage; same bridge/consent design. Pros: one runtime, existing Azure pipelines,
in-house C# depth. Cons: assembling DCR + refresh rotation + resource metadata from primitives
(~weeks of auth-surface work the CF library gives for free), and DIY on the Streamable-HTTP session
plumbing. Recommendation: Cloudflare for Phase 1; this design ports to .NET without
changing any external contract if eng prefers later.
{brand_user_id, tool_name, params_shape (keys only — no PII values), latency_ms, upstream_status, result_rows, truncated} — emitted from the MCP service; no Roster-side work.brand_user_id with ≥1 grant; engagement = distinct brands with ≥1 tool call in trailing 7 days.