urlcap API

API reference

urlcap is a power tool for developers: a fast HTTP service that lets you craft and replay requests with low-level control, and exposes purpose-built endpoints — like a TOTP code generator — over a clean, versioned REST API. Everything here lives under https://urlcap.com/api/v1.

Introduction

The API is organised around predictable resource URLs, accepts standard HTTP, and returns JSON for everything under /api/v1. It uses conventional HTTP verbs and status codes, and every response carries a requestId you can quote when contacting support.

You authenticate with an API key sent in a request header. There are no SDKs to install — any HTTP client works, and a machine-readable description is published at /api/v1/openapi.yaml (OpenAPI 3.1) so you can generate a client if you'd like.

Status: urlcap is in active development. The TOTP endpoint described below is live today. New endpoints for fully custom HTTP requests are rolling out — watch the changelog.

Quickstart

Make your first call in under a minute. You'll need an API key — sign up for a free account, then create one on the API keys page.

1Export your key: export URLCAP_KEY="api_…"
2Call an endpoint with the Authorization: Bearer $URLCAP_KEY header.
3Read the JSON response — every call includes a requestId.

Generate a TOTP code

curl -G https://urlcap.com/api/v1/totp \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "uri=otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP&period=30&digits=6"

200 OK

{
  "version": "1",
  "requestId": "9f1c0b7a-3e2d-4a51-9b88-2f6c1e7d4a02",
  "data": {
    "code": "492039",
    "digits": 6,
    "period": 30,
    "algorithm": "SHA1",
    "expiresIn": 14
  }
}

Base URL & conventions

All versioned endpoints share a common base:

Base URL

https://urlcap.com/api/v1

Requests and responses under /api/v1 use JSON with UTF-8 encoding.
Parameters may be sent as query-string values or, for POST, as application/x-www-form-urlencoded body fields. Always URL-encode values that contain reserved characters.
Successful responses use 200. Client errors use 4xx; server errors use 5xx.
Every response includes a top-level version field and, on success, a requestId.

The original, pre-versioned endpoint at https://urlcap.com/auth remains available and returns a plain-text code — see Legacy /auth.

Authentication

urlcap authenticates requests with a per-user API key. Create and manage keys at /account/api-keys — keys look like api_… and are shown once at creation. Send the key in the Authorization header as a Bearer token. This is the canonical header and the one tied to your account's monthly quota. Treat the key like a password — never embed one in client-side code or commit it to a repository.

Authenticated request (recommended)

curl https://urlcap.com/api/v1/totp \
  -H "Authorization: Bearer api_yDb7pwx671A7_r48mVkr09FbU9U47U0A" \
  --data-urlencode "uri=otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP"

The legacy X-API-Key header is also accepted and will authenticate both new api_… keys and the older UUID-style keys issued before the per-user system existed. It is kept for backwards compatibility; prefer Bearer for new integrations.

Authenticated request (legacy header — still works)

curl https://urlcap.com/api/v1/totp \
  -H "X-API-Key: api_yDb7pwx671A7_r48mVkr09FbU9U47U0A" \
  --data-urlencode "uri=otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP"

Watch out for the anonymous tier. Requests with no recognised key (missing header, typo in the header name, mangled key value) fall through to a small anonymous trial — 5 requests per IP per 24 h — rather than 401-ing. If a key seems to "stop working" after a handful of requests, double-check that the header is actually reaching the server and the key is the exact api_… string you copied at creation.

401 Unauthorized

{
  "version": "1",
  "error": {
    "type": "unauthorized",
    "message": "Missing or invalid API key. Send 'Authorization: Bearer api_…' (or the legacy 'X-API-Key' header)."
  }
}

Making requests

Endpoints under /api/v1 accept GET for read-style calls. Where an endpoint also accepts POST, parameters go in a application/x-www-form-urlencoded body — handy when a value (such as an otpauth:// URI) is long or contains characters that are awkward in a URL.

A note on +: in a query string, + decodes to a space. urlcap restores + in the uri parameter so secrets and labels survive round-tripping, but the most robust approach is to always percent-encode (curl --data-urlencode, encodeURIComponent, urllib.parse.quote, …).

POST with a form body

curl -X POST https://urlcap.com/api/v1/totp \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "uri=otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP&algorithm=SHA256&digits=8&period=30"

Errors

urlcap uses standard HTTP status codes. 2xx means success. 4xx means the request was rejected (a missing parameter, a bad key, a malformed URI). 5xx means something went wrong on our side — these are rare and safe to retry with backoff.

Error responses under /api/v1 have a consistent shape:

Error envelope

{
  "version": "1",
  "error": {
    "type": "invalid_uri",
    "message": "Could not parse the supplied otpauth:// URI: ..."
  }
}

Status	`error.type`	When it happens
400	invalid_request	A required parameter is missing or empty.
400	invalid_uri	The `uri` value is not a parseable `otpauth://` URI.
401	unauthorized	The `Authorization` (or legacy `X-API-Key`) header is missing, malformed, unknown, or expired.
404	not_found	No endpoint matches the requested path under `/api/v1`.
5xx	internal_error	An unexpected error on our side. Retry with exponential backoff.

Rate limits

urlcap enforces two independent limits:

Monthly quota — the request budget published for your plan on the pricing page. The Business plan's "Unlimited" tier has no monthly cap, but it is not uncapped throughput — see below.
Fair-use per-second rate limits — per-API-key and per-IP burst caps scaled to your plan, to protect the platform from abuse and noisy neighbours. They apply to every plan, including unlimited tiers. They also apply to the no-key free trial (5 requests per 24 hours per IP / UA / fingerprint).

When you exceed either limit the API responds with 429 Too Many Requests; back off and retry. Detailed per-plan QPS numbers and the accompanying response headers are published alongside the public launch — until then, build assuming generous-but-finite throughput and add retry-with-backoff for 429 and 5xx.

Anonymous free trial

A small allow-list of endpoints can be hit without an API key, with a strict per-day budget — useful for demos, tinkering, and tutorials. After the budget is spent the endpoint returns 402 anon_limit_reached with a sign-up link.

Allow-listed endpoints: /api/v1/capture, /totp, /is_bot, /ip/contains, /ip/lookup.
Budget: 5 requests per rolling 24h, counted independently on three identity dimensions — IP, User-Agent, and the optional X-Client-Fingerprint header set by the in-browser try-it widget. If any dimension hits the cap, the request is blocked.
Response headers on allowed calls: X-RateLimit-Limit / X-RateLimit-Remaining.
Block response: {"error":{"type":"anon_limit_reached","signup_url":"/register","limit":5,"window":"24h"}}.

Heavy or stateful endpoints (/extract, /datasets, /schedules) are not on the allow-list and continue to require a valid API key.

Versioning

The API is versioned in the URL path. The current version is v1: https://urlcap.com/api/v1. Backwards-incompatible changes — removing a field, changing a type, renaming a parameter — ship under a new path segment (/api/v2); v1 keeps working. Additive changes (new optional parameters, new fields in a response, new endpoints) can appear within v1, so write clients that tolerate unknown fields.

The legacy endpoint at https://urlcap.com/auth predates the versioning scheme. It is frozen: it will keep its current plain-text behaviour indefinitely, but new functionality only lands under /api/v{n}.

Pricing

urlcap is a paid API with usage-based pricing — you pay for the requests you make, with a free tier to build and prototype on. Sign up for a free account and create your first API key on the API keys page; paid tiers (Developer / Startup / Business) are managed from Billing.

The capture object

A successful call to the capture endpoint returns an envelope whose data field describes the request urlcap sent and the response it got back — parsed the way a browser's network inspector would show it. The shape is:

data

{
  "request": {
    "url": "https://example.com/path?q=1", "method": "GET", "httpVersion": "HTTP/1.1",
    "scheme": "https", "host": "example.com", "port": 443, "path": "/path", "query": "q=1",
    "queryParameters": [ { "name": "q", "value": "1" } ],
    "headers": [ { "name": "User-Agent", "value": "..." }, { "name": "Accept", "value": "*/*" } ],
    "followRedirects": false, "body": "", "bodyEncoding": "UTF-8", "technology": "reactor-netty"
  },
  "response": {
    "status": 200, "statusText": "OK", "httpVersion": "HTTP/1.1",
    "headers": [ { "name": "Date", "value": "..." }, { "name": "Content-Type", "value": "text/html" } ],
    "cookies": [ { "name": "sid", "value": "abc", "path": "/", "domain": ".example.com", "secure": true, "httpOnly": true, "sameSite": "Lax" } ],
    "contentType": "text/html", "charset": "utf-8", "contentLength": 1256,
    "bodyBytes": 1256, "bodyTruncated": false, "body": "...",
    "bodyEncoding": "UTF-8",
    "timings": {
      "totalMs":       84,
      "dnsMs":         12,
      "connectMs":     35,
      "requestSendMs":  1,
      "ttfbMs":        28,
      "bodyMs":         7,
      "resolvedIp":   "203.0.113.42"
    }
  }
}

request.headers — the headers urlcap actually sent, in order. If you supply none, it adds a default User-Agent and Accept; if you supply any, it sends exactly what you give. The runtime may add Host / Content-Length on top.
response.headers — every response header in the exact order received (duplicates preserved).
response.cookies — each Set-Cookie header parsed into name/value plus attributes.
response.body — the response body decoded with bodyEncoding. Bodies over ~1 MB are truncated in the response (bodyTruncated = true); the full body is still recorded server-side.
response.timings — wall-clock totalMs plus a phase breakdown: dnsMs (DNS resolution + scheduling), connectMs (TCP + TLS), requestSendMs, ttfbMs (time-to-first-byte), bodyMs (body download), and resolvedIp (the A/AAAA we landed on). Phase fields are absent when their hook didn't fire — e.g. a pooled keep-alive reuse skips DNS / connect, and followRedirects=true captures only the first leg's timings.

get post Capture a request

/api/v1/capture

Sends an HTTP request to a target URL and returns its response as a capture object. Use GET with a url query parameter for a quick fetch, or POST a JSON body for full control — including the exact order of request headers.

GET — quick fetch

GET /api/v1/capture

curl -G https://urlcap.com/api/v1/capture \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "url=https://example.com/path?q=1"

Optional query parameters: method, followRedirects (true/false), timeoutMs.

POST — full control

Send Content-Type: application/json with a body of the following shape (only url is required):

urlstringrequired

The target URL (http or https). Query string and port are honoured.

methodstring

HTTP method — GET (default), POST, PUT, DELETE, HEAD, OPTIONS, PATCH.

headersarray of {name, value}

Request headers, written to the wire in this exact order. Supplying any disables the default User-Agent/Accept.

bodystring

Request body. Defaults to empty.

bodyEncodingstring

Charset used to encode the request body and decode the response body. Defaults to UTF-8.

followRedirectsboolean

Follow 3xx responses. Defaults to false — by default you see the redirect itself.

timeoutMsinteger

Per-request timeout in milliseconds. Defaults to 10000; clamped to 1000–30000.

webBotAuthboolean

Optional. Default false. Sign the outbound request with urlcap's Web Bot Auth signature (Ed25519). Adds Signature-Agent, Signature-Input and Signature headers. Target sites verify against the JWKS at /.well-known/http-message-signatures-directory.

proxy{host, port, user, password, type, pool}

Route the request through a proxy. Optional. Provide either explicit endpoint details (host + port, optional user/password, optional type = HTTP | HTTPS | SOCKS4 | SOCKS5 — default HTTP), or set pool to draw a random proxy from a urlcap-managed pool. When pool is set the explicit fields are overridden and not required. Pool values: "system" (random from the urlcap system pool), "account" (random from your account's own pool), "system-country-XX" (random system proxy in ISO country XX — e.g. "system-country-MT"), "account-country-XX" (same but from your account's pool). Errors: malformed selector → 400; valid selector with no matching proxy → 502 proxy_pool_empty.

waitMsinteger

Optional. Default 0. Post-fetch wait in milliseconds (0..60000). In the default HTTP transport this is a literal sleep after the response lands — useful for pacing tests against a target. In useRealBrowser mode it's a post-load settle window: the headless browser idles for this long after the navigation completes so async XHR / lazy modules / dynamic chunks have time to fire before the snapshot is taken.

output"json" | "har"

Optional. Default "json" — returns the standard urlcap envelope with data.request + data.response. Set to "har" to receive a bare HAR 1.2 document instead. With useRealBrowser:true the HAR carries one entry per sub-request the browser made (main document, scripts, XHR, images, fonts, third-party tracker hits, …); without it the HAR has the single roundtrip we performed. Drop the response straight into Chrome DevTools "Import HAR file…" for a waterfall, or feed to any HAR consumer.

useRealBrowserboolean

Optional. Default false. Use a real headless browser to load the URL instead of the standard HTTP transport. Best paired with output:"har" to receive a multi-entry HAR — same shape your desktop browser's DevTools produces. The main document's body is included; per-asset bodies are not. Allow extra wall-clock — the browser navigation alone is typically several seconds, on top of any waitMs you set.

Headers

Authorizationheaderrequired

Send as Bearer api_…. The legacy X-API-Key header is also accepted.

Returns

A 200 response whose data is a capture object, plus a requestId. Note that 200 means urlcap reached the target — the target's own status is in data.response.status. On error: the error envelope with 400 (invalid_request) for a bad/missing URL, 401 (unauthorized), or 502 (upstream_error) when the target couldn't be reached.

curl -X POST https://urlcap.com/api/v1/capture \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
        "url": "https://httpbin.org/post?x=1",
        "method": "POST",
        "headers": [
          { "name": "User-Agent", "value": "my-app/1.0" },
          { "name": "X-Trace-Id", "value": "abc-123" },
          { "name": "Content-Type", "value": "application/json" }
        ],
        "body": "{\"hello\":\"world\"}",
        "followRedirects": false,
        "timeoutMs": 10000
      }'

const res = await fetch("https://urlcap.com/api/v1/capture", {
  method: "POST",
  headers: { "Authorization": "Bearer " + process.env.URLCAP_KEY, "Content-Type": "application/json" },
  body: JSON.stringify({
    url: "https://example.com",
    headers: [
      { name: "User-Agent", value: "my-app/1.0" },
      { name: "Accept", value: "text/html" }
    ],
    followRedirects: true
  })
});
const { data } = await res.json();
console.log(data.response.status, data.response.headers.map(h => h.name));

get User-Agent profiles — identify as Chrome / Firefox / Safari / …

/api/v1/user_agent_profiles

By default /capture and /extract identify as urlcap/1.0 (+https://urlcap.com/bot). Two knobs let callers identify as something specific:

userAgent — a raw UA string. Wins over the profile.
userAgentProfile — a key into the catalogue. For /extract this also selects the HtmlUnit BrowserVersion (Chrome / Firefox / Edge) so JS-fingerprinted targets see a coherent (engine, UA) pairing.

The catalogue is operator-managed in the user_agent_profiles MySQL table. Hit this endpoint to discover the available keys:

GET /api/v1/user_agent_profiles

curl -s https://urlcap.com/api/v1/user_agent_profiles \
  -H "Authorization: Bearer $URLCAP_KEY"

200 OK (truncated)

{
  "version": "1",
  "data": {
    "profiles": [
      {
        "key": "chrome-latest-mac",
        "description": "Chrome 131 on macOS",
        "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
        "browserEngine": "chrome",
        "extraHeaders": {
          "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
          "Accept-Language": "en-US,en;q=0.9",
          "sec-ch-ua": "\"Chromium\";v=\"131\", \"Not_A Brand\";v=\"24\", \"Google Chrome\";v=\"131\"",
          "sec-ch-ua-mobile": "?0",
          "sec-ch-ua-platform": "\"macOS\""
        }
      },
      { "key": "firefox-latest-win",  "...": "..." },
      { "key": "edge-latest-win",     "...": "..." },
      { "key": "safari-latest-mac",   "...": "..." },
      { "key": "googlebot",           "...": "..." },
      { "key": "urlcap",              "...": "..." }
    ]
  }
}

POST /api/v1/capture — using a profile

curl -X POST https://urlcap.com/api/v1/capture \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://example.com", "userAgentProfile": "chrome-latest-mac" }'

Resolution order on /capture: explicit headers[] wins (you're in full-control mode and we don't second-guess) → userAgent → userAgentProfile → the system default. On /extract, only the profile selects the JS engine — if you supply userAgent alone the engine stays Chromium.

Coherence caveat: the safari-latest-mac profile ships a Safari UA on the Chromium engine because HtmlUnit doesn't include a Safari engine — targets that fingerprint navigator.vendor / navigator.platform will detect the mismatch. The googlebot profile only sets the UA string — urlcap is not Googlebot and reverse-DNS / published-CIDR checks at the target side will reject. Use webBotAuth=true for cryptographic urlcap attribution.

Capture (async) — long captures without timeouts

POST /api/v1/capture_full is the async sibling of /capture. Identical body, identical result shape. The difference: POST returns 202 Accepted + a taskId immediately, and the work runs in the background. You poll GET /api/v1/capture_full/{taskId} for status and the result.

Use it when: your synchronous /capture call could take more than ~10 seconds — typically when you set useRealBrowser:true, a large waitMs, or hit a target close to the 30 s timeoutMs ceiling. Long synchronous captures can be cut off by intermediate proxies / load balancers returning 504 Gateway Timeout; the async form sidesteps that entirely because the HTTP submission is fast (typically <200 ms) and the poll endpoint is quick to respond regardless of how long the underlying capture takes.

Submit

curl -sS -X POST 'https://urlcap.com/api/v1/capture_full' \
  -H 'Authorization: Bearer api_…' \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://target.example/",
    "useRealBrowser": true,
    "output": "har",
    "waitMs": 10000,
    "proxy": { "pool": "system-country-GB" }
  }'

Response (202):

{
  "version": "1",
  "requestId": "...",
  "taskId":    "d4eca6fa-922b-430c-ba17-1d03b7a3ad69",
  "status":    "pending",
  "statusUrl": "/api/v1/capture_full/d4eca6fa-922b-430c-ba17-1d03b7a3ad69"
}

Poll

curl -sS 'https://urlcap.com/api/v1/capture_full/d4eca6fa-922b-430c-ba17-1d03b7a3ad69' \
  -H 'Authorization: Bearer api_…'

While running, the response includes a progress block with the same shape as /extract's — phase, currentUrl, elapsedMs, etc. When finished, the result field carries exactly what synchronous /capture would have returned: the standard {version, requestId, data:{request, response}} envelope, or a bare HAR 1.2 document when the submission set output:"har".

List your recent capture_full tasks

GET /api/v1/capture_full returns the 25 newest tasks for your key with status + timestamps (no result body — fetch the individual {taskId} for that).

Limits

Each submission counts as one API call against your plan; the eventual result fetch is free.
Worker pool is bounded — under heavy load you may get 503 service_busy on submit. Retry with backoff.
Result size cap: ~8 MB. Very large HARs may need to be split or fetched with a different shape.

Screen capture — full-page or viewport screenshot

POST /api/v1/screen_capture drives a real headless browser to navigate the URL, optionally waits for lazy content to settle, then returns a PNG or JPEG of the rendered page. Useful when an extract or capture only gives you the DOM and you actually want the pixels — visual diffs, social previews, evidence captures, SPAs that paint canvases that no DOM extraction can reach.

Request body

POST /api/v1/screen_capture
Authorization: Bearer api_…
Content-Type: application/json

{
  "url":      "https://example.com/",
  "format":   "png",        // optional: "png" (default) or "jpeg"
  "fullPage": true,         // optional: true = full scrolled page (default); false = viewport only
  "waitMs":   0,            // optional: settle window in ms after page load (0..60000)
  "output":   "binary"      // optional: "binary" (default — image bytes) or "json" (base64 envelope)
}

urlstring

Required. Absolute http/https URL.

format"png" | "jpeg"

Optional. Default "png". Use "jpeg" when you want smaller file sizes and don't need transparency.

fullPageboolean

Optional. Default true. true → captures the entire scrolled page height. false → captures only the viewport (whatever fits at the browser's current size). For pages that fit on screen the two are identical.

waitMsinteger

Optional. Default 0. Settle window in milliseconds (0..60000) — the browser waits this long after the page finishes loading so async content (lazy images, XHR-painted widgets, JS-rendered charts) is captured rather than missed. SPAs typically need 3–10 s.

output"binary" | "json"

Optional. Default "binary" — the image bytes are returned directly with Content-Type: image/png or image/jpeg. Switch to "json" to wrap the image base64-encoded inside the standard urlcap JSON envelope — useful when piping through systems that can't handle binary bodies (e.g. some webhook receivers).

Response — binary mode (default)

The body is the image. The HTTP response carries a few X-Urlcap-* headers for observability without polluting the image stream:

Header	Meaning
Content-Type	`image/png` or `image/jpeg` matching `format`.
X-Urlcap-Elapsed-Ms	Total wall-clock the screenshot took (load + wait + capture), in milliseconds.
X-Urlcap-Full-Page	`true` or `false` — confirms what you got.

Response — JSON envelope (`output:"json"`)

{
  "version": "1",
  "requestId": "...",
  "data": {
    "format": "png",
    "fullPage": false,
    "waitMsApplied": 0,
    "totalElapsedMs": 877,
    "contentType": "image/png",
    "sizeBytes": 21460,
    "base64": "iVBORw0KGgoAAAANSUhEUgAA..."
  }
}

Worked examples

# Save a full-page PNG directly to disk
curl -sS -X POST 'https://urlcap.com/api/v1/screen_capture' \
  -H 'Authorization: Bearer api_…' \
  -H 'Content-Type: application/json' \
  -d '{"url":"https://example.com/","fullPage":true,"format":"png"}' \
  -o example.png

# SPA — give it 8 s to bootstrap then snapshot viewport JPEG
curl -sS -X POST 'https://urlcap.com/api/v1/screen_capture' \
  -H 'Authorization: Bearer api_…' \
  -H 'Content-Type: application/json' \
  -d '{"url":"https://my-spa.example/","waitMs":8000,"fullPage":false,"format":"jpeg"}' \
  -o spa-snapshot.jpg

# Embed in a JSON response (e.g. for a webhook pipeline that can't handle binary)
curl -sS -X POST 'https://urlcap.com/api/v1/screen_capture' \
  -H 'Authorization: Bearer api_…' \
  -H 'Content-Type: application/json' \
  -d '{"url":"https://example.com/","output":"json"}'

Limits & notes

Browser viewport is fixed at the fleet's default (1920×~993 today). Custom widths/heights are not exposed.
Each call counts as one API request against your plan.
Expect total wall-clock around (page-load) + waitMs + small overhead. Heavy SPAs are routinely 8–20 s.
Real-browser capacity is shared across the fleet — a small burst may queue briefly during peak load.

Extract — navigation & information retrieval

The extract tool runs a small "recipe" against a web page using a headless browser engine: it loads a URL, optionally performs a sequence of actions (typing into fields, clicking, waiting, navigating), then pulls out the data you describe with CSS or XPath selectors — optionally walking through paginated results. It also handles JSON responses (a REST API, say): point it at a URL that returns JSON and pull out values with JSONPath. Because a job can take a while, it runs asynchronously: you submit a job model, get a taskId back, and poll for completion.

The underlying browser engine is an implementation detail and may change; the job model and the result shape below are the contract.

The job model

A job is a JSON object:

job model

{
  "search_id": "my-job-1",                       // optional: a correlation id you choose; echoed back in the result
  "url": "https://example.com/search",           // required: the page to load
  "content": "json",                             // optional: "json" forces the response to be processed as JSON (see below)
  "actions": [ /* steps performed before extraction — see below */ ],
  "extractors": [ /* what to pull from the final page — see below */ ],
  "pagination": { /* optional: walk through multiple pages — see below */ },
  "includeHeaders": false,                       // optional: include the final response headers in result.http (see below)
  "includeCookies": false,                       // optional: include final-response Set-Cookie + the post-navigation cookie jar
  "includeAllHttpRequests": false,               // optional: include every HTTP request the engine made (implies the two above)
  "loadCss": false,                              // optional: fetch and parse CSS (off by default for speed); populates result.http.assets[]
  "loadImages": false,                           // optional: download image binaries; populates result.http.assets[]
  "waitMs": 0,                                   // optional: settle window after page load (0..60000) — wait for async XHR / lazy modules
  "proxy": { "pool": "system-country-MT" }       // optional: route every request through a proxy (explicit host/port OR a pool selector)
}

urlstringrequired

The http/https page to load first.

search_idstring

An optional identifier you choose; returned in the result as search_id so you can correlate jobs.

contentstring

Set to json to process the response body as JSON regardless of its Content-Type. If omitted, the engine auto-detects: an HTML response is treated as a page, anything else (application/json, text/plain, …) as JSON. See JSON content.

actionsarray

An ordered list of actions performed on the page before extraction. Optional. (Ignored for JSON content.)

extractorsarray

The extractors run against the final page. Their results become the top-level fields of the result.

paginationobject

If present, pagination visits multiple pages and runs its own per-page extractors on each. (Ignored for JSON content.)

webBotAuthboolean

Optional. Default false. Sign every outbound request the headless browser makes (main document, scripts, XHR, …) with urlcap's Web Bot Auth signature (Ed25519). Target sites verify against the JWKS at /.well-known/http-message-signatures-directory.

includeHeadersboolean

Optional. Default false. Include the final landing-page response headers in result.http.finalResponse.headers. The "final" response is the settled page after JS, actions and pagination have run — not the initial load. See HTTP debug.

includeCookiesboolean

Optional. Default false. Include the final-response Set-Cookie list in result.http.finalResponse.cookies AND the complete post-navigation cookie jar in result.http.cookieJar. The jar contains every cookie the browser accumulated during the run — including ones set by intermediate redirects or by JS. Cookies frequently carry session/auth tokens, so they are off by default.

includeAllHttpRequestsboolean

Optional. Default false. Include every HTTP request the engine made — main document, scripts, XHR/AJAX, asset hops — in result.http.requests[], each entry carrying url / method / status / request & response headers / parsed Set-Cookie. Implies includeHeaders and includeCookies. Capped at 200 entries; if the cap fires, result.http.requestsTruncated is true and requestsTotalCount reports the real total. Header values are truncated at 4 kB each.

loadCssboolean

Optional. Default false. Tell the engine to fetch and parse CSS — off by default for speed and bandwidth. Also makes the engine fetch resources referenced by CSS (e.g. fonts via @font-face). Populates result.http.assets[]. See HTTP debug.

loadImagesboolean

Optional. Default false. Tell the engine to download image binaries — <img>, <picture>, and (when loadCss is also on) CSS background images. Populates result.http.assets[].

waitMsinteger

Optional. Default 0. Post-load settle window in milliseconds (bounded 0..60000). After the initial navigation the engine waits this long for async JS, lazy XHR, dynamic chunks to fire before running actions or extractors. Critical for SPAs whose data fetches happen after the load event — e.g. an Angular/React page that pulls data via XHR a few seconds after first paint.

proxy{host, port, user, password, type, pool}

Optional. Route every request the engine makes during this job (main document, scripts, XHR, asset fetches) through a proxy. One proxy per extract job — resolved at submission time, then used for the entire run. When the chosen pool entry supports it, every sub-request within the job shares the same exit IP (session affinity) so server-side state — cookies, anti-bot tokens, A/B bucketing — stays consistent. Provide either explicit endpoint details (host, port, optional user/password, optional type = HTTP|HTTPS|SOCKS4|SOCKS5), or set pool to draw a proxy from a urlcap-managed pool. Pool values: "system", "account", "system-country-XX", "account-country-XX" (XX = ISO-3166 country code, e.g. MT, GB). When pool is set, the explicit fields are overridden and not required. Errors: malformed selector → 400; pool resolves to nothing → 502 proxy_pool_empty.

useRealBrowserboolean

Optional. Default false. Route the initial navigation through a real headless browser (full JavaScript execution, modern Chrome semantics) instead of the default engine. Extractors run against the post-JS rendered DOM the real browser produced — useful for SPAs whose content paints after the load event and where the default engine struggles with modern JS. Limitations in this initial cut: inside actions[] only click actions are dispatched to the real browser (other types log a warning and are skipped); pagination strategies still run on the default engine. Click selectors must be unique on the page — if a selector matches more than one element the click fails and the error message lists every match, so you can refine.

output"json" | "har"

Optional. Default "json" (the standard extract result). Set to "har" to additionally attach a HAR 1.2 document at result.har covering every sub-request the headless browser made — main document + every script / XHR / fetch / image / font / tracker hit, with full request & response headers and status. Requires useRealBrowser:true; silently ignored otherwise (the default engine doesn't drive a request capture pipeline). Extractor output is still produced alongside the HAR, so clients get both the structured fields they asked for AND the wire-level capture in one shot. Drop result.har into Chrome DevTools "Import HAR file…" for a waterfall.

Worked example — SPA + cookie banner + HAR

Visit an SPA, click the cookie consent button, extract the page title, and return a full HAR of every sub-request the browser made:

POST /api/v1/extract
Authorization: Bearer api_…
Content-Type: application/json

{
  "url":            "https://target.example/spa",
  "useRealBrowser": true,
  "output":         "har",
  "waitMs":         8000,
  "actions": [
    { "type": "click", "selector": "button:has-text(\"Aceptar\")", "ms": 3000 }
  ],
  "extractors": [
    { "name": "title", "selector": "title", "type": "text" }
  ]
}

After polling to "succeeded", the response carries both the extractor value AND the HAR document side-by-side:

{
  "status": "succeeded",
  "elapsedMs": 25413,
  "result": {
    "title": "Geoportal Registradores",
    "har": {
      "log": {
        "version": "1.2",
        "creator": { "name": "urlcap", "version": "1.1.0" },
        "pages":   [ { "id": "page_1", "startedDateTime": "…", "title": "…" } ],
        "entries": [ /* one per sub-request — 137 in this real run */ ]
      }
    }
  }
}

The HAR captures the click-triggered traffic too — so if the cookie banner click loaded analytics scripts or fetched a fresh CSRF token, those appear in result.har.log.entries alongside the initial page load. Playwright selectors: :has-text(...) is a Playwright-specific pseudo that matches by visible text — handy for buttons whose text is more stable than their ids.

Selectors

Every selector field accepts a CSS selector by default, or an XPath expression with an xpath: prefix. You may also write css: explicitly. When the response is JSON content, selector is a JSONPath expression instead (the jsonpath: prefix is accepted but optional there):

selector syntax

"#numResultados"                              // CSS (no prefix)
"css:.results a.title"                        // CSS (explicit)
"xpath://a[starts-with(@href,'item.php?')]"   // XPath
"$.store.book[*].title"                       // JSONPath  (only for JSON content)
"jsonpath:$..price"                           // JSONPath  (explicit)

Actions

Each entry in actions is performed in order before the extractors run. An action object has a type and the fields that type needs:

`type`	Other fields	Effect
fill	selector, value	Sets the value of the matched input/textarea (or the `value` attribute otherwise).
select	selector, value	Selects the option with that value in the matched `<select>`.
click	selector	Clicks the matched element, then waits briefly for background JavaScript.
wait	ms	Waits `ms` milliseconds (default 1000) for background JavaScript to settle.
navigate	url	Loads a different URL.

actions example

"actions": [
  { "type": "fill",   "selector": "#q",        "value": "widgets" },
  { "type": "select", "selector": "#category", "value": "hardware" },
  { "type": "click",  "selector": "css:button[type=submit]" },
  { "type": "wait",   "ms": 2000 }
]

Extractors

Each entry in extractors produces one top-level field in the result, named by its name. The type decides what is produced:

textstring

The text content of the first element matching selector.

attrstring

The value of the attr attribute on the first matching element.

listarray of strings

The text (or, if attr is given, the attribute) of every matching element.

itemsarray of objects

One object per matching element; each object's keys come from the extractor's fields, evaluated relative to that element.

A fields entry (used by items) has name, selector, optional attr, and type (text or attr).

extractors example

"extractors": [
  { "name": "total",   "selector": "#numResults", "type": "text" },
  { "name": "results", "selector": "css:.result", "type": "items",
    "fields": [
      { "name": "title", "selector": "a.title", "type": "text" },
      { "name": "href",  "selector": "a.title", "type": "attr", "attr": "href" },
      { "name": "price", "selector": ".price",  "type": "text" }
    ]
  }
]

Every object inside a list/items array is automatically stamped with result_global_id (a counter across the whole job), result_relative_id (a counter within its page), and result_page (the 0-based page index it came from). (Stamping applies to HTML pages only — JSON results are returned verbatim.)

For JSON content the same extractor shapes apply, but selector is a JSONPath expression and the types are value (the default), list and items — see below.

Pagination

If pagination is present, the job visits multiple pages and runs per_page_extractors on each; the per-page results appear under a pages array in the result. Three strategies:

strategystring

sequential (default) — repeatedly click next_selector on the same page; or link_tour — visit every distinct pagination link found by link_selector exactly once (for AJAX paginators whose whole bar re-renders each page); or visit_each — resolve link_selector on the initial page to a list of URLs, then navigate to each one (full page load) and run per_page_extractors on it. Use this for the "list page + detail pages" pattern: search results, product index → product pages, author index → author bios.

next_selectorstring

The "next page" element to click (sequential strategy).

link_selectorstring

For link_tour — selects every pagination link. For visit_each — selects all the link elements whose href attributes should be followed; URLs are absolutised against the initial page and deduplicated automatically.

max_pagesinteger

Hard cap on pages visited. Default 10. For visit_each set this to your expected list size (or higher) — links beyond the cap are dropped.

wait_msinteger

Milliseconds to wait after each page transition for background JavaScript (sequential / link_tour) or between detail-page visits (visit_each — pacing knob against rate-limited targets). Default 1000. When paired with wait_ms_max this is the lower bound of a random range.

wait_ms_maxinteger

Optional. When greater than wait_ms, the actual wait between page transitions is uniformly random within [wait_ms, wait_ms_max] on each iteration. Useful for pacing crawls so the engine doesn't look like a fixed-cadence bot. Default 0 (disabled — falls back to fixed wait_ms).

per_page_timeout_msinteger

visit_each only. Per-iteration timeout in milliseconds — bounds the whole load+extract for one detail URL. On overrun the page lands in pages[] with error: "per_page_timeout exceeded (NNN ms)" and the walk continues to the next URL. Default 60000 (60 s); capped at 600 000 (10 min). Set to 0 to disable (NOT recommended — one wedged page on a bad target will otherwise stall the whole walk).

stop_when_missingboolean

If true (default), stop quietly when the next-page element is gone; if false, the job fails. (sequential only.)

per_page_extractorsarray

Extractors (same shape as above) run on every visited page. For visit_each, each result also carries _url (the URL that was visited) so you can map rows back to their source. Per-URL failures are non-fatal: the failing page lands in pages[] with an error field instead of the extractor values, and the job keeps running.

Long-running jobs — what protects you

Two layers of safety for big visit_each walks:

Per-page timeout (per_page_timeout_ms, default 60 s) — each iteration is bounded; an unresponsive page becomes one row with an error, the walk continues.
Job-wide watchdog — a background sweeper checks every running extract / capture task; if its progress.updatedAt hasn't moved in 5 minutes (configurable server-side), the worker is interrupted and the task is marked failed with a watchdog: … message. Polling clients see a terminal state instead of a forever-running orphan. The per-page timeout is the first line of defense; the watchdog is the safety net for when the per-page bound itself misbehaves.

Example — list page + 1200 detail pages (`visit_each`)

POST /api/v1/extract
Authorization: Bearer api_…

{
  "url": "https://shop.example/catalog?page=1",
  "proxy": { "pool": "system" },                 // optional — every detail-page visit shares this exit IP
  "pagination": {
    "strategy":        "visit_each",
    "link_selector":   "a.product-card",
    "max_pages":       1200,
    "wait_ms":         200,
    "per_page_extractors": [
      { "name": "title", "selector": "h1.product-title",   "type": "text" },
      { "name": "price", "selector": "span.price.current", "type": "text" },
      { "name": "sku",   "selector": "[data-sku]",         "type": "attr", "attr": "data-sku" }
    ]
  }
}

One async task does the whole walk. The result's pages array has one row per detail-page visit (with _url + your extractor values, or error for failures). The optional proxy block routes every sub-request through one proxy for the whole job — use "pool": "system" for any urlcap-managed proxy, or "system-country-XX" to pin an exit country (e.g. "system-country-GB"). Within the job, every detail-page visit shares the same exit IP — useful when the target ties cookies, anti-bot state or A/B bucketing to source IP.

JSON content

When the page you load returns JSON — a REST API endpoint, for example — the extract tool parses the response body and runs your extractors with JSONPath instead of CSS/XPath. This happens automatically when the response isn't HTML (application/json, text/plain, anything that isn't text/html); to force it (e.g. an API that mislabels its Content-Type as text/html), set "content": "json" in the job model.

In JSON mode each extractor's selector is a JSONPath expression (the jsonpath: prefix is accepted but optional; an expression that doesn't start with $ gets one prepended, so store.book[0].title works too). actions and pagination don't apply and are ignored. The type decides what each extractor produces:

valueany

The matched value — a JSON object, array, string, number or boolean (or null if nothing matched). A path that selects more than one node (uses .., [*], a filter or a slice) yields an array of all matches. This is the default when type is omitted.

listarray

Always an array: every match if the path is multi-valued, or the single matched value wrapped in a one-element array if it's not — an empty array when nothing matched.

itemsarray of objects

The path selects a set of nodes; each becomes one object whose keys come from the extractor's fields, evaluated relative to that node — a field's selector is a JSONPath where $ is the node (an empty selector means the node itself).

The attr type (and a field's attr) is for HTML only; using it on JSON content fails the job.

JSON job model + result

// for a URL returning:  { "page": 1, "total": 128,
//                         "results": [ { "id": 1, "name": "Widget A", "price": 9.99 },
//                                      { "id": 2, "name": "Widget B", "price": 12 } ] }
{
  "url": "https://api.example.com/products?q=widget",
  "content": "json",                                  // optional — auto-detected for application/json
  "extractors": [
    { "name": "total",  "selector": "$.total" },
    { "name": "names",  "selector": "$.results[*].name",          "type": "list" },
    { "name": "cheap",  "selector": "$.results[?(@.price < 10)]", "type": "items",
      "fields": [
        { "name": "id",    "selector": "$.id" },
        { "name": "name",  "selector": "$.name" },
        { "name": "price", "selector": "$.price" }
      ]
    }
  ]
}

// → result:
{
  "total": 128,
  "names": [ "Widget A", "Widget B" ],
  "cheap": [ { "id": 1, "name": "Widget A", "price": 9.99 } ]
}

HTTP debug — headers, cookies, request chain

By default the result only carries the data your extractors produced. Three opt-in flags on the job model surface what the headless browser actually saw on the wire — useful for debugging selectors that aren't matching, verifying that auth cookies survived a redirect, or learning what sub-requests a JS-heavy page actually fires. All three are off by default and incur no overhead when off.

Flag	Adds to `result.http`
includeHeaders	`finalResponse.headers[]` — every response header of the settled landing page (after JS, `actions`, `pagination`).
includeCookies	`finalResponse.cookies[]` (parsed `Set-Cookie` from the final response) + `cookieJar[]` (every cookie the browser holds at the end of the run, including ones set by redirects or JS).
includeAllHttpRequests	`requests[]` — one entry per HTTP request the engine made (main document, scripts, XHR/AJAX, asset hops), each with `url` / `method` / `status` / `requestHeaders` / `responseHeaders` / parsed `Set-Cookie`. Implies the two flags above.
loadCss / loadImages	`assets[]` — compact per-asset list (`url` / `type` / `status` / `sizeBytes`) for every sub-request the engine made. `type` is one of `html` / `css` / `javascript` / `image` / `font` / `media` / `json` / `xhr` / `other` — classified by Content-Type first, then URL extension. Either flag enables the list.

Default asset behaviour

The headless engine fetches the main HTML document and any JavaScript (<script src=…>, XHR / fetch(), server-sent events) — JS is on because most modern data-extraction targets render their content with it. CSS, images, fonts and media are off by default to keep extractions fast and cheap. Turn them on per-job with loadCss and loadImages. There is no separate font or media flag: fonts ride on CSS via @font-face (so they load when loadCss is on), and <video>/<audio> only load when JS plays them. A common reason to turn assets on is bot-detection — some anti-bot tools score sessions on whether the client fetched the site's CSS and favicon, and a JS-only fetch fingerprint can trip them.

Example — submit with includeHeaders and includeCookies:

request body

{
  "url": "https://example.com/",
  "extractors": [ { "name": "title", "type": "text", "selector": "h1" } ],
  "includeHeaders": true,
  "includeCookies": true
}

The resulting result gains an http object alongside the extractor outputs:

result (shape)

{
  "search_id": "…",
  "title": "Example Domain",
  "http": {
    "finalResponse": {
      "url": "https://example.com/",
      "status": 200,
      "statusText": "OK",
      "headers": [
        { "name": "Content-Type", "value": "text/html" },
        { "name": "Server",       "value": "cloudflare" }
        /* … */
      ],
      "cookies": [ /* parsed Set-Cookie from the final response */ ]
    },
    "cookieJar": [
      {
        "name": "sid", "value": "abc123",
        "domain": "example.com", "path": "/",
        "expires": "2026-12-31T23:59:59Z",
        "secure": true, "httpOnly": true, "sameSite": "Lax"
      }
    ]
  }
}

With includeAllHttpRequests, result.http.requests[] appears too:

result.http.requests[]

[
  {
    "url": "https://example.com/",
    "method": "GET",
    "status": 200,
    "statusText": "OK",
    "requestHeaders":  [ { "name": "User-Agent",   "value": "…" } ],
    "responseHeaders": [ { "name": "Content-Type", "value": "text/html" } ],
    "cookies": [ /* parsed Set-Cookie from this hop */ ]
  }
  /* …one per HTTP request the engine made (capped at 200) */
]

The requests[] array is capped at 200 entries. If the cap fires, result.http.requestsTruncated is true and requestsTotalCount reports the actual total. Individual header values are truncated at 4 kB each (long Set-Cookie / CSP headers get a …[truncated N chars] suffix). These bounds keep one JS-heavy extract from monopolising the task-result column.

With loadCss and/or loadImages, result.http.assets[] appears — one entry per sub-request, in load order:

result.http.assets[]

[
  { "url": "https://example.com/",            "type": "html",       "status": 200, "sizeBytes": 1256 },
  { "url": "https://example.com/styles.css",  "type": "css",        "status": 200, "sizeBytes": 8421 },
  { "url": "https://example.com/app.js",      "type": "javascript", "status": 200, "sizeBytes": 51280 },
  { "url": "https://example.com/logo.svg",    "type": "image",      "status": 200, "sizeBytes": 4112 },
  { "url": "https://example.com/font.woff2",  "type": "font",       "status": 200, "sizeBytes": 32114 }
]

sizeBytes is the response's declared Content-Length. For chunked or streamed responses without a declared size, sizeBytes is null — fall back to the full result.http.requests[] entry if you need response-header detail for that hop.

Be careful with cookies in shared logs. The cookie jar regularly contains session identifiers, CSRF tokens and Cloudflare anti-bot cookies. If you persist extract results to a shared system, treat http.cookieJar[] and per-hop cookies[] as secret.

post Submit a job

/api/v1/extract

Send the job model as a JSON body. The job is queued and you get a taskId immediately (status 202). Poll GET /api/v1/extract/{taskId} for progress.

Authorizationheaderrequired

Send as Bearer api_…. The legacy X-API-Key header is also accepted.

POST /api/v1/extract

curl -X POST https://urlcap.com/api/v1/extract \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
        "search_id": "demo-1",
        "url": "https://example.com",
        "extractors": [ { "name": "heading", "selector": "css:h1", "type": "text" } ]
      }'

202 Accepted

{
  "version": "1",
  "taskId": "2b9a0c3e-7d11-4f44-9a8c-2c1d4e5f6a7b",
  "status": "pending",
  "statusUrl": "/api/v1/extract/2b9a0c3e-7d11-4f44-9a8c-2c1d4e5f6a7b"
}

If too many jobs are queued you get 503 (service_busy) — retry shortly. 400 (invalid_request) means the model couldn't be parsed or is missing url.

get Task status

/api/v1/extract/{taskId}

Returns the task's current state. status is one of pending, running, succeeded, failed. When succeeded, a result object is included; when failed, an error object. httpRequestCount is how many HTTP requests the engine has performed for this job so far. elapsedMs is the wall-clock time since the job started (live while running, frozen at finish).

Live progress while running. For long jobs — especially visit_each walks over 100+ URLs — the response carries a progress object you can poll to see what the engine is doing right now:

"progress": {
  "phase":            "visiting",
  "currentUrl":       "https://target.example/detail/487",
  "lastCompletedUrl": "https://target.example/detail/486",
  "list":   { "total": 1059, "index": 487, "completed": 486, "failed": 3 },
  "lastError":        "FailingHttpStatusCodeException: 503",
  "updatedAt":        "2026-06-24T15:42:11.000Z"
}

Fields:

phase — high-level stage: loading (initial fetch) · harvesting (extracting URL list, visit_each only) · visiting (walking the list) · paginating (next-button or AJAX clicks) · extracting (running top-level extractors) · finalising (right before result is stored).
currentUrl — the URL being processed RIGHT NOW; null between pages.
lastCompletedUrl — the most recently finished sub-page.
list.{total, index, completed, failed} — counters for list-walking strategies. index is 1-based; total is the harvested URL count capped at max_pages.
lastError — most recent per-page failure message (handy for diagnosing why failed ticked up without killing the job).
updatedAt — server timestamp of the last progress flush. Compare against now to detect a wedged worker.

The progress field is absent on tasks that are still pending and on older tasks that ran before this feature shipped. The engine flushes progress at human-meaningful events (page load, list-walk iteration), not on every sub-asset fetch — so you can poll at 1–5 second intervals without missing anything important.

You only see your own tasks; an unknown id (or one belonging to another key) returns 404. GET /api/v1/extract (no id) lists your recent tasks.

GET /api/v1/extract/{taskId}

curl https://urlcap.com/api/v1/extract/2b9a0c3e-7d11-4f44-9a8c-2c1d4e5f6a7b \
  -H "Authorization: Bearer $URLCAP_KEY"

200 OK — succeeded

{
  "version": "1",
  "taskId": "2b9a0c3e-7d11-4f44-9a8c-2c1d4e5f6a7b",
  "status": "succeeded",
  "url": "https://example.com",
  "httpRequestCount": 1,
  "createdAt": "2026-05-11T13:00:00.000Z",
  "startedAt": "2026-05-11T13:00:00.100Z",
  "finishedAt": "2026-05-11T13:00:01.900Z",
  "result": {
    "search_id": "demo-1",
    "heading": "Example Domain"
  }
}

For a job with an items extractor, the result looks like:

result with items

"result": {
  "search_id": "demo-1",
  "total": "128 results",
  "results": [
    { "result_global_id": 1, "result_relative_id": 1, "result_page": 0, "title": "Widget A", "href": "/item?id=1", "price": "9.99" },
    { "result_global_id": 2, "result_relative_id": 2, "result_page": 0, "title": "Widget B", "href": "/item?id=2", "price": "12.00" }
  ]
}

IP & CIDR

Work with IPv4 and IPv6 addresses and CIDR ranges: check whether an address falls inside a range, keep a list of named ranges (allow/block lists, ASN or geo blocks, your own networks, …) and ask which of them contain a given address.

Stored ranges live in an optimised table: every address is kept as a 16-byte value (IPv4 is stored IPv4-mapped, ::ffff:a.b.c.d, so v4 and v6 share one comparable key space), each range as its first and last address plus a prefix length, and a B-tree index on those bounds turns "which ranges contain this address?" into an index range scan.

get post Is an address in a CIDR?

/api/v1/ip/contains

A pure calculation — does ip fall within cidr? (Different address families ⇒ false.) A single host can be written with or without /32 · /128.

ipstringrequired

A single IPv4 or IPv6 address.

cidrstring

A CIDR (e.g. 10.0.0.0/8) or a single address. Required unless you pass cidrs.

cidrsarray of strings

POST only — check the address against several ranges at once; the response then has a results array.

GET /api/v1/ip/contains

curl -G https://urlcap.com/api/v1/ip/contains \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "ip=10.20.30.40" \
  --data-urlencode "cidr=10.0.0.0/8"

200 OK

{
  "version": "1",
  "requestId": "…",
  "data": {
    "ip": "10.20.30.40",
    "cidr": "10.0.0.0/8",
    "contains": true,
    "range": {
      "cidr": "10.0.0.0/8",
      "family": 4,
      "prefixLength": 8,
      "networkAddress": "10.0.0.0",
      "lastAddress": "10.255.255.255"
    }
  }
}

Batch form: POST /api/v1/ip/contains with { "ip": "10.20.30.40", "cidrs": ["10.0.0.0/8", "192.168.0.0/16", "2001:db8::/32"] } → data.results is [ { "cidr": "10.0.0.0/8", "contains": true }, … ].

get post Which stored ranges contain an address?

/api/v1/ip/lookup

Looks up every range in your stored set (see below) that contains ip, most-specific first.

ipstringrequired

A single IPv4 or IPv6 address (query parameter, or JSON { "ip": "…" }).

GET /api/v1/ip/lookup

curl -G https://urlcap.com/api/v1/ip/lookup \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "ip=10.20.30.40"

200 OK

{
  "version": "1",
  "requestId": "…",
  "data": {
    "ip": "10.20.30.40",
    "family": 4,
    "matchCount": 2,
    "matches": [
      { "id": 7, "cidr": "10.20.0.0/16", "family": 4, "prefixLength": 16, "label": "office-lan" },
      { "id": 3, "cidr": "10.0.0.0/8",   "family": 4, "prefixLength": 8,  "label": "rfc1918" }
    ]
  }
}

get post del Manage stored ranges

/api/v1/ip/ranges · /api/v1/ip/ranges/{id}

GET /api/v1/ip/ranges — list your stored ranges (newest first): id, cidr, family, prefixLength, label, createdAt.
POST /api/v1/ip/ranges with { "cidr": "10.20.0.0/16", "label": "office-lan" } — adds the range (the cidr is canonicalised on the way in). If that CIDR is already stored its label is updated. Returns 201 with the row.
DELETE /api/v1/ip/ranges/{id} — removes a stored range. 404 if there's no such id.

POST /api/v1/ip/ranges

curl -X POST https://urlcap.com/api/v1/ip/ranges \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "cidr": "2001:db8::/32", "label": "documentation-prefix" }'

201 Created

{
  "version": "1",
  "requestId": "…",
  "data": {
    "id": 12,
    "cidr": "2001:db8::/32",
    "family": 6,
    "prefixLength": 32,
    "networkAddress": "2001:db8::",
    "lastAddress": "2001:db8:ffff:ffff:ffff:ffff:ffff:ffff",
    "label": "documentation-prefix"
  }
}

get Full IP intelligence

/api/v1/ip/intelligence

One request, every signal urlcap has on an IP. Composes the static lookups (CIDR membership, GeoIP, reverse DNS, bot registry, trust list) with the behavioural data we've aggregated from ingested traffic: which JA4 fingerprints the IP has used, which user agents, what sites it hits, the status-code mix it gets back, vulnerability- probe hits, and whether other customers have voted on its JA4s via edge_action.

Use it as the investigation pane for a single IP — e.g., "tell me everything about the IP that just showed up in my logs." For inline blocklist hot paths, stay on /api/v1/ip/contains and /api/v1/ja4/intelligence — they're optimised for membership checks; this one composes ~7 sub-queries per call.

ipstring · required

An IPv4 or IPv6 address. CIDRs are not accepted here.

window_daysint 1..7 · default 7

Sliding window for the behavioural snapshot. Capped at 7 because that's the request_events TTL. Static sections (geo, rDNS, bot registry, trust list, bot observations) ignore this.

GET /api/v1/ip/intelligence?ip=216.73.217.19

curl -s "https://urlcap.com/api/v1/ip/intelligence?ip=216.73.217.19" \
  -H "Authorization: Bearer $URLCAP_KEY"

200 OK

{
  "version": "1",
  "data": {
    "ip": "216.73.217.19",
    "family": 4,
    "windowDays": 7,

    "geo": {
      "countryCode": "US", "countryName": "United States",
      "city": "Columbus", "asn": 16509,
      "asnOrganization": "Amazon.com, Inc."
    },
    "reverseDns": { "names": [], "ttlSeconds": 300, "error": "no PTR record" },
    "isBot": {
      "matched": true,
      "matches": [
        { "botGroup": "Claude bot", "botGroupId": 10, "cidr": "216.73.216.0/22" }
      ]
    },
    "trustList": { "trusted": false, "byUsers": 0 },

    "botObservations": [
      { "botGroup": "Claude bot", "source": "cidr_match",
        "observations": 439677, "firstSeen": "…", "lastSeen": "…" }
    ],

    "behaviour": {
      "totalRequests": 29179,
      "distinctJa4s": 1, "distinctUserAgents": 1, "distinctSites": 1,
      "firstSeen": "…", "lastSeen": "…",
      "statusMix": { "count2xx": 28912, "count3xx": 267, "count4xx": 0,
                     "count5xx": 0, "count403": 0, "count444": 0 },
      "assetRatio": 0.0,
      "vulnProbeHits": 0, "vulnProbesUnique": 0,
      "blockRatio": 0.0
    },

    "ja4s": [
      { "ja4": "t13d1011h2_61a7ad8aa9b6_867a6ff6dde3",
        "ja4Hash": "7561205223071800741",
        "requestCount": 29179,
        "classification": "known_bot",
        "botGroup": "OAI-SearchBot" }
    ],

    "userAgents": [
      { "userAgent": "Mozilla/5.0 AppleWebKit/537.36 (…; compatible; Claude-SearchBot/1.0; +searchbot@anthropic.com)",
        "requestCount": 29179 }
    ],

    "crossCustomerAction": null
  }
}

Reading the response

isBot.matches — every published-CIDR registry hit. Operator-grade attribution: Googlebot, Bingbot, GPTBot, ClaudeBot, etc. CIDRs refreshed daily from operators' own JSON.
botObservations — every {bot_group, IP} attribution we've made internally. source tells you how: cidr_match (registry), ua_match (UA self-identification), vuln_match (≥3 vuln-probe paths), manual (admin override).
trustList.byUsers — how many distinct urlcap accounts have whitelisted this IP. "12 customers trust this IP" is a strong "do not block" vote.
behaviour.statusMix — what response codes this IP gets back across all sites. High 403/444 ratio means edges are already blocking it.
behaviour.assetRatio — fraction of requests that fetched images/CSS/JS/fonts. Browsers ≈ 0.5-0.9; bots ≈ 0.
ja4s[].classification — per fingerprint: known_bot (attributed in bot_observed_ja4s), candidate (pending review, includes score), or unknown.
crossCustomerAction — when other customers have submitted edge_action outcomes for any JA4 this IP has used, the headcounts surface here. Highest-confidence label we publish.

Heads up: the per-IP attribution from botObservations and the per-JA4 attribution from ja4s[].botGroup can disagree. In the example above, the IP belongs to Anthropic (Claude bot CIDR), while the JA4 fingerprint is currently attributed to OAI-SearchBot because Anthropic and OpenAI's HTTP clients ship the same TLS library and produce identical fingerprints. That's not a bug — it's the kind of cross-axis fact this endpoint exists to surface. Decide policy based on which axis matters more for your case.

The TOTP code object

A successful call to the TOTP endpoint returns an envelope containing a data object that represents a freshly computed time-based one-time password and the parameters used to compute it.

codestring

The current one-time password, as a zero-padded decimal string of length digits.

digitsinteger

Number of digits in code (commonly 6, taken from the URI or its default).

periodinteger

Length of the time step in seconds (commonly 30).

algorithmstring

HMAC hash used: SHA1, SHA256, or SHA512.

expiresIninteger

Seconds remaining until code rotates. Use it to render a countdown; refetch when it reaches zero.

data

{
  "code": "492039",
  "digits": 6,
  "period": 30,
  "algorithm": "SHA1",
  "expiresIn": 14
}

get post Generate a TOTP code

/api/v1/totp

Related guide

Automating TOTP codes for staging and QA workflows — Playwright + Python examples, secret-handling rules.

Computes the current TOTP code for an otpauth:// URI — the same string you'd scan into an authenticator app. The URI carries the shared secret and the algorithm/digits/period; nothing is stored server-side. Accepts GET (query string) or POST (application/x-www-form-urlencoded).

How secrets are handled

The otpauth:// URI (and its embedded shared secret) is processed in memory for the duration of one request and is never persisted.
The uri parameter is redacted from request logs and excluded from analytics.
Transport is TLS 1.2+ only; cleartext requests are refused.
The endpoint is intended for automated testing and internal automation against systems you own — not for storing or generating codes for your personal 2FA accounts.
Full posture: /security.

Parameters

uristringrequired

An otpauth://totp/... URI containing at least a secret parameter (Base32). May also include algorithm, digits, and period. Always percent-encode this value.

Headers

Authorizationheaderrequired

Send as Bearer api_…. The legacy X-API-Key header is also accepted.

Returns

A 200 response whose data field is a TOTP code object, plus a requestId. On error, the standard error envelope with status 400 or 401.

curl -G https://urlcap.com/api/v1/totp \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "uri=otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP&period=30&digits=6"

const uri = "otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP&period=30&digits=6";
const res = await fetch(
  `https://urlcap.com/api/v1/totp?uri=${encodeURIComponent(uri)}`,
  { headers: { "Authorization": "Bearer " + process.env.URLCAP_KEY } }
);
if (!res.ok) throw new Error(`urlcap: ${res.status}`);
const { data, requestId } = await res.json();
console.log(`${data.code} (expires in ${data.expiresIn}s) — request ${requestId}`);

import os, requests

uri = "otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP&period=30&digits=6"
resp = requests.get(
    "https://urlcap.com/api/v1/totp",
    params={"uri": uri},
    headers={"Authorization": f"Bearer {os.environ['URLCAP_KEY']}"},
    timeout=10,
)
resp.raise_for_status()
print(resp.json()["data"]["code"])

Response · 200 OK

{
  "version": "1",
  "requestId": "9f1c0b7a-3e2d-4a51-9b88-2f6c1e7d4a02",
  "data": {
    "code": "492039",
    "digits": 6,
    "period": 30,
    "algorithm": "SHA1",
    "expiresIn": 14
  }
}

Scheduled tasks

Related guide

Schedule recurring HTTP checks without running a cron server

Run a task at a future time — once, or repeatedly on a cron schedule — without keeping a connection open. A scheduled task is either a capture (kind: "capture") or an extract job (kind: "extract"). Every execution stores its full result, which you can fetch later.

The base path is /api/v1/schedules. Everything here uses the standard JSON envelope and your API key.

The schedule object

schedule

{
  "id": "0f4c…-uuid",
  "kind": "capture",                   // "capture" | "extract"
  "name": "prod health check",         // optional label
  "cron": "*/15 * * * *",              // recurring; null for a one-shot
  "runAt": null,                       // one-shot ISO-8601 time; null for recurring
  "timezone": "Europe/Madrid",         // the cron is evaluated in this zone (default UTC)
  "status": "active",                  // active | paused | done | disabled
  "nextRunAt": "2026-06-01T07:15:00Z",
  "lastRunAt": "2026-06-01T07:00:00Z",
  "runs": 12,
  "maxRuns": null,                     // stop after this many runs; null = unlimited
  "until": null,                       // stop after this ISO-8601 time; null = no end
  "createdAt": "2026-05-12T10:00:00Z",
  "capture": { "url": "https://example.com/health", "method": "GET" }   // present for kind "capture";
                                                                        // an "extract" key holds the job model for kind "extract"
}

Cron syntax. The classic 5-field crontab form — minute hour day-of-month month day-of-week — with ranges (1-5), lists (1,15), steps (*/15) and names (MON, JAN). An optional 6th leading field adds seconds. The macros @hourly @daily @weekly @monthly @yearly work too. (The Quartz-only L/W/# do not.) A job whose time was missed while the service was down runs once on the next poll, then resumes at its next future occurrence.

Extract schedules. An extract task runs through the (asynchronous) extract engine; the scheduler waits for it to finish and stores the engine's result in the run row (no httpStatus — that's a capture-only field). It also shows up in your extract task list.

post Create a schedule

/api/v1/schedules

Send either a cron expression (recurring) or a runAt timestamp (one-shot) — not both — plus exactly one of a capture object (same shape as the capture object) or an extract object (the extract job model). Which one you send determines the kind.

curl — schedule a capture

curl -X POST https://urlcap.com/api/v1/schedules \
  -H "Authorization: Bearer api_…" -H "Content-Type: application/json" \
  -d '{
    "name": "prod health check",
    "cron": "*/15 * * * *",
    "timezone": "Europe/Madrid",
    "maxRuns": 96,
    "capture": { "url": "https://example.com/health", "method": "GET" }
  }'

A one-shot capture instead:

json

{ "runAt": "2026-06-01T09:00:00Z", "capture": { "url": "https://example.com/report" } }

Or schedule an extract job — pass an extract object holding the job model:

json

{
  "name": "daily price scrape",
  "cron": "0 6 * * *",
  "extract": {
    "url": "https://example.com/products",
    "extractors": [ { "name": "prices", "selector": ".price", "type": "list" } ]
  }
}

Body fields

capture / extract — one is required. capture: same fields as the capture endpoint's JSON body (at minimum a url). extract: the extract job model (at minimum a url).
cron — a cron expression (see above). Mutually exclusive with runAt.
runAt — an ISO-8601 timestamp (e.g. 2026-06-01T09:00:00Z) for a single run. Mutually exclusive with cron.
timezone — IANA zone the cron is evaluated in. Default UTC.
name — optional label.
maxRuns — optional; stop after this many executions.
until — optional ISO-8601 timestamp; stop after this time.

Returns 201 with data.schedule = a schedule object. Bad cron / timezone / missing or both task objects → 400.

get post del List, inspect & manage

GET /api/v1/schedules — your schedules (data.schedules: an array of schedule objects).
GET /api/v1/schedules/{id} — one schedule (data.schedule).
POST /api/v1/schedules/{id} with { "action": "pause" | "resume" | "run-now" } — pause stops future runs; resume re-arms it; run-now makes it fire on the next poll (within ~20s).
DELETE /api/v1/schedules/{id} — cancel the schedule. Its run history is kept.

get Run history & results

GET /api/v1/schedules/{id}/runs — the executions, newest first (?limit=N). Each: runNo, scheduledFor, startedAt, finishedAt, status (running/ok/error), httpStatus, error.
GET /api/v1/schedules/{id}/runs/{runNo} — one execution including its full result. For a capture task that's the same JSON the capture endpoint returns ({ version, requestId, data: { request, response } }); for an extract task it's the engine's extract result (the same shape as GET /extract/{taskId}'s result).

curl

curl https://urlcap.com/api/v1/schedules/0f4c…/runs/3 \
  -H "Authorization: Bearer api_…"

Datasets

Named, de-duplicated collections of items of a single type — either IP / CIDR ranges (canonical CIDRs, as in IP & CIDR; a single host becomes a /32 or /128) or URLs (absolute http / https URLs). A dataset is yours; the API only ever shows you your own.

With history: true, every replace items operation first copies the items it drops into the dataset's history with their removal date — useful for tracking a set as it evolves.

Plan caps

free — up to 1 dataset, up to 1,000 items each, history not allowed.
developer — up to 10 datasets, up to 100,000 items, history allowed.
startup — up to 50 datasets, up to 1,000,000 items, history allowed.
business — unlimited datasets & items, history allowed.

The dataset object

iduuid

Stable identifier.

namestring

Unique among your datasets. Auto-assigned dataset-… if omitted on create.

typestring

ip (IP/CIDR) or url.

historyboolean

When true, a replace keeps the dropped items (see History).

sizeinteger

Current item count (present on single-dataset responses).

createdAttimestamp

When the dataset was created.

get post List & create

/api/v1/datasets · /api/v1/datasets/{id}

GET /api/v1/datasets — data.datasets is an array of dataset objects (newest first).
POST /api/v1/datasets with { "type": "ip" | "url", "name"?: "…", "history"?: true, "items"?: [ … ] } — creates a dataset and (optionally) seeds it. 201 with the created object.
GET /api/v1/datasets/{id} — one dataset (with its current size).
DELETE /api/v1/datasets/{id} — deletes the dataset and its items (and history).

POST /api/v1/datasets

curl -X POST https://urlcap.com/api/v1/datasets \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "name": "office-ranges", "type": "ip", "history": true,
        "items": ["10.0.0.0/8", "192.168.1.0/24", "2001:db8::/32"] }'

201 Created

{
  "version": "1",
  "requestId": "…",
  "data": {
    "dataset": {
      "id": "9d811aa8-dbe6-4c48-a811-29f49dd9f49c",
      "name": "office-ranges",
      "type": "ip",
      "itemType": 1,
      "history": true,
      "size": 3,
      "createdAt": "2026-05-13T07:00:00Z",
      "updatedAt": "2026-05-13T07:00:00Z"
    }
  }
}

Bad input — unknown type, duplicate name, plan limit reached, invalid item value, or a name starting with the reserved internal: prefix — returns 400 invalid_request.

get post put del Items: add, replace, remove

/api/v1/datasets/{id}/items

GET — paged list (?limit=N, default 1000, max 5000; ?after=ID for the next page). Returns items and nextAfter.
POST with { "items": [ … ] } — adds items, de-duplicated against what's already there. Returns { added, size }.
PUT with { "items": [ … ] } — replaces the whole set. On a history: true dataset, the dropped items are first written to history. Returns { size }.
DELETE with { "items": [ … ] } — removes the listed items. Returns { removed, size }.

Each item value is canonicalised on the way in: an IP/CIDR is reduced to its canonical form (host bits cleared, single hosts become /32 or /128); a URL must be absolute and http / https. The dataset cannot contain two items with the same canonical value.

POST /api/v1/datasets/{id}/items

curl -X POST https://urlcap.com/api/v1/datasets/9d81…/items \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "items": ["203.0.113.0/24", "198.51.100.7"] }'

200 OK

{
  "version": "1",
  "data": { "datasetId": "9d81…", "added": 2, "size": 5 }
}

get History

/api/v1/datasets/{id}/history

For datasets created with history: true, the items previously dropped by a replace are kept, each stamped with the date it was removed. Newest removal first; ?limit=N (default 1000, max 5000). Always empty for non-history datasets.

200 OK

{
  "version": "1",
  "data": {
    "datasetId": "9d81…",
    "count": 2,
    "history": [
      { "id": 41, "value": "10.0.0.0/8",  "addedAt": "2026-05-10T…", "removedAt": "2026-05-13T07:36:18.07Z" },
      { "id": 40, "value": "192.168.1.0/24", "addedAt": "2026-05-10T…", "removedAt": "2026-05-13T07:36:18.07Z" }
    ]
  }
}

get post Membership check

/api/v1/datasets/{id}/contains

Normalises value to the dataset's type and reports whether that exact item is in the dataset (contains). For IP datasets, if value is a single address (not a CIDR), the response also includes covered — whether that address falls inside some CIDR stored in the dataset (a fast B-tree range lookup over the dataset's 16-byte bounds).

GET /api/v1/datasets/{id}/contains

curl -G https://urlcap.com/api/v1/datasets/9d81…/contains \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "value=10.20.30.40"

200 OK

{
  "version": "1",
  "data": {
    "datasetId": "9d81…",
    "type": "ip",
    "value": "10.20.30.40",
    "normalizedValue": "10.20.30.40/32",
    "contains": false,
    "covered": true
  }
}

get post Is an IP a known bot?

/api/v1/is_bot

Related guide

How to detect Googlebot, Bingbot, GPTBot and other AI crawlers from server logs

Tells you whether an IPv4 or IPv6 address belongs to a well-known web crawler (Googlebot, Bingbot, Yandex, DuckDuckBot & DuckAssistBot, Applebot, GPTBot & ChatGPT-User & OAI-SearchBot, ClaudeBot, PerplexityBot & Perplexity-User, AhrefsBot, Amazonbot, CCBot, …) and, if so, which one. On every match the response includes the bot's search engine, bot group, the matching CIDR, and the bot's categories (SEARCH_INDEXING, AI_TRAINING, AI_SEARCH_OR_ANSWERING, USER_INITIATED_FETCHING, SEO_ANALYTICS, WEB_DATASET_ARCHIVING, COMMERCIAL_PLATFORM, SOCIAL_PREVIEW, …) with per-link confidence.

Backed by an in-memory index: every CIDR each bot publishes is preloaded into a sorted array of 16-byte bounds with side-tables of bot / search-engine / category metadata. Lookups never hit the database (sub-millisecond per IP) and the index is rebuilt daily from the bots' published sources. The index also remembers CIDRs that have since been retired (replaced out by a later refresh) — so you can ask "was this IP a known bot on a specific date?" or "has this IP ever been a known bot?".

ipstring

A single IPv4 or IPv6 address (query string or JSON body). Provide either ip or ips.

ipsarray of strings

Batch of up to 200 addresses. As a query parameter: comma-separated (?ips=a,b,c). As a JSON body: { "ips": [ … ] }.

datestring

Optional. ISO-8601 point-in-time (YYYY-MM-DD is treated as that day's start in UTC; or YYYY-MM-DDTHH:MM:SSZ for an exact moment). Runs an as-of-date lookup — returns the CIDR-bot mappings that were active at this moment, automatically including retired records that had been live then.

historicalboolean

Optional. Default false. When true, the lookup also considers retired CIDR records — every CIDR that has ever been in any bot's published list, regardless of whether it's still current. Useful to answer "has this IP ever been a known bot?". Ignored if date is supplied.

reverseDnsboolean

Optional. Default false. When true, attaches a reverseDns object (names, ttlSeconds, cached) to each per-IP result with the PTR names for the address. Results are cached in-process for the upstream record's TTL (clamped 30s–1h; negative answers 5 minutes). For the FCrDNS forward-confirm check, see /reverse_dns.

Each match always carries addedAt (when the CIDR first appeared in the bot's list), removedAt (null while still current), and active (true iff the CIDR was live at the query's time — now by default, or at date when supplied). With historical=true you'll see active: false matches that report when the CIDR was retired.

Every match's botGroup.honoursRobots reports four booleans — robots, crawlDelay, allow, sitemap — for whether the bot operator publicly commits to that aspect of robots.txt and has no documented violations. null means we haven't researched it; false means at least one credible report of the bot ignoring that rule (e.g. Googlebot's crawlDelay is false since Google explicitly ignores Crawl-delay).

GET /api/v1/is_bot

curl -G https://urlcap.com/api/v1/is_bot \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "ip=66.249.66.1"

200 OK

{
  "version": "1",
  "requestId": "…",
  "data": {
    "ip": "66.249.66.1",
    "family": 4,
    "isBot": true,
    "matchCount": 1,
    "matches": [
      {
        "cidr": "66.249.66.0/27",
        "active": true,
        "addedAt": "2026-05-13T07:39:30Z",
        "removedAt": null,
        "botGroup":     { "id": 4, "description": "Common crawlers" },
        "searchEngine": { "id": 1, "description": "Google" },
        "categories":   [ { "code": "SEARCH_INDEXING", "confidence": "high" } ]
      }
    ],
    "cache": { "entries": 36670, "bots": 20, "builtAt": "2026-05-13T07:39:38Z" }
  }
}

Batch form returns data.results, one entry per IP, with data.count and data.matchCount at the top level. An unparseable address returns isBot: false with family: null and a descriptive error field — the whole call still succeeds.

GET /api/v1/is_bot?ip=…&historical=true

# Was this IP ever a known bot?
curl -G https://urlcap.com/api/v1/is_bot \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "ip=66.249.64.5" \
  --data-urlencode "historical=true"

# As of a specific date — includes retired CIDRs that were live then.
curl -G https://urlcap.com/api/v1/is_bot \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "ip=66.249.64.5" \
  --data-urlencode "date=2025-04-01"

data.asOf and data.historical are echoed back when those parameters are supplied, and cache.entries includes retired CIDRs alongside currently-active ones (so the number is larger than the daily-active count).

get post Reverse DNS (PTR)

/api/v1/reverse_dns

Resolves an IPv4 or IPv6 address to the names returned by its PTR records (in-addr.arpa for v4, ip6.arpa for v6). With forwardConfirm=true, each PTR name is re-resolved to A/AAAA and we report whether the original IP is in the answer — the FCrDNS check Googlebot, Bingbot and friends recommend to verify that a crawler is who its PTR claims it is.

Results are cached in-process for the minimum TTL the upstream resolver returned, clamped to 30s..1h; negative answers (NXDOMAIN / no record / lookup error) are cached for 5 minutes. Every result reports ttlSeconds remaining and cached.

ipstring

A single IPv4 or IPv6 address (query string or JSON body). Provide either ip or ips.

ipsarray of strings

Batch of up to 50 addresses. As a query parameter: comma-separated (?ips=a,b,c). As a JSON body: { "ips": [ … ] }.

forwardConfirmboolean

Optional. Default false. When true, each PTR name is forward-resolved (A + AAAA) and the response reports forwardConfirmed (true iff at least one PTR resolves back to the original IP) plus per-name forwardChecks with addresses, TTL and the matched flag.

GET /api/v1/reverse_dns

curl -G https://urlcap.com/api/v1/reverse_dns \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "ip=66.249.66.1"

200 OK

{
  "version": "1",
  "requestId": "…",
  "data": {
    "ip": "66.249.66.1",
    "family": 4,
    "names": ["crawl-66-249-66-1.googlebot.com"],
    "ttlSeconds": 3600,
    "cached": false
  }
}

Batch form returns data.results with data.count at the top level. An unparseable address returns an entry with a descriptive error field — the whole call still succeeds.

GET /api/v1/reverse_dns?ip=…&forwardConfirm=true

# FCrDNS: is this really Googlebot?
curl -G https://urlcap.com/api/v1/reverse_dns \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "ip=66.249.66.1" \
  --data-urlencode "forwardConfirm=true"

200 OK

{
  "version": "1",
  "requestId": "…",
  "data": {
    "ip": "66.249.66.1",
    "family": 4,
    "names": ["crawl-66-249-66-1.googlebot.com"],
    "ttlSeconds": 3600,
    "cached": true,
    "forwardConfirmed": true,
    "forwardChecks": [
      {
        "name": "crawl-66-249-66-1.googlebot.com",
        "matched": true,
        "ttlSeconds": 300,
        "cached": false,
        "addresses": ["66.249.66.1"]
      }
    ],
    "forwardConfirm": true
  }
}

IPv6 works the same way — the lookup uses ip6.arpa automatically. Single-IP queries can also be made through /is_bot?reverseDns=true if you want the PTR result alongside the bot match.

get Fetch robots.txt

/api/v1/robots

Pulls /robots.txt from a site, parses it per RFC 9309, and returns the user-agent groups, sitemaps and any unknown directives. Fetched bodies are TTL-cached in-process for 1 hour (1 minute on transport errors / 5xx); every response reports cached and ageSeconds.

Per the RFC: a 4xx (except 429) means "no rules apply" — reported as effect: "no_rules_unrestricted"; a 5xx or 429 means "disallow everything" — effect: "restricted_by_error".

sitestring

A hostname (example.com) or any URL — the scheme/path are normalised away.

GET /api/v1/robots

curl -G https://urlcap.com/api/v1/robots \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "site=example.com"

200 OK

{
  "version": "1",
  "requestId": "…",
  "data": {
    "site": "example.com",
    "status": 200,
    "contentSha256": "…",
    "sizeBytes": 412,
    "cached": false,
    "ageSeconds": 0,
    "body": "User-agent: *\nDisallow: /search\n",
    "groups": [
      { "userAgents": ["*"], "rules": [{ "type": "disallow", "pattern": "/search" }] }
    ],
    "sitemaps": [],
    "extensions": {}
  }
}

get URL allow / deny check

/api/v1/robots/check

Decides whether a URL is allowed for a given user-agent. Longest-match wins; on a tie, Allow beats Disallow. If site is omitted, it's derived from url.

sitestring

Optional. Derived from url if omitted.

urlstring

The URL or path to check.

userAgentstring

The bot token to match against (e.g. Googlebot). Case-insensitive substring match.

GET /api/v1/robots/check

curl -G https://urlcap.com/api/v1/robots/check \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "url=https://example.com/private/foo" \
  --data-urlencode "userAgent=Googlebot"

200 OK

{
  "version": "1",
  "requestId": "…",
  "data": {
    "site": "https://example.com/private/foo",
    "url": "https://example.com/private/foo",
    "userAgent": "Googlebot",
    "allowed": false,
    "reason": "disallow '/private/' matched",
    "matchedRule": { "type": "disallow", "pattern": "/private/" },
    "matchedGroupUserAgents": ["Googlebot"],
    "robotsStatus": 200,
    "robotsCached": true,
    "robotsAgeSeconds": 142
  }
}

get post del Watch robots.txt for changes

/api/v1/robots/watch

Registers a per-user watch on a site's /robots.txt. A background job sweeps every 15 minutes; each watch is re-fetched no sooner than its frequencyMinutes (default 60, clamped to 15..1440). Snapshots are stored only when the content hash changes — full body + previous hash kept on each. If webhookUrl is set, every change triggers an HMAC-SHA256-signed POST.

Watches require a per-user API key — the legacy X-API-Key can't create them. Free-trial calls return 401.

sitestring

Hostname. Scheme/path are stripped. Unique per user.

webhookUrlstring (optional)

Where to POST change notifications. http(s)://….

webhookSecretstring (optional)

HMAC-SHA256 signing key. Auto-generated if webhookUrl is set and you don't supply one — returned on the create response and the GET-one endpoint, never on list.

frequencyMinutesinteger (optional)

Minimum interval between fetches for this watch. Default 60, clamped 15..1440.

POST /api/v1/robots/watch

curl -X POST https://urlcap.com/api/v1/robots/watch \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{"site":"example.com","webhookUrl":"https://hooks.example.com/robots","frequencyMinutes":30}'

201 Created

{
  "version": "1",
  "requestId": "…",
  "data": {
    "watch": {
      "id": "1f1b…-…-…",
      "site": "example.com",
      "webhookUrl": "https://hooks.example.com/robots",
      "webhookSecret": "f3a2…64-hex-chars…",   // shown once on create; store it
      "frequencyMinutes": 30,
      "active": true,
      "createdAt": "2026-05-16T13:00:00Z"
    }
  }
}

On every detected change, urlcap POSTs JSON to webhookUrl with an HMAC-SHA256 signature in X-urlcap-Signature: sha256=hex computed over the request body with your watch's webhookSecret. Verify it before trusting the payload.

Webhook delivery (POST to your webhookUrl)

POST /robots HTTP/1.1
Content-Type: application/json
User-Agent: urlcap-robots-webhook/1.0
X-urlcap-Event: robots.changed
X-urlcap-Timestamp: 1747400400
X-urlcap-Signature: sha256=e3b0c44298…

{
  "type": "robots.changed",
  "watchId": "1f1b…",
  "site": "example.com",
  "snapshotId": 42,
  "fetchedAt": "2026-05-16T13:30:00Z",
  "httpStatus": 200,
  "contentSha256": "…",
  "previousSha256": "…",
  "sizeBytes": 412,
  "body": "User-agent: *\nDisallow: /\n"
}

The other operations are:

GET /api/v1/robots/watch — list your watches
GET /api/v1/robots/watch/{id} — fetch one (includes webhookSecret)
DELETE /api/v1/robots/watch/{id} — remove a watch (cascades to its snapshots)
GET /api/v1/robots/watch/{id}/history?limit=&changedOnly=&includeBody= — list snapshots
POST /api/v1/robots/watch/{id}/poll — force a poll right now, bypassing the per-watch frequency throttle

post Verify a Web Bot Auth signature

/api/v1/web_bot_auth/verify

Decides whether an inbound HTTP request's RFC 9421 HTTP Message Signature is valid against the operator's published Ed25519 key directory — the cryptographic identity check the Web Bot Auth draft proposes as a successor to relying on IP ranges and reverse DNS for "is this really Googlebot?".

You hand urlcap the inbound request's method, url and the headers the bot sent. The verifier parses Signature-Input + Signature, fetches the JWKS-style directory at Signature-Agent (TTL-cached 1 h on success, 1 min on errors), rebuilds the canonical signature base from the covered components, and verifies with Ed25519 using the key whose kid matches the keyid parameter.

Failures (expired signature, missing keyid, directory unreachable, signature mismatch, unsupported algorithm, missing tag="web-bot-auth") come back as verified:false with a reason — never as HTTP errors, so you can feed the answer straight into a policy without a try/catch.

methodstring

The bot's request method (e.g. GET).

urlstring

The full URL the bot requested. Used to derive @method, @authority, @target-uri, @path, @query, @scheme in the signature base.

headersobject

Header name → value, case-insensitive. Must include at least Signature, Signature-Input and Signature-Agent; any other header the signature covers (named in Signature-Input's inner-list) must also be present.

labelstring

Optional. When a request carries multiple signatures (e.g. sig1=…, sig2=…), pick a specific label. Default: first.

allowExpiredboolean

Optional. Default false. Set true to skip the expires check (useful for forensic analysis of a stored request).

requireWebBotAuthTagboolean

Optional. Default true. The draft MUSTs that Web Bot Auth signatures carry tag="web-bot-auth"; set this to false only when verifying a non-WBA RFC 9421 signature.

POST /api/v1/web_bot_auth/verify

curl -X POST https://urlcap.com/api/v1/web_bot_auth/verify \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "method": "GET",
    "url":    "https://example.com/foo",
    "headers": {
      "Signature-Agent": "\"https://example.com/.well-known/http-message-signatures-directory\"",
      "Signature-Input": "sig1=(\"@authority\" \"signature-agent\" \"@method\" \"@target-uri\");created=1747500000;expires=1747500060;keyid=\"abc\";alg=\"ed25519\";tag=\"web-bot-auth\"",
      "Signature":       "sig1=:base64-signature-here:"
    }
  }'

200 OK

{
  "version": "1",
  "requestId": "…",
  "data": {
    "verified": true,
    "label": "sig1",
    "keyid": "abc",
    "algorithm": "ed25519",
    "signatureAgent": "https://example.com/.well-known/http-message-signatures-directory",
    "tag": "web-bot-auth",
    "createdAt": "2026-05-16T13:00:00Z",
    "expiresAt": "2026-05-16T13:01:00Z",
    "expired": false,
    "coveredComponents": ["@authority", "signature-agent", "@method", "@target-uri"],
    "directory": {
      "url": "https://example.com/.well-known/http-message-signatures-directory",
      "httpStatus": 200,
      "keyCount": 3,
      "rawKeyCount": 3,
      "cached": false,
      "ageSeconds": 0,
      "kids": ["abc", "def", "ghi"]
    }
  }
}

Algorithm support is Ed25519 only (the draft's MUST). The signature is required to cover @authority and the signature-agent header — both are checked before any directory fetch happens, so an attacker swapping in a friendly directory can't short-circuit the binding to the original request.

Pairs naturally with /is_bot and /reverse_dns?forwardConfirm=true: is_bot says "this IP is in Google's CIDR list," reverse_dns says "the rDNS points back to it," and web_bot_auth says "the bot proved it with a signature only Google could have made." All three together is the gold-standard identity check.

Identifying as urlcap: signed capture & extract

The other direction: when you make a capture or extract request, set webBotAuth: true and urlcap signs every outbound HTTP request it makes on your behalf with our own Ed25519 key. Sites that allow known crawlers but block unknown bots can then choose to allow urlcap-attributed traffic — and verify that what claims to be urlcap really is.

Our public keys are served at https://urlcap.com/.well-known/http-message-signatures-directory as a JWKS-style JSON document. Each signed request carries a Signature-Agent header pointing at that URL plus standard Signature-Input / Signature headers per RFC 9421.

Capture with a signed outbound

curl -X POST https://urlcap.com/api/v1/capture \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://example.com/", "webBotAuth": true }'

The response's data.request.headers shows the three signature headers that went on the wire, so you can verify the integration end-to-end by feeding them back into /api/v1/web_bot_auth/verify. Signature lifetime is 60 seconds.

get post List & create sites

/api/v1/sites

A site is the multi-tenancy unit for ingest. Every event you ship via /events or /outcomes is scoped to one site, and every {site, ingest token, hostname} triple has to line up or the row is rejected. Customers usually create one site for their whole edge and add every public hostname to it; large multi-product orgs sometimes split per product.

Auth: your urlcap API key (the same one used for capture / TOTP / is_bot). Distinct from ingest tokens, which are per-site and used only for the ingest channel.

List your sites

curl -s https://urlcap.com/api/v1/sites \
  -H "Authorization: Bearer $URLCAP_KEY"

Create a site

curl -s https://urlcap.com/api/v1/sites \
  -X POST \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "datocapital"}'

201 Created

{
  "version": "1",
  "data": {
    "id": 47,
    "publicId": "Xy9KqZ7mNvB2",
    "name": "my-site"
  }
}

Save the publicId — that's what you pass as {site_id} in every downstream URL. The numeric id still works for backward compatibility, but new integrations should use publicId because it's randomly generated and doesn't disclose how many sites exist on urlcap.

get post Hostnames per site

/api/v1/sites/{site_id}/domains

Each event ingested under a site must have a host field that's already attached to that site. Submit them once up front; /events will reject any rows whose host isn't registered with "host '…' not registered for site_id=…".

Hostnames are UNIQUE across the whole urlcap database — a hostname belongs to exactly one site. Attempting to add one that's already attached elsewhere returns 409.

hostnamestring · required

FQDN. Lowercased and stored as-is. Wildcards are not supported — attach each hostname individually.

kindstring · default doc

What this hostname serves. One of doc (HTML pages — the normal case), cdn (assets / images / CSS / JS on a separate domain), api (JSON endpoints), or other. Used by the discovery scan to de-prioritise candidate JA4s that only ever touch CDN-kind hostnames — those are usually image crawlers (Pinterest, archive.org) you don't want to block. Invalid values silently fall back to doc.

List hostnames

curl -s https://urlcap.com/api/v1/sites/Xy9KqZ7mNvB2/domains \
  -H "Authorization: Bearer $URLCAP_KEY"

Add a doc hostname

curl -s https://urlcap.com/api/v1/sites/Xy9KqZ7mNvB2/domains \
  -X POST \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{"hostname": "api.datocapital.com"}'

Add a CDN hostname

curl -s https://urlcap.com/api/v1/sites/Xy9KqZ7mNvB2/domains \
  -X POST \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{"hostname": "cdn.datocapital.com", "kind": "cdn"}'

get post Ingest tokens per site

/api/v1/sites/{site_id}/ingest_keys

Per-site bearer tokens (ingest_<32hex>) that authenticate /events and /outcomes calls. Distinct from the urlcap API key you use everywhere else — ingest tokens are scoped only to the ingest channel, and you can revoke / rotate them independently.

Storage: only the SHA-256 of the token sits in the database. The cleartext is returned exactly once, on creation. Capture it then; if you lose it, mint a new one and revoke the old.

List existing ingest tokens (prefixes + status only)

curl -s https://urlcap.com/api/v1/sites/3/ingest_keys \
  -H "Authorization: Bearer $URLCAP_KEY"

Mint a new token

curl -s https://urlcap.com/api/v1/sites/3/ingest_keys \
  -X POST \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{"label": "datocapital-prod-2026-05"}'

201 Created — token shown once, never returned again

{
  "version": "1",
  "data": {
    "id": 3,
    "siteId": 3,
    "label": "datocapital-prod-2026-05",
    "token": "ingest_c7e57f1d9d43866bba19bf95d65c9457",
    "warning": "Save this token now; it is not stored in plaintext and cannot be retrieved later."
  }
}

To rotate: mint a new one with the same label suffix, update your edge config, then revoke the old one (direct DB update for now — admin UI coming). Multiple active tokens per site are fine; we recommend one per environment (prod, staging, ci).

Ingest channel — events & outcomes

The ingest channel is how a site streams its own traffic into urlcap and gets back per-fingerprint intelligence — JA4 / IP profiles, bot likelihood, and (with the outcomes endpoint below) high-confidence "real human" signals like JS challenge pass rate, registered-user observation, and completed purchases. Two complementary endpoints:

POST /api/v1/ingest/{site_id}/events — one NDJSON line per request. The patched nginx ships these natively.
POST /api/v1/ingest/{site_id}/outcomes — asynchronous verdicts tied to the request_id of an earlier event: challenge passed, user authenticated, order completed.

Auth: site ingest tokens

Both ingest endpoints authenticate with a per-site bearer token (separate from your urlcap API key). Mint one with your urlcap API key:

POST /api/v1/sites/{site_id}/ingest-keys

curl -X POST https://urlcap.com/api/v1/sites/42/ingest-keys \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{"label": "my-prod-site-2026-05"}'

The response includes token: "ingest_…" — shown once at creation, stored only as a SHA-256 hash thereafter. Store it in your edge config.

The `request_id` binding

Every event you send carries a request_id (the patched nginx emits $request_id). Outcomes you post later refer back to that id, and urlcap resolves it to the JA4/IP fingerprint server-side at ingest time — so the binding persists even after the raw event ages out of the 7-day window.

post Send request events

/api/v1/ingest/{site_id}/events

NDJSON batch of request observations. One row per request. Used as the raw input for every downstream signal — JA4 reqs / unique IPs / asset ratio / classifier / intelligence. Caps: up to 1,000 events / 4 MB per batch. Per-row errors never fail the batch.

The patched nginx ships these natively and the NginxBotLogTail background component picks them up from /var/log/nginx/bot-access.ndjson when sites.local_log_path is set. The HTTP endpoint is the alternative for clients that prefer to push.

tsISO-8601 string or epoch ms

Event time. Defaults to receive time if absent.

request_idstring

Opaque per-request id (nginx $request_id). The binding key for later outcomes.

ja4string

JA4 fingerprint of the TLS hello (e.g. t13d1516h2_8daaf6152771_b0da82dd1658). Empty = no TLS handshake; the row is skipped.

ja4_hashUInt64

Numeric form of the JA4 fingerprint. The patched nginx ships this. If omitted, urlcap derives it from the JA4 string.

ip, host, pathstring

Mandatory. host must be in the site's registered hostnames or the row is rejected.

method, status, http_version, user_agentstring / int

Optional; absent fields default to empty/0.

ja3_hash, asn, country, accept_language, referer_host, has_cookie, has_referervarious

Optional enrichment hints. asn / country are looked up from MaxMind if absent.

POST /api/v1/ingest/42/events

curl -X POST https://urlcap.com/api/v1/ingest/42/events \
  -H "Authorization: Bearer $INGEST_TOKEN" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary '{"ts":"2026-05-21T08:11:30Z","request_id":"abc123…","ja4":"t13d1516h2_8daaf6152771_b0da82dd1658","ja4_hash":12345,"ip":"1.2.3.4","host":"shop.example.com","path":"/products","method":"GET","status":200,"user_agent":"Mozilla/5.0…"}
{"ts":"2026-05-21T08:11:31Z","request_id":"def456…","ja4":"t13d1516h2_8daaf6152771_b0da82dd1658","ja4_hash":12345,"ip":"1.2.3.5","host":"shop.example.com","path":"/products/sku-7","method":"GET","status":200}'

202 Accepted

{
  "version": "1",
  "requestId": "…",
  "data": { "siteId": 42, "accepted": 2, "rejected": 0, "errors": [] }
}

post Send outcomes (challenges, auth, purchases)

/api/v1/ingest/{site_id}/outcomes

NDJSON batch of asynchronous verdicts — the strongest "is this a real human?" signals in the system. Each outcome refers back to the request_id of an event you already sent; urlcap resolves it to a JA4 at ingest time so the binding survives even after the raw event ages out.

Three canonical kinds today, each driving one cluster of fields in ja4_intelligence_latest:

kind	verdict values	meta keys (canonical)	drives
`js_challenge`	`passed` \| `failed` \| `abandoned`	`challenge`, `reason`	`js_challenge_pass_rate`
`auth`	`authenticated` \| `signup` \| `anonymous`	`user_id_hash` (SHA-256 of your user id)	`auth_observations`, `distinct_users`
`purchase`	`completed` \| `refunded` \| `disputed`	`order_id`, `amount_cents`, `currency`	`purchases`, `total_purchase_cents`, `last_purchase_at`
`edge_action`	`blocked` \| `allowed` \| `challenged` \| `rate_limited`	`rule`, `rule_id` (your blocklist label)	`cross_customer_action.sites_blocking` on bot-ja4s

request_idstring · required

The same value sent on the original event. Binds this verdict to a specific JA4/IP fingerprint.

kindstring · required

js_challenge, auth, purchase, edge_action, or any other label you want to track.

verdictstring

Outcome status; see canonical values above.

tsISO-8601 string or epoch ms

Optional. Defaults to receive time.

scorefloat 0..1

Optional. For continuous-score signals like reCAPTCHA v3 or fraud scores.

metaobject

Kind-specific extras. Stored as raw JSON; downstream queries pull individual keys with JSONExtractString(meta,'...'). See the kinds table above for canonical keys.

ja4_hashUInt64

Optional. If you already know it, ship it inline and we skip the server-side lookup against request_events.

JS challenge example

Send when your edge issues a JS challenge (Turnstile, hCaptcha, your own proof-of-work) and gets a verdict back.

POST /api/v1/ingest/42/outcomes

curl -X POST https://urlcap.com/api/v1/ingest/42/outcomes \
  -H "Authorization: Bearer $INGEST_TOKEN" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary '{"request_id":"abc123","kind":"js_challenge","verdict":"passed","score":0.92,"meta":{"challenge":"turnstile_v0"}}
{"request_id":"def456","kind":"js_challenge","verdict":"failed","meta":{"reason":"no-js-execution"}}'

Auth example — is this a registered user?

Send when a request you previously logged was authenticated — login session active, signup completed, password reset confirmed. The user_id_hash should be a hash of your internal user id, not the raw value — we only need to count distinct users, never identify them.

POST /api/v1/ingest/42/outcomes

curl -X POST https://urlcap.com/api/v1/ingest/42/outcomes \
  -H "Authorization: Bearer $INGEST_TOKEN" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary '{"request_id":"abc123","kind":"auth","verdict":"authenticated","meta":{"user_id_hash":"u_sha256:7a3f…"}}
{"request_id":"def456","kind":"auth","verdict":"signup","meta":{"user_id_hash":"u_sha256:e9c1…"}}'

Purchase example — has this fingerprint converted?

The strongest "real human, valuable visitor" signal. Send from your order-confirmation webhook. Use the request_id of the request that closed the order (the checkout-complete POST, not the first product view).

POST /api/v1/ingest/42/outcomes

curl -X POST https://urlcap.com/api/v1/ingest/42/outcomes \
  -H "Authorization: Bearer $INGEST_TOKEN" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary '{"request_id":"abc123","kind":"purchase","verdict":"completed","meta":{"order_id":"o_9876","amount_cents":4999,"currency":"USD"}}
{"request_id":"def456","kind":"purchase","verdict":"refunded","meta":{"order_id":"o_9876"}}'

202 Accepted

{
  "version": "1",
  "requestId": "…",
  "data": { "siteId": 42, "accepted": 2, "rejected": 0, "errors": [] }
}

A single request_id can carry multiple outcomes — one page-view that triggered a challenge, then authenticated, then converted is three rows with the same id. They aggregate independently into the matching counters.

get Read JA4 intelligence

/api/v1/ja4/intelligence

Returns the rolled-up profile for one JA4 fingerprint over a window (7 / 30 / 90 days). Authenticated with your urlcap API key — not the site ingest token.

site_idUInt64 · required

Your site.

ja4_hashUInt64 · required

The fingerprint to look up.

window_daysint · required

7, 30, or 90 (must match a configured window in intelligence.compute.windows_days).

GET /api/v1/ja4/intelligence

curl -G https://urlcap.com/api/v1/ja4/intelligence \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "site_id=42" \
  --data-urlencode "ja4_hash=17888951274072987679" \
  --data-urlencode "window_days=7"

200 OK

{
  "version": "1",
  "requestId": "…",
  "data": {
    "siteId": 42, "ja4Hash": "17888951274072987679", "windowDays": 7,
    "ja4": "t13d3613h2_018971650b2c_23cd79a6e20d",
    "reqs": 10, "unique_ips": 2, "unique_uas": 1, "unique_paths": 6,

    "likely_os_name": "OS X",         "likely_os_confidence": 1.0,
    "likely_agent_name": "Chrome",    "likely_agent_confidence": 1.0,
    "likely_device_class": "Desktop", "likely_device_confidence": 1.0,
    "ja4_ua_consistency": 1.0,

    "ua_diversity_score": 0.1,
    "ip_diversity_score": 0.2,
    "suspicious_score":   0.0,

    "js_challenge_attempts": 1, "js_challenge_passes": 1, "js_challenge_pass_rate": 1.0,
    "auth_observations": 1,     "distinct_users": 1,
    "purchases": 1, "total_purchase_cents": 4999, "last_purchase_at": "2026-05-21T07:12:02Z"
  }
}

Field meanings: ja4_ua_consistency 1.0 = this JA4 always claims the same (agent, os) tuple; lower = the JA4 is observed claiming mismatched UAs (= spoofed UA on the same TLS library). js_challenge_pass_rate NULL when never challenged, 0.0 = challenged but never passes (strongest pure-bot signal). distinct_users ≥ 1 = at least one registered user observed on this fingerprint. purchases > 0 = the highest-confidence "real human, valuable visitor" signal.

get Trailing-hour JA4 signals

/api/v1/ja4/signals

Cloudflare-style "what does this JA4 look like right now?" snapshot. Returns the latest 1-hour rollup with 10 ratios + 4 ranks + 2 quantiles. Recomputed every minute by an internal job; 404 if the fingerprint hasn't been seen on this site within the trailing hour. Authenticated with your urlcap API key.

site_idUInt64 · required

Your site.

ja4_hashUInt64 · required

JA4 hash as an unsigned decimal string.

GET /api/v1/ja4/signals

curl -G https://urlcap.com/api/v1/ja4/signals \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "site_id=3" \
  --data-urlencode "ja4_hash=13366807129412944815"

200 OK

{
  "version": "1",
  "requestId": "…",
  "data": {
    "siteId": 3, "ja4Hash": "13366807129412944815",
    "calculatedAt": "2026-05-25T09:14:00Z",
    "ja4": "t13d1516h2_8daaf6152771_b1ff8ab2d16f",
    "reqs_1h": 14528,
    "h2h3_ratio_1h": 0.97, "browser_ratio_1h": 0.99,
    "cache_ratio_1h": 0.42, "heuristic_ratio_1h": 0.0,
    "unique_ips_1h": 3211, "unique_uas_1h": 14, "unique_paths_1h": 882,
    "reqs_rank_1h": 4, "reqs_quantile_1h": 0.97,
    "ips_rank_1h":  5, "ips_quantile_1h":  0.96,
    "uas_rank_1h": 18, "paths_rank_1h": 6
  }
}

*_ratio_1h are 0..1 shares of the trailing-hour request volume. *_rank_1h is per-site rank (1 = highest) and *_quantile_1h the corresponding quantile — a rank-1 JA4 will sit near 1.0. Use this for hot-path decisions; for stable long-window classification use /ja4/intelligence.

get JA4 25-metric snapshot

/api/v1/ja4/metrics

The "prioritised 25" metric snapshot for one JA4 — computed on-demand from the 5-minute / 1-hour / 1-day aggregates (no precompute job). Three blocks in one response: a trailing-hour rollup for the JA4, an optional IP+JA4 sub-block when ip= is supplied, and a top-10 JA4×UA breakdown over 24h with each UA's share of the JA4's volume.

site_idUInt64 · required

Your site.

ja4_hashUInt64 · required

JA4 hash.

ipstring · optional

Narrow the snapshot to one IP. Adds the ip_ja4_1h block and an is_new_24h flag.

GET /api/v1/ja4/metrics

curl -G https://urlcap.com/api/v1/ja4/metrics \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "site_id=3" \
  --data-urlencode "ja4_hash=13366807129412944815" \
  --data-urlencode "ip=203.0.113.5"

200 OK

{
  "version": "1",
  "requestId": "…",
  "data": {
    "siteId": 3, "ja4Hash": "13366807129412944815",
    "ja4_1h": {
      "req_count": 14528, "unique_ips": 3211, "unique_uas": 14, "unique_paths": 882, "unique_hosts": 1,
      "h2h3_ratio": 0.97, "browser_ua_ratio": 0.99,
      "error_ratio": 0.018, "s403_ratio": 0.0, "s404_ratio": 0.012, "s429_ratio": 0.0
    },
    "ip_ja4_1h": {
      "ip": "203.0.113.5", "req_count": 41, "unique_uas": 1, "unique_paths": 18, "unique_hosts": 1,
      "browser_ua_ratio": 1.0, "library_ua_ratio": 0.0, "h2h3_ratio": 1.0,
      "error_ratio": 0.0, "s404_ratio": 0.0,
      "first_seen": "2026-05-25T08:42:00Z", "last_seen": "2026-05-25T09:14:21Z",
      "is_new_24h": true
    },
    "ja4_ua_24h_top": [
      { "ua_hash_128": "8d9c…", "req_count": 218341, "unique_ips": 5621, "unique_asns": 412,
        "error_ratio": 0.02, "share_of_ja4": 0.71 },
      { "ua_hash_128": "ab12…", "req_count":  41203, "unique_ips":  331, "unique_asns":  18,
        "error_ratio": 0.01, "share_of_ja4": 0.13 }
    ]
  }
}

library_ua_ratio on the IP block counts UAs Yauaa classifies as Special or Robot — high values are a non-browser client tell. share_of_ja4 in the UA breakdown sums to 1.0 across the top-N; a single UA > 0.95 means "one client owns this JA4."

get JA4 profile breakdown

/api/v1/ja4/profile

Top-N values of one profile dimension for a JA4 over the last N days. Useful for "show me every agent_name ever observed on this JA4" or "which countries does this fingerprint actually come from." Each row returns the request count plus HLL-merged distinct ips/uas/paths.

site_idUInt64 · required

Your site.

ja4_hashUInt64 · required

JA4 hash.

dimenum · required

One of os_name, os_class, agent_name, agent_class, device_class, device_brand, country, asn, http_version.

daysint 1..730 · default 90

Sliding window.

limitint 1..500 · default 20

Top-N cap.

GET /api/v1/ja4/profile

curl -G https://urlcap.com/api/v1/ja4/profile \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "site_id=3" \
  --data-urlencode "ja4_hash=13366807129412944815" \
  --data-urlencode "dim=country" \
  --data-urlencode "days=30" \
  --data-urlencode "limit=10"

200 OK

{
  "version": "1",
  "requestId": "…",
  "data": {
    "siteId": 3, "ja4Hash": "13366807129412944815",
    "dim": "country", "days": 30,
    "values": [
      { "value": "US", "reqs": 482931, "unique_ips":  9214, "unique_uas":  211, "unique_paths": 5410 },
      { "value": "DE", "reqs": 121034, "unique_ips":  1632, "unique_uas":   88, "unique_paths": 2231 },
      { "value": "JP", "reqs":  88412, "unique_ips":   941, "unique_uas":   42, "unique_paths": 1844 }
    ]
  }
}

A JA4 returning many distinct agent_name values with even shares is a strong UA-spoofing tell — pair with ja4_ua_consistency. Same trick for country or asn to spot scrapers behind residential-proxy networks.

get Per-IP rollup

/api/v1/ip/profile

Per-IP behavioural summary on a single site over the last N days. Returns request count plus distinct JA4s / UAs / paths / hosts — an IP serving many of each is a proxy / NAT tell. For cross-site investigation including geo, PTR, bot-CIDR membership and every bot attribution, use the richer /api/v1/ip/intelligence.

site_idUInt64 · required

Your site.

ipstring · required

IPv4 or IPv6 address.

daysint 1..730 · default 30

Sliding window.

GET /api/v1/ip/profile

curl -G https://urlcap.com/api/v1/ip/profile \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "site_id=3" \
  --data-urlencode "ip=203.0.113.5" \
  --data-urlencode "days=30"

200 OK

{
  "version": "1",
  "requestId": "…",
  "data": {
    "siteId": 3, "ip": "203.0.113.5", "days": 30,
    "reqs": 18421,
    "unique_ja4s": 9, "unique_uas": 31, "unique_paths": 482, "unique_hosts": 3,
    "heuristic_reqs": 411, "challenge_reqs": 0,
    "first_seen": "2026-04-28T14:21:08Z", "last_seen": "2026-05-25T09:14:21Z"
  }
}

Heuristics: unique_ja4s >= 3 with unique_uas >= 10 is proxy-shaped; unique_ja4s = 1 with high reqs and low unique_paths is a single headless client. 404 if the IP hasn't sent any traffic to this site in the window.

get List blockable JA4s per site

/api/v1/sites/{site_id}/bot-ja4s

Returns every JA4 the discovery system has flagged on this site in the last window_days days, labelled by classification. The customer pulls this and feeds the JA4 strings into their edge blocklist (nginx $ja4 map, Cloudflare WAF rule, etc.). Browser-shaped fingerprints are excluded because they're not blockable.

window_daysint 1..90 · default 7

Sliding window for the per-site JA4 enumeration. Bounded by request_events' 7-day TTL on the upper end.

includecsv · default known,candidate

Filter to one classification. known = attributed to a bot_group (Bingbot, GPTBot, …); candidate = pending admin review.

limitint 1..1000 · default 200

Cap on returned rows.

GET /api/v1/sites/3/bot-ja4s

curl -s "https://urlcap.com/api/v1/sites/3/bot-ja4s?window_days=7&limit=200" \
  -H "Authorization: Bearer $URLCAP_KEY"

200 OK

{
  "version": "1",
  "data": {
    "siteId": 3,
    "windowDays": 7,
    "items": [
      {
        "ja4": "t13d181300_e8a523a41297_69f017ebb96f",
        "ja4_hash": "13366807129412944815",
        "classification": "known_bot",
        "bot_group": "Googlebot",
        "bot_group_id": 4,
        "reqs": 2885, "ips": 162, "active_days": 1,
        "asset_ratio": 0.0,
        "first_seen": "2026-05-21T19:35:00Z",
        "last_seen":  "2026-05-21T22:01:38Z"
      },
      {
        "ja4": "t13d311100_e8f1e7e78f70_b6426fc6f187",
        "ja4_hash": "2034759142565420012",
        "classification": "candidate",
        "score": 0.76,
        "candidate_id": 1222,
        "reqs": 11317, "ips": 4953, "active_days": 1,
        "asset_ratio": 0.0,
        "first_seen": "2026-05-21T20:33:22Z",
        "last_seen":  "2026-05-21T22:01:38Z",

        "cross_customer_action": {
          "sites_blocking":    12,
          "sites_allowing":     0,
          "sites_challenging":  3
        },
        "block_likely_on_this_site": true,
        "block_likely_ratio": 0.997,

        "reqs_per_minute_peak":  21334,
        "reqs_per_minute_mean":   293,
        "active_minutes":        2289,
        "burstiness":            4.28
      }
    ]
  }
}

Rate & burstiness fields

Surface per-minute traffic-shape signal from urlcap's internal ja4_agg_1m rollup. Useful for catching high-volume bursty JA4s that hide under a moderate asset_ratio but spike to thousands of req/min during their active windows — the no-man's-land case where score >= 0.70 and asset_ratio falls between 0.05 and 0.50, so neither Tier 1 nor Tier 2 fires.

reqs_per_minute_peak — max requests in any 1-minute bucket for this JA4 over the window.
reqs_per_minute_mean — average across active minutes only (silent buckets between bursts are excluded so a bursty bot's mean reads its real when-active rate, not a diluted overall average).
active_minutes — count of 1-min buckets with reqs > 0.
burstiness — coefficient of variation (stddev / mean of per-minute counts). 0 = perfectly uniform; 1 = Poisson-like; > 2 = attack-shape; > 5 = textbook scheduled-burst pattern.

Practical use: add a Tier 1 boost condition like burstiness > 2.0 OR reqs_per_minute_peak > 1000 to catch attack-shape JA4s your other rules would miss.

The two policy axes

Each row carries enough information to drive a per-operator decision:

bot_group is the operator name — your policy lever. Most customers keep Googlebot + Bingbot for search referral traffic, block GPTBot / ClaudeBot / CCBot for AI-training without traffic return.
score is the confidence on unattributed candidates — start with score >= 0.7 and tighten as you watch analytics for collateral damage.
cross_customer_action.sites_blocking >= 3 is a strong "other sites block this too" vote, regardless of score.
block_likely_on_this_site is what we can already tell from your own status codes — useful as a sanity check, not a recommendation.

Closing the loop: report your edge actions

When your edge takes a decision on a JA4, post an edge_action outcome. That populates the cross_customer_action field on every other site's bot-ja4s response — your blocks become a signal for everyone else, and theirs become a signal for you.

POST /api/v1/ingest/Xy9KqZ7mNvB2/outcomes

curl -X POST https://urlcap.com/api/v1/ingest/Xy9KqZ7mNvB2/outcomes \
  -H "Authorization: Bearer $INGEST_TOKEN" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary '{"request_id":"abc123","kind":"edge_action","verdict":"blocked","meta":{"rule":"urlcap-blocklist","rule_id":"v1"}}'

Send one outcome per edge decision. Auth is the site's ingest token (the same one used for /events), not your urlcap API key. request_id lets us auto-resolve the JA4 server-side from the original event; you don't have to ship the fingerprint inline.

post URL monitors — up/down checks with alerts

/api/v1/monitors

Schedule a recurring /capture or /extract run and urlcap will alert you when the target changes state (up → down or down → up). Same primitives as UptimeRobot — plus full headless-browser checks, JSON-API validation, and the User-Agent personas from /user_agent_profiles.

Plan limits

Free: 1 monitor, minimum 300 s.
Developer: 25 monitors, minimum 60 s.
Startup: 100 monitors, minimum 30 s.
Business: unlimited, minimum 30 s.

Create a monitor

The spec object is shipped verbatim to the chosen engine, so anything you can do via /capture or /extract works here too — custom headers, POST bodies, JSON-content extractors, navigation actions, Web Bot Auth signing.

POST /api/v1/monitors

curl -X POST https://urlcap.com/api/v1/monitors \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "prod API healthcheck",
    "kind": "capture",
    "spec": { "url": "https://api.example.com/healthz" },
    "intervalSeconds": 60,
    "expectedStatus": 200,
    "userAgentProfile": "chrome-latest-mac",
    "alertWebhookUrl": "https://example.com/webhooks/urlcap-monitor",
    "alertEmail": "oncall@example.com",
    "alertFailureThreshold": 2
  }'

201 Created

{
  "version": "1",
  "requestId": "…",
  "data": {
    "publicId": "Xy9KqZ7mNvB2",
    "name": "prod API healthcheck",
    "kind": "capture",
    "spec": { "url": "https://api.example.com/healthz" },
    "intervalSeconds": 60,
    "expectedStatus": 200,
    "userAgentProfile": "chrome-latest-mac",
    "alertWebhookUrl": "https://example.com/webhooks/urlcap-monitor",
    "alertWebhookSecret": "Tw3qHk…48charsBase62…M7",
    "alertWebhookSecretWarning": "Save this secret now — it signs outbound webhook payloads (X-urlcap-Signature: sha256=…) and is not retrievable later.",
    "alertEmail": "oncall@example.com",
    "alertFailureThreshold": 2,
    "paused": false,
    "currentState": "unknown",
    "createdAt": "2026-05-26T12:00:00Z"
  }
}

alertWebhookSecret is returned only on this create call. Subsequent reads omit it — you'll see alertWebhookSecretSet: true instead. The secret signs every outbound webhook with HMAC-SHA256 in X-urlcap-Signature; your verifier compares its own HMAC of the raw body to the header.

Pass rule (v1)

Status-code only. If expectedStatus is set, the check passes only when the response status matches exactly. If it's absent, any 2xx is a pass. Richer assertions (body-contains, JSONPath predicates) are on the roadmap.

Alerts

Alerts fire only on state transitions, not on every failing check. The state machine debounces flapping targets via alertFailureThreshold consecutive failures required before flipping to down. Both alertWebhookUrl and alertEmail are optional; set neither and the monitor still records check history but won't notify.

Webhook payload

{
  "event": "monitor.state_changed",
  "monitorPublicId": "Xy9KqZ7mNvB2",
  "monitorName": "prod API healthcheck",
  "newState": "down",
  "changedAt": "2026-05-26T12:34:56Z",
  "latestCheck": { "httpStatus": 503, "latencyMs": 421, "passed": false, "error": null }
}
// Headers: X-urlcap-Event, X-urlcap-Monitor, X-urlcap-Timestamp, X-urlcap-Signature: sha256=<HMAC>

Inspect a monitor

GET /api/v1/monitors/{publicId}

curl https://urlcap.com/api/v1/monitors/Xy9KqZ7mNvB2 -H "Authorization: Bearer $URLCAP_KEY"
curl https://urlcap.com/api/v1/monitors/Xy9KqZ7mNvB2/checks?limit=20 -H "Authorization: Bearer $URLCAP_KEY"
curl https://urlcap.com/api/v1/monitors/Xy9KqZ7mNvB2/uptime?days=30 -H "Authorization: Bearer $URLCAP_KEY"

Phase-level timings

Capture monitors record a phase breakdown on every check: dnsMs, connectMs (TCP + TLS), ttfbMs (time-to-first-byte), bodyMs (body download), plus resolvedIp (which A/AAAA the socket actually used). Phase fields are absent when their hook didn't fire — pooled keep-alive reuse skips DNS / connect, and followRedirects=true only captures the first leg. Same fields show up in data.response.timings on bare /capture too. Extract monitors don't have these (the HtmlUnit engine doesn't surface phase timing).

Lifecycle

PATCH /api/v1/monitors/{publicId} — whole-spec replace (partial updates not supported in v1).
POST /api/v1/monitors/{publicId}/pause — scheduler skips it; state is preserved.
POST /api/v1/monitors/{publicId}/resume — reverse the above.
DELETE /api/v1/monitors/{publicId} — hard-delete. Check history is removed by the daily sweeper.

Check history (monitor_checks rows) is kept for 30 days. A daily internal job at 03:05 UTC sweeps anything older.

get Candidate IPs per site

/api/v1/sites/{id}/bot-ip-candidates

Live feed of IPs exhibiting abusive behaviour on your site under five composite signals. The IP equivalent of the JA4 candidate queue — small list by construction (score floor), recomputed on every call. Use it to populate an iptables / ipset / Cloudflare IP-list at the edge for IP-based blocking.

Scoring (weights sum to 1.0)

40% block_ratio — share of this IP's requests on the site that returned 4xx or 444.
20% volume — log10(1+reqs)/4, saturates at ~10,000 requests.
15% path breadth — distinct_paths/100, saturates at 100 paths.
15% JA4 churn — distinct_ja4s/3, saturates at 3 JA4s (anti-fingerprinting tell).
10% vuln-probe hits — distinct vuln-probe paths hit, saturates at 3.

Hard exclusions (no scoring, row dropped)

IP is on any user's trust list — signal modulation is global.
IP is already attributed in bot_observed_ips — no point re-discovering known bots. Use /bot-traffic for those.
reqs < 20 — insufficient evidence on this site within the window.

Request

window_daysint 1..30 · default 7

Sliding window for per-IP aggregation.

min_scorefloat 0..1 · default 0.50

Score floor. Default keeps the list small by construction; raise to ~0.70 for only the most obvious abusers, lower to ~0.30 for a wider net.

limitint 1..1000 · default 200

Hard cap. Default + score floor together = "small list."

formatenum · default json

json returns the operator-rich shape below. txt returns one IP per line. cidr returns each IP as /32 (or /128). Both text formats use text/plain — easy to feed straight into ipset.

GET /api/v1/sites/{id}/bot-ip-candidates

curl -G https://urlcap.com/api/v1/sites/2DrxGfsYW0jv/bot-ip-candidates \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "window_days=7" \
  --data-urlencode "min_score=0.50"

200 OK

{
  "version": "1",
  "requestId": "…",
  "data": {
    "siteId": 3, "windowDays": 7, "minScore": 0.5, "excludedClassified": 4954,
    "candidates": [
      {
        "ip": "203.0.113.5", "asn": 9009, "country": "VN", "score": 0.78,
        "components": {
          "block_ratio": 0.92, "reqs": 1850,
          "distinct_paths": 1452, "distinct_ja4s": 4, "vuln_probe_hits": 12
        },
        "first_seen": "2026-05-22 03:48:52.000",
        "last_seen":  "2026-05-22 04:03:07.000"
      }
    ]
  }
}

For one-shot ipset feeding: curl -s … &format=txt | ipset restore -exist. For Cloudflare IP-list import: … &format=cidr | cf-cli ip-list update ....

get URL traffic + blocks summary

/api/v1/sites/{id}/url-stats?host=&path=&days=N

Quick "is this URL healthy?" report for a specific page on your site. One call returns totals, status-code breakdown, per-day chart, and up to 25 most recent non-2xx requests with their IP / ASN / JA4. Use to answer "any rejections on URL X today?".

GET /api/v1/sites/{id}/url-stats

curl -G https://urlcap.com/api/v1/sites/2DrxGfsYW0jv/url-stats \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "host=en.example.com" \
  --data-urlencode "path=/terms-of-service.html" \
  --data-urlencode "days=7"

200 OK (truncated)

{
  "version": "1",
  "data": {
    "siteId": 3, "host": "en.example.com", "path": "/terms-of-service.html",
    "windowDays": 7,
    "total": 351, "ok_2xx": 341, "redirect_3xx": 2,
    "rejected_4xx": 8, "error_5xx": 0,
    "unique_ips": 122, "unique_ja4": 55,
    "by_status": [
      { "status": 200, "count": 341, "unique_ips": 122 },
      { "status": 405, "count": 4,   "unique_ips": 1 },
      { "status": 444, "count": 4,   "unique_ips": 1 }
    ],
    "by_day": [ /* one row per UTC day */ ],
    "rejected_sample": [
      { "ts": "2026-05-26 04:02:44.000", "status": 444, "ip": "::ffff:45.33.69.206",
        "asn": 63949, "method": "GET", "ja4": "" }
    ]
  }
}

get Per-bot accessibility check

/api/v1/sites/{id}/bot-traffic?bot_group=&days=N

Did urlcap-discovered bot traffic land successfully on your site? Given a bot_group name (substring-match on bot_groups.description), this returns its recent visits, status breakdown, per-day chart, and a sample of recent requests with non-2xx surfaced first. Use to answer "are we blocking Google?" in one call.

Bot identification matches against every JA4 the discovery system has attributed to the bot_group via bot_observed_ja4s — catches both CIDR- and UA-matched traffic without enumerating IP lists. Optional host and path parameters narrow the lookup to a single page.

GET /api/v1/sites/{id}/bot-traffic

curl -G https://urlcap.com/api/v1/sites/2DrxGfsYW0jv/bot-traffic \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "bot_group=Googlebot" \
  --data-urlencode "days=1"

200 OK (truncated)

{
  "version": "1",
  "data": {
    "siteId": 3, "botGroupFilter": "Googlebot", "windowDays": 1,
    "matchedBotGroups": [ { "botGroupId": 4, "description": "Googlebot" } ],
    "ja4HashesUsedForFilter": 12,
    "total": 98831, "ok_2xx": 96603, "redirect_3xx": 1010,
    "rejected_4xx": 969, "error_5xx": 0,
    "by_status": [
      { "status": 200, "count": 96603 },
      { "status": 404, "count": 955 },
      { "status": 403, "count": 14 }
    ],
    "by_day": [ /* one row per UTC day */ ],
    "recent_sample": [
      { "ts": "...", "host": "en.example.com", "path": "/images/consumo.png", "status": 403, "ip": "..." }
    ]
  }
}

Substring matching: bot_group=Google catches both Googlebot and User-triggered fetchers (Google) together; pass the exact full description to narrow to one bot_group.

get Legacy — `/auth`

/auth

The original TOTP endpoint, kept for backwards compatibility. It takes the same uri query parameter and the same X-API-Key header, but responds with the bare code as text/html — no JSON envelope, no metadata. Prefer /api/v1/totp for new integrations; this endpoint will not change.

Legacy request

curl -G https://urlcap.com/auth \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "uri=otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP"

492039

If the key is invalid or the URI can't be parsed, the legacy endpoint responds with 404 Not Found and an empty body.

Need an API key, or want to talk through a use case? Email info@urlcap.com. Track changes in the changelog.

API reference

Introduction

Quickstart

Base URL & conventions

Authentication

Making requests

Errors

Rate limits

Anonymous free trial

Versioning

Pricing

The capture object

get post Capture a request

GET — quick fetch

POST — full control

Headers

Returns

get User-Agent profiles — identify as Chrome / Firefox / Safari / …

Capture (async) — long captures without timeouts

Submit

Poll

List your recent capture_full tasks

Limits

Screen capture — full-page or viewport screenshot

Request body

Response — binary mode (default)

Response — JSON envelope (output:"json")

Worked examples

Limits & notes

Extract — navigation & information retrieval

The job model

Worked example — SPA + cookie banner + HAR

Selectors

Actions

Extractors

Pagination

Long-running jobs — what protects you

Example — list page + 1200 detail pages (visit_each)

JSON content

HTTP debug — headers, cookies, request chain

Default asset behaviour

post Submit a job

get Task status

IP & CIDR

get post Is an address in a CIDR?

get post Which stored ranges contain an address?

get post del Manage stored ranges

get Full IP intelligence

Reading the response

The TOTP code object

get post Generate a TOTP code

Parameters

Headers

Returns

Scheduled tasks

The schedule object

post Create a schedule

Body fields

get post del List, inspect & manage

get Run history & results

Datasets

Plan caps

The dataset object

get post List & create

get post put del Items: add, replace, remove

get History

get post Membership check

get post Is an IP a known bot?

get post Reverse DNS (PTR)

get Fetch robots.txt

get URL allow / deny check

get post del Watch robots.txt for changes

post Verify a Web Bot Auth signature

Identifying as urlcap: signed capture & extract

get post List & create sites

get post Hostnames per site

get post Ingest tokens per site

Ingest channel — events & outcomes

Auth: site ingest tokens

The request_id binding

Response — JSON envelope (`output:"json"`)

Example — list page + 1200 detail pages (`visit_each`)

The `request_id` binding

get Legacy — `/auth`