urlcap

urlcap API

API reference

urlcap is a power tool for developers: a fast HTTP service that lets you craft and replay requests with low-level control, and exposes purpose-built endpoints — like a TOTP code generator — over a clean, versioned REST API. Everything here lives under https://urlcap.com/api/v1.

Introduction

The API is organised around predictable resource URLs, accepts standard HTTP, and returns JSON for everything under /api/v1. It uses conventional HTTP verbs and status codes, and every response carries a requestId you can quote when contacting support.

You authenticate with an API key sent in a request header. There are no SDKs to install — any HTTP client works, and a machine-readable description is published at /api/v1/openapi.yaml (OpenAPI 3.1) so you can generate a client if you'd like.

Status: urlcap is in active development. The TOTP endpoint described below is live today. New endpoints for fully custom HTTP requests are rolling out — watch the changelog.

Quickstart

Make your first call in under a minute. You'll need an API key — sign up for a free account, then create one on the API keys page.

  1. 1Export your key: export URLCAP_KEY="cb7c07df-…"
  2. 2Call an endpoint with the X-API-Key header.
  3. 3Read the JSON response — every call includes a requestId.
Generate a TOTP code
curl -G https://urlcap.com/api/v1/totp \
  -H "X-API-Key: $URLCAP_KEY" \
  --data-urlencode "uri=otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP&period=30&digits=6"
200 OK
{
  "version": "1",
  "requestId": "9f1c0b7a-3e2d-4a51-9b88-2f6c1e7d4a02",
  "data": {
    "code": "492039",
    "digits": 6,
    "period": 30,
    "algorithm": "SHA1",
    "expiresIn": 14
  }
}

Base URL & conventions

All versioned endpoints share a common base:

Base URL
https://urlcap.com/api/v1
  • Requests and responses under /api/v1 use JSON with UTF-8 encoding.
  • Parameters may be sent as query-string values or, for POST, as application/x-www-form-urlencoded body fields. Always URL-encode values that contain reserved characters.
  • Successful responses use 200. Client errors use 4xx; server errors use 5xx.
  • Every response includes a top-level version field and, on success, a requestId.

The original, pre-versioned endpoint at https://urlcap.com/auth remains available and returns a plain-text code — see Legacy /auth.

Authentication

urlcap authenticates requests with an API key. Send it in the X-API-Key header on every request. Keys are UUID-shaped strings; treat them like passwords — never embed one in client-side code or commit it to a repository.

Authenticated request
curl https://urlcap.com/api/v1/totp \
  -H "X-API-Key: cb7c07df-588e-4ef8-ae42-458fe1e90fd0" \
  --data-urlencode "uri=otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP"

Requests with a missing, malformed, or unknown key are rejected with 401 Unauthorized. Expired keys are treated the same way. Authentication failures are recorded against the attempted key for auditing.

401 Unauthorized
{
  "version": "1",
  "error": {
    "type": "unauthorized",
    "message": "Missing or invalid X-API-Key header."
  }
}

Making requests

Endpoints under /api/v1 accept GET for read-style calls. Where an endpoint also accepts POST, parameters go in a application/x-www-form-urlencoded body — handy when a value (such as an otpauth:// URI) is long or contains characters that are awkward in a URL.

A note on +: in a query string, + decodes to a space. urlcap restores + in the uri parameter so secrets and labels survive round-tripping, but the most robust approach is to always percent-encode (curl --data-urlencode, encodeURIComponent, urllib.parse.quote, …).

POST with a form body
curl -X POST https://urlcap.com/api/v1/totp \
  -H "X-API-Key: $URLCAP_KEY" \
  --data-urlencode "uri=otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP&algorithm=SHA256&digits=8&period=30"

Errors

urlcap uses standard HTTP status codes. 2xx means success. 4xx means the request was rejected (a missing parameter, a bad key, a malformed URI). 5xx means something went wrong on our side — these are rare and safe to retry with backoff.

Error responses under /api/v1 have a consistent shape:

Error envelope
{
  "version": "1",
  "error": {
    "type": "invalid_uri",
    "message": "Could not parse the supplied otpauth:// URI: ..."
  }
}
Statuserror.typeWhen it happens
400invalid_requestA required parameter is missing or empty.
400invalid_uriThe uri value is not a parseable otpauth:// URI.
401unauthorizedThe X-API-Key header is missing, malformed, unknown, or expired.
404not_foundNo endpoint matches the requested path under /api/v1.
5xxinternal_errorAn unexpected error on our side. Retry with exponential backoff.

Rate limits

urlcap enforces two independent limits:

  • Monthly quota — the request budget published for your plan on the pricing page. The Business plan's "Unlimited" tier has no monthly cap, but it is not uncapped throughput — see below.
  • Fair-use per-second rate limits — per-API-key and per-IP burst caps scaled to your plan, to protect the platform from abuse and noisy neighbours. They apply to every plan, including unlimited tiers. They also apply to the no-key free trial (5 requests per 24 hours per IP / UA / fingerprint).

When you exceed either limit the API responds with 429 Too Many Requests; back off and retry. Detailed per-plan QPS numbers and the accompanying response headers are published alongside the public launch — until then, build assuming generous-but-finite throughput and add retry-with-backoff for 429 and 5xx.

Anonymous free trial

A small allow-list of endpoints can be hit without an API key, with a strict per-day budget — useful for demos, tinkering, and tutorials. After the budget is spent the endpoint returns 402 anon_limit_reached with a sign-up link.

  • Allow-listed endpoints: /api/v1/capture, /totp, /is_bot, /ip/contains, /ip/lookup.
  • Budget: 5 requests per rolling 24h, counted independently on three identity dimensions — IP, User-Agent, and the optional X-Client-Fingerprint header set by the in-browser try-it widget. If any dimension hits the cap, the request is blocked.
  • Response headers on allowed calls: X-RateLimit-Limit / X-RateLimit-Remaining.
  • Block response: {"error":{"type":"anon_limit_reached","signup_url":"/register","limit":5,"window":"24h"}}.

Heavy or stateful endpoints (/extract, /datasets, /schedules) are not on the allow-list and continue to require a valid API key.

Versioning

The API is versioned in the URL path. The current version is v1: https://urlcap.com/api/v1. Backwards-incompatible changes — removing a field, changing a type, renaming a parameter — ship under a new path segment (/api/v2); v1 keeps working. Additive changes (new optional parameters, new fields in a response, new endpoints) can appear within v1, so write clients that tolerate unknown fields.

The legacy endpoint at https://urlcap.com/auth predates the versioning scheme. It is frozen: it will keep its current plain-text behaviour indefinitely, but new functionality only lands under /api/v{n}.

Pricing

urlcap is a paid API with usage-based pricing — you pay for the requests you make, with a free tier to build and prototype on. Sign up for a free account and create your first API key on the API keys page; paid tiers (Developer / Startup / Business) are managed from Billing.

The capture object

A successful call to the capture endpoint returns an envelope whose data field describes the request urlcap sent and the response it got back — parsed the way a browser's network inspector would show it. The shape is:

data
{
  "request": {
    "url": "https://example.com/path?q=1", "method": "GET", "httpVersion": "HTTP/1.1",
    "scheme": "https", "host": "example.com", "port": 443, "path": "/path", "query": "q=1",
    "queryParameters": [ { "name": "q", "value": "1" } ],
    "headers": [ { "name": "User-Agent", "value": "..." }, { "name": "Accept", "value": "*/*" } ],
    "followRedirects": false, "body": "", "bodyEncoding": "UTF-8", "technology": "reactor-netty"
  },
  "response": {
    "status": 200, "statusText": "OK", "httpVersion": "HTTP/1.1",
    "headers": [ { "name": "Date", "value": "..." }, { "name": "Content-Type", "value": "text/html" } ],
    "cookies": [ { "name": "sid", "value": "abc", "path": "/", "domain": ".example.com", "secure": true, "httpOnly": true, "sameSite": "Lax" } ],
    "contentType": "text/html", "charset": "utf-8", "contentLength": 1256,
    "bodyBytes": 1256, "bodyTruncated": false, "body": "...",
    "bodyEncoding": "UTF-8",
    "timings": {
      "totalMs":       84,
      "dnsMs":         12,
      "connectMs":     35,
      "requestSendMs":  1,
      "ttfbMs":        28,
      "bodyMs":         7,
      "resolvedIp":   "203.0.113.42"
    }
  }
}
  • request.headers — the headers urlcap actually sent, in order. If you supply none, it adds a default User-Agent and Accept; if you supply any, it sends exactly what you give. The runtime may add Host / Content-Length on top.
  • response.headers — every response header in the exact order received (duplicates preserved).
  • response.cookies — each Set-Cookie header parsed into name/value plus attributes.
  • response.body — the response body decoded with bodyEncoding. Bodies over ~1 MB are truncated in the response (bodyTruncated = true); the full body is still recorded server-side.
  • response.timings — wall-clock totalMs plus a phase breakdown: dnsMs (DNS resolution + scheduling), connectMs (TCP + TLS), requestSendMs, ttfbMs (time-to-first-byte), bodyMs (body download), and resolvedIp (the A/AAAA we landed on). Phase fields are absent when their hook didn't fire — e.g. a pooled keep-alive reuse skips DNS / connect, and followRedirects=true captures only the first leg's timings.

get post  Capture a request

/api/v1/capture

Sends an HTTP request to a target URL and returns its response as a capture object. Use GET with a url query parameter for a quick fetch, or POST a JSON body for full control — including the exact order of request headers.

GET — quick fetch

GET /api/v1/capture
curl -G https://urlcap.com/api/v1/capture \
  -H "X-API-Key: $URLCAP_KEY" \
  --data-urlencode "url=https://example.com/path?q=1"

Optional query parameters: method, followRedirects (true/false), timeoutMs.

POST — full control

Send Content-Type: application/json with a body of the following shape (only url is required):

urlstringrequired
The target URL (http or https). Query string and port are honoured.
methodstring
HTTP method — GET (default), POST, PUT, DELETE, HEAD, OPTIONS, PATCH.
headersarray of {name, value}
Request headers, written to the wire in this exact order. Supplying any disables the default User-Agent/Accept.
bodystring
Request body. Defaults to empty.
bodyEncodingstring
Charset used to encode the request body and decode the response body. Defaults to UTF-8.
followRedirectsboolean
Follow 3xx responses. Defaults to false — by default you see the redirect itself.
timeoutMsinteger
Per-request timeout in milliseconds. Defaults to 10000; clamped to 1000–30000.
webBotAuthboolean
Optional. Default false. Sign the outbound request with urlcap's Web Bot Auth signature (Ed25519). Adds Signature-Agent, Signature-Input and Signature headers. Target sites verify against the JWKS at /.well-known/http-message-signatures-directory.
proxy{host, port, user, password}
Route the request through an HTTP proxy. Optional.

Headers

X-API-Keystringrequired
Your API key.

Returns

A 200 response whose data is a capture object, plus a requestId. Note that 200 means urlcap reached the target — the target's own status is in data.response.status. On error: the error envelope with 400 (invalid_request) for a bad/missing URL, 401 (unauthorized), or 502 (upstream_error) when the target couldn't be reached.

curl -X POST https://urlcap.com/api/v1/capture \
  -H "X-API-Key: $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
        "url": "https://httpbin.org/post?x=1",
        "method": "POST",
        "headers": [
          { "name": "User-Agent", "value": "my-app/1.0" },
          { "name": "X-Trace-Id", "value": "abc-123" },
          { "name": "Content-Type", "value": "application/json" }
        ],
        "body": "{\"hello\":\"world\"}",
        "followRedirects": false,
        "timeoutMs": 10000
      }'

get  User-Agent profiles — identify as Chrome / Firefox / Safari / …

/api/v1/user_agent_profiles

By default /capture and /extract identify as urlcap/1.0 (+https://urlcap.com/bot). Two knobs let callers identify as something specific:

  • userAgent — a raw UA string. Wins over the profile.
  • userAgentProfile — a key into the catalogue. For /extract this also selects the HtmlUnit BrowserVersion (Chrome / Firefox / Edge) so JS-fingerprinted targets see a coherent (engine, UA) pairing.

The catalogue is operator-managed in the user_agent_profiles MySQL table. Hit this endpoint to discover the available keys:

GET /api/v1/user_agent_profiles
curl -s https://urlcap.com/api/v1/user_agent_profiles \
  -H "X-API-Key: $URLCAP_KEY"
200 OK (truncated)
{
  "version": "1",
  "data": {
    "profiles": [
      {
        "key": "chrome-latest-mac",
        "description": "Chrome 131 on macOS",
        "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
        "browserEngine": "chrome",
        "extraHeaders": {
          "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
          "Accept-Language": "en-US,en;q=0.9",
          "sec-ch-ua": "\"Chromium\";v=\"131\", \"Not_A Brand\";v=\"24\", \"Google Chrome\";v=\"131\"",
          "sec-ch-ua-mobile": "?0",
          "sec-ch-ua-platform": "\"macOS\""
        }
      },
      { "key": "firefox-latest-win",  "...": "..." },
      { "key": "edge-latest-win",     "...": "..." },
      { "key": "safari-latest-mac",   "...": "..." },
      { "key": "googlebot",           "...": "..." },
      { "key": "urlcap",              "...": "..." }
    ]
  }
}
POST /api/v1/capture — using a profile
curl -X POST https://urlcap.com/api/v1/capture \
  -H "X-API-Key: $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://example.com", "userAgentProfile": "chrome-latest-mac" }'

Resolution order on /capture: explicit headers[] wins (you're in full-control mode and we don't second-guess) → userAgentuserAgentProfile → the system default. On /extract, only the profile selects the JS engine — if you supply userAgent alone the engine stays Chromium.

Coherence caveat: the safari-latest-mac profile ships a Safari UA on the Chromium engine because HtmlUnit doesn't include a Safari engine — targets that fingerprint navigator.vendor / navigator.platform will detect the mismatch. The googlebot profile only sets the UA string — urlcap is not Googlebot and reverse-DNS / published-CIDR checks at the target side will reject. Use webBotAuth=true for cryptographic urlcap attribution.

Extract — navigation & information retrieval

The extract tool runs a small "recipe" against a web page using a headless browser engine: it loads a URL, optionally performs a sequence of actions (typing into fields, clicking, waiting, navigating), then pulls out the data you describe with CSS or XPath selectors — optionally walking through paginated results. It also handles JSON responses (a REST API, say): point it at a URL that returns JSON and pull out values with JSONPath. Because a job can take a while, it runs asynchronously: you submit a job model, get a taskId back, and poll for completion.

The underlying browser engine is an implementation detail and may change; the job model and the result shape below are the contract.

The job model

A job is a JSON object:

job model
{
  "search_id": "my-job-1",                       // optional: a correlation id you choose; echoed back in the result
  "url": "https://example.com/search",           // required: the page to load
  "content": "json",                             // optional: "json" forces the response to be processed as JSON (see below)
  "actions": [ /* steps performed before extraction — see below */ ],
  "extractors": [ /* what to pull from the final page — see below */ ],
  "pagination": { /* optional: walk through multiple pages — see below */ }
}
urlstringrequired
The http/https page to load first.
search_idstring
An optional identifier you choose; returned in the result as search_id so you can correlate jobs.
contentstring
Set to json to process the response body as JSON regardless of its Content-Type. If omitted, the engine auto-detects: an HTML response is treated as a page, anything else (application/json, text/plain, …) as JSON. See JSON content.
actionsarray
An ordered list of actions performed on the page before extraction. Optional. (Ignored for JSON content.)
extractorsarray
The extractors run against the final page. Their results become the top-level fields of the result.
paginationobject
If present, pagination visits multiple pages and runs its own per-page extractors on each. (Ignored for JSON content.)
webBotAuthboolean
Optional. Default false. Sign every outbound request the headless browser makes (main document, scripts, XHR, …) with urlcap's Web Bot Auth signature (Ed25519). Target sites verify against the JWKS at /.well-known/http-message-signatures-directory.

Selectors

Every selector field accepts a CSS selector by default, or an XPath expression with an xpath: prefix. You may also write css: explicitly. When the response is JSON content, selector is a JSONPath expression instead (the jsonpath: prefix is accepted but optional there):

selector syntax
"#numResultados"                              // CSS (no prefix)
"css:.results a.title"                        // CSS (explicit)
"xpath://a[starts-with(@href,'item.php?')]"   // XPath
"$.store.book[*].title"                       // JSONPath  (only for JSON content)
"jsonpath:$..price"                           // JSONPath  (explicit)

Actions

Each entry in actions is performed in order before the extractors run. An action object has a type and the fields that type needs:

typeOther fieldsEffect
fillselector, valueSets the value of the matched input/textarea (or the value attribute otherwise).
selectselector, valueSelects the option with that value in the matched <select>.
clickselectorClicks the matched element, then waits briefly for background JavaScript.
waitmsWaits ms milliseconds (default 1000) for background JavaScript to settle.
navigateurlLoads a different URL.
actions example
"actions": [
  { "type": "fill",   "selector": "#q",        "value": "widgets" },
  { "type": "select", "selector": "#category", "value": "hardware" },
  { "type": "click",  "selector": "css:button[type=submit]" },
  { "type": "wait",   "ms": 2000 }
]

Extractors

Each entry in extractors produces one top-level field in the result, named by its name. The type decides what is produced:

textstring
The text content of the first element matching selector.
attrstring
The value of the attr attribute on the first matching element.
listarray of strings
The text (or, if attr is given, the attribute) of every matching element.
itemsarray of objects
One object per matching element; each object's keys come from the extractor's fields, evaluated relative to that element.

A fields entry (used by items) has name, selector, optional attr, and type (text or attr).

extractors example
"extractors": [
  { "name": "total",   "selector": "#numResults", "type": "text" },
  { "name": "results", "selector": "css:.result", "type": "items",
    "fields": [
      { "name": "title", "selector": "a.title", "type": "text" },
      { "name": "href",  "selector": "a.title", "type": "attr", "attr": "href" },
      { "name": "price", "selector": ".price",  "type": "text" }
    ]
  }
]

Every object inside a list/items array is automatically stamped with result_global_id (a counter across the whole job), result_relative_id (a counter within its page), and result_page (the 0-based page index it came from). (Stamping applies to HTML pages only — JSON results are returned verbatim.)

For JSON content the same extractor shapes apply, but selector is a JSONPath expression and the types are value (the default), list and items — see below.

Pagination

If pagination is present, the job visits multiple pages and runs per_page_extractors on each; the per-page results appear under a pages array in the result. There are two strategies:

strategystring
sequential (default) — repeatedly click next_selector; or link_tour — visit every distinct pagination link found by link_selector exactly once (handles AJAX paginators whose whole bar re-renders each page).
next_selectorstring
The "next page" element to click (sequential strategy).
link_selectorstring
Matches all pagination links (link_tour strategy).
max_pagesinteger
Hard cap on pages visited. Default 10.
wait_msinteger
Milliseconds to wait after each page transition for background JavaScript. Default 1000.
stop_when_missingboolean
If true (default), stop quietly when the next-page element is gone; if false, the job fails.
per_page_extractorsarray
Extractors (same shape as above) run on every visited page.

JSON content

When the page you load returns JSON — a REST API endpoint, for example — the extract tool parses the response body and runs your extractors with JSONPath instead of CSS/XPath. This happens automatically when the response isn't HTML (application/json, text/plain, anything that isn't text/html); to force it (e.g. an API that mislabels its Content-Type as text/html), set "content": "json" in the job model.

In JSON mode each extractor's selector is a JSONPath expression (the jsonpath: prefix is accepted but optional; an expression that doesn't start with $ gets one prepended, so store.book[0].title works too). actions and pagination don't apply and are ignored. The type decides what each extractor produces:

valueany
The matched value — a JSON object, array, string, number or boolean (or null if nothing matched). A path that selects more than one node (uses .., [*], a filter or a slice) yields an array of all matches. This is the default when type is omitted.
listarray
Always an array: every match if the path is multi-valued, or the single matched value wrapped in a one-element array if it's not — an empty array when nothing matched.
itemsarray of objects
The path selects a set of nodes; each becomes one object whose keys come from the extractor's fields, evaluated relative to that node — a field's selector is a JSONPath where $ is the node (an empty selector means the node itself).

The attr type (and a field's attr) is for HTML only; using it on JSON content fails the job.

JSON job model + result
// for a URL returning:  { "page": 1, "total": 128,
//                         "results": [ { "id": 1, "name": "Widget A", "price": 9.99 },
//                                      { "id": 2, "name": "Widget B", "price": 12 } ] }
{
  "url": "https://api.example.com/products?q=widget",
  "content": "json",                                  // optional — auto-detected for application/json
  "extractors": [
    { "name": "total",  "selector": "$.total" },
    { "name": "names",  "selector": "$.results[*].name",          "type": "list" },
    { "name": "cheap",  "selector": "$.results[?(@.price < 10)]", "type": "items",
      "fields": [
        { "name": "id",    "selector": "$.id" },
        { "name": "name",  "selector": "$.name" },
        { "name": "price", "selector": "$.price" }
      ]
    }
  ]
}

// → result:
{
  "total": 128,
  "names": [ "Widget A", "Widget B" ],
  "cheap": [ { "id": 1, "name": "Widget A", "price": 9.99 } ]
}

post  Submit a job

/api/v1/extract

Send the job model as a JSON body. The job is queued and you get a taskId immediately (status 202). Poll GET /api/v1/extract/{taskId} for progress.

X-API-Keyheaderrequired
Your API key.
POST /api/v1/extract
curl -X POST https://urlcap.com/api/v1/extract \
  -H "X-API-Key: $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
        "search_id": "demo-1",
        "url": "https://example.com",
        "extractors": [ { "name": "heading", "selector": "css:h1", "type": "text" } ]
      }'
202 Accepted
{
  "version": "1",
  "taskId": "2b9a0c3e-7d11-4f44-9a8c-2c1d4e5f6a7b",
  "status": "pending",
  "statusUrl": "/api/v1/extract/2b9a0c3e-7d11-4f44-9a8c-2c1d4e5f6a7b"
}

If too many jobs are queued you get 503 (service_busy) — retry shortly. 400 (invalid_request) means the model couldn't be parsed or is missing url.

get  Task status

/api/v1/extract/{taskId}

Returns the task's current state. status is one of pending, running, succeeded, failed. When succeeded, a result object is included; when failed, an error object. httpRequestCount is how many HTTP requests the engine has performed for this job so far.

You only see your own tasks; an unknown id (or one belonging to another key) returns 404. GET /api/v1/extract (no id) lists your recent tasks.

GET /api/v1/extract/{taskId}
curl https://urlcap.com/api/v1/extract/2b9a0c3e-7d11-4f44-9a8c-2c1d4e5f6a7b \
  -H "X-API-Key: $URLCAP_KEY"
200 OK — succeeded
{
  "version": "1",
  "taskId": "2b9a0c3e-7d11-4f44-9a8c-2c1d4e5f6a7b",
  "status": "succeeded",
  "url": "https://example.com",
  "httpRequestCount": 1,
  "createdAt": "2026-05-11T13:00:00.000Z",
  "startedAt": "2026-05-11T13:00:00.100Z",
  "finishedAt": "2026-05-11T13:00:01.900Z",
  "result": {
    "search_id": "demo-1",
    "heading": "Example Domain"
  }
}

For a job with an items extractor, the result looks like:

result with items
"result": {
  "search_id": "demo-1",
  "total": "128 results",
  "results": [
    { "result_global_id": 1, "result_relative_id": 1, "result_page": 0, "title": "Widget A", "href": "/item?id=1", "price": "9.99" },
    { "result_global_id": 2, "result_relative_id": 2, "result_page": 0, "title": "Widget B", "href": "/item?id=2", "price": "12.00" }
  ]
}

IP & CIDR

Work with IPv4 and IPv6 addresses and CIDR ranges: check whether an address falls inside a range, keep a list of named ranges (allow/block lists, ASN or geo blocks, your own networks, …) and ask which of them contain a given address.

Stored ranges live in an optimised table: every address is kept as a 16-byte value (IPv4 is stored IPv4-mapped, ::ffff:a.b.c.d, so v4 and v6 share one comparable key space), each range as its first and last address plus a prefix length, and a B-tree index on those bounds turns "which ranges contain this address?" into an index range scan.

get post  Is an address in a CIDR?

/api/v1/ip/contains

A pure calculation — does ip fall within cidr? (Different address families ⇒ false.) A single host can be written with or without /32 · /128.

ipstringrequired
A single IPv4 or IPv6 address.
cidrstring
A CIDR (e.g. 10.0.0.0/8) or a single address. Required unless you pass cidrs.
cidrsarray of strings
POST only — check the address against several ranges at once; the response then has a results array.
GET /api/v1/ip/contains
curl -G https://urlcap.com/api/v1/ip/contains \
  -H "X-API-Key: $URLCAP_KEY" \
  --data-urlencode "ip=10.20.30.40" \
  --data-urlencode "cidr=10.0.0.0/8"
200 OK
{
  "version": "1",
  "requestId": "…",
  "data": {
    "ip": "10.20.30.40",
    "cidr": "10.0.0.0/8",
    "contains": true,
    "range": {
      "cidr": "10.0.0.0/8",
      "family": 4,
      "prefixLength": 8,
      "networkAddress": "10.0.0.0",
      "lastAddress": "10.255.255.255"
    }
  }
}

Batch form: POST /api/v1/ip/contains with { "ip": "10.20.30.40", "cidrs": ["10.0.0.0/8", "192.168.0.0/16", "2001:db8::/32"] }data.results is [ { "cidr": "10.0.0.0/8", "contains": true }, … ].

get post  Which stored ranges contain an address?

/api/v1/ip/lookup

Looks up every range in your stored set (see below) that contains ip, most-specific first.

ipstringrequired
A single IPv4 or IPv6 address (query parameter, or JSON { "ip": "…" }).
GET /api/v1/ip/lookup
curl -G https://urlcap.com/api/v1/ip/lookup \
  -H "X-API-Key: $URLCAP_KEY" \
  --data-urlencode "ip=10.20.30.40"
200 OK
{
  "version": "1",
  "requestId": "…",
  "data": {
    "ip": "10.20.30.40",
    "family": 4,
    "matchCount": 2,
    "matches": [
      { "id": 7, "cidr": "10.20.0.0/16", "family": 4, "prefixLength": 16, "label": "office-lan" },
      { "id": 3, "cidr": "10.0.0.0/8",   "family": 4, "prefixLength": 8,  "label": "rfc1918" }
    ]
  }
}

get post del  Manage stored ranges

/api/v1/ip/ranges  ·  /api/v1/ip/ranges/{id}

  • GET /api/v1/ip/ranges — list your stored ranges (newest first): id, cidr, family, prefixLength, label, createdAt.
  • POST /api/v1/ip/ranges with { "cidr": "10.20.0.0/16", "label": "office-lan" } — adds the range (the cidr is canonicalised on the way in). If that CIDR is already stored its label is updated. Returns 201 with the row.
  • DELETE /api/v1/ip/ranges/{id} — removes a stored range. 404 if there's no such id.
POST /api/v1/ip/ranges
curl -X POST https://urlcap.com/api/v1/ip/ranges \
  -H "X-API-Key: $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "cidr": "2001:db8::/32", "label": "documentation-prefix" }'
201 Created
{
  "version": "1",
  "requestId": "…",
  "data": {
    "id": 12,
    "cidr": "2001:db8::/32",
    "family": 6,
    "prefixLength": 32,
    "networkAddress": "2001:db8::",
    "lastAddress": "2001:db8:ffff:ffff:ffff:ffff:ffff:ffff",
    "label": "documentation-prefix"
  }
}

get  Full IP intelligence

/api/v1/ip/intelligence

One request, every signal urlcap has on an IP. Composes the static lookups (CIDR membership, GeoIP, reverse DNS, bot registry, trust list) with the behavioural data we've aggregated from ingested traffic: which JA4 fingerprints the IP has used, which user agents, what sites it hits, the status-code mix it gets back, vulnerability- probe hits, and whether other customers have voted on its JA4s via edge_action.

Use it as the investigation pane for a single IP — e.g., "tell me everything about the IP that just showed up in my logs." For inline blocklist hot paths, stay on /api/v1/ip/contains and /api/v1/ja4/intelligence — they're optimised for membership checks; this one composes ~7 sub-queries per call.

ipstring · required
An IPv4 or IPv6 address. CIDRs are not accepted here.
window_daysint 1..7 · default 7
Sliding window for the behavioural snapshot. Capped at 7 because that's the request_events TTL. Static sections (geo, rDNS, bot registry, trust list, bot observations) ignore this.
GET /api/v1/ip/intelligence?ip=216.73.217.19
curl -s "https://urlcap.com/api/v1/ip/intelligence?ip=216.73.217.19" \
  -H "Authorization: Bearer $URLCAP_KEY"
200 OK
{
  "version": "1",
  "data": {
    "ip": "216.73.217.19",
    "family": 4,
    "windowDays": 7,

    "geo": {
      "countryCode": "US", "countryName": "United States",
      "city": "Columbus", "asn": 16509,
      "asnOrganization": "Amazon.com, Inc."
    },
    "reverseDns": { "names": [], "ttlSeconds": 300, "error": "no PTR record" },
    "isBot": {
      "matched": true,
      "matches": [
        { "botGroup": "Claude bot", "botGroupId": 10, "cidr": "216.73.216.0/22" }
      ]
    },
    "trustList": { "trusted": false, "byUsers": 0 },

    "botObservations": [
      { "botGroup": "Claude bot", "source": "cidr_match",
        "observations": 439677, "firstSeen": "…", "lastSeen": "…" }
    ],

    "behaviour": {
      "totalRequests": 29179,
      "distinctJa4s": 1, "distinctUserAgents": 1, "distinctSites": 1,
      "firstSeen": "…", "lastSeen": "…",
      "statusMix": { "count2xx": 28912, "count3xx": 267, "count4xx": 0,
                     "count5xx": 0, "count403": 0, "count444": 0 },
      "assetRatio": 0.0,
      "vulnProbeHits": 0, "vulnProbesUnique": 0,
      "blockRatio": 0.0
    },

    "ja4s": [
      { "ja4": "t13d1011h2_61a7ad8aa9b6_867a6ff6dde3",
        "ja4Hash": "7561205223071800741",
        "requestCount": 29179,
        "classification": "known_bot",
        "botGroup": "OAI-SearchBot" }
    ],

    "userAgents": [
      { "userAgent": "Mozilla/5.0 AppleWebKit/537.36 (…; compatible; Claude-SearchBot/1.0; +searchbot@anthropic.com)",
        "requestCount": 29179 }
    ],

    "crossCustomerAction": null
  }
}

Reading the response

  • isBot.matches — every published-CIDR registry hit. Operator-grade attribution: Googlebot, Bingbot, GPTBot, ClaudeBot, etc. CIDRs refreshed daily from operators' own JSON.
  • botObservations — every {bot_group, IP} attribution we've made internally. source tells you how: cidr_match (registry), ua_match (UA self-identification), vuln_match (≥3 vuln-probe paths), manual (admin override).
  • trustList.byUsers — how many distinct urlcap accounts have whitelisted this IP. "12 customers trust this IP" is a strong "do not block" vote.
  • behaviour.statusMix — what response codes this IP gets back across all sites. High 403/444 ratio means edges are already blocking it.
  • behaviour.assetRatio — fraction of requests that fetched images/CSS/JS/fonts. Browsers ≈ 0.5-0.9; bots ≈ 0.
  • ja4s[].classification — per fingerprint: known_bot (attributed in bot_observed_ja4s), candidate (pending review, includes score), or unknown.
  • crossCustomerAction — when other customers have submitted edge_action outcomes for any JA4 this IP has used, the headcounts surface here. Highest-confidence label we publish.

Heads up: the per-IP attribution from botObservations and the per-JA4 attribution from ja4s[].botGroup can disagree. In the example above, the IP belongs to Anthropic (Claude bot CIDR), while the JA4 fingerprint is currently attributed to OAI-SearchBot because Anthropic and OpenAI's HTTP clients ship the same TLS library and produce identical fingerprints. That's not a bug — it's the kind of cross-axis fact this endpoint exists to surface. Decide policy based on which axis matters more for your case.

The TOTP code object

A successful call to the TOTP endpoint returns an envelope containing a data object that represents a freshly computed time-based one-time password and the parameters used to compute it.

codestring
The current one-time password, as a zero-padded decimal string of length digits.
digitsinteger
Number of digits in code (commonly 6, taken from the URI or its default).
periodinteger
Length of the time step in seconds (commonly 30).
algorithmstring
HMAC hash used: SHA1, SHA256, or SHA512.
expiresIninteger
Seconds remaining until code rotates. Use it to render a countdown; refetch when it reaches zero.
data
{
  "code": "492039",
  "digits": 6,
  "period": 30,
  "algorithm": "SHA1",
  "expiresIn": 14
}

get post  Generate a TOTP code

/api/v1/totp

Related guide

Computes the current TOTP code for an otpauth:// URI — the same string you'd scan into an authenticator app. The URI carries the shared secret and the algorithm/digits/period; nothing is stored server-side. Accepts GET (query string) or POST (application/x-www-form-urlencoded).

How secrets are handled

  • The otpauth:// URI (and its embedded shared secret) is processed in memory for the duration of one request and is never persisted.
  • The uri parameter is redacted from request logs and excluded from analytics.
  • Transport is TLS 1.2+ only; cleartext requests are refused.
  • The endpoint is intended for automated testing and internal automation against systems you own — not for storing or generating codes for your personal 2FA accounts.
  • Full posture: /security.

Parameters

uristringrequired
An otpauth://totp/... URI containing at least a secret parameter (Base32). May also include algorithm, digits, and period. Always percent-encode this value.

Headers

X-API-Keystringrequired
Your API key.

Returns

A 200 response whose data field is a TOTP code object, plus a requestId. On error, the standard error envelope with status 400 or 401.

curl -G https://urlcap.com/api/v1/totp \
  -H "X-API-Key: $URLCAP_KEY" \
  --data-urlencode "uri=otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP&period=30&digits=6"
Response · 200 OK
{
  "version": "1",
  "requestId": "9f1c0b7a-3e2d-4a51-9b88-2f6c1e7d4a02",
  "data": {
    "code": "492039",
    "digits": 6,
    "period": 30,
    "algorithm": "SHA1",
    "expiresIn": 14
  }
}

Scheduled tasks

Run a task at a future time — once, or repeatedly on a cron schedule — without keeping a connection open. A scheduled task is either a capture (kind: "capture") or an extract job (kind: "extract"). Every execution stores its full result, which you can fetch later.

The base path is /api/v1/schedules. Everything here uses the standard JSON envelope and your API key.

The schedule object

schedule
{
  "id": "0f4c…-uuid",
  "kind": "capture",                   // "capture" | "extract"
  "name": "prod health check",         // optional label
  "cron": "*/15 * * * *",              // recurring; null for a one-shot
  "runAt": null,                       // one-shot ISO-8601 time; null for recurring
  "timezone": "Europe/Madrid",         // the cron is evaluated in this zone (default UTC)
  "status": "active",                  // active | paused | done | disabled
  "nextRunAt": "2026-06-01T07:15:00Z",
  "lastRunAt": "2026-06-01T07:00:00Z",
  "runs": 12,
  "maxRuns": null,                     // stop after this many runs; null = unlimited
  "until": null,                       // stop after this ISO-8601 time; null = no end
  "createdAt": "2026-05-12T10:00:00Z",
  "capture": { "url": "https://example.com/health", "method": "GET" }   // present for kind "capture";
                                                                        // an "extract" key holds the job model for kind "extract"
}

Cron syntax. The classic 5-field crontab form — minute hour day-of-month month day-of-week — with ranges (1-5), lists (1,15), steps (*/15) and names (MON, JAN). An optional 6th leading field adds seconds. The macros @hourly @daily @weekly @monthly @yearly work too. (The Quartz-only L/W/# do not.) A job whose time was missed while the service was down runs once on the next poll, then resumes at its next future occurrence.

Extract schedules. An extract task runs through the (asynchronous) extract engine; the scheduler waits for it to finish and stores the engine's result in the run row (no httpStatus — that's a capture-only field). It also shows up in your extract task list.

post  Create a schedule

/api/v1/schedules

Send either a cron expression (recurring) or a runAt timestamp (one-shot) — not both — plus exactly one of a capture object (same shape as the capture object) or an extract object (the extract job model). Which one you send determines the kind.

curl — schedule a capture
curl -X POST https://urlcap.com/api/v1/schedules \
  -H "Authorization: Bearer api_…" -H "Content-Type: application/json" \
  -d '{
    "name": "prod health check",
    "cron": "*/15 * * * *",
    "timezone": "Europe/Madrid",
    "maxRuns": 96,
    "capture": { "url": "https://example.com/health", "method": "GET" }
  }'

A one-shot capture instead:

json
{ "runAt": "2026-06-01T09:00:00Z", "capture": { "url": "https://example.com/report" } }

Or schedule an extract job — pass an extract object holding the job model:

json
{
  "name": "daily price scrape",
  "cron": "0 6 * * *",
  "extract": {
    "url": "https://example.com/products",
    "extractors": [ { "name": "prices", "selector": ".price", "type": "list" } ]
  }
}

Body fields

  • capture / extractone is required. capture: same fields as the capture endpoint's JSON body (at minimum a url). extract: the extract job model (at minimum a url).
  • cron — a cron expression (see above). Mutually exclusive with runAt.
  • runAt — an ISO-8601 timestamp (e.g. 2026-06-01T09:00:00Z) for a single run. Mutually exclusive with cron.
  • timezone — IANA zone the cron is evaluated in. Default UTC.
  • name — optional label.
  • maxRuns — optional; stop after this many executions.
  • until — optional ISO-8601 timestamp; stop after this time.

Returns 201 with data.schedule = a schedule object. Bad cron / timezone / missing or both task objects → 400.

get post del  List, inspect & manage

  • GET /api/v1/schedules — your schedules (data.schedules: an array of schedule objects).
  • GET /api/v1/schedules/{id} — one schedule (data.schedule).
  • POST /api/v1/schedules/{id} with { "action": "pause" | "resume" | "run-now" } — pause stops future runs; resume re-arms it; run-now makes it fire on the next poll (within ~20s).
  • DELETE /api/v1/schedules/{id} — cancel the schedule. Its run history is kept.

get  Run history & results

  • GET /api/v1/schedules/{id}/runs — the executions, newest first (?limit=N). Each: runNo, scheduledFor, startedAt, finishedAt, status (running/ok/error), httpStatus, error.
  • GET /api/v1/schedules/{id}/runs/{runNo} — one execution including its full result. For a capture task that's the same JSON the capture endpoint returns ({ version, requestId, data: { request, response } }); for an extract task it's the engine's extract result (the same shape as GET /extract/{taskId}'s result).
curl
curl https://urlcap.com/api/v1/schedules/0f4c…/runs/3 \
  -H "Authorization: Bearer api_…"

Datasets

Named, de-duplicated collections of items of a single type — either IP / CIDR ranges (canonical CIDRs, as in IP & CIDR; a single host becomes a /32 or /128) or URLs (absolute http / https URLs). A dataset is yours; the API only ever shows you your own.

With history: true, every replace items operation first copies the items it drops into the dataset's history with their removal date — useful for tracking a set as it evolves.

Plan caps

  • free — up to 1 dataset, up to 1,000 items each, history not allowed.
  • developer — up to 10 datasets, up to 100,000 items, history allowed.
  • startup — up to 50 datasets, up to 1,000,000 items, history allowed.
  • business — unlimited datasets & items, history allowed.

The dataset object

iduuid
Stable identifier.
namestring
Unique among your datasets. Auto-assigned dataset-… if omitted on create.
typestring
ip (IP/CIDR) or url.
historyboolean
When true, a replace keeps the dropped items (see History).
sizeinteger
Current item count (present on single-dataset responses).
createdAttimestamp
When the dataset was created.

get post  List & create

/api/v1/datasets  ·  /api/v1/datasets/{id}

  • GET /api/v1/datasetsdata.datasets is an array of dataset objects (newest first).
  • POST /api/v1/datasets with { "type": "ip" | "url", "name"?: "…", "history"?: true, "items"?: [ … ] } — creates a dataset and (optionally) seeds it. 201 with the created object.
  • GET /api/v1/datasets/{id} — one dataset (with its current size).
  • DELETE /api/v1/datasets/{id} — deletes the dataset and its items (and history).
POST /api/v1/datasets
curl -X POST https://urlcap.com/api/v1/datasets \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "name": "office-ranges", "type": "ip", "history": true,
        "items": ["10.0.0.0/8", "192.168.1.0/24", "2001:db8::/32"] }'
201 Created
{
  "version": "1",
  "requestId": "…",
  "data": {
    "dataset": {
      "id": "9d811aa8-dbe6-4c48-a811-29f49dd9f49c",
      "name": "office-ranges",
      "type": "ip",
      "itemType": 1,
      "history": true,
      "size": 3,
      "createdAt": "2026-05-13T07:00:00Z",
      "updatedAt": "2026-05-13T07:00:00Z"
    }
  }
}

Bad input — unknown type, duplicate name, plan limit reached, invalid item value, or a name starting with the reserved internal: prefix — returns 400 invalid_request.

get post put del  Items: add, replace, remove

/api/v1/datasets/{id}/items

  • GET — paged list (?limit=N, default 1000, max 5000; ?after=ID for the next page). Returns items and nextAfter.
  • POST with { "items": [ … ] } — adds items, de-duplicated against what's already there. Returns { added, size }.
  • PUT with { "items": [ … ] } — replaces the whole set. On a history: true dataset, the dropped items are first written to history. Returns { size }.
  • DELETE with { "items": [ … ] } — removes the listed items. Returns { removed, size }.

Each item value is canonicalised on the way in: an IP/CIDR is reduced to its canonical form (host bits cleared, single hosts become /32 or /128); a URL must be absolute and http / https. The dataset cannot contain two items with the same canonical value.

POST /api/v1/datasets/{id}/items
curl -X POST https://urlcap.com/api/v1/datasets/9d81…/items \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "items": ["203.0.113.0/24", "198.51.100.7"] }'
200 OK
{
  "version": "1",
  "data": { "datasetId": "9d81…", "added": 2, "size": 5 }
}

get  History

/api/v1/datasets/{id}/history

For datasets created with history: true, the items previously dropped by a replace are kept, each stamped with the date it was removed. Newest removal first; ?limit=N (default 1000, max 5000). Always empty for non-history datasets.

200 OK
{
  "version": "1",
  "data": {
    "datasetId": "9d81…",
    "count": 2,
    "history": [
      { "id": 41, "value": "10.0.0.0/8",  "addedAt": "2026-05-10T…", "removedAt": "2026-05-13T07:36:18.07Z" },
      { "id": 40, "value": "192.168.1.0/24", "addedAt": "2026-05-10T…", "removedAt": "2026-05-13T07:36:18.07Z" }
    ]
  }
}

get post  Membership check

/api/v1/datasets/{id}/contains

Normalises value to the dataset's type and reports whether that exact item is in the dataset (contains). For IP datasets, if value is a single address (not a CIDR), the response also includes covered — whether that address falls inside some CIDR stored in the dataset (a fast B-tree range lookup over the dataset's 16-byte bounds).

GET /api/v1/datasets/{id}/contains
curl -G https://urlcap.com/api/v1/datasets/9d81…/contains \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "value=10.20.30.40"
200 OK
{
  "version": "1",
  "data": {
    "datasetId": "9d81…",
    "type": "ip",
    "value": "10.20.30.40",
    "normalizedValue": "10.20.30.40/32",
    "contains": false,
    "covered": true
  }
}

get post  Is an IP a known bot?

/api/v1/is_bot

Tells you whether an IPv4 or IPv6 address belongs to a well-known web crawler (Googlebot, Bingbot, Yandex, DuckDuckBot & DuckAssistBot, Applebot, GPTBot & ChatGPT-User & OAI-SearchBot, ClaudeBot, PerplexityBot & Perplexity-User, AhrefsBot, Amazonbot, CCBot, …) and, if so, which one. On every match the response includes the bot's search engine, bot group, the matching CIDR, and the bot's categories (SEARCH_INDEXING, AI_TRAINING, AI_SEARCH_OR_ANSWERING, USER_INITIATED_FETCHING, SEO_ANALYTICS, WEB_DATASET_ARCHIVING, COMMERCIAL_PLATFORM, SOCIAL_PREVIEW, …) with per-link confidence.

Backed by an in-memory index: every CIDR each bot publishes is preloaded into a sorted array of 16-byte bounds with side-tables of bot / search-engine / category metadata. Lookups never hit the database (sub-millisecond per IP) and the index is rebuilt daily from the bots' published sources. The index also remembers CIDRs that have since been retired (replaced out by a later refresh) — so you can ask "was this IP a known bot on a specific date?" or "has this IP ever been a known bot?".

ipstring
A single IPv4 or IPv6 address (query string or JSON body). Provide either ip or ips.
ipsarray of strings
Batch of up to 200 addresses. As a query parameter: comma-separated (?ips=a,b,c). As a JSON body: { "ips": [ … ] }.
datestring
Optional. ISO-8601 point-in-time (YYYY-MM-DD is treated as that day's start in UTC; or YYYY-MM-DDTHH:MM:SSZ for an exact moment). Runs an as-of-date lookup — returns the CIDR-bot mappings that were active at this moment, automatically including retired records that had been live then.
historicalboolean
Optional. Default false. When true, the lookup also considers retired CIDR records — every CIDR that has ever been in any bot's published list, regardless of whether it's still current. Useful to answer "has this IP ever been a known bot?". Ignored if date is supplied.
reverseDnsboolean
Optional. Default false. When true, attaches a reverseDns object (names, ttlSeconds, cached) to each per-IP result with the PTR names for the address. Results are cached in-process for the upstream record's TTL (clamped 30s–1h; negative answers 5 minutes). For the FCrDNS forward-confirm check, see /reverse_dns.

Each match always carries addedAt (when the CIDR first appeared in the bot's list), removedAt (null while still current), and active (true iff the CIDR was live at the query's time — now by default, or at date when supplied). With historical=true you'll see active: false matches that report when the CIDR was retired.

Every match's botGroup.honoursRobots reports four booleans — robots, crawlDelay, allow, sitemap — for whether the bot operator publicly commits to that aspect of robots.txt and has no documented violations. null means we haven't researched it; false means at least one credible report of the bot ignoring that rule (e.g. Googlebot's crawlDelay is false since Google explicitly ignores Crawl-delay).

GET /api/v1/is_bot
curl -G https://urlcap.com/api/v1/is_bot \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "ip=66.249.66.1"
200 OK
{
  "version": "1",
  "requestId": "…",
  "data": {
    "ip": "66.249.66.1",
    "family": 4,
    "isBot": true,
    "matchCount": 1,
    "matches": [
      {
        "cidr": "66.249.66.0/27",
        "active": true,
        "addedAt": "2026-05-13T07:39:30Z",
        "removedAt": null,
        "botGroup":     { "id": 4, "description": "Common crawlers" },
        "searchEngine": { "id": 1, "description": "Google" },
        "categories":   [ { "code": "SEARCH_INDEXING", "confidence": "high" } ]
      }
    ],
    "cache": { "entries": 36670, "bots": 20, "builtAt": "2026-05-13T07:39:38Z" }
  }
}

Batch form returns data.results, one entry per IP, with data.count and data.matchCount at the top level. An unparseable address returns isBot: false with family: null and a descriptive error field — the whole call still succeeds.

GET /api/v1/is_bot?ip=…&historical=true
# Was this IP ever a known bot?
curl -G https://urlcap.com/api/v1/is_bot \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "ip=66.249.64.5" \
  --data-urlencode "historical=true"

# As of a specific date — includes retired CIDRs that were live then.
curl -G https://urlcap.com/api/v1/is_bot \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "ip=66.249.64.5" \
  --data-urlencode "date=2025-04-01"

data.asOf and data.historical are echoed back when those parameters are supplied, and cache.entries includes retired CIDRs alongside currently-active ones (so the number is larger than the daily-active count).

get post  Reverse DNS (PTR)

/api/v1/reverse_dns

Resolves an IPv4 or IPv6 address to the names returned by its PTR records (in-addr.arpa for v4, ip6.arpa for v6). With forwardConfirm=true, each PTR name is re-resolved to A/AAAA and we report whether the original IP is in the answer — the FCrDNS check Googlebot, Bingbot and friends recommend to verify that a crawler is who its PTR claims it is.

Results are cached in-process for the minimum TTL the upstream resolver returned, clamped to 30s..1h; negative answers (NXDOMAIN / no record / lookup error) are cached for 5 minutes. Every result reports ttlSeconds remaining and cached.

ipstring
A single IPv4 or IPv6 address (query string or JSON body). Provide either ip or ips.
ipsarray of strings
Batch of up to 50 addresses. As a query parameter: comma-separated (?ips=a,b,c). As a JSON body: { "ips": [ … ] }.
forwardConfirmboolean
Optional. Default false. When true, each PTR name is forward-resolved (A + AAAA) and the response reports forwardConfirmed (true iff at least one PTR resolves back to the original IP) plus per-name forwardChecks with addresses, TTL and the matched flag.
GET /api/v1/reverse_dns
curl -G https://urlcap.com/api/v1/reverse_dns \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "ip=66.249.66.1"
200 OK
{
  "version": "1",
  "requestId": "…",
  "data": {
    "ip": "66.249.66.1",
    "family": 4,
    "names": ["crawl-66-249-66-1.googlebot.com"],
    "ttlSeconds": 3600,
    "cached": false
  }
}

Batch form returns data.results with data.count at the top level. An unparseable address returns an entry with a descriptive error field — the whole call still succeeds.

GET /api/v1/reverse_dns?ip=…&forwardConfirm=true
# FCrDNS: is this really Googlebot?
curl -G https://urlcap.com/api/v1/reverse_dns \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "ip=66.249.66.1" \
  --data-urlencode "forwardConfirm=true"
200 OK
{
  "version": "1",
  "requestId": "…",
  "data": {
    "ip": "66.249.66.1",
    "family": 4,
    "names": ["crawl-66-249-66-1.googlebot.com"],
    "ttlSeconds": 3600,
    "cached": true,
    "forwardConfirmed": true,
    "forwardChecks": [
      {
        "name": "crawl-66-249-66-1.googlebot.com",
        "matched": true,
        "ttlSeconds": 300,
        "cached": false,
        "addresses": ["66.249.66.1"]
      }
    ],
    "forwardConfirm": true
  }
}

IPv6 works the same way — the lookup uses ip6.arpa automatically. Single-IP queries can also be made through /is_bot?reverseDns=true if you want the PTR result alongside the bot match.

get  Fetch robots.txt

/api/v1/robots

Pulls /robots.txt from a site, parses it per RFC 9309, and returns the user-agent groups, sitemaps and any unknown directives. Fetched bodies are TTL-cached in-process for 1 hour (1 minute on transport errors / 5xx); every response reports cached and ageSeconds.

Per the RFC: a 4xx (except 429) means "no rules apply" — reported as effect: "no_rules_unrestricted"; a 5xx or 429 means "disallow everything" — effect: "restricted_by_error".

sitestring
A hostname (example.com) or any URL — the scheme/path are normalised away.
GET /api/v1/robots
curl -G https://urlcap.com/api/v1/robots \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "site=example.com"
200 OK
{
  "version": "1",
  "requestId": "…",
  "data": {
    "site": "example.com",
    "status": 200,
    "contentSha256": "…",
    "sizeBytes": 412,
    "cached": false,
    "ageSeconds": 0,
    "body": "User-agent: *\nDisallow: /search\n",
    "groups": [
      { "userAgents": ["*"], "rules": [{ "type": "disallow", "pattern": "/search" }] }
    ],
    "sitemaps": [],
    "extensions": {}
  }
}

get  URL allow / deny check

/api/v1/robots/check

Decides whether a URL is allowed for a given user-agent. Longest-match wins; on a tie, Allow beats Disallow. If site is omitted, it's derived from url.

sitestring
Optional. Derived from url if omitted.
urlstring
The URL or path to check.
userAgentstring
The bot token to match against (e.g. Googlebot). Case-insensitive substring match.
GET /api/v1/robots/check
curl -G https://urlcap.com/api/v1/robots/check \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "url=https://example.com/private/foo" \
  --data-urlencode "userAgent=Googlebot"
200 OK
{
  "version": "1",
  "requestId": "…",
  "data": {
    "site": "https://example.com/private/foo",
    "url": "https://example.com/private/foo",
    "userAgent": "Googlebot",
    "allowed": false,
    "reason": "disallow '/private/' matched",
    "matchedRule": { "type": "disallow", "pattern": "/private/" },
    "matchedGroupUserAgents": ["Googlebot"],
    "robotsStatus": 200,
    "robotsCached": true,
    "robotsAgeSeconds": 142
  }
}

get post del  Watch robots.txt for changes

/api/v1/robots/watch

Registers a per-user watch on a site's /robots.txt. A background job sweeps every 15 minutes; each watch is re-fetched no sooner than its frequencyMinutes (default 60, clamped to 15..1440). Snapshots are stored only when the content hash changes — full body + previous hash kept on each. If webhookUrl is set, every change triggers an HMAC-SHA256-signed POST.

Watches require a per-user API key — the legacy X-API-Key can't create them. Free-trial calls return 401.

sitestring
Hostname. Scheme/path are stripped. Unique per user.
webhookUrlstring (optional)
Where to POST change notifications. http(s)://….
webhookSecretstring (optional)
HMAC-SHA256 signing key. Auto-generated if webhookUrl is set and you don't supply one — returned on the create response and the GET-one endpoint, never on list.
frequencyMinutesinteger (optional)
Minimum interval between fetches for this watch. Default 60, clamped 15..1440.
POST /api/v1/robots/watch
curl -X POST https://urlcap.com/api/v1/robots/watch \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{"site":"example.com","webhookUrl":"https://hooks.example.com/robots","frequencyMinutes":30}'
201 Created
{
  "version": "1",
  "requestId": "…",
  "data": {
    "watch": {
      "id": "1f1b…-…-…",
      "site": "example.com",
      "webhookUrl": "https://hooks.example.com/robots",
      "webhookSecret": "f3a2…64-hex-chars…",   // shown once on create; store it
      "frequencyMinutes": 30,
      "active": true,
      "createdAt": "2026-05-16T13:00:00Z"
    }
  }
}

On every detected change, urlcap POSTs JSON to webhookUrl with an HMAC-SHA256 signature in X-urlcap-Signature: sha256=hex computed over the request body with your watch's webhookSecret. Verify it before trusting the payload.

Webhook delivery (POST to your webhookUrl)
POST /robots HTTP/1.1
Content-Type: application/json
User-Agent: urlcap-robots-webhook/1.0
X-urlcap-Event: robots.changed
X-urlcap-Timestamp: 1747400400
X-urlcap-Signature: sha256=e3b0c44298…

{
  "type": "robots.changed",
  "watchId": "1f1b…",
  "site": "example.com",
  "snapshotId": 42,
  "fetchedAt": "2026-05-16T13:30:00Z",
  "httpStatus": 200,
  "contentSha256": "…",
  "previousSha256": "…",
  "sizeBytes": 412,
  "body": "User-agent: *\nDisallow: /\n"
}

The other operations are:

  • GET   /api/v1/robots/watch — list your watches
  • GET   /api/v1/robots/watch/{id} — fetch one (includes webhookSecret)
  • DELETE /api/v1/robots/watch/{id} — remove a watch (cascades to its snapshots)
  • GET   /api/v1/robots/watch/{id}/history?limit=&changedOnly=&includeBody= — list snapshots
  • POST  /api/v1/robots/watch/{id}/poll — force a poll right now, bypassing the per-watch frequency throttle

post  Verify a Web Bot Auth signature

/api/v1/web_bot_auth/verify

Decides whether an inbound HTTP request's RFC 9421 HTTP Message Signature is valid against the operator's published Ed25519 key directory — the cryptographic identity check the Web Bot Auth draft proposes as a successor to relying on IP ranges and reverse DNS for "is this really Googlebot?".

You hand urlcap the inbound request's method, url and the headers the bot sent. The verifier parses Signature-Input + Signature, fetches the JWKS-style directory at Signature-Agent (TTL-cached 1 h on success, 1 min on errors), rebuilds the canonical signature base from the covered components, and verifies with Ed25519 using the key whose kid matches the keyid parameter.

Failures (expired signature, missing keyid, directory unreachable, signature mismatch, unsupported algorithm, missing tag="web-bot-auth") come back as verified:false with a reason — never as HTTP errors, so you can feed the answer straight into a policy without a try/catch.

methodstring
The bot's request method (e.g. GET).
urlstring
The full URL the bot requested. Used to derive @method, @authority, @target-uri, @path, @query, @scheme in the signature base.
headersobject
Header name → value, case-insensitive. Must include at least Signature, Signature-Input and Signature-Agent; any other header the signature covers (named in Signature-Input's inner-list) must also be present.
labelstring
Optional. When a request carries multiple signatures (e.g. sig1=…, sig2=…), pick a specific label. Default: first.
allowExpiredboolean
Optional. Default false. Set true to skip the expires check (useful for forensic analysis of a stored request).
requireWebBotAuthTagboolean
Optional. Default true. The draft MUSTs that Web Bot Auth signatures carry tag="web-bot-auth"; set this to false only when verifying a non-WBA RFC 9421 signature.
POST /api/v1/web_bot_auth/verify
curl -X POST https://urlcap.com/api/v1/web_bot_auth/verify \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "method": "GET",
    "url":    "https://example.com/foo",
    "headers": {
      "Signature-Agent": "\"https://example.com/.well-known/http-message-signatures-directory\"",
      "Signature-Input": "sig1=(\"@authority\" \"signature-agent\" \"@method\" \"@target-uri\");created=1747500000;expires=1747500060;keyid=\"abc\";alg=\"ed25519\";tag=\"web-bot-auth\"",
      "Signature":       "sig1=:base64-signature-here:"
    }
  }'
200 OK
{
  "version": "1",
  "requestId": "…",
  "data": {
    "verified": true,
    "label": "sig1",
    "keyid": "abc",
    "algorithm": "ed25519",
    "signatureAgent": "https://example.com/.well-known/http-message-signatures-directory",
    "tag": "web-bot-auth",
    "createdAt": "2026-05-16T13:00:00Z",
    "expiresAt": "2026-05-16T13:01:00Z",
    "expired": false,
    "coveredComponents": ["@authority", "signature-agent", "@method", "@target-uri"],
    "directory": {
      "url": "https://example.com/.well-known/http-message-signatures-directory",
      "httpStatus": 200,
      "keyCount": 3,
      "rawKeyCount": 3,
      "cached": false,
      "ageSeconds": 0,
      "kids": ["abc", "def", "ghi"]
    }
  }
}

Algorithm support is Ed25519 only (the draft's MUST). The signature is required to cover @authority and the signature-agent header — both are checked before any directory fetch happens, so an attacker swapping in a friendly directory can't short-circuit the binding to the original request.

Pairs naturally with /is_bot and /reverse_dns?forwardConfirm=true: is_bot says "this IP is in Google's CIDR list," reverse_dns says "the rDNS points back to it," and web_bot_auth says "the bot proved it with a signature only Google could have made." All three together is the gold-standard identity check.

Identifying as urlcap: signed capture & extract

The other direction: when you make a capture or extract request, set webBotAuth: true and urlcap signs every outbound HTTP request it makes on your behalf with our own Ed25519 key. Sites that allow known crawlers but block unknown bots can then choose to allow urlcap-attributed traffic — and verify that what claims to be urlcap really is.

Our public keys are served at https://urlcap.com/.well-known/http-message-signatures-directory as a JWKS-style JSON document. Each signed request carries a Signature-Agent header pointing at that URL plus standard Signature-Input / Signature headers per RFC 9421.

Capture with a signed outbound
curl -X POST https://urlcap.com/api/v1/capture \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://example.com/", "webBotAuth": true }'

The response's data.request.headers shows the three signature headers that went on the wire, so you can verify the integration end-to-end by feeding them back into /api/v1/web_bot_auth/verify. Signature lifetime is 60 seconds.

get post  List & create sites

/api/v1/sites

A site is the multi-tenancy unit for ingest. Every event you ship via /events or /outcomes is scoped to one site, and every {site, ingest token, hostname} triple has to line up or the row is rejected. Customers usually create one site for their whole edge and add every public hostname to it; large multi-product orgs sometimes split per product.

Auth: your urlcap API key (the same one used for capture / TOTP / is_bot). Distinct from ingest tokens, which are per-site and used only for the ingest channel.

List your sites
curl -s https://urlcap.com/api/v1/sites \
  -H "Authorization: Bearer $URLCAP_KEY"
Create a site
curl -s https://urlcap.com/api/v1/sites \
  -X POST \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "datocapital"}'
201 Created
{
  "version": "1",
  "data": {
    "id": 47,
    "publicId": "Xy9KqZ7mNvB2",
    "name": "my-site"
  }
}

Save the publicId — that's what you pass as {site_id} in every downstream URL. The numeric id still works for backward compatibility, but new integrations should use publicId because it's randomly generated and doesn't disclose how many sites exist on urlcap.

get post  Hostnames per site

/api/v1/sites/{site_id}/domains

Each event ingested under a site must have a host field that's already attached to that site. Submit them once up front; /events will reject any rows whose host isn't registered with "host '…' not registered for site_id=…".

Hostnames are UNIQUE across the whole urlcap database — a hostname belongs to exactly one site. Attempting to add one that's already attached elsewhere returns 409.

hostnamestring · required
FQDN. Lowercased and stored as-is. Wildcards are not supported — attach each hostname individually.
kindstring · default doc
What this hostname serves. One of doc (HTML pages — the normal case), cdn (assets / images / CSS / JS on a separate domain), api (JSON endpoints), or other. Used by the discovery scan to de-prioritise candidate JA4s that only ever touch CDN-kind hostnames — those are usually image crawlers (Pinterest, archive.org) you don't want to block. Invalid values silently fall back to doc.
List hostnames
curl -s https://urlcap.com/api/v1/sites/Xy9KqZ7mNvB2/domains \
  -H "Authorization: Bearer $URLCAP_KEY"
Add a doc hostname
curl -s https://urlcap.com/api/v1/sites/Xy9KqZ7mNvB2/domains \
  -X POST \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{"hostname": "api.datocapital.com"}'
Add a CDN hostname
curl -s https://urlcap.com/api/v1/sites/Xy9KqZ7mNvB2/domains \
  -X POST \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{"hostname": "cdn.datocapital.com", "kind": "cdn"}'

get post  Ingest tokens per site

/api/v1/sites/{site_id}/ingest_keys

Per-site bearer tokens (ingest_<32hex>) that authenticate /events and /outcomes calls. Distinct from the urlcap API key you use everywhere else — ingest tokens are scoped only to the ingest channel, and you can revoke / rotate them independently.

Storage: only the SHA-256 of the token sits in the database. The cleartext is returned exactly once, on creation. Capture it then; if you lose it, mint a new one and revoke the old.

List existing ingest tokens (prefixes + status only)
curl -s https://urlcap.com/api/v1/sites/3/ingest_keys \
  -H "Authorization: Bearer $URLCAP_KEY"
Mint a new token
curl -s https://urlcap.com/api/v1/sites/3/ingest_keys \
  -X POST \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{"label": "datocapital-prod-2026-05"}'
201 Created — token shown once, never returned again
{
  "version": "1",
  "data": {
    "id": 3,
    "siteId": 3,
    "label": "datocapital-prod-2026-05",
    "token": "ingest_c7e57f1d9d43866bba19bf95d65c9457",
    "warning": "Save this token now; it is not stored in plaintext and cannot be retrieved later."
  }
}

To rotate: mint a new one with the same label suffix, update your edge config, then revoke the old one (direct DB update for now — admin UI coming). Multiple active tokens per site are fine; we recommend one per environment (prod, staging, ci).

Ingest channel — events & outcomes

The ingest channel is how a site streams its own traffic into urlcap and gets back per-fingerprint intelligence — JA4 / IP profiles, bot likelihood, and (with the outcomes endpoint below) high-confidence "real human" signals like JS challenge pass rate, registered-user observation, and completed purchases. Two complementary endpoints:

Auth: site ingest tokens

Both ingest endpoints authenticate with a per-site bearer token (separate from your urlcap API key). Mint one with your urlcap API key:

POST /api/v1/sites/{site_id}/ingest-keys
curl -X POST https://urlcap.com/api/v1/sites/42/ingest-keys \
  -H "Authorization: Bearer $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{"label": "my-prod-site-2026-05"}'

The response includes token: "ingest_…" — shown once at creation, stored only as a SHA-256 hash thereafter. Store it in your edge config.

The request_id binding

Every event you send carries a request_id (the patched nginx emits $request_id). Outcomes you post later refer back to that id, and urlcap resolves it to the JA4/IP fingerprint server-side at ingest time — so the binding persists even after the raw event ages out of the 7-day window.

post  Send request events

/api/v1/ingest/{site_id}/events

NDJSON batch of request observations. One row per request. Used as the raw input for every downstream signal — JA4 reqs / unique IPs / asset ratio / classifier / intelligence. Caps: up to 1,000 events / 4 MB per batch. Per-row errors never fail the batch.

The patched nginx ships these natively and the NginxBotLogTail background component picks them up from /var/log/nginx/bot-access.ndjson when sites.local_log_path is set. The HTTP endpoint is the alternative for clients that prefer to push.

tsISO-8601 string or epoch ms
Event time. Defaults to receive time if absent.
request_idstring
Opaque per-request id (nginx $request_id). The binding key for later outcomes.
ja4string
JA4 fingerprint of the TLS hello (e.g. t13d1516h2_8daaf6152771_b0da82dd1658). Empty = no TLS handshake; the row is skipped.
ja4_hashUInt64
Numeric form of the JA4 fingerprint. The patched nginx ships this. If omitted, urlcap derives it from the JA4 string.
ip, host, pathstring
Mandatory. host must be in the site's registered hostnames or the row is rejected.
method, status, http_version, user_agentstring / int
Optional; absent fields default to empty/0.
ja3_hash, asn, country, accept_language, referer_host, has_cookie, has_referervarious
Optional enrichment hints. asn / country are looked up from MaxMind if absent.
POST /api/v1/ingest/42/events
curl -X POST https://urlcap.com/api/v1/ingest/42/events \
  -H "Authorization: Bearer $INGEST_TOKEN" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary '{"ts":"2026-05-21T08:11:30Z","request_id":"abc123…","ja4":"t13d1516h2_8daaf6152771_b0da82dd1658","ja4_hash":12345,"ip":"1.2.3.4","host":"shop.example.com","path":"/products","method":"GET","status":200,"user_agent":"Mozilla/5.0…"}
{"ts":"2026-05-21T08:11:31Z","request_id":"def456…","ja4":"t13d1516h2_8daaf6152771_b0da82dd1658","ja4_hash":12345,"ip":"1.2.3.5","host":"shop.example.com","path":"/products/sku-7","method":"GET","status":200}'
202 Accepted
{
  "version": "1",
  "requestId": "…",
  "data": { "siteId": 42, "accepted": 2, "rejected": 0, "errors": [] }
}

post  Send outcomes (challenges, auth, purchases)

/api/v1/ingest/{site_id}/outcomes

NDJSON batch of asynchronous verdicts — the strongest "is this a real human?" signals in the system. Each outcome refers back to the request_id of an event you already sent; urlcap resolves it to a JA4 at ingest time so the binding survives even after the raw event ages out.

Three canonical kinds today, each driving one cluster of fields in ja4_intelligence_latest:

kindverdict valuesmeta keys (canonical)drives
js_challenge passed | failed | abandoned challenge, reason js_challenge_pass_rate
auth authenticated | signup | anonymous user_id_hash  (SHA-256 of your user id) auth_observations, distinct_users
purchase completed | refunded | disputed order_id, amount_cents, currency purchases, total_purchase_cents, last_purchase_at
edge_action blocked | allowed | challenged | rate_limited rule, rule_id  (your blocklist label) cross_customer_action.sites_blocking on bot-ja4s
request_idstring · required
The same value sent on the original event. Binds this verdict to a specific JA4/IP fingerprint.
kindstring · required
js_challenge, auth, purchase, edge_action, or any other label you want to track.
verdictstring
Outcome status; see canonical values above.
tsISO-8601 string or epoch ms
Optional. Defaults to receive time.
scorefloat 0..1
Optional. For continuous-score signals like reCAPTCHA v3 or fraud scores.
metaobject
Kind-specific extras. Stored as raw JSON; downstream queries pull individual keys with JSONExtractString(meta,'...'). See the kinds table above for canonical keys.
ja4_hashUInt64
Optional. If you already know it, ship it inline and we skip the server-side lookup against request_events.

JS challenge example

Send when your edge issues a JS challenge (Turnstile, hCaptcha, your own proof-of-work) and gets a verdict back.

POST /api/v1/ingest/42/outcomes
curl -X POST https://urlcap.com/api/v1/ingest/42/outcomes \
  -H "Authorization: Bearer $INGEST_TOKEN" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary '{"request_id":"abc123","kind":"js_challenge","verdict":"passed","score":0.92,"meta":{"challenge":"turnstile_v0"}}
{"request_id":"def456","kind":"js_challenge","verdict":"failed","meta":{"reason":"no-js-execution"}}'

Auth example — is this a registered user?

Send when a request you previously logged was authenticated — login session active, signup completed, password reset confirmed. The user_id_hash should be a hash of your internal user id, not the raw value — we only need to count distinct users, never identify them.

POST /api/v1/ingest/42/outcomes
curl -X POST https://urlcap.com/api/v1/ingest/42/outcomes \
  -H "Authorization: Bearer $INGEST_TOKEN" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary '{"request_id":"abc123","kind":"auth","verdict":"authenticated","meta":{"user_id_hash":"u_sha256:7a3f…"}}
{"request_id":"def456","kind":"auth","verdict":"signup","meta":{"user_id_hash":"u_sha256:e9c1…"}}'

Purchase example — has this fingerprint converted?

The strongest "real human, valuable visitor" signal. Send from your order-confirmation webhook. Use the request_id of the request that closed the order (the checkout-complete POST, not the first product view).

POST /api/v1/ingest/42/outcomes
curl -X POST https://urlcap.com/api/v1/ingest/42/outcomes \
  -H "Authorization: Bearer $INGEST_TOKEN" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary '{"request_id":"abc123","kind":"purchase","verdict":"completed","meta":{"order_id":"o_9876","amount_cents":4999,"currency":"USD"}}
{"request_id":"def456","kind":"purchase","verdict":"refunded","meta":{"order_id":"o_9876"}}'
202 Accepted
{
  "version": "1",
  "requestId": "…",
  "data": { "siteId": 42, "accepted": 2, "rejected": 0, "errors": [] }
}

A single request_id can carry multiple outcomes — one page-view that triggered a challenge, then authenticated, then converted is three rows with the same id. They aggregate independently into the matching counters.

get  Read JA4 intelligence

/api/v1/ja4/intelligence

Returns the rolled-up profile for one JA4 fingerprint over a window (7 / 30 / 90 days). Authenticated with your urlcap API key — not the site ingest token.

site_idUInt64 · required
Your site.
ja4_hashUInt64 · required
The fingerprint to look up.
window_daysint · required
7, 30, or 90 (must match a configured window in intelligence.compute.windows_days).
GET /api/v1/ja4/intelligence
curl -G https://urlcap.com/api/v1/ja4/intelligence \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "site_id=42" \
  --data-urlencode "ja4_hash=17888951274072987679" \
  --data-urlencode "window_days=7"
200 OK
{
  "version": "1",
  "requestId": "…",
  "data": {
    "siteId": 42, "ja4Hash": "17888951274072987679", "windowDays": 7,
    "ja4": "t13d3613h2_018971650b2c_23cd79a6e20d",
    "reqs": 10, "unique_ips": 2, "unique_uas": 1, "unique_paths": 6,

    "likely_os_name": "OS X",         "likely_os_confidence": 1.0,
    "likely_agent_name": "Chrome",    "likely_agent_confidence": 1.0,
    "likely_device_class": "Desktop", "likely_device_confidence": 1.0,
    "ja4_ua_consistency": 1.0,

    "ua_diversity_score": 0.1,
    "ip_diversity_score": 0.2,
    "suspicious_score":   0.0,

    "js_challenge_attempts": 1, "js_challenge_passes": 1, "js_challenge_pass_rate": 1.0,
    "auth_observations": 1,     "distinct_users": 1,
    "purchases": 1, "total_purchase_cents": 4999, "last_purchase_at": "2026-05-21T07:12:02Z"
  }
}

Field meanings: ja4_ua_consistency 1.0 = this JA4 always claims the same (agent, os) tuple; lower = the JA4 is observed claiming mismatched UAs (= spoofed UA on the same TLS library). js_challenge_pass_rate NULL when never challenged, 0.0 = challenged but never passes (strongest pure-bot signal). distinct_users ≥ 1 = at least one registered user observed on this fingerprint. purchases > 0 = the highest-confidence "real human, valuable visitor" signal.

get  Trailing-hour JA4 signals

/api/v1/ja4/signals

Cloudflare-style "what does this JA4 look like right now?" snapshot. Returns the latest 1-hour rollup with 10 ratios + 4 ranks + 2 quantiles. Recomputed every minute by an internal job; 404 if the fingerprint hasn't been seen on this site within the trailing hour. Authenticated with your urlcap API key.

site_idUInt64 · required
Your site.
ja4_hashUInt64 · required
JA4 hash as an unsigned decimal string.
GET /api/v1/ja4/signals
curl -G https://urlcap.com/api/v1/ja4/signals \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "site_id=3" \
  --data-urlencode "ja4_hash=13366807129412944815"
200 OK
{
  "version": "1",
  "requestId": "…",
  "data": {
    "siteId": 3, "ja4Hash": "13366807129412944815",
    "calculatedAt": "2026-05-25T09:14:00Z",
    "ja4": "t13d1516h2_8daaf6152771_b1ff8ab2d16f",
    "reqs_1h": 14528,
    "h2h3_ratio_1h": 0.97, "browser_ratio_1h": 0.99,
    "cache_ratio_1h": 0.42, "heuristic_ratio_1h": 0.0,
    "unique_ips_1h": 3211, "unique_uas_1h": 14, "unique_paths_1h": 882,
    "reqs_rank_1h": 4, "reqs_quantile_1h": 0.97,
    "ips_rank_1h":  5, "ips_quantile_1h":  0.96,
    "uas_rank_1h": 18, "paths_rank_1h": 6
  }
}

*_ratio_1h are 0..1 shares of the trailing-hour request volume. *_rank_1h is per-site rank (1 = highest) and *_quantile_1h the corresponding quantile — a rank-1 JA4 will sit near 1.0. Use this for hot-path decisions; for stable long-window classification use /ja4/intelligence.

get  JA4 25-metric snapshot

/api/v1/ja4/metrics

The "prioritised 25" metric snapshot for one JA4 — computed on-demand from the 5-minute / 1-hour / 1-day aggregates (no precompute job). Three blocks in one response: a trailing-hour rollup for the JA4, an optional IP+JA4 sub-block when ip= is supplied, and a top-10 JA4×UA breakdown over 24h with each UA's share of the JA4's volume.

site_idUInt64 · required
Your site.
ja4_hashUInt64 · required
JA4 hash.
ipstring · optional
Narrow the snapshot to one IP. Adds the ip_ja4_1h block and an is_new_24h flag.
GET /api/v1/ja4/metrics
curl -G https://urlcap.com/api/v1/ja4/metrics \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "site_id=3" \
  --data-urlencode "ja4_hash=13366807129412944815" \
  --data-urlencode "ip=203.0.113.5"
200 OK
{
  "version": "1",
  "requestId": "…",
  "data": {
    "siteId": 3, "ja4Hash": "13366807129412944815",
    "ja4_1h": {
      "req_count": 14528, "unique_ips": 3211, "unique_uas": 14, "unique_paths": 882, "unique_hosts": 1,
      "h2h3_ratio": 0.97, "browser_ua_ratio": 0.99,
      "error_ratio": 0.018, "s403_ratio": 0.0, "s404_ratio": 0.012, "s429_ratio": 0.0
    },
    "ip_ja4_1h": {
      "ip": "203.0.113.5", "req_count": 41, "unique_uas": 1, "unique_paths": 18, "unique_hosts": 1,
      "browser_ua_ratio": 1.0, "library_ua_ratio": 0.0, "h2h3_ratio": 1.0,
      "error_ratio": 0.0, "s404_ratio": 0.0,
      "first_seen": "2026-05-25T08:42:00Z", "last_seen": "2026-05-25T09:14:21Z",
      "is_new_24h": true
    },
    "ja4_ua_24h_top": [
      { "ua_hash_128": "8d9c…", "req_count": 218341, "unique_ips": 5621, "unique_asns": 412,
        "error_ratio": 0.02, "share_of_ja4": 0.71 },
      { "ua_hash_128": "ab12…", "req_count":  41203, "unique_ips":  331, "unique_asns":  18,
        "error_ratio": 0.01, "share_of_ja4": 0.13 }
    ]
  }
}

library_ua_ratio on the IP block counts UAs Yauaa classifies as Special or Robot — high values are a non-browser client tell. share_of_ja4 in the UA breakdown sums to 1.0 across the top-N; a single UA > 0.95 means "one client owns this JA4."

get  JA4 profile breakdown

/api/v1/ja4/profile

Top-N values of one profile dimension for a JA4 over the last N days. Useful for "show me every agent_name ever observed on this JA4" or "which countries does this fingerprint actually come from." Each row returns the request count plus HLL-merged distinct ips/uas/paths.

site_idUInt64 · required
Your site.
ja4_hashUInt64 · required
JA4 hash.
dimenum · required
One of os_name, os_class, agent_name, agent_class, device_class, device_brand, country, asn, http_version.
daysint 1..730 · default 90
Sliding window.
limitint 1..500 · default 20
Top-N cap.
GET /api/v1/ja4/profile
curl -G https://urlcap.com/api/v1/ja4/profile \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "site_id=3" \
  --data-urlencode "ja4_hash=13366807129412944815" \
  --data-urlencode "dim=country" \
  --data-urlencode "days=30" \
  --data-urlencode "limit=10"
200 OK
{
  "version": "1",
  "requestId": "…",
  "data": {
    "siteId": 3, "ja4Hash": "13366807129412944815",
    "dim": "country", "days": 30,
    "values": [
      { "value": "US", "reqs": 482931, "unique_ips":  9214, "unique_uas":  211, "unique_paths": 5410 },
      { "value": "DE", "reqs": 121034, "unique_ips":  1632, "unique_uas":   88, "unique_paths": 2231 },
      { "value": "JP", "reqs":  88412, "unique_ips":   941, "unique_uas":   42, "unique_paths": 1844 }
    ]
  }
}

A JA4 returning many distinct agent_name values with even shares is a strong UA-spoofing tell — pair with ja4_ua_consistency. Same trick for country or asn to spot scrapers behind residential-proxy networks.

get  Per-IP rollup

/api/v1/ip/profile

Per-IP behavioural summary on a single site over the last N days. Returns request count plus distinct JA4s / UAs / paths / hosts — an IP serving many of each is a proxy / NAT tell. For cross-site investigation including geo, PTR, bot-CIDR membership and every bot attribution, use the richer /api/v1/ip/intelligence.

site_idUInt64 · required
Your site.
ipstring · required
IPv4 or IPv6 address.
daysint 1..730 · default 30
Sliding window.
GET /api/v1/ip/profile
curl -G https://urlcap.com/api/v1/ip/profile \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "site_id=3" \
  --data-urlencode "ip=203.0.113.5" \
  --data-urlencode "days=30"
200 OK
{
  "version": "1",
  "requestId": "…",
  "data": {
    "siteId": 3, "ip": "203.0.113.5", "days": 30,
    "reqs": 18421,
    "unique_ja4s": 9, "unique_uas": 31, "unique_paths": 482, "unique_hosts": 3,
    "heuristic_reqs": 411, "challenge_reqs": 0,
    "first_seen": "2026-04-28T14:21:08Z", "last_seen": "2026-05-25T09:14:21Z"
  }
}

Heuristics: unique_ja4s >= 3 with unique_uas >= 10 is proxy-shaped; unique_ja4s = 1 with high reqs and low unique_paths is a single headless client. 404 if the IP hasn't sent any traffic to this site in the window.

get  List blockable JA4s per site

/api/v1/sites/{site_id}/bot-ja4s

Returns every JA4 the discovery system has flagged on this site in the last window_days days, labelled by classification. The customer pulls this and feeds the JA4 strings into their edge blocklist (nginx $ja4 map, Cloudflare WAF rule, etc.). Browser-shaped fingerprints are excluded because they're not blockable.

window_daysint 1..90 · default 7
Sliding window for the per-site JA4 enumeration. Bounded by request_events' 7-day TTL on the upper end.
includecsv · default known,candidate
Filter to one classification. known = attributed to a bot_group (Bingbot, GPTBot, …); candidate = pending admin review.
limitint 1..1000 · default 200
Cap on returned rows.
GET /api/v1/sites/3/bot-ja4s
curl -s "https://urlcap.com/api/v1/sites/3/bot-ja4s?window_days=7&limit=200" \
  -H "Authorization: Bearer $URLCAP_KEY"
200 OK
{
  "version": "1",
  "data": {
    "siteId": 3,
    "windowDays": 7,
    "items": [
      {
        "ja4": "t13d181300_e8a523a41297_69f017ebb96f",
        "ja4_hash": "13366807129412944815",
        "classification": "known_bot",
        "bot_group": "Googlebot",
        "bot_group_id": 4,
        "reqs": 2885, "ips": 162, "active_days": 1,
        "asset_ratio": 0.0,
        "first_seen": "2026-05-21T19:35:00Z",
        "last_seen":  "2026-05-21T22:01:38Z"
      },
      {
        "ja4": "t13d311100_e8f1e7e78f70_b6426fc6f187",
        "ja4_hash": "2034759142565420012",
        "classification": "candidate",
        "score": 0.76,
        "candidate_id": 1222,
        "reqs": 11317, "ips": 4953, "active_days": 1,
        "asset_ratio": 0.0,
        "first_seen": "2026-05-21T20:33:22Z",
        "last_seen":  "2026-05-21T22:01:38Z",

        "cross_customer_action": {
          "sites_blocking":    12,
          "sites_allowing":     0,
          "sites_challenging":  3
        },
        "block_likely_on_this_site": true,
        "block_likely_ratio": 0.997,

        "reqs_per_minute_peak":  21334,
        "reqs_per_minute_mean":   293,
        "active_minutes":        2289,
        "burstiness":            4.28
      }
    ]
  }
}

Rate & burstiness fields

Surface per-minute traffic-shape signal from urlcap's internal ja4_agg_1m rollup. Useful for catching high-volume bursty JA4s that hide under a moderate asset_ratio but spike to thousands of req/min during their active windows — the no-man's-land case where score >= 0.70 and asset_ratio falls between 0.05 and 0.50, so neither Tier 1 nor Tier 2 fires.

  • reqs_per_minute_peak — max requests in any 1-minute bucket for this JA4 over the window.
  • reqs_per_minute_mean — average across active minutes only (silent buckets between bursts are excluded so a bursty bot's mean reads its real when-active rate, not a diluted overall average).
  • active_minutes — count of 1-min buckets with reqs > 0.
  • burstiness — coefficient of variation (stddev / mean of per-minute counts). 0 = perfectly uniform; 1 = Poisson-like; > 2 = attack-shape; > 5 = textbook scheduled-burst pattern.

Practical use: add a Tier 1 boost condition like burstiness > 2.0 OR reqs_per_minute_peak > 1000 to catch attack-shape JA4s your other rules would miss.

The two policy axes

Each row carries enough information to drive a per-operator decision:

  • bot_group is the operator name — your policy lever. Most customers keep Googlebot + Bingbot for search referral traffic, block GPTBot / ClaudeBot / CCBot for AI-training without traffic return.
  • score is the confidence on unattributed candidates — start with score >= 0.7 and tighten as you watch analytics for collateral damage.
  • cross_customer_action.sites_blocking >= 3 is a strong "other sites block this too" vote, regardless of score.
  • block_likely_on_this_site is what we can already tell from your own status codes — useful as a sanity check, not a recommendation.

Closing the loop: report your edge actions

When your edge takes a decision on a JA4, post an edge_action outcome. That populates the cross_customer_action field on every other site's bot-ja4s response — your blocks become a signal for everyone else, and theirs become a signal for you.

POST /api/v1/ingest/Xy9KqZ7mNvB2/outcomes
curl -X POST https://urlcap.com/api/v1/ingest/Xy9KqZ7mNvB2/outcomes \
  -H "Authorization: Bearer $INGEST_TOKEN" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary '{"request_id":"abc123","kind":"edge_action","verdict":"blocked","meta":{"rule":"urlcap-blocklist","rule_id":"v1"}}'

Send one outcome per edge decision. Auth is the site's ingest token (the same one used for /events), not your urlcap API key. request_id lets us auto-resolve the JA4 server-side from the original event; you don't have to ship the fingerprint inline.

post  URL monitors — up/down checks with alerts

/api/v1/monitors

Schedule a recurring /capture or /extract run and urlcap will alert you when the target changes state (up → down or down → up). Same primitives as UptimeRobot — plus full headless-browser checks, JSON-API validation, and the User-Agent personas from /user_agent_profiles.

Plan limits

  • Free: 1 monitor, minimum 300 s.
  • Developer: 25 monitors, minimum 60 s.
  • Startup: 100 monitors, minimum 30 s.
  • Business: unlimited, minimum 30 s.

Create a monitor

The spec object is shipped verbatim to the chosen engine, so anything you can do via /capture or /extract works here too — custom headers, POST bodies, JSON-content extractors, navigation actions, Web Bot Auth signing.

POST /api/v1/monitors
curl -X POST https://urlcap.com/api/v1/monitors \
  -H "X-API-Key: $URLCAP_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "prod API healthcheck",
    "kind": "capture",
    "spec": { "url": "https://api.example.com/healthz" },
    "intervalSeconds": 60,
    "expectedStatus": 200,
    "userAgentProfile": "chrome-latest-mac",
    "alertWebhookUrl": "https://example.com/webhooks/urlcap-monitor",
    "alertEmail": "oncall@example.com",
    "alertFailureThreshold": 2
  }'
201 Created
{
  "version": "1",
  "requestId": "…",
  "data": {
    "publicId": "Xy9KqZ7mNvB2",
    "name": "prod API healthcheck",
    "kind": "capture",
    "spec": { "url": "https://api.example.com/healthz" },
    "intervalSeconds": 60,
    "expectedStatus": 200,
    "userAgentProfile": "chrome-latest-mac",
    "alertWebhookUrl": "https://example.com/webhooks/urlcap-monitor",
    "alertWebhookSecret": "Tw3qHk…48charsBase62…M7",
    "alertWebhookSecretWarning": "Save this secret now — it signs outbound webhook payloads (X-urlcap-Signature: sha256=…) and is not retrievable later.",
    "alertEmail": "oncall@example.com",
    "alertFailureThreshold": 2,
    "paused": false,
    "currentState": "unknown",
    "createdAt": "2026-05-26T12:00:00Z"
  }
}

alertWebhookSecret is returned only on this create call. Subsequent reads omit it — you'll see alertWebhookSecretSet: true instead. The secret signs every outbound webhook with HMAC-SHA256 in X-urlcap-Signature; your verifier compares its own HMAC of the raw body to the header.

Pass rule (v1)

Status-code only. If expectedStatus is set, the check passes only when the response status matches exactly. If it's absent, any 2xx is a pass. Richer assertions (body-contains, JSONPath predicates) are on the roadmap.

Alerts

Alerts fire only on state transitions, not on every failing check. The state machine debounces flapping targets via alertFailureThreshold consecutive failures required before flipping to down. Both alertWebhookUrl and alertEmail are optional; set neither and the monitor still records check history but won't notify.

Webhook payload
{
  "event": "monitor.state_changed",
  "monitorPublicId": "Xy9KqZ7mNvB2",
  "monitorName": "prod API healthcheck",
  "newState": "down",
  "changedAt": "2026-05-26T12:34:56Z",
  "latestCheck": { "httpStatus": 503, "latencyMs": 421, "passed": false, "error": null }
}
// Headers: X-urlcap-Event, X-urlcap-Monitor, X-urlcap-Timestamp, X-urlcap-Signature: sha256=<HMAC>

Inspect a monitor

GET /api/v1/monitors/{publicId}
curl https://urlcap.com/api/v1/monitors/Xy9KqZ7mNvB2 -H "X-API-Key: $URLCAP_KEY"
curl https://urlcap.com/api/v1/monitors/Xy9KqZ7mNvB2/checks?limit=20 -H "X-API-Key: $URLCAP_KEY"
curl https://urlcap.com/api/v1/monitors/Xy9KqZ7mNvB2/uptime?days=30 -H "X-API-Key: $URLCAP_KEY"

Phase-level timings

Capture monitors record a phase breakdown on every check: dnsMs, connectMs (TCP + TLS), ttfbMs (time-to-first-byte), bodyMs (body download), plus resolvedIp (which A/AAAA the socket actually used). Phase fields are absent when their hook didn't fire — pooled keep-alive reuse skips DNS / connect, and followRedirects=true only captures the first leg. Same fields show up in data.response.timings on bare /capture too. Extract monitors don't have these (the HtmlUnit engine doesn't surface phase timing).

Lifecycle

  • PATCH /api/v1/monitors/{publicId} — whole-spec replace (partial updates not supported in v1).
  • POST /api/v1/monitors/{publicId}/pause — scheduler skips it; state is preserved.
  • POST /api/v1/monitors/{publicId}/resume — reverse the above.
  • DELETE /api/v1/monitors/{publicId} — hard-delete. Check history is removed by the daily sweeper.

Check history (monitor_checks rows) is kept for 30 days. A daily internal job at 03:05 UTC sweeps anything older.

get  Candidate IPs per site

/api/v1/sites/{id}/bot-ip-candidates

Live feed of IPs exhibiting abusive behaviour on your site under five composite signals. The IP equivalent of the JA4 candidate queue — small list by construction (score floor), recomputed on every call. Use it to populate an iptables / ipset / Cloudflare IP-list at the edge for IP-based blocking.

Scoring (weights sum to 1.0)

  • 40% block_ratio — share of this IP's requests on the site that returned 4xx or 444.
  • 20% volumelog10(1+reqs)/4, saturates at ~10,000 requests.
  • 15% path breadthdistinct_paths/100, saturates at 100 paths.
  • 15% JA4 churndistinct_ja4s/3, saturates at 3 JA4s (anti-fingerprinting tell).
  • 10% vuln-probe hits — distinct vuln-probe paths hit, saturates at 3.

Hard exclusions (no scoring, row dropped)

  • IP is on any user's trust list — signal modulation is global.
  • IP is already attributed in bot_observed_ips — no point re-discovering known bots. Use /bot-traffic for those.
  • reqs < 20 — insufficient evidence on this site within the window.

Request

window_daysint 1..30 · default 7
Sliding window for per-IP aggregation.
min_scorefloat 0..1 · default 0.50
Score floor. Default keeps the list small by construction; raise to ~0.70 for only the most obvious abusers, lower to ~0.30 for a wider net.
limitint 1..1000 · default 200
Hard cap. Default + score floor together = "small list."
formatenum · default json
json returns the operator-rich shape below. txt returns one IP per line. cidr returns each IP as /32 (or /128). Both text formats use text/plain — easy to feed straight into ipset.
GET /api/v1/sites/{id}/bot-ip-candidates
curl -G https://urlcap.com/api/v1/sites/2DrxGfsYW0jv/bot-ip-candidates \
  -H "Authorization: Bearer $URLCAP_KEY" \
  --data-urlencode "window_days=7" \
  --data-urlencode "min_score=0.50"
200 OK
{
  "version": "1",
  "requestId": "…",
  "data": {
    "siteId": 3, "windowDays": 7, "minScore": 0.5, "excludedClassified": 4954,
    "candidates": [
      {
        "ip": "203.0.113.5", "asn": 9009, "country": "VN", "score": 0.78,
        "components": {
          "block_ratio": 0.92, "reqs": 1850,
          "distinct_paths": 1452, "distinct_ja4s": 4, "vuln_probe_hits": 12
        },
        "first_seen": "2026-05-22 03:48:52.000",
        "last_seen":  "2026-05-22 04:03:07.000"
      }
    ]
  }
}

For one-shot ipset feeding: curl -s … &format=txt | ipset restore -exist. For Cloudflare IP-list import: … &format=cidr | cf-cli ip-list update ....

get  URL traffic + blocks summary

/api/v1/sites/{id}/url-stats?host=&path=&days=N

Quick "is this URL healthy?" report for a specific page on your site. One call returns totals, status-code breakdown, per-day chart, and up to 25 most recent non-2xx requests with their IP / ASN / JA4. Use to answer "any rejections on URL X today?".

GET /api/v1/sites/{id}/url-stats
curl -G https://urlcap.com/api/v1/sites/2DrxGfsYW0jv/url-stats \
  -H "X-API-Key: $URLCAP_KEY" \
  --data-urlencode "host=en.example.com" \
  --data-urlencode "path=/terms-of-service.html" \
  --data-urlencode "days=7"
200 OK (truncated)
{
  "version": "1",
  "data": {
    "siteId": 3, "host": "en.example.com", "path": "/terms-of-service.html",
    "windowDays": 7,
    "total": 351, "ok_2xx": 341, "redirect_3xx": 2,
    "rejected_4xx": 8, "error_5xx": 0,
    "unique_ips": 122, "unique_ja4": 55,
    "by_status": [
      { "status": 200, "count": 341, "unique_ips": 122 },
      { "status": 405, "count": 4,   "unique_ips": 1 },
      { "status": 444, "count": 4,   "unique_ips": 1 }
    ],
    "by_day": [ /* one row per UTC day */ ],
    "rejected_sample": [
      { "ts": "2026-05-26 04:02:44.000", "status": 444, "ip": "::ffff:45.33.69.206",
        "asn": 63949, "method": "GET", "ja4": "" }
    ]
  }
}

get  Per-bot accessibility check

/api/v1/sites/{id}/bot-traffic?bot_group=&days=N

Did urlcap-discovered bot traffic land successfully on your site? Given a bot_group name (substring-match on bot_groups.description), this returns its recent visits, status breakdown, per-day chart, and a sample of recent requests with non-2xx surfaced first. Use to answer "are we blocking Google?" in one call.

Bot identification matches against every JA4 the discovery system has attributed to the bot_group via bot_observed_ja4s — catches both CIDR- and UA-matched traffic without enumerating IP lists. Optional host and path parameters narrow the lookup to a single page.

GET /api/v1/sites/{id}/bot-traffic
curl -G https://urlcap.com/api/v1/sites/2DrxGfsYW0jv/bot-traffic \
  -H "X-API-Key: $URLCAP_KEY" \
  --data-urlencode "bot_group=Googlebot" \
  --data-urlencode "days=1"
200 OK (truncated)
{
  "version": "1",
  "data": {
    "siteId": 3, "botGroupFilter": "Googlebot", "windowDays": 1,
    "matchedBotGroups": [ { "botGroupId": 4, "description": "Googlebot" } ],
    "ja4HashesUsedForFilter": 12,
    "total": 98831, "ok_2xx": 96603, "redirect_3xx": 1010,
    "rejected_4xx": 969, "error_5xx": 0,
    "by_status": [
      { "status": 200, "count": 96603 },
      { "status": 404, "count": 955 },
      { "status": 403, "count": 14 }
    ],
    "by_day": [ /* one row per UTC day */ ],
    "recent_sample": [
      { "ts": "...", "host": "en.example.com", "path": "/images/consumo.png", "status": 403, "ip": "..." }
    ]
  }
}

Substring matching: bot_group=Google catches both Googlebot and User-triggered fetchers (Google) together; pass the exact full description to narrow to one bot_group.

get  Legacy — /auth

/auth

The original TOTP endpoint, kept for backwards compatibility. It takes the same uri query parameter and the same X-API-Key header, but responds with the bare code as text/html — no JSON envelope, no metadata. Prefer /api/v1/totp for new integrations; this endpoint will not change.

Legacy request
curl -G https://urlcap.com/auth \
  -H "X-API-Key: $URLCAP_KEY" \
  --data-urlencode "uri=otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP"

492039

If the key is invalid or the URI can't be parsed, the legacy endpoint responds with 404 Not Found and an empty body.


Need an API key, or want to talk through a use case? Email info@urlcap.com. Track changes in the changelog.