urlcap API
API reference
urlcap is a power tool for developers: a fast HTTP service that lets you craft and replay requests
with low-level control, and exposes purpose-built endpoints — like a TOTP code generator — over a clean,
versioned REST API. Everything here lives under https://urlcap.com/api/v1.
Introduction
The API is organised around predictable resource URLs, accepts standard HTTP, and returns JSON for
everything under /api/v1. It uses conventional HTTP verbs and status codes, and every
response carries a requestId you can quote when contacting support.
You authenticate with an API key sent in a request header. There are no SDKs to install — any HTTP
client works, and a machine-readable description is published at
/api/v1/openapi.yaml
(OpenAPI 3.1) so you can generate a client if you'd like.
Quickstart
Make your first call in under a minute. You'll need an API key — sign up for a free account, then create one on the API keys page.
- 1Export your key:
export URLCAP_KEY="cb7c07df-…" - 2Call an endpoint with the
X-API-Keyheader. - 3Read the JSON response — every call includes a
requestId.
curl -G https://urlcap.com/api/v1/totp \
-H "X-API-Key: $URLCAP_KEY" \
--data-urlencode "uri=otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP&period=30&digits=6"
{
"version": "1",
"requestId": "9f1c0b7a-3e2d-4a51-9b88-2f6c1e7d4a02",
"data": {
"code": "492039",
"digits": 6,
"period": 30,
"algorithm": "SHA1",
"expiresIn": 14
}
}
Base URL & conventions
All versioned endpoints share a common base:
https://urlcap.com/api/v1
- Requests and responses under
/api/v1use JSON with UTF-8 encoding. - Parameters may be sent as query-string values or, for
POST, asapplication/x-www-form-urlencodedbody fields. Always URL-encode values that contain reserved characters. - Successful responses use
200. Client errors use4xx; server errors use5xx. - Every response includes a top-level
versionfield and, on success, arequestId.
The original, pre-versioned endpoint at https://urlcap.com/auth remains available and returns a plain-text code — see Legacy /auth.
Authentication
urlcap authenticates requests with an API key. Send it in the X-API-Key header on every
request. Keys are UUID-shaped strings; treat them like passwords — never embed one in client-side code
or commit it to a repository.
curl https://urlcap.com/api/v1/totp \
-H "X-API-Key: cb7c07df-588e-4ef8-ae42-458fe1e90fd0" \
--data-urlencode "uri=otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP"
Requests with a missing, malformed, or unknown key are rejected with 401 Unauthorized.
Expired keys are treated the same way. Authentication failures are recorded against the attempted key
for auditing.
{
"version": "1",
"error": {
"type": "unauthorized",
"message": "Missing or invalid X-API-Key header."
}
}
Making requests
Endpoints under /api/v1 accept GET for read-style calls. Where an endpoint
also accepts POST, parameters go in a application/x-www-form-urlencoded body —
handy when a value (such as an otpauth:// URI) is long or contains characters that are
awkward in a URL.
A note on +: in a query string, + decodes to a space. urlcap restores
+ in the uri parameter so secrets and labels survive round-tripping, but the
most robust approach is to always percent-encode (curl --data-urlencode,
encodeURIComponent, urllib.parse.quote, …).
curl -X POST https://urlcap.com/api/v1/totp \
-H "X-API-Key: $URLCAP_KEY" \
--data-urlencode "uri=otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP&algorithm=SHA256&digits=8&period=30"
Errors
urlcap uses standard HTTP status codes. 2xx means success. 4xx means the
request was rejected (a missing parameter, a bad key, a malformed URI). 5xx means
something went wrong on our side — these are rare and safe to retry with backoff.
Error responses under /api/v1 have a consistent shape:
{
"version": "1",
"error": {
"type": "invalid_uri",
"message": "Could not parse the supplied otpauth:// URI: ..."
}
}
| Status | error.type | When it happens |
|---|---|---|
| 400 | invalid_request | A required parameter is missing or empty. |
| 400 | invalid_uri | The uri value is not a parseable otpauth:// URI. |
| 401 | unauthorized | The X-API-Key header is missing, malformed, unknown, or expired. |
| 404 | not_found | No endpoint matches the requested path under /api/v1. |
| 5xx | internal_error | An unexpected error on our side. Retry with exponential backoff. |
Rate limits
urlcap enforces two independent limits:
- Monthly quota — the request budget published for your plan on the pricing page. The Business plan's "Unlimited" tier has no monthly cap, but it is not uncapped throughput — see below.
- Fair-use per-second rate limits — per-API-key and per-IP burst caps scaled to your plan, to protect the platform from abuse and noisy neighbours. They apply to every plan, including unlimited tiers. They also apply to the no-key free trial (5 requests per 24 hours per IP / UA / fingerprint).
When you exceed either limit the API responds with 429 Too Many Requests; back off and retry.
Detailed per-plan QPS numbers and the accompanying response headers are published alongside the public
launch — until then, build assuming generous-but-finite throughput and add retry-with-backoff for
429 and 5xx.
Anonymous free trial
A small allow-list of endpoints can be hit without an API key, with a strict per-day budget — useful for
demos, tinkering, and tutorials. After the budget is spent the endpoint returns
402 anon_limit_reached with a sign-up link.
- Allow-listed endpoints:
/api/v1/capture,/totp,/is_bot,/ip/contains,/ip/lookup. - Budget: 5 requests per rolling 24h,
counted independently on three identity dimensions — IP,
User-Agent, and the optionalX-Client-Fingerprintheader set by the in-browser try-it widget. If any dimension hits the cap, the request is blocked. - Response headers on allowed calls:
X-RateLimit-Limit/X-RateLimit-Remaining. - Block response:
{"error":{"type":"anon_limit_reached","signup_url":"/register","limit":5,"window":"24h"}}.
Heavy or stateful endpoints (/extract, /datasets, /schedules) are
not on the allow-list and continue to require a valid API key.
Versioning
The API is versioned in the URL path. The current version is v1:
https://urlcap.com/api/v1. Backwards-incompatible changes — removing a field, changing a
type, renaming a parameter — ship under a new path segment (/api/v2); v1 keeps
working. Additive changes (new optional parameters, new fields in a response, new endpoints) can appear
within v1, so write clients that tolerate unknown fields.
The legacy endpoint at https://urlcap.com/auth predates the versioning scheme. It is frozen:
it will keep its current plain-text behaviour indefinitely, but new functionality only lands under
/api/v{n}.
Pricing
urlcap is a paid API with usage-based pricing — you pay for the requests you make, with a free tier to build and prototype on. Sign up for a free account and create your first API key on the API keys page; paid tiers (Developer / Startup / Business) are managed from Billing.
The capture object
A successful call to the capture endpoint returns an envelope whose data field describes
the request urlcap sent and the response it got back — parsed the way a browser's network inspector
would show it. The shape is:
{
"request": {
"url": "https://example.com/path?q=1", "method": "GET", "httpVersion": "HTTP/1.1",
"scheme": "https", "host": "example.com", "port": 443, "path": "/path", "query": "q=1",
"queryParameters": [ { "name": "q", "value": "1" } ],
"headers": [ { "name": "User-Agent", "value": "..." }, { "name": "Accept", "value": "*/*" } ],
"followRedirects": false, "body": "", "bodyEncoding": "UTF-8", "technology": "reactor-netty"
},
"response": {
"status": 200, "statusText": "OK", "httpVersion": "HTTP/1.1",
"headers": [ { "name": "Date", "value": "..." }, { "name": "Content-Type", "value": "text/html" } ],
"cookies": [ { "name": "sid", "value": "abc", "path": "/", "domain": ".example.com", "secure": true, "httpOnly": true, "sameSite": "Lax" } ],
"contentType": "text/html", "charset": "utf-8", "contentLength": 1256,
"bodyBytes": 1256, "bodyTruncated": false, "body": "...",
"bodyEncoding": "UTF-8",
"timings": {
"totalMs": 84,
"dnsMs": 12,
"connectMs": 35,
"requestSendMs": 1,
"ttfbMs": 28,
"bodyMs": 7,
"resolvedIp": "203.0.113.42"
}
}
}
- request.headers — the headers urlcap actually sent, in order. If you supply none, it adds a default
User-AgentandAccept; if you supply any, it sends exactly what you give. The runtime may addHost/Content-Lengthon top. - response.headers — every response header in the exact order received (duplicates preserved).
- response.cookies — each
Set-Cookieheader parsed into name/value plus attributes. - response.body — the response body decoded with
bodyEncoding. Bodies over ~1 MB are truncated in the response (bodyTruncated=true); the full body is still recorded server-side. - response.timings — wall-clock
totalMsplus a phase breakdown:dnsMs(DNS resolution + scheduling),connectMs(TCP + TLS),requestSendMs,ttfbMs(time-to-first-byte),bodyMs(body download), andresolvedIp(the A/AAAA we landed on). Phase fields are absent when their hook didn't fire — e.g. a pooled keep-alive reuse skips DNS / connect, andfollowRedirects=truecaptures only the first leg's timings.
get post Capture a request
/api/v1/capture
Sends an HTTP request to a target URL and returns its response as a capture object.
Use GET with a url query parameter for a quick fetch, or POST a JSON
body for full control — including the exact order of request headers.
GET — quick fetch
curl -G https://urlcap.com/api/v1/capture \
-H "X-API-Key: $URLCAP_KEY" \
--data-urlencode "url=https://example.com/path?q=1"
Optional query parameters: method, followRedirects (true/false), timeoutMs.
POST — full control
Send Content-Type: application/json with a body of the following shape (only url is required):
http or https). Query string and port are honoured.GET (default), POST, PUT, DELETE, HEAD, OPTIONS, PATCH.User-Agent/Accept.UTF-8.3xx responses. Defaults to false — by default you see the redirect itself.10000; clamped to 1000–30000.false. Sign the outbound request with urlcap's Web Bot Auth signature (Ed25519). Adds Signature-Agent, Signature-Input and Signature headers. Target sites verify against the JWKS at /.well-known/http-message-signatures-directory.Headers
Returns
A 200 response whose data is a capture object, plus a requestId. Note that 200 means urlcap reached the target — the target's own status is in data.response.status. On error: the error envelope with 400 (invalid_request) for a bad/missing URL, 401 (unauthorized), or 502 (upstream_error) when the target couldn't be reached.
curl -X POST https://urlcap.com/api/v1/capture \
-H "X-API-Key: $URLCAP_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://httpbin.org/post?x=1",
"method": "POST",
"headers": [
{ "name": "User-Agent", "value": "my-app/1.0" },
{ "name": "X-Trace-Id", "value": "abc-123" },
{ "name": "Content-Type", "value": "application/json" }
],
"body": "{\"hello\":\"world\"}",
"followRedirects": false,
"timeoutMs": 10000
}'
const res = await fetch("https://urlcap.com/api/v1/capture", {
method: "POST",
headers: { "X-API-Key": process.env.URLCAP_KEY, "Content-Type": "application/json" },
body: JSON.stringify({
url: "https://example.com",
headers: [
{ name: "User-Agent", value: "my-app/1.0" },
{ name: "Accept", value: "text/html" }
],
followRedirects: true
})
});
const { data } = await res.json();
console.log(data.response.status, data.response.headers.map(h => h.name));
get User-Agent profiles — identify as Chrome / Firefox / Safari / …
/api/v1/user_agent_profiles
By default /capture and
/extract identify as
urlcap/1.0 (+https://urlcap.com/bot). Two knobs
let callers identify as something specific:
userAgent— a raw UA string. Wins over the profile.userAgentProfile— a key into the catalogue. For/extractthis also selects the HtmlUnitBrowserVersion(Chrome / Firefox / Edge) so JS-fingerprinted targets see a coherent (engine, UA) pairing.
The catalogue is operator-managed in the user_agent_profiles
MySQL table. Hit this endpoint to discover the available keys:
curl -s https://urlcap.com/api/v1/user_agent_profiles \
-H "X-API-Key: $URLCAP_KEY"
{
"version": "1",
"data": {
"profiles": [
{
"key": "chrome-latest-mac",
"description": "Chrome 131 on macOS",
"userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
"browserEngine": "chrome",
"extraHeaders": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"sec-ch-ua": "\"Chromium\";v=\"131\", \"Not_A Brand\";v=\"24\", \"Google Chrome\";v=\"131\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"macOS\""
}
},
{ "key": "firefox-latest-win", "...": "..." },
{ "key": "edge-latest-win", "...": "..." },
{ "key": "safari-latest-mac", "...": "..." },
{ "key": "googlebot", "...": "..." },
{ "key": "urlcap", "...": "..." }
]
}
}
curl -X POST https://urlcap.com/api/v1/capture \
-H "X-API-Key: $URLCAP_KEY" \
-H "Content-Type: application/json" \
-d '{ "url": "https://example.com", "userAgentProfile": "chrome-latest-mac" }'
Resolution order on /capture: explicit
headers[] wins (you're in full-control mode and we don't second-guess) →
userAgent →
userAgentProfile → the system default.
On /extract, only the profile selects the JS engine —
if you supply userAgent alone the engine stays Chromium.
Coherence caveat: the safari-latest-mac
profile ships a Safari UA on the Chromium engine because HtmlUnit doesn't include a Safari engine —
targets that fingerprint navigator.vendor /
navigator.platform will detect the mismatch.
The googlebot profile only sets the UA string —
urlcap is not Googlebot and reverse-DNS / published-CIDR checks at the target side will reject.
Use webBotAuth=true
for cryptographic urlcap attribution.
Extract — navigation & information retrieval
The extract tool runs a small "recipe" against a web page using a headless browser
engine: it loads a URL, optionally performs a sequence of actions (typing into fields, clicking,
waiting, navigating), then pulls out the data you describe with CSS or XPath selectors — optionally
walking through paginated results. It also handles JSON responses (a REST API, say):
point it at a URL that returns JSON and pull out values with JSONPath.
Because a job can take a while, it runs asynchronously: you submit a job model, get a
taskId back, and poll for completion.
The underlying browser engine is an implementation detail and may change; the job model and the result shape below are the contract.
The job model
A job is a JSON object:
{
"search_id": "my-job-1", // optional: a correlation id you choose; echoed back in the result
"url": "https://example.com/search", // required: the page to load
"content": "json", // optional: "json" forces the response to be processed as JSON (see below)
"actions": [ /* steps performed before extraction — see below */ ],
"extractors": [ /* what to pull from the final page — see below */ ],
"pagination": { /* optional: walk through multiple pages — see below */ }
}
search_id so you can correlate jobs.json to process the response body as JSON regardless of its Content-Type. If omitted, the engine auto-detects: an HTML response is treated as a page, anything else (application/json, text/plain, …) as JSON. See JSON content.false. Sign every outbound request the headless browser makes (main document, scripts, XHR, …) with urlcap's Web Bot Auth signature (Ed25519). Target sites verify against the JWKS at /.well-known/http-message-signatures-directory.Selectors
Every selector field accepts a CSS selector by default, or an XPath expression with an xpath: prefix. You may also write css: explicitly. When the response is JSON content, selector is a JSONPath expression instead (the jsonpath: prefix is accepted but optional there):
"#numResultados" // CSS (no prefix)
"css:.results a.title" // CSS (explicit)
"xpath://a[starts-with(@href,'item.php?')]" // XPath
"$.store.book[*].title" // JSONPath (only for JSON content)
"jsonpath:$..price" // JSONPath (explicit)
Actions
Each entry in actions is performed in order before the extractors run. An action object has a type and the fields that type needs:
type | Other fields | Effect |
|---|---|---|
| fill | selector, value | Sets the value of the matched input/textarea (or the value attribute otherwise). |
| select | selector, value | Selects the option with that value in the matched <select>. |
| click | selector | Clicks the matched element, then waits briefly for background JavaScript. |
| wait | ms | Waits ms milliseconds (default 1000) for background JavaScript to settle. |
| navigate | url | Loads a different URL. |
"actions": [
{ "type": "fill", "selector": "#q", "value": "widgets" },
{ "type": "select", "selector": "#category", "value": "hardware" },
{ "type": "click", "selector": "css:button[type=submit]" },
{ "type": "wait", "ms": 2000 }
]
Extractors
Each entry in extractors produces one top-level field in the result, named by its name. The type decides what is produced:
selector.attr attribute on the first matching element.attr is given, the attribute) of every matching element.fields, evaluated relative to that element.A fields entry (used by items) has name, selector, optional attr, and type (text or attr).
"extractors": [
{ "name": "total", "selector": "#numResults", "type": "text" },
{ "name": "results", "selector": "css:.result", "type": "items",
"fields": [
{ "name": "title", "selector": "a.title", "type": "text" },
{ "name": "href", "selector": "a.title", "type": "attr", "attr": "href" },
{ "name": "price", "selector": ".price", "type": "text" }
]
}
]
Every object inside a list/items array is automatically stamped with result_global_id (a counter across the whole job), result_relative_id (a counter within its page), and result_page (the 0-based page index it came from). (Stamping applies to HTML pages only — JSON results are returned verbatim.)
For JSON content the same extractor shapes apply, but selector is a JSONPath expression and the types are value (the default), list and items — see below.
Pagination
If pagination is present, the job visits multiple pages and runs per_page_extractors on each; the per-page results appear under a pages array in the result. There are two strategies:
sequential (default) — repeatedly click next_selector; or link_tour — visit every distinct pagination link found by link_selector exactly once (handles AJAX paginators whose whole bar re-renders each page).sequential strategy).link_tour strategy).10.1000.true (default), stop quietly when the next-page element is gone; if false, the job fails.JSON content
When the page you load returns JSON — a REST API endpoint, for example — the extract tool parses the
response body and runs your extractors with JSONPath instead of CSS/XPath.
This happens automatically when the response isn't HTML (application/json,
text/plain, anything that isn't text/html);
to force it (e.g. an API that mislabels its Content-Type as
text/html), set "content": "json" in the job model.
In JSON mode each extractor's selector is a JSONPath expression (the jsonpath: prefix is
accepted but optional; an expression that doesn't start with $ gets one prepended, so
store.book[0].title works too). actions and
pagination don't apply and are ignored. The type decides what each extractor produces:
null if nothing matched). A path that selects more than one node (uses .., [*], a filter or a slice) yields an array of all matches. This is the default when type is omitted.fields, evaluated relative to that node — a field's selector is a JSONPath where $ is the node (an empty selector means the node itself).The attr type (and a field's attr) is for HTML only; using it on JSON content fails the job.
// for a URL returning: { "page": 1, "total": 128,
// "results": [ { "id": 1, "name": "Widget A", "price": 9.99 },
// { "id": 2, "name": "Widget B", "price": 12 } ] }
{
"url": "https://api.example.com/products?q=widget",
"content": "json", // optional — auto-detected for application/json
"extractors": [
{ "name": "total", "selector": "$.total" },
{ "name": "names", "selector": "$.results[*].name", "type": "list" },
{ "name": "cheap", "selector": "$.results[?(@.price < 10)]", "type": "items",
"fields": [
{ "name": "id", "selector": "$.id" },
{ "name": "name", "selector": "$.name" },
{ "name": "price", "selector": "$.price" }
]
}
]
}
// → result:
{
"total": 128,
"names": [ "Widget A", "Widget B" ],
"cheap": [ { "id": 1, "name": "Widget A", "price": 9.99 } ]
}
post Submit a job
/api/v1/extract
Send the job model as a JSON body. The job is queued and you get a taskId immediately (status 202). Poll GET /api/v1/extract/{taskId} for progress.
curl -X POST https://urlcap.com/api/v1/extract \
-H "X-API-Key: $URLCAP_KEY" \
-H "Content-Type: application/json" \
-d '{
"search_id": "demo-1",
"url": "https://example.com",
"extractors": [ { "name": "heading", "selector": "css:h1", "type": "text" } ]
}'
{
"version": "1",
"taskId": "2b9a0c3e-7d11-4f44-9a8c-2c1d4e5f6a7b",
"status": "pending",
"statusUrl": "/api/v1/extract/2b9a0c3e-7d11-4f44-9a8c-2c1d4e5f6a7b"
}
If too many jobs are queued you get 503 (service_busy) — retry shortly. 400 (invalid_request) means the model couldn't be parsed or is missing url.
get Task status
/api/v1/extract/{taskId}
Returns the task's current state. status is one of pending, running, succeeded, failed. When succeeded, a result object is included; when failed, an error object. httpRequestCount is how many HTTP requests the engine has performed for this job so far.
You only see your own tasks; an unknown id (or one belonging to another key) returns 404. GET /api/v1/extract (no id) lists your recent tasks.
curl https://urlcap.com/api/v1/extract/2b9a0c3e-7d11-4f44-9a8c-2c1d4e5f6a7b \
-H "X-API-Key: $URLCAP_KEY"
{
"version": "1",
"taskId": "2b9a0c3e-7d11-4f44-9a8c-2c1d4e5f6a7b",
"status": "succeeded",
"url": "https://example.com",
"httpRequestCount": 1,
"createdAt": "2026-05-11T13:00:00.000Z",
"startedAt": "2026-05-11T13:00:00.100Z",
"finishedAt": "2026-05-11T13:00:01.900Z",
"result": {
"search_id": "demo-1",
"heading": "Example Domain"
}
}
For a job with an items extractor, the result looks like:
"result": {
"search_id": "demo-1",
"total": "128 results",
"results": [
{ "result_global_id": 1, "result_relative_id": 1, "result_page": 0, "title": "Widget A", "href": "/item?id=1", "price": "9.99" },
{ "result_global_id": 2, "result_relative_id": 2, "result_page": 0, "title": "Widget B", "href": "/item?id=2", "price": "12.00" }
]
}
IP & CIDR
Work with IPv4 and IPv6 addresses and CIDR ranges: check whether an address falls inside a range, keep a list of named ranges (allow/block lists, ASN or geo blocks, your own networks, …) and ask which of them contain a given address.
Stored ranges live in an optimised table: every address is kept as a 16-byte value (IPv4 is stored
IPv4-mapped, ::ffff:a.b.c.d, so v4 and v6 share one comparable key space), each range as
its first and last address plus a prefix length, and a B-tree index on those bounds turns
"which ranges contain this address?" into an index range scan.
get post Is an address in a CIDR?
/api/v1/ip/contains
A pure calculation — does ip fall within cidr? (Different address families ⇒ false.) A single host can be written with or without /32 · /128.
10.0.0.0/8) or a single address. Required unless you pass cidrs.results array.curl -G https://urlcap.com/api/v1/ip/contains \
-H "X-API-Key: $URLCAP_KEY" \
--data-urlencode "ip=10.20.30.40" \
--data-urlencode "cidr=10.0.0.0/8"
{
"version": "1",
"requestId": "…",
"data": {
"ip": "10.20.30.40",
"cidr": "10.0.0.0/8",
"contains": true,
"range": {
"cidr": "10.0.0.0/8",
"family": 4,
"prefixLength": 8,
"networkAddress": "10.0.0.0",
"lastAddress": "10.255.255.255"
}
}
}
Batch form: POST /api/v1/ip/contains with { "ip": "10.20.30.40", "cidrs": ["10.0.0.0/8", "192.168.0.0/16", "2001:db8::/32"] } → data.results is [ { "cidr": "10.0.0.0/8", "contains": true }, … ].
get post Which stored ranges contain an address?
/api/v1/ip/lookup
Looks up every range in your stored set (see below) that contains ip, most-specific first.
{ "ip": "…" }).curl -G https://urlcap.com/api/v1/ip/lookup \
-H "X-API-Key: $URLCAP_KEY" \
--data-urlencode "ip=10.20.30.40"
{
"version": "1",
"requestId": "…",
"data": {
"ip": "10.20.30.40",
"family": 4,
"matchCount": 2,
"matches": [
{ "id": 7, "cidr": "10.20.0.0/16", "family": 4, "prefixLength": 16, "label": "office-lan" },
{ "id": 3, "cidr": "10.0.0.0/8", "family": 4, "prefixLength": 8, "label": "rfc1918" }
]
}
}
get post del Manage stored ranges
/api/v1/ip/ranges · /api/v1/ip/ranges/{id}
- GET
/api/v1/ip/ranges— list your stored ranges (newest first):id,cidr,family,prefixLength,label,createdAt. - POST
/api/v1/ip/rangeswith{ "cidr": "10.20.0.0/16", "label": "office-lan" }— adds the range (thecidris canonicalised on the way in). If that CIDR is already stored its label is updated. Returns201with the row. - DELETE
/api/v1/ip/ranges/{id}— removes a stored range.404if there's no such id.
curl -X POST https://urlcap.com/api/v1/ip/ranges \
-H "X-API-Key: $URLCAP_KEY" \
-H "Content-Type: application/json" \
-d '{ "cidr": "2001:db8::/32", "label": "documentation-prefix" }'
{
"version": "1",
"requestId": "…",
"data": {
"id": 12,
"cidr": "2001:db8::/32",
"family": 6,
"prefixLength": 32,
"networkAddress": "2001:db8::",
"lastAddress": "2001:db8:ffff:ffff:ffff:ffff:ffff:ffff",
"label": "documentation-prefix"
}
}
get Full IP intelligence
/api/v1/ip/intelligence
One request, every signal urlcap has on an IP. Composes the static lookups
(CIDR membership, GeoIP, reverse DNS, bot registry,
trust list) with the behavioural data we've aggregated from
ingested traffic: which JA4 fingerprints the IP has used,
which user agents, what sites it hits, the status-code mix it gets back, vulnerability-
probe hits, and whether other customers have voted on its JA4s via
edge_action.
Use it as the investigation pane for a single IP — e.g., "tell me everything about the
IP that just showed up in my logs." For inline blocklist hot paths, stay on
/api/v1/ip/contains
and /api/v1/ja4/intelligence — they're optimised for membership checks; this one
composes ~7 sub-queries per call.
request_events TTL. Static sections (geo, rDNS, bot registry, trust list, bot observations) ignore this.curl -s "https://urlcap.com/api/v1/ip/intelligence?ip=216.73.217.19" \
-H "Authorization: Bearer $URLCAP_KEY"
{
"version": "1",
"data": {
"ip": "216.73.217.19",
"family": 4,
"windowDays": 7,
"geo": {
"countryCode": "US", "countryName": "United States",
"city": "Columbus", "asn": 16509,
"asnOrganization": "Amazon.com, Inc."
},
"reverseDns": { "names": [], "ttlSeconds": 300, "error": "no PTR record" },
"isBot": {
"matched": true,
"matches": [
{ "botGroup": "Claude bot", "botGroupId": 10, "cidr": "216.73.216.0/22" }
]
},
"trustList": { "trusted": false, "byUsers": 0 },
"botObservations": [
{ "botGroup": "Claude bot", "source": "cidr_match",
"observations": 439677, "firstSeen": "…", "lastSeen": "…" }
],
"behaviour": {
"totalRequests": 29179,
"distinctJa4s": 1, "distinctUserAgents": 1, "distinctSites": 1,
"firstSeen": "…", "lastSeen": "…",
"statusMix": { "count2xx": 28912, "count3xx": 267, "count4xx": 0,
"count5xx": 0, "count403": 0, "count444": 0 },
"assetRatio": 0.0,
"vulnProbeHits": 0, "vulnProbesUnique": 0,
"blockRatio": 0.0
},
"ja4s": [
{ "ja4": "t13d1011h2_61a7ad8aa9b6_867a6ff6dde3",
"ja4Hash": "7561205223071800741",
"requestCount": 29179,
"classification": "known_bot",
"botGroup": "OAI-SearchBot" }
],
"userAgents": [
{ "userAgent": "Mozilla/5.0 AppleWebKit/537.36 (…; compatible; Claude-SearchBot/1.0; +searchbot@anthropic.com)",
"requestCount": 29179 }
],
"crossCustomerAction": null
}
}
Reading the response
isBot.matches— every published-CIDR registry hit. Operator-grade attribution: Googlebot, Bingbot, GPTBot, ClaudeBot, etc. CIDRs refreshed daily from operators' own JSON.botObservations— every {bot_group, IP} attribution we've made internally.sourcetells you how:cidr_match(registry),ua_match(UA self-identification),vuln_match(≥3 vuln-probe paths),manual(admin override).trustList.byUsers— how many distinct urlcap accounts have whitelisted this IP. "12 customers trust this IP" is a strong "do not block" vote.behaviour.statusMix— what response codes this IP gets back across all sites. High 403/444 ratio means edges are already blocking it.behaviour.assetRatio— fraction of requests that fetched images/CSS/JS/fonts. Browsers ≈ 0.5-0.9; bots ≈ 0.ja4s[].classification— per fingerprint:known_bot(attributed inbot_observed_ja4s),candidate(pending review, includesscore), orunknown.crossCustomerAction— when other customers have submittededge_actionoutcomes for any JA4 this IP has used, the headcounts surface here. Highest-confidence label we publish.
Heads up: the per-IP attribution from botObservations and the per-JA4 attribution from ja4s[].botGroup can disagree. In the example above, the IP belongs to Anthropic (Claude bot CIDR), while the JA4 fingerprint is currently attributed to OAI-SearchBot because Anthropic and OpenAI's HTTP clients ship the same TLS library and produce identical fingerprints. That's not a bug — it's the kind of cross-axis fact this endpoint exists to surface. Decide policy based on which axis matters more for your case.
The TOTP code object
A successful call to the TOTP endpoint returns an envelope containing a data object that
represents a freshly computed time-based one-time password and the parameters used to compute it.
digits.code (commonly 6, taken from the URI or its default).30).SHA1, SHA256, or SHA512.code rotates. Use it to render a countdown; refetch when it reaches zero.{
"code": "492039",
"digits": 6,
"period": 30,
"algorithm": "SHA1",
"expiresIn": 14
}
get post Generate a TOTP code
/api/v1/totp
Related guide
- Automating TOTP codes for staging and QA workflows — Playwright + Python examples, secret-handling rules.
Computes the current TOTP code for an otpauth:// URI — the same string you'd scan into an
authenticator app. The URI carries the shared secret and the algorithm/digits/period; nothing is stored
server-side. Accepts GET (query string) or POST
(application/x-www-form-urlencoded).
How secrets are handled
- The
otpauth://URI (and its embedded shared secret) is processed in memory for the duration of one request and is never persisted. - The
uriparameter is redacted from request logs and excluded from analytics. - Transport is TLS 1.2+ only; cleartext requests are refused.
- The endpoint is intended for automated testing and internal automation against systems you own — not for storing or generating codes for your personal 2FA accounts.
- Full posture: /security.
Parameters
otpauth://totp/... URI containing at least a
secret parameter (Base32). May also include
algorithm, digits,
and period. Always percent-encode this value.
Headers
Returns
A 200 response whose data field is a TOTP code object, plus a requestId. On error, the standard error envelope with status 400 or 401.
curl -G https://urlcap.com/api/v1/totp \
-H "X-API-Key: $URLCAP_KEY" \
--data-urlencode "uri=otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP&period=30&digits=6"
const uri = "otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP&period=30&digits=6";
const res = await fetch(
`https://urlcap.com/api/v1/totp?uri=${encodeURIComponent(uri)}`,
{ headers: { "X-API-Key": process.env.URLCAP_KEY } }
);
if (!res.ok) throw new Error(`urlcap: ${res.status}`);
const { data, requestId } = await res.json();
console.log(`${data.code} (expires in ${data.expiresIn}s) — request ${requestId}`);
import os, requests
uri = "otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP&period=30&digits=6"
resp = requests.get(
"https://urlcap.com/api/v1/totp",
params={"uri": uri},
headers={"X-API-Key": os.environ["URLCAP_KEY"]},
timeout=10,
)
resp.raise_for_status()
print(resp.json()["data"]["code"])
{
"version": "1",
"requestId": "9f1c0b7a-3e2d-4a51-9b88-2f6c1e7d4a02",
"data": {
"code": "492039",
"digits": 6,
"period": 30,
"algorithm": "SHA1",
"expiresIn": 14
}
}
Scheduled tasks
Run a task at a future time — once, or repeatedly on a cron schedule — without keeping a
connection open. A scheduled task is either a capture (kind: "capture") or an
extract job (kind: "extract"). Every execution stores its full result, which
you can fetch later.
The base path is /api/v1/schedules. Everything here uses the standard JSON envelope and your API key.
The schedule object
{
"id": "0f4c…-uuid",
"kind": "capture", // "capture" | "extract"
"name": "prod health check", // optional label
"cron": "*/15 * * * *", // recurring; null for a one-shot
"runAt": null, // one-shot ISO-8601 time; null for recurring
"timezone": "Europe/Madrid", // the cron is evaluated in this zone (default UTC)
"status": "active", // active | paused | done | disabled
"nextRunAt": "2026-06-01T07:15:00Z",
"lastRunAt": "2026-06-01T07:00:00Z",
"runs": 12,
"maxRuns": null, // stop after this many runs; null = unlimited
"until": null, // stop after this ISO-8601 time; null = no end
"createdAt": "2026-05-12T10:00:00Z",
"capture": { "url": "https://example.com/health", "method": "GET" } // present for kind "capture";
// an "extract" key holds the job model for kind "extract"
}
Cron syntax. The classic 5-field crontab form — minute hour day-of-month month day-of-week
— with ranges (1-5), lists (1,15), steps (*/15) and names (MON, JAN).
An optional 6th leading field adds seconds. The macros @hourly @daily @weekly
@monthly @yearly work too. (The Quartz-only L/W/# do not.)
A job whose time was missed while the service was down runs once on the next poll, then resumes at its next future occurrence.
Extract schedules. An extract task runs through the (asynchronous) extract engine; the
scheduler waits for it to finish and stores the engine's result in the run row (no httpStatus — that's
a capture-only field). It also shows up in your extract task list.
post Create a schedule
/api/v1/schedules
Send either a cron expression (recurring) or a runAt timestamp (one-shot) — not both — plus exactly one of a capture object (same shape as the capture object) or an extract object (the extract job model). Which one you send determines the kind.
curl -X POST https://urlcap.com/api/v1/schedules \
-H "Authorization: Bearer api_…" -H "Content-Type: application/json" \
-d '{
"name": "prod health check",
"cron": "*/15 * * * *",
"timezone": "Europe/Madrid",
"maxRuns": 96,
"capture": { "url": "https://example.com/health", "method": "GET" }
}'
A one-shot capture instead:
{ "runAt": "2026-06-01T09:00:00Z", "capture": { "url": "https://example.com/report" } }
Or schedule an extract job — pass an extract object holding the job model:
{
"name": "daily price scrape",
"cron": "0 6 * * *",
"extract": {
"url": "https://example.com/products",
"extractors": [ { "name": "prices", "selector": ".price", "type": "list" } ]
}
}
Body fields
capture/extract— one is required.capture: same fields as the capture endpoint's JSON body (at minimum aurl).extract: the extract job model (at minimum aurl).cron— a cron expression (see above). Mutually exclusive withrunAt.runAt— an ISO-8601 timestamp (e.g.2026-06-01T09:00:00Z) for a single run. Mutually exclusive withcron.timezone— IANA zone thecronis evaluated in. DefaultUTC.name— optional label.maxRuns— optional; stop after this many executions.until— optional ISO-8601 timestamp; stop after this time.
Returns 201 with data.schedule = a schedule object. Bad cron / timezone / missing or both task objects → 400.
get post del List, inspect & manage
GET /api/v1/schedules— your schedules (data.schedules: an array of schedule objects).GET /api/v1/schedules/{id}— one schedule (data.schedule).POST /api/v1/schedules/{id}with{ "action": "pause" | "resume" | "run-now" }— pause stops future runs; resume re-arms it; run-now makes it fire on the next poll (within ~20s).DELETE /api/v1/schedules/{id}— cancel the schedule. Its run history is kept.
get Run history & results
GET /api/v1/schedules/{id}/runs— the executions, newest first (?limit=N). Each:runNo,scheduledFor,startedAt,finishedAt,status(running/ok/error),httpStatus,error.GET /api/v1/schedules/{id}/runs/{runNo}— one execution including its fullresult. For acapturetask that's the same JSON the capture endpoint returns ({ version, requestId, data: { request, response } }); for anextracttask it's the engine's extract result (the same shape asGET /extract/{taskId}'sresult).
curl https://urlcap.com/api/v1/schedules/0f4c…/runs/3 \
-H "Authorization: Bearer api_…"
Datasets
Named, de-duplicated collections of items of a single type — either IP / CIDR ranges
(canonical CIDRs, as in IP & CIDR; a single host becomes a /32 or
/128) or URLs (absolute http / https URLs). A dataset is yours; the API only ever shows you your own.
With history: true, every replace items operation first copies the items it
drops into the dataset's history with their removal date — useful for tracking a set as it evolves.
Plan caps
- free — up to 1 dataset, up to 1,000 items each, history not allowed.
- developer — up to 10 datasets, up to 100,000 items, history allowed.
- startup — up to 50 datasets, up to 1,000,000 items, history allowed.
- business — unlimited datasets & items, history allowed.
The dataset object
dataset-… if omitted on create.ip (IP/CIDR) or url.true, a replace keeps the dropped items (see History).get post List & create
/api/v1/datasets · /api/v1/datasets/{id}
- GET
/api/v1/datasets—data.datasetsis an array of dataset objects (newest first). - POST
/api/v1/datasetswith{ "type": "ip" | "url", "name"?: "…", "history"?: true, "items"?: [ … ] }— creates a dataset and (optionally) seeds it.201with the created object. - GET
/api/v1/datasets/{id}— one dataset (with its currentsize). - DELETE
/api/v1/datasets/{id}— deletes the dataset and its items (and history).
curl -X POST https://urlcap.com/api/v1/datasets \
-H "Authorization: Bearer $URLCAP_KEY" \
-H "Content-Type: application/json" \
-d '{ "name": "office-ranges", "type": "ip", "history": true,
"items": ["10.0.0.0/8", "192.168.1.0/24", "2001:db8::/32"] }'
{
"version": "1",
"requestId": "…",
"data": {
"dataset": {
"id": "9d811aa8-dbe6-4c48-a811-29f49dd9f49c",
"name": "office-ranges",
"type": "ip",
"itemType": 1,
"history": true,
"size": 3,
"createdAt": "2026-05-13T07:00:00Z",
"updatedAt": "2026-05-13T07:00:00Z"
}
}
}
Bad input — unknown type, duplicate name, plan limit reached, invalid item value, or a name starting with the reserved internal: prefix — returns 400 invalid_request.
get post put del Items: add, replace, remove
/api/v1/datasets/{id}/items
- GET — paged list (
?limit=N, default 1000, max 5000;?after=IDfor the next page). ReturnsitemsandnextAfter. - POST with
{ "items": [ … ] }— adds items, de-duplicated against what's already there. Returns{ added, size }. - PUT with
{ "items": [ … ] }— replaces the whole set. On ahistory: truedataset, the dropped items are first written to history. Returns{ size }. - DELETE with
{ "items": [ … ] }— removes the listed items. Returns{ removed, size }.
Each item value is canonicalised on the way in: an IP/CIDR is reduced to its canonical form (host bits cleared, single hosts become /32 or /128); a URL must be absolute and http / https. The dataset cannot contain two items with the same canonical value.
curl -X POST https://urlcap.com/api/v1/datasets/9d81…/items \
-H "Authorization: Bearer $URLCAP_KEY" \
-H "Content-Type: application/json" \
-d '{ "items": ["203.0.113.0/24", "198.51.100.7"] }'
{
"version": "1",
"data": { "datasetId": "9d81…", "added": 2, "size": 5 }
}
get History
/api/v1/datasets/{id}/history
For datasets created with history: true, the items previously dropped by a replace are kept, each stamped with the date it was removed. Newest removal first; ?limit=N (default 1000, max 5000). Always empty for non-history datasets.
{
"version": "1",
"data": {
"datasetId": "9d81…",
"count": 2,
"history": [
{ "id": 41, "value": "10.0.0.0/8", "addedAt": "2026-05-10T…", "removedAt": "2026-05-13T07:36:18.07Z" },
{ "id": 40, "value": "192.168.1.0/24", "addedAt": "2026-05-10T…", "removedAt": "2026-05-13T07:36:18.07Z" }
]
}
}
get post Membership check
/api/v1/datasets/{id}/contains
Normalises value to the dataset's type and reports whether that exact item is in the dataset (contains). For IP datasets, if value is a single address (not a CIDR), the response also includes covered — whether that address falls inside some CIDR stored in the dataset (a fast B-tree range lookup over the dataset's 16-byte bounds).
curl -G https://urlcap.com/api/v1/datasets/9d81…/contains \
-H "Authorization: Bearer $URLCAP_KEY" \
--data-urlencode "value=10.20.30.40"
{
"version": "1",
"data": {
"datasetId": "9d81…",
"type": "ip",
"value": "10.20.30.40",
"normalizedValue": "10.20.30.40/32",
"contains": false,
"covered": true
}
}
get post Is an IP a known bot?
/api/v1/is_bot
Tells you whether an IPv4 or IPv6 address belongs to a well-known web crawler (Googlebot, Bingbot, Yandex,
DuckDuckBot & DuckAssistBot, Applebot, GPTBot & ChatGPT-User & OAI-SearchBot, ClaudeBot,
PerplexityBot & Perplexity-User, AhrefsBot, Amazonbot, CCBot, …) and, if so, which one.
On every match the response includes the bot's search engine, bot group,
the matching CIDR, and the bot's categories
(SEARCH_INDEXING, AI_TRAINING,
AI_SEARCH_OR_ANSWERING, USER_INITIATED_FETCHING,
SEO_ANALYTICS, WEB_DATASET_ARCHIVING,
COMMERCIAL_PLATFORM, SOCIAL_PREVIEW, …) with per-link confidence.
Backed by an in-memory index: every CIDR each bot publishes is preloaded into a sorted array of 16-byte bounds with side-tables of bot / search-engine / category metadata. Lookups never hit the database (sub-millisecond per IP) and the index is rebuilt daily from the bots' published sources. The index also remembers CIDRs that have since been retired (replaced out by a later refresh) — so you can ask "was this IP a known bot on a specific date?" or "has this IP ever been a known bot?".
ip or ips.?ips=a,b,c). As a JSON body: { "ips": [ … ] }.YYYY-MM-DD is treated as that day's start in UTC; or YYYY-MM-DDTHH:MM:SSZ for an exact moment). Runs an as-of-date lookup — returns the CIDR-bot mappings that were active at this moment, automatically including retired records that had been live then.false. When true, the lookup also considers retired CIDR records — every CIDR that has ever been in any bot's published list, regardless of whether it's still current. Useful to answer "has this IP ever been a known bot?". Ignored if date is supplied.false. When true, attaches a reverseDns object (names, ttlSeconds, cached) to each per-IP result with the PTR names for the address. Results are cached in-process for the upstream record's TTL (clamped 30s–1h; negative answers 5 minutes). For the FCrDNS forward-confirm check, see /reverse_dns.
Each match always carries addedAt (when the CIDR first appeared in the bot's list),
removedAt (null while still current), and active (true iff
the CIDR was live at the query's time — now by default, or at date when supplied).
With historical=true you'll see active: false matches that report when the CIDR
was retired.
Every match's botGroup.honoursRobots reports four booleans — robots,
crawlDelay, allow, sitemap — for whether the bot operator
publicly commits to that aspect of robots.txt and has no documented violations.
null means we haven't researched it; false means at least one credible
report of the bot ignoring that rule (e.g. Googlebot's crawlDelay is false
since Google explicitly ignores Crawl-delay).
curl -G https://urlcap.com/api/v1/is_bot \
-H "Authorization: Bearer $URLCAP_KEY" \
--data-urlencode "ip=66.249.66.1"
{
"version": "1",
"requestId": "…",
"data": {
"ip": "66.249.66.1",
"family": 4,
"isBot": true,
"matchCount": 1,
"matches": [
{
"cidr": "66.249.66.0/27",
"active": true,
"addedAt": "2026-05-13T07:39:30Z",
"removedAt": null,
"botGroup": { "id": 4, "description": "Common crawlers" },
"searchEngine": { "id": 1, "description": "Google" },
"categories": [ { "code": "SEARCH_INDEXING", "confidence": "high" } ]
}
],
"cache": { "entries": 36670, "bots": 20, "builtAt": "2026-05-13T07:39:38Z" }
}
}
Batch form returns data.results, one entry per IP, with data.count and
data.matchCount at the top level. An unparseable address returns
isBot: false with family: null and a descriptive error field — the
whole call still succeeds.
# Was this IP ever a known bot?
curl -G https://urlcap.com/api/v1/is_bot \
-H "Authorization: Bearer $URLCAP_KEY" \
--data-urlencode "ip=66.249.64.5" \
--data-urlencode "historical=true"
# As of a specific date — includes retired CIDRs that were live then.
curl -G https://urlcap.com/api/v1/is_bot \
-H "Authorization: Bearer $URLCAP_KEY" \
--data-urlencode "ip=66.249.64.5" \
--data-urlencode "date=2025-04-01"
data.asOf and data.historical are echoed back when those parameters are supplied,
and cache.entries includes retired CIDRs alongside currently-active ones (so the number is
larger than the daily-active count).
get post Reverse DNS (PTR)
/api/v1/reverse_dns
Resolves an IPv4 or IPv6 address to the names returned by its PTR records
(in-addr.arpa for v4, ip6.arpa for v6).
With forwardConfirm=true, each PTR name is re-resolved to A/AAAA and we report whether the original
IP is in the answer — the FCrDNS check Googlebot, Bingbot and friends recommend to verify
that a crawler is who its PTR claims it is.
Results are cached in-process for the minimum TTL the upstream resolver returned, clamped to
30s..1h; negative answers (NXDOMAIN / no record / lookup error) are cached for 5 minutes.
Every result reports ttlSeconds remaining and cached.
ip or ips.?ips=a,b,c). As a JSON body: { "ips": [ … ] }.false. When true, each PTR name is forward-resolved (A + AAAA) and the response reports forwardConfirmed (true iff at least one PTR resolves back to the original IP) plus per-name forwardChecks with addresses, TTL and the matched flag.curl -G https://urlcap.com/api/v1/reverse_dns \
-H "Authorization: Bearer $URLCAP_KEY" \
--data-urlencode "ip=66.249.66.1"
{
"version": "1",
"requestId": "…",
"data": {
"ip": "66.249.66.1",
"family": 4,
"names": ["crawl-66-249-66-1.googlebot.com"],
"ttlSeconds": 3600,
"cached": false
}
}
Batch form returns data.results with data.count at the top level. An unparseable
address returns an entry with a descriptive error field — the whole call still succeeds.
# FCrDNS: is this really Googlebot?
curl -G https://urlcap.com/api/v1/reverse_dns \
-H "Authorization: Bearer $URLCAP_KEY" \
--data-urlencode "ip=66.249.66.1" \
--data-urlencode "forwardConfirm=true"
{
"version": "1",
"requestId": "…",
"data": {
"ip": "66.249.66.1",
"family": 4,
"names": ["crawl-66-249-66-1.googlebot.com"],
"ttlSeconds": 3600,
"cached": true,
"forwardConfirmed": true,
"forwardChecks": [
{
"name": "crawl-66-249-66-1.googlebot.com",
"matched": true,
"ttlSeconds": 300,
"cached": false,
"addresses": ["66.249.66.1"]
}
],
"forwardConfirm": true
}
}
IPv6 works the same way — the lookup uses ip6.arpa automatically.
Single-IP queries can also be made through /is_bot?reverseDns=true if
you want the PTR result alongside the bot match.
get Fetch robots.txt
/api/v1/robots
Pulls /robots.txt from a site, parses it per
RFC 9309,
and returns the user-agent groups, sitemaps and any unknown directives.
Fetched bodies are TTL-cached in-process for 1 hour (1 minute on transport errors / 5xx);
every response reports cached and ageSeconds.
Per the RFC: a 4xx (except 429) means "no rules apply" — reported
as effect: "no_rules_unrestricted"; a 5xx or 429
means "disallow everything" — effect: "restricted_by_error".
example.com) or any URL — the scheme/path are normalised away.curl -G https://urlcap.com/api/v1/robots \
-H "Authorization: Bearer $URLCAP_KEY" \
--data-urlencode "site=example.com"
{
"version": "1",
"requestId": "…",
"data": {
"site": "example.com",
"status": 200,
"contentSha256": "…",
"sizeBytes": 412,
"cached": false,
"ageSeconds": 0,
"body": "User-agent: *\nDisallow: /search\n",
"groups": [
{ "userAgents": ["*"], "rules": [{ "type": "disallow", "pattern": "/search" }] }
],
"sitemaps": [],
"extensions": {}
}
}
get URL allow / deny check
/api/v1/robots/check
Decides whether a URL is allowed for a given user-agent. Longest-match wins; on a tie,
Allow beats Disallow. If site is omitted, it's
derived from url.
url if omitted.Googlebot). Case-insensitive substring match.curl -G https://urlcap.com/api/v1/robots/check \
-H "Authorization: Bearer $URLCAP_KEY" \
--data-urlencode "url=https://example.com/private/foo" \
--data-urlencode "userAgent=Googlebot"
{
"version": "1",
"requestId": "…",
"data": {
"site": "https://example.com/private/foo",
"url": "https://example.com/private/foo",
"userAgent": "Googlebot",
"allowed": false,
"reason": "disallow '/private/' matched",
"matchedRule": { "type": "disallow", "pattern": "/private/" },
"matchedGroupUserAgents": ["Googlebot"],
"robotsStatus": 200,
"robotsCached": true,
"robotsAgeSeconds": 142
}
}
get post del Watch robots.txt for changes
/api/v1/robots/watch
Registers a per-user watch on a site's /robots.txt. A background job sweeps
every 15 minutes; each watch is re-fetched no sooner than its frequencyMinutes
(default 60, clamped to 15..1440). Snapshots are stored
only when the content hash changes — full body + previous hash kept on each.
If webhookUrl is set, every change triggers an HMAC-SHA256-signed POST.
Watches require a per-user API key — the legacy X-API-Key can't create them.
Free-trial calls return 401.
http(s)://….webhookUrl is set and you don't supply one — returned on the create response and the GET-one endpoint, never on list.60, clamped 15..1440.curl -X POST https://urlcap.com/api/v1/robots/watch \
-H "Authorization: Bearer $URLCAP_KEY" \
-H "Content-Type: application/json" \
-d '{"site":"example.com","webhookUrl":"https://hooks.example.com/robots","frequencyMinutes":30}'
{
"version": "1",
"requestId": "…",
"data": {
"watch": {
"id": "1f1b…-…-…",
"site": "example.com",
"webhookUrl": "https://hooks.example.com/robots",
"webhookSecret": "f3a2…64-hex-chars…", // shown once on create; store it
"frequencyMinutes": 30,
"active": true,
"createdAt": "2026-05-16T13:00:00Z"
}
}
}
On every detected change, urlcap POSTs JSON to webhookUrl with an HMAC-SHA256
signature in X-urlcap-Signature: sha256=hex computed over the request
body with your watch's webhookSecret. Verify it before trusting the payload.
POST /robots HTTP/1.1
Content-Type: application/json
User-Agent: urlcap-robots-webhook/1.0
X-urlcap-Event: robots.changed
X-urlcap-Timestamp: 1747400400
X-urlcap-Signature: sha256=e3b0c44298…
{
"type": "robots.changed",
"watchId": "1f1b…",
"site": "example.com",
"snapshotId": 42,
"fetchedAt": "2026-05-16T13:30:00Z",
"httpStatus": 200,
"contentSha256": "…",
"previousSha256": "…",
"sizeBytes": 412,
"body": "User-agent: *\nDisallow: /\n"
}
The other operations are:
GET /api/v1/robots/watch— list your watchesGET /api/v1/robots/watch/{id}— fetch one (includeswebhookSecret)DELETE /api/v1/robots/watch/{id}— remove a watch (cascades to its snapshots)GET /api/v1/robots/watch/{id}/history?limit=&changedOnly=&includeBody=— list snapshotsPOST /api/v1/robots/watch/{id}/poll— force a poll right now, bypassing the per-watch frequency throttle
post Verify a Web Bot Auth signature
/api/v1/web_bot_auth/verify
Decides whether an inbound HTTP request's RFC 9421 HTTP Message Signature is valid against the operator's published Ed25519 key directory — the cryptographic identity check the Web Bot Auth draft proposes as a successor to relying on IP ranges and reverse DNS for "is this really Googlebot?".
You hand urlcap the inbound request's method,
url and the headers the bot sent. The verifier
parses Signature-Input +
Signature, fetches the JWKS-style directory at
Signature-Agent (TTL-cached 1 h on success,
1 min on errors), rebuilds the canonical signature base from the covered components,
and verifies with Ed25519 using the key whose kid
matches the keyid parameter.
Failures (expired signature, missing keyid, directory unreachable, signature mismatch,
unsupported algorithm, missing tag="web-bot-auth")
come back as verified:false with a reason — never as HTTP errors,
so you can feed the answer straight into a policy without a try/catch.
GET).@method, @authority, @target-uri, @path, @query, @scheme in the signature base.Signature, Signature-Input and Signature-Agent; any other header the signature covers (named in Signature-Input's inner-list) must also be present.sig1=…, sig2=…), pick a specific label. Default: first.false. Set true to skip the expires check (useful for forensic analysis of a stored request).true. The draft MUSTs that Web Bot Auth signatures carry tag="web-bot-auth"; set this to false only when verifying a non-WBA RFC 9421 signature.curl -X POST https://urlcap.com/api/v1/web_bot_auth/verify \
-H "Authorization: Bearer $URLCAP_KEY" \
-H "Content-Type: application/json" \
-d '{
"method": "GET",
"url": "https://example.com/foo",
"headers": {
"Signature-Agent": "\"https://example.com/.well-known/http-message-signatures-directory\"",
"Signature-Input": "sig1=(\"@authority\" \"signature-agent\" \"@method\" \"@target-uri\");created=1747500000;expires=1747500060;keyid=\"abc\";alg=\"ed25519\";tag=\"web-bot-auth\"",
"Signature": "sig1=:base64-signature-here:"
}
}'
{
"version": "1",
"requestId": "…",
"data": {
"verified": true,
"label": "sig1",
"keyid": "abc",
"algorithm": "ed25519",
"signatureAgent": "https://example.com/.well-known/http-message-signatures-directory",
"tag": "web-bot-auth",
"createdAt": "2026-05-16T13:00:00Z",
"expiresAt": "2026-05-16T13:01:00Z",
"expired": false,
"coveredComponents": ["@authority", "signature-agent", "@method", "@target-uri"],
"directory": {
"url": "https://example.com/.well-known/http-message-signatures-directory",
"httpStatus": 200,
"keyCount": 3,
"rawKeyCount": 3,
"cached": false,
"ageSeconds": 0,
"kids": ["abc", "def", "ghi"]
}
}
}
Algorithm support is Ed25519 only (the draft's MUST). The signature is required
to cover @authority and the signature-agent header — both are checked
before any directory fetch happens, so an attacker swapping in a friendly directory can't
short-circuit the binding to the original request.
Pairs naturally with /is_bot and
/reverse_dns?forwardConfirm=true:
is_bot says "this IP is in Google's CIDR list," reverse_dns says "the rDNS
points back to it," and web_bot_auth says "the bot proved it with a signature only
Google could have made." All three together is the gold-standard identity check.
Identifying as urlcap: signed capture & extract
The other direction: when you make a capture
or extract request, set
webBotAuth: true and urlcap signs every
outbound HTTP request it makes on your behalf with our own Ed25519 key. Sites that allow
known crawlers but block unknown bots can then choose to allow urlcap-attributed traffic
— and verify that what claims to be urlcap really is.
Our public keys are served at
https://urlcap.com/.well-known/http-message-signatures-directory
as a JWKS-style JSON document. Each signed request carries a
Signature-Agent header pointing at that URL plus
standard Signature-Input /
Signature headers per RFC 9421.
curl -X POST https://urlcap.com/api/v1/capture \
-H "Authorization: Bearer $URLCAP_KEY" \
-H "Content-Type: application/json" \
-d '{ "url": "https://example.com/", "webBotAuth": true }'
The response's data.request.headers shows the
three signature headers that went on the wire, so you can verify the integration end-to-end
by feeding them back into /api/v1/web_bot_auth/verify.
Signature lifetime is 60 seconds.
get post List & create sites
/api/v1/sites
A site is the multi-tenancy unit for ingest. Every event you ship via
/events or
/outcomes is scoped
to one site, and every {site, ingest token, hostname} triple has to line up or the row is
rejected. Customers usually create one site for their whole edge and add every public
hostname to it; large multi-product orgs sometimes split per product.
Auth: your urlcap API key (the same one used for capture / TOTP / is_bot). Distinct from ingest tokens, which are per-site and used only for the ingest channel.
curl -s https://urlcap.com/api/v1/sites \
-H "Authorization: Bearer $URLCAP_KEY"
curl -s https://urlcap.com/api/v1/sites \
-X POST \
-H "Authorization: Bearer $URLCAP_KEY" \
-H "Content-Type: application/json" \
-d '{"name": "datocapital"}'
{
"version": "1",
"data": {
"id": 47,
"publicId": "Xy9KqZ7mNvB2",
"name": "my-site"
}
}
Save the publicId — that's what you pass as
{site_id} in every downstream URL. The numeric
id still works for backward compatibility, but
new integrations should use publicId because
it's randomly generated and doesn't disclose how many sites exist on urlcap.
get post Hostnames per site
/api/v1/sites/{site_id}/domains
Each event ingested under a site must have a host
field that's already attached to that site. Submit them once up front; /events
will reject any rows whose host isn't registered with "host '…' not registered for site_id=…".
Hostnames are UNIQUE across the whole urlcap database — a hostname belongs to exactly one site. Attempting to add one that's already attached elsewhere returns 409.
docdoc (HTML pages — the normal case), cdn (assets / images / CSS / JS on a separate domain), api (JSON endpoints), or other. Used by the discovery scan to de-prioritise candidate JA4s that only ever touch CDN-kind hostnames — those are usually image crawlers (Pinterest, archive.org) you don't want to block. Invalid values silently fall back to doc.curl -s https://urlcap.com/api/v1/sites/Xy9KqZ7mNvB2/domains \
-H "Authorization: Bearer $URLCAP_KEY"
curl -s https://urlcap.com/api/v1/sites/Xy9KqZ7mNvB2/domains \
-X POST \
-H "Authorization: Bearer $URLCAP_KEY" \
-H "Content-Type: application/json" \
-d '{"hostname": "api.datocapital.com"}'
curl -s https://urlcap.com/api/v1/sites/Xy9KqZ7mNvB2/domains \
-X POST \
-H "Authorization: Bearer $URLCAP_KEY" \
-H "Content-Type: application/json" \
-d '{"hostname": "cdn.datocapital.com", "kind": "cdn"}'
get post Ingest tokens per site
/api/v1/sites/{site_id}/ingest_keys
Per-site bearer tokens (ingest_<32hex>)
that authenticate
/events and
/outcomes calls.
Distinct from the urlcap API key you use everywhere else — ingest tokens
are scoped only to the ingest channel, and you can revoke / rotate them independently.
Storage: only the SHA-256 of the token sits in the database. The cleartext is returned exactly once, on creation. Capture it then; if you lose it, mint a new one and revoke the old.
curl -s https://urlcap.com/api/v1/sites/3/ingest_keys \
-H "Authorization: Bearer $URLCAP_KEY"
curl -s https://urlcap.com/api/v1/sites/3/ingest_keys \
-X POST \
-H "Authorization: Bearer $URLCAP_KEY" \
-H "Content-Type: application/json" \
-d '{"label": "datocapital-prod-2026-05"}'
{
"version": "1",
"data": {
"id": 3,
"siteId": 3,
"label": "datocapital-prod-2026-05",
"token": "ingest_c7e57f1d9d43866bba19bf95d65c9457",
"warning": "Save this token now; it is not stored in plaintext and cannot be retrieved later."
}
}
To rotate: mint a new one with the same label suffix, update your edge config, then revoke the old one (direct DB update for now — admin UI coming). Multiple active tokens per site are fine; we recommend one per environment (prod, staging, ci).
Ingest channel — events & outcomes
The ingest channel is how a site streams its own traffic into urlcap and gets back per-fingerprint intelligence — JA4 / IP profiles, bot likelihood, and (with the outcomes endpoint below) high-confidence "real human" signals like JS challenge pass rate, registered-user observation, and completed purchases. Two complementary endpoints:
POST /api/v1/ingest/{site_id}/events— one NDJSON line per request. The patched nginx ships these natively.POST /api/v1/ingest/{site_id}/outcomes— asynchronous verdicts tied to therequest_idof an earlier event: challenge passed, user authenticated, order completed.
Auth: site ingest tokens
Both ingest endpoints authenticate with a per-site bearer token (separate from your urlcap API key). Mint one with your urlcap API key:
curl -X POST https://urlcap.com/api/v1/sites/42/ingest-keys \
-H "Authorization: Bearer $URLCAP_KEY" \
-H "Content-Type: application/json" \
-d '{"label": "my-prod-site-2026-05"}'
The response includes token: "ingest_…" — shown
once at creation, stored only as a SHA-256 hash thereafter. Store it in your edge config.
The request_id binding
Every event you send carries a request_id (the patched
nginx emits $request_id). Outcomes you post later refer
back to that id, and urlcap resolves it to the JA4/IP fingerprint server-side at ingest time —
so the binding persists even after the raw event ages out of the 7-day window.
post Send request events
/api/v1/ingest/{site_id}/events
NDJSON batch of request observations. One row per request. Used as the raw input for every downstream signal — JA4 reqs / unique IPs / asset ratio / classifier / intelligence. Caps: up to 1,000 events / 4 MB per batch. Per-row errors never fail the batch.
The patched nginx ships these natively and the
NginxBotLogTail background component picks them up
from /var/log/nginx/bot-access.ndjson when
sites.local_log_path is set. The HTTP endpoint is
the alternative for clients that prefer to push.
$request_id). The binding key for later outcomes.t13d1516h2_8daaf6152771_b0da82dd1658). Empty = no TLS handshake; the row is skipped.host must be in the site's registered hostnames or the row is rejected.asn / country are looked up from MaxMind if absent.curl -X POST https://urlcap.com/api/v1/ingest/42/events \
-H "Authorization: Bearer $INGEST_TOKEN" \
-H "Content-Type: application/x-ndjson" \
--data-binary '{"ts":"2026-05-21T08:11:30Z","request_id":"abc123…","ja4":"t13d1516h2_8daaf6152771_b0da82dd1658","ja4_hash":12345,"ip":"1.2.3.4","host":"shop.example.com","path":"/products","method":"GET","status":200,"user_agent":"Mozilla/5.0…"}
{"ts":"2026-05-21T08:11:31Z","request_id":"def456…","ja4":"t13d1516h2_8daaf6152771_b0da82dd1658","ja4_hash":12345,"ip":"1.2.3.5","host":"shop.example.com","path":"/products/sku-7","method":"GET","status":200}'
{
"version": "1",
"requestId": "…",
"data": { "siteId": 42, "accepted": 2, "rejected": 0, "errors": [] }
}
post Send outcomes (challenges, auth, purchases)
/api/v1/ingest/{site_id}/outcomes
NDJSON batch of asynchronous verdicts — the strongest "is this a real human?" signals in
the system. Each outcome refers back to the request_id
of an event you already sent; urlcap resolves it to a JA4 at ingest time so the binding survives
even after the raw event ages out.
Three canonical kinds today, each driving one cluster of fields in ja4_intelligence_latest:
| kind | verdict values | meta keys (canonical) | drives |
|---|---|---|---|
js_challenge |
passed | failed | abandoned |
challenge, reason |
js_challenge_pass_rate |
auth |
authenticated | signup | anonymous |
user_id_hash (SHA-256 of your user id) |
auth_observations, distinct_users |
purchase |
completed | refunded | disputed |
order_id, amount_cents, currency |
purchases, total_purchase_cents, last_purchase_at |
edge_action |
blocked | allowed | challenged | rate_limited |
rule, rule_id (your blocklist label) |
cross_customer_action.sites_blocking on bot-ja4s |
js_challenge, auth, purchase, edge_action, or any other label you want to track.JSONExtractString(meta,'...'). See the kinds table above for canonical keys.request_events.JS challenge example
Send when your edge issues a JS challenge (Turnstile, hCaptcha, your own proof-of-work) and gets a verdict back.
curl -X POST https://urlcap.com/api/v1/ingest/42/outcomes \
-H "Authorization: Bearer $INGEST_TOKEN" \
-H "Content-Type: application/x-ndjson" \
--data-binary '{"request_id":"abc123","kind":"js_challenge","verdict":"passed","score":0.92,"meta":{"challenge":"turnstile_v0"}}
{"request_id":"def456","kind":"js_challenge","verdict":"failed","meta":{"reason":"no-js-execution"}}'
Auth example — is this a registered user?
Send when a request you previously logged was authenticated — login session active, signup
completed, password reset confirmed. The user_id_hash
should be a hash of your internal user id, not the raw value — we only need
to count distinct users, never identify them.
curl -X POST https://urlcap.com/api/v1/ingest/42/outcomes \
-H "Authorization: Bearer $INGEST_TOKEN" \
-H "Content-Type: application/x-ndjson" \
--data-binary '{"request_id":"abc123","kind":"auth","verdict":"authenticated","meta":{"user_id_hash":"u_sha256:7a3f…"}}
{"request_id":"def456","kind":"auth","verdict":"signup","meta":{"user_id_hash":"u_sha256:e9c1…"}}'
Purchase example — has this fingerprint converted?
The strongest "real human, valuable visitor" signal. Send from your order-confirmation webhook.
Use the request_id of the request that closed
the order (the checkout-complete POST, not the first product view).
curl -X POST https://urlcap.com/api/v1/ingest/42/outcomes \
-H "Authorization: Bearer $INGEST_TOKEN" \
-H "Content-Type: application/x-ndjson" \
--data-binary '{"request_id":"abc123","kind":"purchase","verdict":"completed","meta":{"order_id":"o_9876","amount_cents":4999,"currency":"USD"}}
{"request_id":"def456","kind":"purchase","verdict":"refunded","meta":{"order_id":"o_9876"}}'
{
"version": "1",
"requestId": "…",
"data": { "siteId": 42, "accepted": 2, "rejected": 0, "errors": [] }
}
A single request_id can carry multiple outcomes —
one page-view that triggered a challenge, then authenticated, then converted is three rows with
the same id. They aggregate independently into the matching counters.
get Read JA4 intelligence
/api/v1/ja4/intelligence
Returns the rolled-up profile for one JA4 fingerprint over a window (7 / 30 / 90 days). Authenticated with your urlcap API key — not the site ingest token.
7, 30, or 90 (must match a configured window in intelligence.compute.windows_days).curl -G https://urlcap.com/api/v1/ja4/intelligence \
-H "Authorization: Bearer $URLCAP_KEY" \
--data-urlencode "site_id=42" \
--data-urlencode "ja4_hash=17888951274072987679" \
--data-urlencode "window_days=7"
{
"version": "1",
"requestId": "…",
"data": {
"siteId": 42, "ja4Hash": "17888951274072987679", "windowDays": 7,
"ja4": "t13d3613h2_018971650b2c_23cd79a6e20d",
"reqs": 10, "unique_ips": 2, "unique_uas": 1, "unique_paths": 6,
"likely_os_name": "OS X", "likely_os_confidence": 1.0,
"likely_agent_name": "Chrome", "likely_agent_confidence": 1.0,
"likely_device_class": "Desktop", "likely_device_confidence": 1.0,
"ja4_ua_consistency": 1.0,
"ua_diversity_score": 0.1,
"ip_diversity_score": 0.2,
"suspicious_score": 0.0,
"js_challenge_attempts": 1, "js_challenge_passes": 1, "js_challenge_pass_rate": 1.0,
"auth_observations": 1, "distinct_users": 1,
"purchases": 1, "total_purchase_cents": 4999, "last_purchase_at": "2026-05-21T07:12:02Z"
}
}
Field meanings: ja4_ua_consistency 1.0 = this JA4
always claims the same (agent, os) tuple; lower = the JA4 is observed claiming mismatched UAs
(= spoofed UA on the same TLS library). js_challenge_pass_rate
NULL when never challenged, 0.0 = challenged but never passes (strongest pure-bot signal).
distinct_users ≥ 1 = at least one registered user
observed on this fingerprint. purchases > 0 = the
highest-confidence "real human, valuable visitor" signal.
get Trailing-hour JA4 signals
/api/v1/ja4/signals
Cloudflare-style "what does this JA4 look like right now?" snapshot. Returns the latest
1-hour rollup with 10 ratios + 4 ranks + 2 quantiles. Recomputed every minute by an
internal job; 404 if the fingerprint hasn't
been seen on this site within the trailing hour. Authenticated with your urlcap API key.
curl -G https://urlcap.com/api/v1/ja4/signals \
-H "Authorization: Bearer $URLCAP_KEY" \
--data-urlencode "site_id=3" \
--data-urlencode "ja4_hash=13366807129412944815"
{
"version": "1",
"requestId": "…",
"data": {
"siteId": 3, "ja4Hash": "13366807129412944815",
"calculatedAt": "2026-05-25T09:14:00Z",
"ja4": "t13d1516h2_8daaf6152771_b1ff8ab2d16f",
"reqs_1h": 14528,
"h2h3_ratio_1h": 0.97, "browser_ratio_1h": 0.99,
"cache_ratio_1h": 0.42, "heuristic_ratio_1h": 0.0,
"unique_ips_1h": 3211, "unique_uas_1h": 14, "unique_paths_1h": 882,
"reqs_rank_1h": 4, "reqs_quantile_1h": 0.97,
"ips_rank_1h": 5, "ips_quantile_1h": 0.96,
"uas_rank_1h": 18, "paths_rank_1h": 6
}
}
*_ratio_1h are 0..1 shares of the trailing-hour
request volume. *_rank_1h is per-site rank (1 = highest)
and *_quantile_1h the corresponding quantile —
a rank-1 JA4 will sit near 1.0. Use this for hot-path decisions; for stable long-window
classification use /ja4/intelligence.
get JA4 25-metric snapshot
/api/v1/ja4/metrics
The "prioritised 25" metric snapshot for one JA4 — computed on-demand from the
5-minute / 1-hour / 1-day aggregates (no precompute job). Three blocks in one response:
a trailing-hour rollup for the JA4, an optional IP+JA4 sub-block when
ip= is supplied, and a top-10 JA4×UA breakdown
over 24h with each UA's share of the JA4's volume.
ip_ja4_1h block and an is_new_24h flag.curl -G https://urlcap.com/api/v1/ja4/metrics \
-H "Authorization: Bearer $URLCAP_KEY" \
--data-urlencode "site_id=3" \
--data-urlencode "ja4_hash=13366807129412944815" \
--data-urlencode "ip=203.0.113.5"
{
"version": "1",
"requestId": "…",
"data": {
"siteId": 3, "ja4Hash": "13366807129412944815",
"ja4_1h": {
"req_count": 14528, "unique_ips": 3211, "unique_uas": 14, "unique_paths": 882, "unique_hosts": 1,
"h2h3_ratio": 0.97, "browser_ua_ratio": 0.99,
"error_ratio": 0.018, "s403_ratio": 0.0, "s404_ratio": 0.012, "s429_ratio": 0.0
},
"ip_ja4_1h": {
"ip": "203.0.113.5", "req_count": 41, "unique_uas": 1, "unique_paths": 18, "unique_hosts": 1,
"browser_ua_ratio": 1.0, "library_ua_ratio": 0.0, "h2h3_ratio": 1.0,
"error_ratio": 0.0, "s404_ratio": 0.0,
"first_seen": "2026-05-25T08:42:00Z", "last_seen": "2026-05-25T09:14:21Z",
"is_new_24h": true
},
"ja4_ua_24h_top": [
{ "ua_hash_128": "8d9c…", "req_count": 218341, "unique_ips": 5621, "unique_asns": 412,
"error_ratio": 0.02, "share_of_ja4": 0.71 },
{ "ua_hash_128": "ab12…", "req_count": 41203, "unique_ips": 331, "unique_asns": 18,
"error_ratio": 0.01, "share_of_ja4": 0.13 }
]
}
}
library_ua_ratio on the IP block counts UAs Yauaa
classifies as Special or Robot —
high values are a non-browser client tell. share_of_ja4 in the UA
breakdown sums to 1.0 across the top-N; a single UA > 0.95 means "one client owns this JA4."
get JA4 profile breakdown
/api/v1/ja4/profile
Top-N values of one profile dimension for a JA4 over the last N days. Useful for "show me
every agent_name ever observed on this JA4" or
"which countries does this fingerprint actually come from." Each row returns the request
count plus HLL-merged distinct ips/uas/paths.
os_name, os_class, agent_name, agent_class, device_class, device_brand, country, asn, http_version.curl -G https://urlcap.com/api/v1/ja4/profile \
-H "Authorization: Bearer $URLCAP_KEY" \
--data-urlencode "site_id=3" \
--data-urlencode "ja4_hash=13366807129412944815" \
--data-urlencode "dim=country" \
--data-urlencode "days=30" \
--data-urlencode "limit=10"
{
"version": "1",
"requestId": "…",
"data": {
"siteId": 3, "ja4Hash": "13366807129412944815",
"dim": "country", "days": 30,
"values": [
{ "value": "US", "reqs": 482931, "unique_ips": 9214, "unique_uas": 211, "unique_paths": 5410 },
{ "value": "DE", "reqs": 121034, "unique_ips": 1632, "unique_uas": 88, "unique_paths": 2231 },
{ "value": "JP", "reqs": 88412, "unique_ips": 941, "unique_uas": 42, "unique_paths": 1844 }
]
}
}
A JA4 returning many distinct agent_name values
with even shares is a strong UA-spoofing tell — pair with ja4_ua_consistency.
Same trick for country or asn to
spot scrapers behind residential-proxy networks.
get Per-IP rollup
/api/v1/ip/profile
Per-IP behavioural summary on a single site over the last N days. Returns request count
plus distinct JA4s / UAs / paths / hosts — an IP serving many of each is a proxy / NAT tell.
For cross-site investigation including geo, PTR, bot-CIDR membership and every bot
attribution, use the richer /api/v1/ip/intelligence.
curl -G https://urlcap.com/api/v1/ip/profile \
-H "Authorization: Bearer $URLCAP_KEY" \
--data-urlencode "site_id=3" \
--data-urlencode "ip=203.0.113.5" \
--data-urlencode "days=30"
{
"version": "1",
"requestId": "…",
"data": {
"siteId": 3, "ip": "203.0.113.5", "days": 30,
"reqs": 18421,
"unique_ja4s": 9, "unique_uas": 31, "unique_paths": 482, "unique_hosts": 3,
"heuristic_reqs": 411, "challenge_reqs": 0,
"first_seen": "2026-04-28T14:21:08Z", "last_seen": "2026-05-25T09:14:21Z"
}
}
Heuristics: unique_ja4s >= 3 with
unique_uas >= 10 is proxy-shaped;
unique_ja4s = 1 with high
reqs and low unique_paths is a single
headless client. 404 if the IP hasn't sent any traffic
to this site in the window.
get List blockable JA4s per site
/api/v1/sites/{site_id}/bot-ja4s
Returns every JA4 the discovery system has flagged on this site in the last
window_days days, labelled by classification.
The customer pulls this and feeds the JA4 strings into their edge blocklist (nginx
$ja4 map, Cloudflare WAF rule, etc.). Browser-shaped
fingerprints are excluded because they're not blockable.
request_events' 7-day TTL on the upper end.known,candidateknown = attributed to a bot_group (Bingbot, GPTBot, …); candidate = pending admin review.curl -s "https://urlcap.com/api/v1/sites/3/bot-ja4s?window_days=7&limit=200" \
-H "Authorization: Bearer $URLCAP_KEY"
{
"version": "1",
"data": {
"siteId": 3,
"windowDays": 7,
"items": [
{
"ja4": "t13d181300_e8a523a41297_69f017ebb96f",
"ja4_hash": "13366807129412944815",
"classification": "known_bot",
"bot_group": "Googlebot",
"bot_group_id": 4,
"reqs": 2885, "ips": 162, "active_days": 1,
"asset_ratio": 0.0,
"first_seen": "2026-05-21T19:35:00Z",
"last_seen": "2026-05-21T22:01:38Z"
},
{
"ja4": "t13d311100_e8f1e7e78f70_b6426fc6f187",
"ja4_hash": "2034759142565420012",
"classification": "candidate",
"score": 0.76,
"candidate_id": 1222,
"reqs": 11317, "ips": 4953, "active_days": 1,
"asset_ratio": 0.0,
"first_seen": "2026-05-21T20:33:22Z",
"last_seen": "2026-05-21T22:01:38Z",
"cross_customer_action": {
"sites_blocking": 12,
"sites_allowing": 0,
"sites_challenging": 3
},
"block_likely_on_this_site": true,
"block_likely_ratio": 0.997,
"reqs_per_minute_peak": 21334,
"reqs_per_minute_mean": 293,
"active_minutes": 2289,
"burstiness": 4.28
}
]
}
}
Rate & burstiness fields
Surface per-minute traffic-shape signal from urlcap's internal
ja4_agg_1m rollup. Useful for catching
high-volume bursty JA4s that hide under a moderate asset_ratio
but spike to thousands of req/min during their active windows — the no-man's-land case
where score >= 0.70 and
asset_ratio falls between 0.05 and 0.50, so
neither Tier 1 nor Tier 2 fires.
reqs_per_minute_peak— max requests in any 1-minute bucket for this JA4 over the window.reqs_per_minute_mean— average across active minutes only (silent buckets between bursts are excluded so a bursty bot's mean reads its real when-active rate, not a diluted overall average).active_minutes— count of 1-min buckets withreqs > 0.burstiness— coefficient of variation (stddev / meanof per-minute counts). 0 = perfectly uniform; 1 = Poisson-like; > 2 = attack-shape; > 5 = textbook scheduled-burst pattern.
Practical use: add a Tier 1 boost condition like
burstiness > 2.0 OR reqs_per_minute_peak > 1000
to catch attack-shape JA4s your other rules would miss.
The two policy axes
Each row carries enough information to drive a per-operator decision:
bot_groupis the operator name — your policy lever. Most customers keepGooglebot+Bingbotfor search referral traffic, blockGPTBot/ClaudeBot/CCBotfor AI-training without traffic return.scoreis the confidence on unattributed candidates — start withscore >= 0.7and tighten as you watch analytics for collateral damage.cross_customer_action.sites_blocking>= 3 is a strong "other sites block this too" vote, regardless of score.block_likely_on_this_siteis what we can already tell from your own status codes — useful as a sanity check, not a recommendation.
Closing the loop: report your edge actions
When your edge takes a decision on a JA4, post an
edge_action outcome.
That populates the cross_customer_action field on every
other site's bot-ja4s response — your blocks become a
signal for everyone else, and theirs become a signal for you.
curl -X POST https://urlcap.com/api/v1/ingest/Xy9KqZ7mNvB2/outcomes \
-H "Authorization: Bearer $INGEST_TOKEN" \
-H "Content-Type: application/x-ndjson" \
--data-binary '{"request_id":"abc123","kind":"edge_action","verdict":"blocked","meta":{"rule":"urlcap-blocklist","rule_id":"v1"}}'
Send one outcome per edge decision. Auth is the site's ingest token (the same one used for
/events), not your urlcap
API key. request_id lets us auto-resolve the JA4
server-side from the original event; you don't have to ship the fingerprint inline.
post URL monitors — up/down checks with alerts
/api/v1/monitors
Schedule a recurring /capture
or /extract run and
urlcap will alert you when the target changes state (up → down or down → up).
Same primitives as UptimeRobot — plus full headless-browser checks, JSON-API
validation, and the User-Agent personas from
/user_agent_profiles.
Plan limits
- Free: 1 monitor, minimum 300 s.
- Developer: 25 monitors, minimum 60 s.
- Startup: 100 monitors, minimum 30 s.
- Business: unlimited, minimum 30 s.
Create a monitor
The spec object is shipped verbatim to the
chosen engine, so anything you can do via /capture
or /extract works here too — custom headers,
POST bodies, JSON-content extractors, navigation actions, Web Bot Auth signing.
curl -X POST https://urlcap.com/api/v1/monitors \
-H "X-API-Key: $URLCAP_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "prod API healthcheck",
"kind": "capture",
"spec": { "url": "https://api.example.com/healthz" },
"intervalSeconds": 60,
"expectedStatus": 200,
"userAgentProfile": "chrome-latest-mac",
"alertWebhookUrl": "https://example.com/webhooks/urlcap-monitor",
"alertEmail": "oncall@example.com",
"alertFailureThreshold": 2
}'
{
"version": "1",
"requestId": "…",
"data": {
"publicId": "Xy9KqZ7mNvB2",
"name": "prod API healthcheck",
"kind": "capture",
"spec": { "url": "https://api.example.com/healthz" },
"intervalSeconds": 60,
"expectedStatus": 200,
"userAgentProfile": "chrome-latest-mac",
"alertWebhookUrl": "https://example.com/webhooks/urlcap-monitor",
"alertWebhookSecret": "Tw3qHk…48charsBase62…M7",
"alertWebhookSecretWarning": "Save this secret now — it signs outbound webhook payloads (X-urlcap-Signature: sha256=…) and is not retrievable later.",
"alertEmail": "oncall@example.com",
"alertFailureThreshold": 2,
"paused": false,
"currentState": "unknown",
"createdAt": "2026-05-26T12:00:00Z"
}
}
alertWebhookSecret is returned only
on this create call. Subsequent reads omit it — you'll see
alertWebhookSecretSet: true instead. The secret
signs every outbound webhook with HMAC-SHA256 in X-urlcap-Signature;
your verifier compares its own HMAC of the raw body to the header.
Pass rule (v1)
Status-code only. If expectedStatus is set, the
check passes only when the response status matches exactly. If it's absent, any
2xx is a pass. Richer assertions
(body-contains, JSONPath predicates) are on the roadmap.
Alerts
Alerts fire only on state transitions, not on every failing check.
The state machine debounces flapping targets via
alertFailureThreshold consecutive failures
required before flipping to down. Both alertWebhookUrl
and alertEmail are optional; set neither and
the monitor still records check history but won't notify.
{
"event": "monitor.state_changed",
"monitorPublicId": "Xy9KqZ7mNvB2",
"monitorName": "prod API healthcheck",
"newState": "down",
"changedAt": "2026-05-26T12:34:56Z",
"latestCheck": { "httpStatus": 503, "latencyMs": 421, "passed": false, "error": null }
}
// Headers: X-urlcap-Event, X-urlcap-Monitor, X-urlcap-Timestamp, X-urlcap-Signature: sha256=<HMAC>
Inspect a monitor
curl https://urlcap.com/api/v1/monitors/Xy9KqZ7mNvB2 -H "X-API-Key: $URLCAP_KEY"
curl https://urlcap.com/api/v1/monitors/Xy9KqZ7mNvB2/checks?limit=20 -H "X-API-Key: $URLCAP_KEY"
curl https://urlcap.com/api/v1/monitors/Xy9KqZ7mNvB2/uptime?days=30 -H "X-API-Key: $URLCAP_KEY"
Phase-level timings
Capture monitors record a phase breakdown on every check: dnsMs,
connectMs (TCP + TLS),
ttfbMs (time-to-first-byte),
bodyMs (body download), plus
resolvedIp (which A/AAAA the socket actually used).
Phase fields are absent when their hook didn't fire — pooled keep-alive reuse skips
DNS / connect, and followRedirects=true only
captures the first leg. Same fields show up in
data.response.timings on bare /capture too.
Extract monitors don't have these (the HtmlUnit engine doesn't surface phase timing).
Lifecycle
PATCH /api/v1/monitors/{publicId}— whole-spec replace (partial updates not supported in v1).POST /api/v1/monitors/{publicId}/pause— scheduler skips it; state is preserved.POST /api/v1/monitors/{publicId}/resume— reverse the above.DELETE /api/v1/monitors/{publicId}— hard-delete. Check history is removed by the daily sweeper.
Check history (monitor_checks rows) is kept for
30 days. A daily internal job at 03:05 UTC sweeps anything older.
get Candidate IPs per site
/api/v1/sites/{id}/bot-ip-candidates
Live feed of IPs exhibiting abusive behaviour on your site under five composite signals. The IP equivalent of the JA4 candidate queue — small list by construction (score floor), recomputed on every call. Use it to populate an iptables / ipset / Cloudflare IP-list at the edge for IP-based blocking.
Scoring (weights sum to 1.0)
- 40%
block_ratio— share of this IP's requests on the site that returned 4xx or 444. - 20%
volume—log10(1+reqs)/4, saturates at ~10,000 requests. - 15%
path breadth—distinct_paths/100, saturates at 100 paths. - 15%
JA4 churn—distinct_ja4s/3, saturates at 3 JA4s (anti-fingerprinting tell). - 10%
vuln-probe hits— distinct vuln-probe paths hit, saturates at 3.
Hard exclusions (no scoring, row dropped)
- IP is on any user's trust list — signal modulation is global.
- IP is already attributed in
bot_observed_ips— no point re-discovering known bots. Use/bot-trafficfor those. reqs < 20— insufficient evidence on this site within the window.
Request
json returns the operator-rich shape below. txt returns one IP per line. cidr returns each IP as /32 (or /128). Both text formats use text/plain — easy to feed straight into ipset.curl -G https://urlcap.com/api/v1/sites/2DrxGfsYW0jv/bot-ip-candidates \
-H "Authorization: Bearer $URLCAP_KEY" \
--data-urlencode "window_days=7" \
--data-urlencode "min_score=0.50"
{
"version": "1",
"requestId": "…",
"data": {
"siteId": 3, "windowDays": 7, "minScore": 0.5, "excludedClassified": 4954,
"candidates": [
{
"ip": "203.0.113.5", "asn": 9009, "country": "VN", "score": 0.78,
"components": {
"block_ratio": 0.92, "reqs": 1850,
"distinct_paths": 1452, "distinct_ja4s": 4, "vuln_probe_hits": 12
},
"first_seen": "2026-05-22 03:48:52.000",
"last_seen": "2026-05-22 04:03:07.000"
}
]
}
}
For one-shot ipset feeding: curl -s … &format=txt | ipset restore -exist.
For Cloudflare IP-list import: … &format=cidr | cf-cli ip-list update ....
get URL traffic + blocks summary
/api/v1/sites/{id}/url-stats?host=&path=&days=N
Quick "is this URL healthy?" report for a specific page on your site. One call returns totals, status-code breakdown, per-day chart, and up to 25 most recent non-2xx requests with their IP / ASN / JA4. Use to answer "any rejections on URL X today?".
curl -G https://urlcap.com/api/v1/sites/2DrxGfsYW0jv/url-stats \
-H "X-API-Key: $URLCAP_KEY" \
--data-urlencode "host=en.example.com" \
--data-urlencode "path=/terms-of-service.html" \
--data-urlencode "days=7"
{
"version": "1",
"data": {
"siteId": 3, "host": "en.example.com", "path": "/terms-of-service.html",
"windowDays": 7,
"total": 351, "ok_2xx": 341, "redirect_3xx": 2,
"rejected_4xx": 8, "error_5xx": 0,
"unique_ips": 122, "unique_ja4": 55,
"by_status": [
{ "status": 200, "count": 341, "unique_ips": 122 },
{ "status": 405, "count": 4, "unique_ips": 1 },
{ "status": 444, "count": 4, "unique_ips": 1 }
],
"by_day": [ /* one row per UTC day */ ],
"rejected_sample": [
{ "ts": "2026-05-26 04:02:44.000", "status": 444, "ip": "::ffff:45.33.69.206",
"asn": 63949, "method": "GET", "ja4": "" }
]
}
}
get Per-bot accessibility check
/api/v1/sites/{id}/bot-traffic?bot_group=&days=N
Did urlcap-discovered bot traffic land successfully on your site? Given a
bot_group name (substring-match on
bot_groups.description), this returns its
recent visits, status breakdown, per-day chart, and a sample of recent requests with
non-2xx surfaced first. Use to answer "are we blocking Google?" in one call.
Bot identification matches against every JA4 the discovery system has attributed to the
bot_group via bot_observed_ja4s — catches
both CIDR- and UA-matched traffic without enumerating IP lists. Optional
host and path
parameters narrow the lookup to a single page.
curl -G https://urlcap.com/api/v1/sites/2DrxGfsYW0jv/bot-traffic \
-H "X-API-Key: $URLCAP_KEY" \
--data-urlencode "bot_group=Googlebot" \
--data-urlencode "days=1"
{
"version": "1",
"data": {
"siteId": 3, "botGroupFilter": "Googlebot", "windowDays": 1,
"matchedBotGroups": [ { "botGroupId": 4, "description": "Googlebot" } ],
"ja4HashesUsedForFilter": 12,
"total": 98831, "ok_2xx": 96603, "redirect_3xx": 1010,
"rejected_4xx": 969, "error_5xx": 0,
"by_status": [
{ "status": 200, "count": 96603 },
{ "status": 404, "count": 955 },
{ "status": 403, "count": 14 }
],
"by_day": [ /* one row per UTC day */ ],
"recent_sample": [
{ "ts": "...", "host": "en.example.com", "path": "/images/consumo.png", "status": 403, "ip": "..." }
]
}
}
Substring matching: bot_group=Google catches both
Googlebot and
User-triggered fetchers (Google) together; pass the
exact full description to narrow to one bot_group.
get Legacy — /auth
/auth
The original TOTP endpoint, kept for backwards compatibility. It takes the same uri query
parameter and the same X-API-Key header, but responds with the bare code as
text/html — no JSON envelope, no metadata. Prefer /api/v1/totp for new integrations; this endpoint will not change.
curl -G https://urlcap.com/auth \
-H "X-API-Key: $URLCAP_KEY" \
--data-urlencode "uri=otpauth://totp/Acme:alice@acme.io?secret=JBSWY3DPEHPK3PXP"
492039
If the key is invalid or the URI can't be parsed, the legacy endpoint responds with 404 Not Found and an empty body.
Need an API key, or want to talk through a use case? Email info@urlcap.com. Track changes in the changelog.