Deployment Guide¶
This guide covers how to build, configure, and run web-researcher-mcp — whether locally on your machine or deployed to a server. Most users only need the Quick Start in the README; this doc is for production deployments and advanced configuration.
Package Manager Distribution¶
AUR (Arch Linux)¶
yay -S web-researcher-mcp # or paru, trizen, etc.
Manual install:
git clone https://aur.archlinux.org/web-researcher-mcp.git
cd web-researcher-mcp
makepkg -si
The PKGBUILD and .SRCINFO in packaging/aur/ are updated automatically on every release by the update-packaging CI job. To update a manual install: yay -Syu web-researcher-mcp.
Nix / NixOS¶
Run without installing:
nix run github:zoharbabin/web-researcher-mcp
Add to a flake:
inputs.web-researcher-mcp.url = "github:zoharbabin/web-researcher-mcp";
# In your environment packages:
packages = [ inputs.web-researcher-mcp.packages.${system}.default ];
Profile install:
nix profile install github:zoharbabin/web-researcher-mcp
The flake in packaging/nix/flake.nix ships pre-built binaries for x86_64-linux, aarch64-linux, x86_64-darwin, and aarch64-darwin. Hashes are updated automatically on every release.
Continue.dev¶
Continue.dev does not have a package marketplace; configure the server via your ~/.continue/config.json:
{
"mcpServers": {
"web-researcher": {
"command": "uvx",
"args": ["web-researcher-mcp"]
}
}
}
A ready-to-copy snippet is in packaging/continue/config.json.
Build¶
# Development build
go build -o web-researcher-mcp ./cmd/web-researcher-mcp
# Production build (static, stripped)
CGO_ENABLED=0 go build -ldflags="-s -w" -o web-researcher-mcp ./cmd/web-researcher-mcp
# FIPS-compliant build (government/enterprise)
GOEXPERIMENT=boringcrypto CGO_ENABLED=0 go build -ldflags="-s -w" -o web-researcher-mcp ./cmd/web-researcher-mcp
# Cross-compile
GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o web-researcher-mcp-linux-amd64 ./cmd/web-researcher-mcp
GOOS=darwin GOARCH=arm64 go build -ldflags="-s -w" -o web-researcher-mcp-darwin-arm64 ./cmd/web-researcher-mcp
Output: single static binary. No runtime dependencies.
Transport Modes¶
STDIO (Default — Claude Code, Cursor, Claude Desktop)¶
# Direct
./web-researcher-mcp
# With env
GOOGLE_CUSTOM_SEARCH_API_KEY=AIza... GOOGLE_CUSTOM_SEARCH_ID=017... ./web-researcher-mcp
The server reads MCP JSON-RPC from stdin, writes to stdout. No port, no network.
Claude Code config (~/.claude.json):
{
"mcpServers": {
"web-researcher": {
"command": "/path/to/web-researcher-mcp",
"env": {
"GOOGLE_CUSTOM_SEARCH_API_KEY": "AIza...",
"GOOGLE_CUSTOM_SEARCH_ID": "017...",
"SEARCH_PROVIDER": "brave",
"BRAVE_API_KEY": "BSA..."
}
}
}
}
HTTP (Multi-client, web apps)¶
PORT=3000 \
OAUTH_ISSUER_URL=https://auth.example.com \
OAUTH_AUDIENCE=https://api.example.com \
./web-researcher-mcp
When PORT is set, the server runs the HTTP (Streamable) transport exclusively
and does not read STDIO; when PORT is unset it runs STDIO exclusively. The two
transports are mutually exclusive, so a container started with PORT set but no
stdin attached (docker run -p ... -e PORT=...) stays up serving HTTP.
Endpoints:
- /mcp/ — Streamable HTTP MCP endpoint (handles POST and streaming)
- GET /health/live — Liveness probe (always 200, ok; a degraded-but-alive
process must not be killed)
- GET /health/ready — Readiness probe. When multi-provider routing is
configured, returns 503 (with the health snapshot JSON) only when every
provider's circuit breaker is open — the pod cannot serve any query and
should be pulled from the load balancer; 200 otherwise (healthy or
degraded, since fallback providers still serve). With no routing
(single-provider / zero-config), it is a static 200 ready — there is no
breaker ladder to gate on and the process is ready by construction.
- GET /metrics — Prometheus metrics
- GET /dashboard — read-only operator dashboard (HTML); its data endpoint GET /dashboard/data is admin-gated. Both are registered only when ADMIN_API_KEY is set. See Operator Dashboard
- GET /.well-known/oauth-authorization-server — OAuth metadata
Transport Mode Differences¶
| Behavior | STDIO (Local) | HTTP (Cloud/Team) |
|---|---|---|
| Tool functionality | Identical | Identical |
| Tool descriptions | Identical | Identical |
| Auth | No | OAuth 2.1 when OAUTH_ISSUER_URL is set; open otherwise |
| Rate limiting (server-side) | None | Per-tenant + global |
| Rate limiting (upstream APIs) | Applies | Applies |
| Session persistence | Local disk | Local disk (use sticky sessions for multi-instance) |
| Audit logging | Yes | Yes |
| SSRF protection | Yes | Yes |
| Cache | Local memory + disk | Local memory + disk |
Design intent: STDIO mode trusts the local user implicitly (it runs as their process). HTTP mode adds auth and rate limiting for untrusted network clients. Tool handlers execute identically regardless of transport.
Connecting an AI client to a remote HTTP endpoint¶
Once you have a server running with PORT set (locally or in the cloud), point your MCP client at the /mcp/ endpoint. The path is the same regardless of host.
Claude Code (~/.claude.json):
{
"mcpServers": {
"web-researcher-remote": {
"type": "http",
"url": "https://your-server.example.com/mcp/"
}
}
}
Cursor (~/.cursor/mcp.json) and VS Code (.vscode/mcp.json):
{
"servers": {
"web-researcher": {
"type": "http",
"url": "https://your-server.example.com/mcp/"
}
}
}
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"web-researcher": {
"type": "http",
"url": "https://your-server.example.com/mcp/"
}
}
}
When OAuth is enabled (OAUTH_ISSUER_URL set), the client must present a valid Bearer token on each request. When OAuth is not configured, the endpoint is open to any request — restrict access via firewall or reverse-proxy auth if needed.
Docker¶
The project includes two Dockerfiles in the repo root:
- Dockerfile — multi-stage build (builder + Alpine runtime), used for local builds
- Dockerfile.release — slim Alpine image used by GoReleaser (expects pre-built binary)
Both images bundle Chromium plus the fonts/libraries go-rod needs for full browser-tier rendering, run as a non-root UID (65534), and set CHROME_PATH=/usr/bin/chromium-browser so the browser scrape tier works out of the box — no extra layers required.
# Build and run locally
docker build -t web-researcher-mcp .
docker run -i --rm \
-e GOOGLE_CUSTOM_SEARCH_API_KEY=... \
-e GOOGLE_CUSTOM_SEARCH_ID=... \
web-researcher-mcp
# HTTP mode
docker run -p 3000:3000 \
-e PORT=3000 \
-e GOOGLE_CUSTOM_SEARCH_API_KEY=... \
-e GOOGLE_CUSTOM_SEARCH_ID=... \
-e OAUTH_ISSUER_URL=... \
-e OAUTH_AUDIENCE=... \
web-researcher-mcp
For headless browser (go-rod): The bundled images already ship Chromium and set CHROME_PATH. Override CHROME_PATH only if you mount a different Chromium/Chrome binary.
Kubernetes¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-researcher-mcp
spec:
replicas: 3
selector:
matchLabels:
app: web-researcher-mcp
template:
metadata:
labels:
app: web-researcher-mcp
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "3000"
prometheus.io/path: "/metrics"
spec:
containers:
- name: server
image: web-researcher-mcp:latest
ports:
- containerPort: 3000
env:
- name: PORT
value: "3000"
- name: GOOGLE_CUSTOM_SEARCH_API_KEY
valueFrom:
secretKeyRef:
name: mcp-secrets
key: google-api-key
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 1000m
memory: 512Mi
livenessProbe:
httpGet:
path: /health/live
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-researcher-mcp
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-researcher-mcp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: mcp_active_connections
target:
type: AverageValue
averageValue: "50"
Environment Variables¶
Required¶
None. With no configuration at all, the server falls back to DuckDuckGo (zero-config, no API key). For higher-quality results configure a provider below. .env.example is the authoritative source for the full variable set.
| Variable | Description | Example |
|---|---|---|
GOOGLE_CUSTOM_SEARCH_API_KEY |
Google API key. Required only when SEARCH_PROVIDER=google and routing is unset; otherwise optional |
AIzaSy... (39 chars) |
GOOGLE_CUSTOM_SEARCH_ID |
Search engine ID (paired with the key above) | From PSE console |
Note: Google keys are validated as required only when you explicitly select SEARCH_PROVIDER=google without multi-provider routing. With SEARCH_PROVIDER unset (or any other value), the server starts keyless and falls back to the zero-config DuckDuckGo provider — in both STDIO and HTTP mode. A genuine misconfiguration (e.g. SEARCH_PROVIDER=google with no key) is fatal in HTTP mode (PORT set) and logged-but-non-fatal in STDIO mode so local use is never blocked.
Search Provider¶
| Variable | Description | Default |
|---|---|---|
SEARCH_PROVIDER |
Primary provider: google, brave, serper, searxng, searchapi, duckduckgo, tavily, exa, hackernews | google (variable default); at runtime, when google is selected but no Google key is set, the server falls back to the zero-config duckduckgo provider |
SEARCH_ROUTING |
Multi-provider routing (see below) | — |
BRAVE_API_KEY |
Brave Search API key | — |
BRAVE_EXTRA_SNIPPETS |
Return up to 5 extra snippets per Brave result | false |
SERPER_API_KEY |
Serper.dev API key | — |
SEARCHAPI_API_KEY |
SearchAPI.io API key | — |
TAVILY_API_KEY |
Tavily API key (AI-agent search; sent as a Bearer token) | — |
EXA_API_KEY |
Exa API key (neural/semantic search; sent as x-api-key). Also backs academic_search, the answer/structured_search tools, and a paid /contents scrape fallback tier |
— |
SEARXNG_URL |
SearXNG instance URL | — |
SEARXNG_BASIC_AUTH |
HTTP Basic credential user:password for a SearXNG behind Basic auth (malformed value fails startup; never logged) |
— |
SEARXNG_HEADERS |
Static request headers for SearXNG as comma-separated Name: Value pairs (no commas/newlines in a value; a custom Authorization overrides SEARXNG_BASIC_AUTH) |
— |
CUSTOM_LENSES_PATH |
Directory of custom lens JSON files, loaded after (and able to override) the bundled lenses; schema-validated at startup. See lenses/README.md |
— |
Patent Providers (Optional)¶
These enable structured patent search via official APIs. Without them, patent_search falls back to web search discovery.
| Variable | Description | Coverage |
|---|---|---|
USPTO_API_KEY |
USPTO API key (data.uspto.gov) | US patents |
EPO_OPS_CONSUMER_KEY |
EPO OPS consumer key (developers.epo.org) | Worldwide |
EPO_OPS_CONSUMER_SECRET |
EPO OPS consumer secret | Worldwide |
LENS_API_TOKEN |
The Lens API token (lens.org) | Worldwide + scholarly |
Each configured provider gets an independent circuit breaker. The patent_search tool automatically selects providers based on the requested patent_office region.
Academic Providers (Optional)¶
These enable rich scholarly metadata (DOIs, authors, citation counts, abstracts, OA status) for academic_search, and back the citation_graph tool. Without them, academic_search falls back to site-restricted web search.
| Variable | Description | Default |
|---|---|---|
OPENALEX_EMAIL |
Contact email for the OpenAlex polite pool (287M+ works). Also enables citation_graph (counts-only) |
— |
CROSSREF_EMAIL |
Contact email for the CrossRef polite pool (140M+ DOI-registered works) | — |
SEMANTIC_SCHOLAR_API_KEY |
Semantic Scholar API key (200M+ papers + tldr + citation intent/influence). Works without a key at a lower shared rate; also powers citation_graph (rich edges) |
— |
PUBMED_API_KEY |
NCBI E-utilities API key for PubMed (biomedical literature). PubMed is always available keyless (~3 req/s); this key raises the rate (~10 req/s) | — |
PUBMED_EMAIL |
Optional NCBI contact for PubMed requests (recommended by NCBI). Falls back to OPENALEX_EMAIL |
— (falls back to OPENALEX_EMAIL) |
UNPAYWALL_EMAIL |
Contact email enabling Unpaywall open-access enrichment (fills free-PDF links on DOI-bearing results that lack one). Falls back to OPENALEX_EMAIL when unset; no-op when neither is set |
— (falls back to OPENALEX_EMAIL) |
citation_graph registers only when a citation-capable academic provider (Semantic Scholar or OpenAlex) is configured. Open-access enrichment is best-effort and never fails or slows a search beyond its own bounded request.
Structured-Domain Providers (Optional)¶
These enable dedicated structured-research tools. Each provider is independent. filing_search registers only when its provider is configured; legal_search, econ_search, and clinical_search each have a keyless provider, so they are always available (a key/token only adds coverage or raises limits).
| Variable | Tool | Description | Default |
|---|---|---|---|
EDGAR_CONTACT_EMAIL |
filing_search |
Contact email for SEC EDGAR's required User-Agent (no API key). Falls back to OPENALEX_EMAIL; filing_search registers only when one is set |
— (falls back to OPENALEX_EMAIL) |
COURTLISTENER_API_TOKEN |
legal_search |
Optional token raising the CourtListener rate limit (~100→~5000 req/day). legal_search is always available (CourtListener works keyless) |
— |
FRED_API_KEY |
econ_search |
Federal Reserve Economic Data API key (free at fred.stlouisfed.org). econ_search is always available via keyless World Bank / OECD / Eurostat providers; this key adds FRED's US macro series |
— |
| — (none) | econ_search |
World Bank Open Data (global development indicators, 200+ economies), OECD (SDMX economy indicators), and Eurostat (European official statistics). All keyless; no configuration | — |
| — (none) | clinical_search |
ClinicalTrials.gov v2 — 400K+ clinical-trial registrations as typed data. Keyless; no configuration. Always available | — |
IA_ACCESS_KEY + IA_SECRET_KEY |
archive_source |
Optional Internet Archive S3-style credentials for Save Page Now. archive_source is always available keyless; both keys together authenticate captures for higher reliability. Never logged. Get a pair at archive.org/account/s3.php |
— |
Each structured-domain provider gets an independent circuit breaker and uses the SSRF-safe HTTP client. filing_search returns XBRL company facts (with facts=true); econ_search returns observations passed through exactly as the source provides them — no rounding; clinical_search returns trial metadata for discovery (not medical advice).
BrandFetch (Optional)¶
These enable the Tier 1 BrandFetch API for brand_research. The tool always works without them — it falls back to CSS extraction, homepage meta, and web search.
| Variable | Description |
|---|---|
BRANDFETCH_API_KEY |
BrandFetch Brand + Context API key (pk_*). Free tier: 100 req/month. Never logged |
BRANDFETCH_CLIENT_ID |
BrandFetch logo CDN client ID. Free tier: 500K req/month |
Multi-Provider Routing¶
When SEARCH_ROUTING is set, the server uses all configured providers with priority-ordered fallback:
# Simple: comma-separated priority list (applies to all operations)
SEARCH_ROUTING=brave,google,serper
# Advanced: per-operation routing (JSON)
SEARCH_ROUTING='{"web":"brave,google","news":"brave,serper","images":"google,brave","academic":"openalex,crossref","patents":"epo,lens,searchapi,uspto","default":"brave,google,searchapi"}'
How it works:
- Requests route to the first healthy provider in the priority list
- If a provider fails (timeout, rate limit, 5xx), the next provider is tried automatically
- Each provider gets an independent circuit breaker. The routing-layer breakers that govern fallback (web, patent, and academic alike) open after 3 consecutive failures and reset after 30s (internal/search/router.go). Domain providers additionally wrap their own upstream HTTP calls in an inner breaker (5 failures / 60s, internal/search/domain.go) — a separate, deeper layer, not the effective routing breaker. See those files for the authoritative values.
- Lenses can override routing via the "routing" field in their JSON definition
Operation types: web, images, news, academic, patents, default. The academic and patents lists are filtered to providers that implement the academic/patent interface — academic accepts openalex, crossref, pubmed, semanticscholar, exa; patents accepts searchapi, epo, lens, uspto. Names that don't implement the interface are silently dropped, so use the example values above.
When no explicit routing is configured for an operation, the default list is used. When SEARCH_ROUTING is not set at all, the server uses SEARCH_PROVIDER as a single provider (backward compatible).
HTTP Transport¶
| Variable | Description | Default |
|---|---|---|
PORT |
HTTP listen port (enables HTTP mode) | — (STDIO only) |
OAUTH_ISSUER_URL |
JWT issuer URL | — |
OAUTH_AUDIENCE |
Expected JWT audience | — |
ALLOWED_ORIGINS |
CORS origins (comma-separated). Browser-only; backend connectors and STDIO are unaffected | — (deny cross-origin by default; see CORS_STRICT) |
CORS_STRICT |
When true (default), an empty ALLOWED_ORIGINS denies all cross-origin browser requests (fail-closed). When false, an empty ALLOWED_ORIGINS reflects any Origin (legacy permissive escape hatch). See MIGRATION.md for the breaking change. |
true |
ENFORCE_SCOPES |
When true, a token that carries a scope/scp claim must include tool:*, tool:<name>, or the coarse research scope to invoke a tool. Tokens with no scope claim are still allowed (permissive; fail-closed only on present-but-insufficient scopes). |
false |
REQUIRED_SCOPES |
Optional comma-separated scopes that every request must carry when ENFORCE_SCOPES=true. Only meaningful with ENFORCE_SCOPES. |
— |
Connecting browser-based clients (CORS)¶
CORS is a browser-only mechanism — it governs whether JavaScript running on one origin may read responses from your server. It is not an authentication layer (that is OAuth). Two cases:
- Hosted connectors (ChatGPT, Claude.ai, and most agent platforms). When a user adds your remote server as a connector, the platform's backend opens the connection, not the user's browser tab. These requests carry no enforced
Origin, so CORS never applies and the fail-closed default has no effect. You do not need to control the client app — just configure OAuth. This is the common case. - A genuine in-browser MCP client (JavaScript calling your server directly with
fetch). Here CORS applies. The operator allow-lists the client's public origin — you don't need to own the app to do this:
ALLOWED_ORIGINS=https://claude.ai,https://chatgpt.com
To restore the legacy permissive behavior wholesale, set CORS_STRICT=false (see MIGRATION.md).
HTTP Hardening¶
These tune the embedded http.Server and response security headers. All are ignored in STDIO mode (when PORT is unset). Defaults are permissive so long scrape/research responses are never truncated — HTTP_WRITE_TIMEOUT=0 (unlimited) in particular keeps multi-minute responses intact.
| Variable | Description | Default |
|---|---|---|
HTTP_READ_HEADER_TIMEOUT |
Max time to read request headers (primary slowloris guard) | 5s |
HTTP_READ_TIMEOUT |
Max time to read the full request | 30s |
HTTP_WRITE_TIMEOUT |
Max time to write the response. 0 = unlimited (keep permissive for long responses) |
0 |
HTTP_IDLE_TIMEOUT |
Frees idle keep-alive connections | 120s |
HTTP_SHUTDOWN_TIMEOUT |
Grace period to drain in-flight requests on SIGINT/SIGTERM before a hard close | 30s |
HTTP_MAX_HEADER_BYTES |
Caps total request header size against header-flood memory exhaustion | 1048576 (1 MB) |
MAX_REQUEST_BODY_BYTES |
Caps /mcp and /admin request body size; oversized bodies are rejected with 413. Set higher for large MCP payloads |
10485760 (10 MB) |
HTTP_CSP |
Content-Security-Policy response header. Safe for a JSON-only API (no HTML served). An empty value omits the header |
default-src 'none'; frame-ancestors 'none' |
HTTP_REFERRER_POLICY |
Referrer-Policy response header |
no-referrer |
HTTP_PERMISSIONS_POLICY |
Permissions-Policy response header (empty-deny set). An empty value omits the header |
geolocation=(), camera=(), microphone=() |
Cache¶
| Variable | Description | Default |
|---|---|---|
CACHE_DIR |
Disk cache directory | Platform cache dir (e.g., ~/Library/Caches/web-researcher-mcp) |
CACHE_MAX_MEMORY_MB |
Max memory cache size | 64 |
CACHE_ENCRYPTION_KEY |
64 hex chars for AES-256-GCM | — (plaintext) |
CACHE_ENCRYPTION_KEY_PREV |
Optional 64-hex previous key for zero-downtime key rotation. When set, the disk cache and session store decrypt-fallback to it and lazily re-encrypt with the current key on read. Empty = no fallback | — |
REDIS_URL |
HTTP mode only. When set, enables distributed state across pods (shared cache L2, cross-pod sessions, atomic daily-quota). Requires CACHE_ENCRYPTION_KEY (personal data is encrypted at rest in Redis). Fail-fast: an unreachable Redis at startup is fatal. Unset = per-pod in-memory/disk (unchanged). Ignored in STDIO mode |
— |
SESSION_TTL |
Session idle timeout (the in-memory TTL resets on every read or write of the session) | 4h |
SESSION_DATA_DIR |
Directory for encrypted session files | {CACHE_DIR}/sessions |
SESSION_MAX_STEPS |
Maximum steps per research session before auto-completion | 200 |
Rate Limiting¶
The per-tenant, global, and per-IP limits below apply only in HTTP mode (when PORT is set). MAX_CALLS_PER_DAY is the exception: it is a transport-agnostic in-process cap that also applies in STDIO mode (and to all tools, not just web requests). All other STDIO calls are subject only to upstream API quotas.
| Variable | Description | Default |
|---|---|---|
MAX_CALLS_PER_DAY |
Tool calls per day per (tenant, user) pair (STDIO + HTTP). In-process denial-of-wallet backstop; resets at UTC midnight. |
— (disabled) |
RATE_LIMIT_PER_TENANT |
Requests per minute per tenant | 120 |
RATE_LIMIT_GLOBAL |
Total requests per second | 1000 |
DAILY_QUOTA_PER_TENANT |
Max API calls per tenant per day | 5000 |
RATE_LIMIT_PER_IP |
Requests per minute per client IP, enforced pre-auth (outermost middleware). 0 disables it (default), so zero-config use is never blocked. Set generous (hundreds) for public HTTP |
0 (disabled) |
TRUST_PROXY |
When true, the per-IP limiter reads the leftmost X-Forwarded-For entry (behind a trusted load balancer). Default false uses RemoteAddr only, preventing spoofed-IP bypass |
false |
RATE_LIMIT_PERSIST |
When true, daily-quota counters write through to the encrypted persist store and survive restarts. Default false keeps the pure in-memory zero-config behavior |
false |
How tenant identity works:
- With OAuth configured: tenant ID is extracted from the JWT tenant_id claim. Each authenticated tenant gets independent rate limit buckets.
- Without OAuth: all requests share a single "default" tenant bucket. This means multiple AI sessions hitting the same HTTP instance share 120 req/min by default.
Recommended settings for common scenarios:
# Single developer, multiple AI sessions (no OAuth)
RATE_LIMIT_PER_TENANT=200
DAILY_QUOTA_PER_TENANT=5000
# Team server with OAuth (each team member gets their own bucket)
RATE_LIMIT_PER_TENANT=60
DAILY_QUOTA_PER_TENANT=2000
# High-throughput automation
RATE_LIMIT_PER_TENANT=500
RATE_LIMIT_GLOBAL=5000
DAILY_QUOTA_PER_TENANT=10000
Note: These limits protect the server, not your upstream API quota. Google PSE free tier is 100 queries/day regardless of what you set here. Configure SEARCH_ROUTING with multiple providers if you need higher throughput.
Scraping¶
| Variable | Description | Default |
|---|---|---|
ALLOW_PRIVATE_IPS |
Disable SSRF protection | false |
ALLOWED_DOMAINS |
Domain whitelist (comma-separated) | — (all allowed) |
CHROME_PATH |
Custom Chrome/Chromium binary path; set to "disabled" to turn the browser tier off entirely (no autodetect, no download) |
auto-detect |
MAX_SCRAPE_CONCURRENCY |
Parallel scrape limit | 5 |
MAX_HTML_BYTES |
Decompressed HTML body read cap per scrape tier | 8388608 (8 MB) |
MAX_DOCUMENT_BYTES |
Document (PDF/DOCX/PPTX) download cap | 52428800 (50 MB) |
Features (Opt-In)¶
Additive output features (content-only, no personal data, no model calls):
| Variable | Description | Default |
|---|---|---|
SOURCE_RECOMMENDATIONS |
Surface advisory "related higher-quality sources" on search_and_scrape, derived from the existing transparent quality signals. Content-based; never re-ranks or hides results. Set false to omit the field |
true |
GENERATIVE_UI_ENABLED |
Emit additive, deterministic mcp-auto-formatted components (source cards, quality-comparison table) built from already-extracted data — no model call. Off → output byte-for-byte unchanged |
false |
Regulated features (per-user personal data; each activates the consent subsystem and is covered by the data-subject rights endpoints). All default off. Consent is normally host-asserted over HTTP via POST /admin/consent; in STDIO it is reachable only by setting STDIO_USER_ID (see that row). Per-variable rows note any mode-specific behavior:
| Variable | Description | Default |
|---|---|---|
MEMORY_ENABLED |
Opt-in long-term cross-session memory (memory_save/memory_recall). Consent-gated on the memory purpose |
false |
MEMORY_RETENTION |
Max lifetime of a saved memory before auto-expiry | 2160h (90d) |
USER_ANALYTICS_ENABLED |
Opt-in per-user usage analytics (get_my_analytics). Consent-gated on the analytics purpose |
false |
WORKSPACES_ENABLED |
Opt-in shared research workspaces (workspace_contribute/workspace_read + /admin/workspace/members). Consent-gated on the workspace purpose; membership host-managed |
false |
WORKSPACE_TTL |
Max lifetime of shared-workspace data | 720h (30d) |
STDIO_USER_ID |
STDIO-only. Names the single local user so the per-user regulated features (memory, analytics) work without OAuth. When set (+ the feature flag), consent for memory/analytics (never workspace) is auto-granted at startup — grant-only-if-absent (a later withdrawal is never re-granted), audited via consent.grant. Data keyed (tenant=default, user=<value>). Allowed: A-Za-z0-9._@-, len 1–128, not anonymous. Ignored in HTTP mode |
(unset → anonymous) |
Enabling any regulated feature activates the consent subsystem automatically — there is no standalone
CONSENT_ENABLEDknob. Consent is asserted by the host (viaPOST /admin/consent) and recorded/verified/honored by the server. Seedocs/SECURITY.mdanddocs/SECURITY_AND_COMPLIANCE.md.STDIO single-user exception: STDIO has no OAuth, so by default the user is
anonymousand the per-user regulated features (memory, analytics) stay off (fail-closed). SettingSTDIO_USER_IDis the operator asserting their own identity — in that single-user model the host, operator, and subject are the same person, so the server auto-grants consent formemory/analytics(neverworkspace) at startup. The grant is grant-only-if-absent: a consent decision the user later changes (e.g. a withdrawal recorded out-of-band) is never overwritten on restart, and each grant emits an auditedconsent.grantevent.
Observability¶
| Variable | Description | Default |
|---|---|---|
LOG_LEVEL |
slog level | info |
LOG_FORMAT |
Output format | json |
METRICS_ENABLED |
Enable Prometheus metrics | true |
Audit¶
| Variable | Description | Default |
|---|---|---|
AUDIT_ENABLED |
Enable structured audit logging | true |
AUDIT_OUTPUT_PATH |
File path for audit log output (JSONL format) | — (stderr) |
AUDIT_BUFFER_SIZE |
Internal event buffer size | 1000 |
AUDIT_INCLUDE_REQUEST_BODY |
When true, raw query text is attached to audit metadata. When false, only a length/hash is recorded — raw query text is omitted |
false |
AUDIT_MAX_BYTES |
Rotate the active audit file to a timestamped sibling at this size. File output only; ignored for stderr/STDIO | 104857600 (100 MB) |
AUDIT_RETENTION_DAYS |
Rotated audit files older than this are deleted on startup and hourly. 0 disables cleanup. Any non-zero value is clamped to [180, 3650] per NIS2/HGB retention floors |
180 |
Multi-Tenancy¶
| Variable | Description | Default |
|---|---|---|
CACHE_ISOLATION |
Cache isolation mode (shared or tenant) |
shared |
DATA_REGION |
Advisory label for where cache/session/audit data resides; surfaced in stats/audit. No functional restriction | — (unset) |
When CACHE_ISOLATION=tenant, all cache keys are prefixed with the authenticated tenant ID from the JWT token. This ensures tenant A's cached results are invisible to tenant B. Default (shared) is appropriate for single-tenant deployments or when search results are inherently public. Use tenant for multi-tenant deployments with strict data isolation requirements.
DATA_REGION is an operator-supplied label only (e.g. eu-central, us-east). It is echoed in stats and audit records for residency documentation but does not move, restrict, or constrain where data is physically stored — that is governed by CACHE_DIR, SESSION_DATA_DIR, and AUDIT_OUTPUT_PATH.
Auth (Advanced)¶
| Variable | Description | Default |
|---|---|---|
JWKS_REFRESH_INTERVAL |
How often to refresh JWKS keys | 1h |
ADMIN_API_KEY |
Shared secret gating all /admin/* endpoints, sent as X-Admin-Key (min 16 chars). Generate with openssl rand -hex 32 |
— |
CACHE_ADMIN_KEY |
Deprecated alias for ADMIN_API_KEY (still accepted; logs a startup warning). ADMIN_API_KEY wins if both are set |
— |
Horizontal Scaling¶
Two modes. Without REDIS_URL, the server uses in-memory + encrypted-disk state, per-instance:
- Cache: Each instance has its own memory + disk cache. Cache hits are local only. Acceptable since search results are deterministic (same query = same results).
- Sessions: Persist to local encrypted disk with an in-memory index; survive restarts within the TTL window (default 4h). If a client reconnects to a different instance, the session is not there — use sticky sessions (the typed
session_not_founderror lets clients recover cleanly otherwise). - Rate limits: Per-instance. A tenant hitting N instances gets up to N× the per-tenant limit.
- go-rod browser instances are per-pod. No shared browser pool.
With REDIS_URL set (HTTP mode), distributed state is enabled:
- Cache gains a shared Redis L2 tier (memory L1 → Redis L2 → disk L3), so a query warmed by one pod is served from Redis by the others — upstream quota is burned once, not once-per-pod.
- Sessions live in Redis with a server-side
EXPIRE, so they survive pod restarts and a client reaching any pod finds its research (sticky sessions become optional). - Daily rate quota is enforced fleet-wide via an atomic Redis counter (single
INCRkeyed to a midnight-UTC TTL), so N pods share one limit — no N× over-spend, no double-spend under concurrency. - Token revocation is shared across pods via the same Redis-backed persist store.
- All personal-data namespaces (sessions, persist) are AES-256-GCM encrypted before write — Redis holds only ciphertext, identical at-rest protection to disk.
REDIS_URLtherefore requiresCACHE_ENCRYPTION_KEY. - Fail-fast: if
REDIS_URLis set but Redis is unreachable at startup, the server exits rather than silently degrading to per-pod mode.
Recommendations for multi-instance HTTP deployments:
- Preferred: set
REDIS_URL(+CACHE_ENCRYPTION_KEY) for correct cross-pod sessions, cache, and rate limits. - Without Redis: use sticky sessions at your L7 load balancer and divide rate limits by expected instance count.
go-rodbrowser rendering remains per-pod regardless (stateless, no shared pool needed).
Production Readiness Checklist¶
Before running multiple instances behind a load balancer, work through this checklist. Items marked (per-pod without Redis) behave differently across pods unless REDIS_URL is set.
- [ ] Distributed state — set
REDIS_URL(+CACHE_ENCRYPTION_KEY) to share sessions, cache, and rate limits across pods. This is the recommended multi-instance configuration; the items below are only concerns when Redis is not used. - [ ] Sticky sessions — without Redis, configure session affinity at the L7 load balancer so a client's follow-up
sequential_searchsteps reach the pod holding its session. A step routed to another pod returns a typedsession_not_founderror with arecoveryHint(last known step) so the client can restart cleanly rather than silently forking. (per-pod without Redis) - [ ] Rate-limit math for N pods — without Redis, per-tenant and global limits are per-instance, so N pods allow up to N× the configured value. Set
RATE_LIMIT_PER_TENANT/RATE_LIMIT_GLOBALtodesired_total / N, or useREDIS_URLfor fleet-wide atomic enforcement. (per-pod without Redis) - [ ] Log aggregation — ship each pod's structured JSON audit/log output (stderr or
AUDIT_OUTPUT_PATH) to a central sink. Every audit event carriespod_id(fromHOSTNAME/os.Hostname()) for cross-pod correlation — filter or group by it to trace a request or identify a pod dropping events under backpressure. - [ ] Monitoring & dashboards — scrape
/metrics(Prometheus) from every pod; alert on error rate, upstream-provider failures (circuit-breaker trips), and latency percentiles. Liveness/health/liveand readiness/health/readyare wired for orchestrator probes. - [ ] Encryption key — set
CACHE_ENCRYPTION_KEY(and rotate per Key Rotation) so disk-persisted sessions/cache/quota are encrypted at rest on every pod. - [ ] Admin key — set
ADMIN_API_KEYif you use the/admin/*operational endpoints; it is required to enable them. - [ ] CORS — set
ALLOWED_ORIGINSif a browser client connects directly; the default is fail-closed (see Connecting browser-based clients).
Persistence¶
Two HTTP-mode subsystems can durably persist state across restarts via a single internal persist.Store interface (internal/persist):
- Token revocation — revoked JWT IDs (JTIs) survive a restart so a revoked token stays revoked.
- Daily quota counters — enabled by
RATE_LIMIT_PERSIST=true, so per-tenant daily quotas are not reset by a restart.
The default persist.Store implementation is the same proven encrypted-disk pattern as the session store: AES-256-GCM (using CACHE_ENCRYPTION_KEY, with CACHE_ENCRYPTION_KEY_PREV fallback), atomic temp-file-and-rename writes, 0600 file permissions, an 8-byte big-endian expiry prefix, and an in-memory index. Keys are SHA-256-hashed for the on-disk filename and bound as GCM additional authenticated data so a blob cannot be swapped to a different key's file. Local (memory) and disk implementations behave identically, so there is no behavioral drift between STDIO and HTTP deployments.
When REDIS_URL is set (HTTP mode), a RedisStore satisfying this same interface backs token revocation and the daily quota, so both are shared across pods and survive restarts. Redis-stored values are AES-256-GCM encrypted (parity with disk). All Redis code is isolated in internal/redisbackend — the only package that imports the Redis client — and is constructed in exactly one gated place in main.go, so STDIO and the zero-config path never touch it.
PyPI (uvx / uv / pip)¶
The server is published to PyPI as platform wheels that vendor the prebuilt, signed Go binary — no Go toolchain, no compilation. This is the broadest, fastest path for Python-native users (the uvx one-liner is the officially recommended way to run Python MCP servers):
# Run on demand (no install) — uv fetches the right binary for your platform:
uvx web-researcher-mcp
# Or install as a persistent tool:
uv tool install web-researcher-mcp
# Or via pip:
pip install web-researcher-mcp
Claude Code config (uvx):
{
"mcpServers": {
"web-researcher": {
"command": "uvx",
"args": ["web-researcher-mcp"],
"env": { "GOOGLE_CUSTOM_SEARCH_API_KEY": "...", "GOOGLE_CUSTOM_SEARCH_ID": "..." }
}
}
}
The wheels are py3-none-<platform> (one per OS/arch; the none ABI means any Python 3.10+), built by scripts/build_wheels.py (stdlib-only — no build backend) from the same GoReleaser binaries every other channel ships, and published on each release via PyPI Trusted Publishing (OIDC). Publishing is gated on the PYPI_PUBLISH_ENABLED GitHub Actions repository variable (a CI knob, like SMITHERY_ENABLED/AZURE_SIGNING_ENABLED — not a runtime env var); an unset repo is a clean no-op. The PyPI side uses Trusted Publishing configured against this repo + the release workflow + the pypi environment. The wheel is a thin launcher that execs the bundled binary, so behavior is identical to running it directly.
go install¶
# Install globally
go install github.com/zoharbabin/web-researcher-mcp/cmd/web-researcher-mcp@latest
# The binary is available as:
web-researcher-mcp
Claude Code config (go install):
{
"mcpServers": {
"web-researcher": {
"command": "web-researcher-mcp",
"env": {
"GOOGLE_CUSTOM_SEARCH_API_KEY": "...",
"GOOGLE_CUSTOM_SEARCH_ID": "..."
}
}
}
}
The Go binary runs directly with no wrapper process — clean process lifecycle with immediate EOF detection on parent exit.
Client Configurations¶
Claude Code / Cursor¶
Add to your project root as .mcp.json or run:
claude mcp add --scope user --transport stdio web-researcher -- web-researcher-mcp
Project config (.mcp.json):
{
"mcpServers": {
"web-researcher": {
"command": "web-researcher-mcp",
"args": [],
"env": {
"GOOGLE_CUSTOM_SEARCH_API_KEY": "${GOOGLE_CUSTOM_SEARCH_API_KEY}",
"GOOGLE_CUSTOM_SEARCH_ID": "${GOOGLE_CUSTOM_SEARCH_ID}"
}
}
}
}
VS Code / GitHub Copilot¶
Add .vscode/mcp.json to your project:
{
"inputs": [
{
"id": "google_api_key",
"type": "promptString",
"description": "Google Custom Search API key",
"password": true
},
{
"id": "google_cx",
"type": "promptString",
"description": "Google Custom Search engine ID"
}
],
"servers": {
"web-researcher": {
"command": "web-researcher-mcp",
"args": [],
"env": {
"GOOGLE_CUSTOM_SEARCH_API_KEY": "${input:google_api_key}",
"GOOGLE_CUSTOM_SEARCH_ID": "${input:google_cx}"
}
}
}
}
Claude Desktop¶
Download the .mcpb bundle for your platform from GitHub Releases and open it in Claude Desktop. It will prompt for your API keys.
Or manually add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"web-researcher": {
"command": "/usr/local/bin/web-researcher-mcp",
"env": {
"GOOGLE_CUSTOM_SEARCH_API_KEY": "...",
"GOOGLE_CUSTOM_SEARCH_ID": "..."
}
}
}
}
Windsurf¶
Add to ./codeium/windsurf/model_config.json:
{
"mcpServers": {
"web-researcher": {
"command": "web-researcher-mcp",
"args": [],
"env": {
"GOOGLE_CUSTOM_SEARCH_API_KEY": "...",
"GOOGLE_CUSTOM_SEARCH_ID": "..."
}
}
}
}
Docker (any client)¶
docker run -i --rm \
-e GOOGLE_CUSTOM_SEARCH_API_KEY=... \
-e GOOGLE_CUSTOM_SEARCH_ID=... \
zoharbabin/web-researcher-mcp:latest
Use with any MCP client that supports Docker transports or pipe STDIO to the container.
MCP Registry & Marketplace¶
This server is distributed via:
| Registry | Config File | Status |
|---|---|---|
| Official MCP Registry | server.json |
Publish with mcp-publisher publish |
| Smithery.ai | smithery.yaml |
Auto-detected from repo root |
| Docker Hub | docker.io/zoharbabin/web-researcher-mcp |
Published on release |
| GHCR | ghcr.io/zoharbabin/web-researcher-mcp |
Published on release |
| GitHub Releases | .mcpb bundles |
Attached per-platform on release |
Health Checks¶
| Endpoint | Method | Response | Use |
|---|---|---|---|
/health/live |
GET | 200 OK always (ok) |
K8s liveness probe |
/health/ready |
GET | 200 OK (ready/snapshot); 503 when all provider breakers are open |
K8s readiness probe |
/health/live is a static process-up check: a 200 means the process is running
and the HTTP listener is bound (the server completes all initialization —
providers, cache, sessions, audit — before binding the port, so a successful
connection already implies a fully-constructed server). A degraded-but-alive
process must not be killed, so liveness never flips on dependency state.
/health/ready reflects whether the pod can serve a query. With multi-provider
routing configured, it returns 503 (body {"status":"unhealthy"}) only when
every provider's circuit breaker is open — the pod can serve nothing and should
be pulled from the load balancer — and 200 otherwise (healthy/degraded,
since fallback providers still serve). With no routing (single-provider /
zero-config) there is no breaker ladder, so it stays a static 200. The body is
the aggregate status only; the per-provider breaker list is operator data behind
the admin-gated dashboard and diagnostics://health, not this unauthenticated probe.
Graceful Shutdown¶
On SIGINT/SIGTERM (HTTP mode), or SIGINT/SIGTERM/stdin EOF (STDIO mode):
1. Stop accepting new connections
2. Drain in-flight requests (HTTP_SHUTDOWN_TIMEOUT, default 30s; hard close on timeout)
3. Flush cache to disk
4. Close audit logger (drains buffered events including swap file)
5. Terminate headless browsers
6. Exit 0
No orphan processes. No watchdog needed.
Admin Endpoints (HTTP Mode)¶
All admin endpoints require the X-Admin-Key header matching the ADMIN_API_KEY env var (the deprecated CACHE_ADMIN_KEY is still accepted). The header is compared in constant time. They are separate from OAuth — admin auth is a simple shared secret for operational use.
| Method | Path | Purpose |
|---|---|---|
| DELETE | /admin/cache |
Flush all cache (memory + disk) |
| DELETE | /admin/sessions |
Kill all active sessions |
| GET | /admin/analytics |
Per-tenant aggregate usage (calls, error/cache-hit rates, provider breakdown, latency percentiles) for billing/capacity. Optional ?tenant_id= filter. Aggregate-only — no per-query or per-user content |
| GET | /admin/data?tenant_id=&user_id= |
GDPR access/portability (Art. 15/20): JSON export of all data held for a data subject across every registered store. tenant_id required; user_id optional |
| DELETE | /admin/data?tenant_id=&user_id= |
GDPR erasure (Art. 17): purge the subject's data across all stores and withdraw their consent; records a data.erasure audit event |
| POST | /admin/consent |
Record a host-asserted consent decision. Body: {tenant_id, user_id, purpose, granted, terms_version?}. Only present when a regulated feature is enabled |
| GET | /admin/consent?tenant_id=&user_id=&purpose= |
Query the current consent decision for a subject + purpose |
| POST | /admin/workspace/members |
Add a member to a shared workspace (host's RBAC hook). Body: {workspace_id, tenant_id, user_id}. Only present when WORKSPACES_ENABLED |
| DELETE | /admin/workspace/members |
Remove a member from a shared workspace. Body: {workspace_id, tenant_id, user_id}. Only present when WORKSPACES_ENABLED |
| GET | /dashboard/data |
Aggregate JSON powering the operator dashboard (tool stats, active sessions, rate-limit config, provider health, recent errors). Aggregate-only — no per-user/per-query data. Registered with the dashboard (admin key required) |
These are HTTP-only operational endpoints, not exposed via MCP tools. The /admin/data endpoints exist only when a personal-data store is registered; /admin/consent and /admin/workspace/members only when the corresponding regulated feature is enabled.
Operator Observability¶
Three operator-facing surfaces expose runtime behavior without leaking infrastructure into LLM content. They share one rule: routing/health/error internals are operator/debug data, never part of a tool's model-facing result body. The provider name is the disclosure boundary — no upstream URLs, credentials, or breaker counts are surfaced anywhere.
Per-call routing (_meta.routing)¶
When SEARCH_ROUTING is active, search-family tool results carry a routing block on the MCP _meta channel (LLM-invisible, client-app visible): provider_used, providers_attempted, fallback, a coarse fallback_reason (circuit_open / primary_unavailable), cache_hit, and latency_ms. It answers "why did I get Google when I expected Brave?". Full field contract: see Routing Provenance in docs/TOOLS.md. The same summary is mirrored to audit.AuditEvent.Metadata["routing"].
On-demand diagnostics (MCP Resources)¶
Read-only Resources beside stats://*, for operators to read on demand:
| URI | Returns |
|---|---|
diagnostics://errors/recent |
The most recent tool errors (bounded ring, newest-first): tool, error kind, provider, redacted cause. Memory-only and bounded — no unbounded accumulation, no disk. Scoped to the caller's tenant when authenticated. Causes pass through audit.MaskSecrets, so no secrets, user queries, or full URLs appear |
diagnostics://health |
Live provider health: an overall status (healthy / degraded / unhealthy) plus each routed provider's circuit-breaker state. Complements stats://providers (which lists configured providers) with current availability. Empty/healthy when multi-provider routing is not enabled (no breaker ladder to observe) |
Operator dashboard (HTTP mode)¶
A lightweight, read-only, aggregate-only dashboard at GET /dashboard for self-hosters who don't run their own Grafana/Prometheus stack. It is a single self-contained HTML page (no CDN, no build step) that polls the admin-gated GET /dashboard/data and renders per-tool call counts / latency (avg, p95) / error rates, active session count, rate-limit configuration, live provider/breaker health, and the recent-errors ring.
- Auth: the page is an inert shell that prompts for the admin key client-side;
GET /dashboard/datais gated byX-Admin-Keyexactly like/admin/*. Both routes register only whenADMIN_API_KEYis set. - CSP: each page response sets a per-request nonce-based
Content-Security-Policy(default-src 'none'; nonce'd inline script/style;connect-src 'self';frame-ancestors 'none') — nounsafe-inline, no third-party origins. - STDIO unaffected: the dashboard is HTTP-only by construction (it lives in
ServeHTTP). - No new data: it visualizes aggregate operational data that already exists via
/metricsand the Resources above — no per-user, per-query, or tenant-identifiable data, and no new collection.
Key Rotation¶
The server uses two independent secrets. Both rotate without downtime.
Admin key (ADMIN_API_KEY)¶
The admin key is stateless — rotating it is a single env-var change:
- Generate a new key:
openssl rand -hex 32. - Update
ADMIN_API_KEYin your deployment and restart (or rolling-restart) the pods. - Update any operational scripts/dashboards that send
X-Admin-Key.
There is no stored state encrypted under the admin key, so no migration is needed. In a rolling deployment, in-flight admin calls against an old pod use that pod's old key until it cycles; admin endpoints are operational, not user-facing, so a brief overlap is harmless.
Encryption key (CACHE_ENCRYPTION_KEY) — zero-downtime re-encryption¶
Disk-persisted data (cache, sessions, and any encrypted persist.Store namespace) is sealed with AES-256-GCM under CACHE_ENCRYPTION_KEY. Rotating it without stranding existing data uses the previous-key fallback:
- Move the current key to
CACHE_ENCRYPTION_KEY_PREVand set a new 64-hexCACHE_ENCRYPTION_KEY(generate withopenssl rand -hex 32). - Restart. On every read, data sealed with the previous key is decrypt-fall-back decrypted and lazily re-encrypted with the new key — so hot data migrates automatically with no flush and no downtime.
- After at least one full data lifetime (e.g.
SESSION_TTLfor sessions, the cache TTL for cache) has elapsed, removeCACHE_ENCRYPTION_KEY_PREV. Any still-unread blobs from before the rotation expire naturally.
To force immediate re-encryption rather than waiting for natural reads, flush the affected store (DELETE /admin/cache, DELETE /admin/sessions) after step 2 — data repopulates under the new key on demand.
Compliance note: rotating
CACHE_ENCRYPTION_KEYperiodically (and immediately on suspected exposure) satisfies common key-lifecycle controls (e.g. NIST SP 800-57 crypto-period guidance). The previous-key window should be kept as short as your longest TTL; never keep more than one previous key.
MCP Resources & Prompts¶
Resources¶
| URI | Description |
|---|---|
stats://tools |
Per-tool execution metrics (totalCalls, avgLatencyMs, etc.) |
stats://sessions |
Count of active sequential research sessions |
stats://rate-limits |
Rate limit config and usage (per-tenant limits, daily quota remaining, reset time) |
stats://providers |
Search, patent, and academic providers currently configured and available |
lenses://catalog |
All registered lenses with their names, domains, and descriptions |
diagnostics://errors/recent |
Bounded ring of recent errors for operator diagnostics |
diagnostics://health |
Server health — version, uptime, provider availability |
research://artifact/{id} |
Large-payload resource store for tool results that exceed inline size limits |
Prompts¶
| Prompt | Description | Required Args |
|---|---|---|
comprehensive-research |
Multi-step research process | topic |
fact-check |
Verify a claim from multiple sources | claim |
competitive-analysis |
Research competitors in a market | company |
literature-review |
Systematic academic literature review | topic |