Skip to content

Contributing to web-researcher-mcp

Thanks for contributing to web-researcher-mcp! We build reliable research tools for AI assistants via the Model Context Protocol, and every contribution makes a difference.

Whether you're fixing a typo, adding a search provider, improving docs, or proposing a new tool — your help matters.

Table of Contents

Development Setup

Prerequisites

  • Go — version requirement is specified in go.mod (the toolchain directive pins the exact patched release; Go auto-downloads it, so you never build with an unpatched compiler)
  • API keys (for integration/E2E testing):
  • Google Custom Search: GOOGLE_CUSTOM_SEARCH_API_KEY and GOOGLE_CUSTOM_SEARCH_ID
  • Brave Search (optional): BRAVE_API_KEY
  • Chrome/Chromium — optional, only needed for headless scraping features

Linters and the vulnerability scanner are not separate installs — they are pinned in go.mod as tool directives and invoked via go tool, so every contributor and CI run uses byte-identical versions (no drift, no "works on my machine").

One-time setup

make tools   # warms the pinned golangci-lint + govulncheck + gosec (go tool fetches on first use anyway)
make hooks   # installs the git pre-commit hook (fmt + vet + lint on staged files)

The pre-commit hook keeps commits fast by checking only staged Go files with the quick gates; the full suite (race, vuln, e2e) runs in CI. Bypass a hook in an emergency with git commit --no-verify — CI still enforces everything.

Getting Started

# Clone the repository
git clone https://github.com/zoharbabin/web-researcher-mcp.git
cd web-researcher-mcp

# Download dependencies
go mod download

# Build the binary
go build -o web-researcher-mcp ./cmd/web-researcher-mcp

# Verify everything works
go test ./...

Environment Setup

Copy the environment variables you need for testing:

export GOOGLE_CUSTOM_SEARCH_API_KEY="your-key"
export GOOGLE_CUSTOM_SEARCH_ID="your-cx"
# Optional:
export BRAVE_API_KEY="your-brave-key"
export SEARCH_PROVIDER="google"  # or brave, serper, searxng, searchapi, duckduckgo, tavily, exa, hackernews (see search.SupportedProviders)

Unit and integration tests do not require API keys. Only E2E tests that hit live services need them.

Running Tests

# Unit and integration tests (no API keys needed)
go test ./...

# With race detector (recommended before submitting)
go test -race ./...

# E2E tests (requires API keys and the e2e build tag)
go test -tags=e2e -count=1 -v ./tests/e2e/...

# Benchmarks (with memory allocation stats)
make test-bench
# Or directly:
go test -bench=. -benchmem ./tests/benchmark/

# Specific package
go test ./internal/scraper/...

# With coverage report
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out

Linting

Tools are pinned in go.mod and invoked through go tool so every contributor and CI run uses byte-identical versions. Use the make targets, or the go tool … form if running directly — never the bare globally-installed binaries.

# Run all linters (Makefile target wraps `go tool golangci-lint run`)
make lint

# Auto-fix where possible
go tool golangci-lint run --fix

# Vet
go vet ./...

# Static security analysis (gosec)
make sec

# Dependency vulnerability check
make vuln

Build

# Standard build
go build -o web-researcher-mcp ./cmd/web-researcher-mcp

# With version info (reads from VERSION file)
go build -ldflags "-X main.version=$(cat VERSION)" -o web-researcher-mcp ./cmd/web-researcher-mcp

# Docker build
docker build -t web-researcher-mcp .

Code Style

This project follows Effective Go and enforces style via golangci-lint. Key principles:

  1. Accept interfaces, return structs — for testability and clarity
  2. Context is always the first parameterfunc DoThing(ctx context.Context, ...)
  3. Error messages are lowercase, no punctuationfmt.Errorf("invalid query: %w", err)
  4. Exported names have doc comments — unexported names generally don't need them
  5. One package per concern — no "utils" or "helpers" packages
  6. Wrap errors with contextfmt.Errorf("brave search for %q: %w", query, err)
  7. Table-driven tests — with t.Parallel() where possible
  8. No global state — all dependencies are injected

See ARCHITECTURE.md for package organization and docs/TOOLS.md for tool specifications.

Before Submitting

Run the full check suite — the same gate CI enforces:

make verify   # fmt-check + vet + lint + sec + vuln + validate-lenses + test-race + test-e2e + check-python-drift + test-python + build

Commit Messages

This project uses Conventional Commits. Each commit message should follow this format:

<type>(<optional scope>): <description>

[optional body]

[optional footer(s)]

Types

Type Purpose
feat A new feature
fix A bug fix
docs Documentation only changes
style Formatting, missing semicolons, etc. (no code change)
refactor Code change that neither fixes a bug nor adds a feature
perf Performance improvement
test Adding or updating tests
build Build system or external dependency changes
ci CI configuration changes
chore Other changes that don't modify src or test files

Examples

feat(search): add Brave Search provider
fix(scraper): handle timeout on large PDF downloads
docs: update deployment guide for patent providers
test(cache): add benchmark for hybrid cache operations
refactor(content): extract sanitization into pipeline pattern
perf(scraper): reduce allocations in HTML parsing

Breaking Changes

Append ! after the type/scope, and include a BREAKING CHANGE: footer:

feat(auth)!: require OAuth 2.1 for HTTP transport

BREAKING CHANGE: HTTP transport now requires a valid JWT token.
STDIO transport is unaffected.

Pull Request Process

  1. Fork and branch — create a branch from main with a descriptive name:
  2. feat/brave-search-provider
  3. fix/ssrf-ipv6-bypass
  4. docs/quickstart-guide

  5. Keep changes focused — one logical change per PR. Split large features into smaller, reviewable pieces.

  6. Ensure quality before requesting review — one command runs the full gate:

  7. make verify — formatting, vet, lint, gosec, govulncheck, validate-lenses, race tests, e2e, python-drift check, python tests, build
  8. (individual targets exist too: make test-race, make lint, make sec, make vuln)
  9. New code has tests; documentation updated if behavior changes

main is branch-protected: the Lint, Test, Security (govulncheck + gosec), and E2E CI checks must all pass before a PR can merge, the branch must be up to date with main, linear history is required, and all PR conversations must be resolved. Running make verify locally reproduces the CI checks exactly (same pinned tool versions via go tool). Human approval is not required (the repo is maintainer-driven — see the merge policy below).

  1. Write a clear PR description — explain what changed and why. Include:
  2. Summary of changes
  3. Motivation/context
  4. Testing done
  5. Screenshots (if UI-related)

  6. Respond to review feedback — push additional commits (don't force-push during review). Squash will happen at merge.

  7. Benchmarks — if your change touches hot paths (cache, scraping pipeline, content processing), include before/after benchmark results.

PR Checklist

  • [ ] Full gate passes locally (make verify)
  • [ ] New functionality has tests
  • [ ] Documentation updated (if applicable)
  • [ ] Commit messages follow Conventional Commits

Maintainer Merge Policy

main requires zero human approvals (this is a maintainer-driven repo, so a required-reviewer rule would just block the maintainer's own PRs). Quality is held by two gates instead: the CI checks above, and a mandatory Copilot review as a second set of eyes. Every PR is reviewed by Copilot and every finding is either fixed or rebutted before merge.

How Copilot review is triggered: by the repo setting Settings → Rules → Rulesets → "Request pull request review from Copilot" (a one-time UI toggle). Copilot cannot be requested per-PR via the API or gh — it is not a collaborator, so gh pr edit --add-reviewer and the requested_reviewers REST/GraphQL endpoints all reject it. The automatic setting is the only mechanism; if a fast PR merges before Copilot posts, address its findings in a follow-up PR.

Per-PR cycle the maintainer follows:

  1. Open the PR; CI runs and Copilot review is auto-requested.
  2. Wait for Copilot's review to post (copilot-pull-request-reviewer[bot]).
  3. For each Copilot finding: fix it, or reply in-thread explaining why it's incorrect — then resolve the conversation. (Copilot only ever COMMENTED, never APPROVED, so its review can't satisfy an approval gate by design.)
  4. Confirm all CI checks are green and every Copilot thread is resolved.
  5. Merge: gh pr merge <N> --squash --admin.
# Inspect Copilot's findings on a PR. Note the two different bot logins:
# the review summary is authored by `copilot-pull-request-reviewer`, but the
# inline review comments are authored by `Copilot`.
gh pr view <N> --json reviews \
  --jq '.reviews[] | select(.author.login=="copilot-pull-request-reviewer") | .body'
gh api repos/zoharbabin/web-researcher-mcp/pulls/<N>/comments \
  --jq '.[] | select(.user.login=="Copilot") | "\(.path):\(.line // .original_line)  \(.body)"'

# After CI is green and every finding is addressed/resolved:
gh pr merge <N> --squash --admin

--admin clears the conversation-resolution/up-to-date formalities at merge time; it is not a substitute for steps 2–3 — never run it before Copilot's findings are genuinely addressed.

Issue Guidelines

Reporting Bugs

Please include: - Go version (go version) - Operating system and architecture - Steps to reproduce - Expected vs. actual behavior - Relevant logs or error messages (redact any API keys)

Requesting Features

Please include: - Use case description — what problem does this solve? - Proposed solution (if you have one) - Alternatives considered - Whether you'd be willing to implement it

Security Issues

Do NOT report security vulnerabilities via public issues. See SECURITY.md for responsible disclosure instructions.

Code of Conduct

This project follows the Contributor Covenant Code of Conduct. By participating, you are expected to uphold this code. To report unacceptable behavior, see the confidential reporting instructions in CODE_OF_CONDUCT.md.

When to use which extension point

Not sure whether your idea should be a Tool, Provider, Lens, Enrichment Resolver, Prompt, or Resource? See docs/EXTENSION_GUIDE.md for a decision path and canonical examples.

Adding a New Tool

Adding a tool requires:

  1. Create the handler in internal/tools/<toolname>.go:
package tools

type myToolInput struct {
    Query string `json:"query" jsonschema:"Search query,required"`
}

func registerMyTool(srv *mcp.Server, deps Dependencies) {
    mcp.AddTool(srv, &mcp.Tool{
        Name:         "my_tool",
        Description:  "One-line description for the AI assistant",
        Annotations:  readOnlyAnnotations(true, true),
        OutputSchema: myToolOutputSchema,
    }, func(ctx context.Context, req *mcp.CallToolRequest, input myToolInput) (*mcp.CallToolResult, any, error) {
        start := time.Now()
        // Implementation here — use deps.Cache, deps.Search, etc.
        deps.Metrics.RecordToolCall("my_tool", time.Since(start), nil, "", false)
        auditToolCall(ctx, deps, "my_tool", time.Since(start), nil, "")
        return structuredResult(jsonBytes), nil, nil
    })
}
  1. Register it in internal/tools/registry.go — add registerMyTool(srv, deps) to RegisterAll().

  2. Add tests in internal/tools/tools_test.go or a dedicated <toolname>_test.go; add the tool name to expectedTools in internal/tools/metadata_test.go.

  3. Document it in docs/TOOLS.md with a ## Tool N: \name`section — the drift testTestToolsDocMatchesRegistry(internal/tools/metadata_test.go`) fails CI if a registered tool is undocumented or vice-versa.

  4. Regenerate the Python client — run make gen-python-client and commit the result. This updates python/web_researcher_mcp/{models.py,client.py,__init__.py} with the new typed method and response class. The python-drift CI job and pre-commit hook both fail if you skip this step.

Key conventions: - All tool inputs use typed structs with jsonschema tags (the SDK auto-generates JSON Schema from these) - Use deps.Cache for caching, deps.Metrics for telemetry, deps.Auditor for audit logging - Return validation errors via toolError(msg), upstream errors via upstreamErrorResponse(toolName, err), success via structuredResult(jsonBytes) — these helpers are defined in internal/tools/search.go; scrapeErrorResponse is in internal/tools/scrape.go; the ToolError types and structuredError are in internal/tools/errors.go (see docs/ERROR_HANDLING.md for the full pattern)

Most tools are read-only. For the rare tool that mutates server-side state (e.g. memory_save, workspace_contribute, archive_source):

  • Annotate with writeAnnotations(idempotent) instead of readOnlyAnnotations(...). ReadOnlyHint becomes false; DestructiveHint stays falsedeletion is never a tool flag, it is the GDPR erasure endpoint (DELETE /admin/data). Update the writeTools set and add a case in TestAllToolsHaveAnnotations (internal/tools/metadata_test.go).
  • If the tool processes per-user personal data, gate it on consent: if deps.Consent == nil || !deps.Consent.HasConsent(ctx, consent.PurposeXxx) { return structuredResult(... "status":"no_consent" ...) }. Take the subject from auth.UserIDFromContext(ctx) / auth.TenantIDFromContext(ctx) — never from a tool parameter. Refuse anonymous.
  • Register conditionally in RegisterAll() — only when the feature dependency is non-Noop (mirror the if _, isNoop := deps.X.(*pkg.Noop); deps.X != nil && !isNoop pattern), so the default tool surface is unchanged.
  • Register the store's Exporter/Eraser into the data-subject registry (internal/datasubject) in main.go so the data is covered by /admin/data export + erasure.
  • Add the feature dependency to setupTestDeps() (internal/tools/tools_test.go) so the conditionally-registered tool is visible to the drift tests, and add the tool name to expectedTools (metadata_test.go).

Docs-only PRs skip the Go drift gates. CI sets code=false and skips the test job when every changed file is docs/meta. A pure-doc edit to docs/TOOLS.md will NOT run the drift tests on that PR (the standalone docs-drift job covers this — see .github/workflows/ci.yml). When a doc edit pairs with a tool/schema change, keep the code file in the same PR so the gates fire.

Adding a Search Provider

Web search providers implement the search.Provider interface (Web, Images, News, Name) — the core extension path.

  1. Implement search.Provider in internal/search/<name>.go (add a var _ Provider = (*XProvider)(nil) assertion; return (nil, nil) from any unsupported sub-capability such as Images — never an error, which would trip the breaker).
  2. Wire the factory — add a case to both NewProvider() and NewProviderByName() in internal/search/provider.go. These are separate switch statements, so both need a new case. The credential check lives in the NewProviderByName() case (return the provider only when its key is set).
  3. Add the credential/config env var to internal/config/config.go and document it in .env.example.
  4. Make it discoverable — add the name to search.SupportedProviders. AvailableProviders() ranges over that list (constructing each via NewProviderByName()), so no edit there — the Router picks it up automatically.

Academic providers implement search.AcademicProvider and register via NewAcademicProviderByName() (internal/search/domain.go) + AvailableAcademicProviders(). See the existing openalex.go / crossref.go for the pattern.

Adding a Patent Provider

Patent providers implement the PatentProvider interface for structured patent search from authoritative APIs.

  1. Create the provider in internal/search/<provider>.go:
package search

type MyProvider struct {
    apiKey  string
    deps    Deps
}

func NewMyProvider(apiKey string, deps Deps) *MyProvider {
    return &MyProvider{apiKey: apiKey, deps: deps}
}

func (p *MyProvider) Name() string { return "myprovider" }

func (p *MyProvider) Metadata() ProviderMeta {
    return ProviderMeta{
        Regions:      []string{"US"},       // or []string{"*"} for worldwide
        Capabilities: []string{"search", "biblio"},
        RateClass:    "metered",
        Description:  "My Provider — brief description",
    }
}

func (p *MyProvider) Patents(ctx context.Context, params PatentSearchParams) ([]PatentResult, error) {
    // Wrap in circuit breaker, call API, parse response
    var results []PatentResult
    err := p.deps.Breaker.Execute(func() error {
        // API call and parsing here
        return nil
    })
    return results, err
}
  1. Register it — add a case to NewPatentProviderByName() in internal/search/domain.go and add the provider name to SupportedPatentProviders in the same file. AvailablePatentProviders() iterates that slice, so both edits are required. Add the env var to internal/config/config.go.

  2. Add tests — create internal/search/<provider>_test.go with httptest mocks and a _live_test.go that skips without credentials.

  3. Document — add the env var to .env.example and setup instructions to docs/API_SETUP.md.

The ProviderMeta.Regions field controls intelligent routing — set it to the jurisdictions your provider covers so queries for other regions skip it automatically.

Adding a Structured-Domain Capability

New structured-research domains (financial filings, case law, economic data, …) follow one repeatable pattern, proven by the SEC EDGAR / CourtListener / FRED trio in internal/search/structured_domains.go. Unlike web providers, these are not Router-routed — they resolve directly from the Dependencies maps in the tool layer, like the synthesis tools.

  1. Define the capability in internal/search/structured_domains.go (or a sibling file): a …Searcher interface (the method), a …Provider interface (…Searcher + Name() + Metadata()), the …SearchParams / …Result structs, a …ProviderConfig, a Supported…Providers slice, a New…ProviderByName() factory, and an Available…Providers() constructor (it gives each provider its own circuit.New(...) breaker — copy the EDGAR/FRED shape exactly).
  2. Implement the provider in internal/search/<name>.go with the usual Deps{HTTPClient, Breaker}, a SetBaseURL (or SetBaseURLs for providers with multiple base URLs, as in edgar.go) test hook, typed 429/404/4xx handling, io.LimitReader-bounded reads, and a var _ …Provider = (*XProvider)(nil) assertion. Mirror edgar.go / courtlistener.go / fred.go.
  3. Add the tool in internal/tools/<name>.go: a typed input struct (use omitempty + in-handler validation for "provide X or Y" inputs — a ,required jsonschema tag is enforced by the SDK before your handler runs), a resolve…Searcher helper, structuredResult, readOnlyAnnotations(true, true), the untrusted-external-content trust marker, audit + metrics, and an output schema in internal/tools/schemas.go. Register it conditionally in RegisterAll() (internal/tools/registry.go) and add a provider map to the Dependencies struct.
  4. Wire it in cmd/web-researcher-mcp/main.go (construct the config + Available…Providers map) and add the credential/contact env var to internal/config/config.go + .env.example.
  5. Test + documenthttptest unit tests + a //go:build live test that skips without credentials; add the tool to expectedTools, setupTestDeps, and the trust-marker/output-schema maps in metadata_test.go; write a ## Tool N: section in docs/TOOLS.md and a setup section in docs/API_SETUP.md.

Getting Help

Recognition

Contributors are recognized in release notes. Significant contributions may be highlighted in the README.

Thank you for helping make web-researcher-mcp better!