# GoldenMatch MCP server

Find duplicate records in 30 seconds. Zero-config entity resolution, 97.2% F1 out of the box.

## Links
- Registry page: https://www.getdrio.com/mcp/io-github-benzsevern-goldenmatch
- Repository: https://github.com/benzsevern/goldenmatch
- Website: https://benzsevern.github.io/goldenmatch/

## Install
- Command: `uvx goldenmatch`
- Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- Auth: Not captured

## Setup notes
- Package: Pypi goldenmatch v1.4.5
- Remote endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/

## Tools
- analyze_data - Profile data, detect domain, recommend ER strategy Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- auto_configure - Run AutoConfigController on a CSV; return the committed GoldenMatchConfig (incl. negative_evidence / Path Y when chosen) plus telemetry — stop_reason, health, decision trace, indicator column priors. Programmatic equivalent of `goldenmatch autoconfig`. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- controller_telemetry - Return the AutoConfigController telemetry from the most recent `auto_configure` or `agent_deduplicate` call in this MCP session. Same JSON shape as the web /api/v1/controller/telemetry endpoint. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- agent_deduplicate - Run full ER pipeline with confidence gating and reasoning Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- agent_match_sources - Match two files with intelligent strategy selection Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- agent_explain_pair - Natural language explanation for a record pair Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- agent_explain_cluster - Explain why records are in the same cluster Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- agent_review_queue - Get borderline pairs awaiting approval Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- agent_approve_reject - Approve or reject a review queue pair Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- agent_compare_strategies - Compare ER strategies on your data Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- suggest_pprl - Check if data needs privacy-preserving matching Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- scan_quality - Run GoldenCheck data quality scan on a CSV file. Returns issues found (encoding errors, Unicode problems, format violations) without applying fixes. Requires goldencheck: pip install goldenmatch[quality] Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- fix_quality - Run GoldenCheck scan and apply fixes to a CSV file. Returns the fixed data summary and a manifest of all fixes applied. Requires goldencheck: pip install goldenmatch[quality] Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- run_transforms - Run GoldenFlow data transforms on a CSV file. Normalizes phone numbers (E.164), dates (ISO), categorical spelling, and Unicode issues. Returns a manifest of transforms applied. Requires goldenflow: pip install goldenmatch[transform] Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- list_corrections - List stored Learning Memory corrections, optionally filtered by dataset. Returns id_a, id_b, decision, source, trust, reason, matchkey_name, dataset, original_score, created_at. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- add_correction - Add a pair correction to Learning Memory. Source is set to 'agent' with trust=0.5 (lower than human steward decisions which are 1.0). Pair (id_a, id_b) is canonicalized to (min, max) before storage. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- learn_thresholds - Force a MemoryLearner pass over accumulated corrections. Returns the list of LearnedAdjustments produced (matchkey_name, threshold, sample_size, learned_at). Requires >= 10 corrections per matchkey before threshold tuning fires; otherwise returns an empty list. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- memory_stats - Return Learning Memory status: total correction count, last learn time, and current learned adjustments. Cheap; safe for status checks. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- memory_export - Return all corrections as a list of dicts (CSV-shaped). Caller is responsible for writing the file. Optionally filter by dataset. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- identity_resolve - Resolve a record_id to its durable identity. Returns the full identity view (members, evidence edges, recent events) or null when no identity exists for that record. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- identity_list - List identities, optionally filtered by dataset/status. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- identity_history - Return the temporal event log for an identity. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- identity_conflicts - List evidence edges marked `conflicts_with`. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- identity_merge - Manually merge two identities. All records from `absorb_entity_id` are reassigned to `keep_entity_id`. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- identity_split - Split a subset of records off an identity into a brand-new identity. The original keeps the remaining records. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- get_stats - Get dataset statistics: record count, cluster count, match rate, cluster sizes. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- find_duplicates - Find duplicate matches for a record. Provide field values to search against the loaded dataset. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- explain_match - Explain why two records match or don't match. Shows per-field score breakdown. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- list_clusters - List duplicate clusters found in the dataset. Returns cluster IDs, sizes, and member counts. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- get_cluster - Get details of a specific cluster: all member records and their field values. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- get_golden_record - Get the merged golden (canonical) record for a cluster. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- match_record - Match a single record against the loaded dataset in real-time. Paste a record's fields and instantly see if it matches any existing record. Uses the configured matchkeys, scorers, and thresholds. Example: {"name": "John Smith", "email": "john@test.com", "zip": "10001"} Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- unmerge_record - Remove a record from its cluster. The record becomes a singleton. Remaining cluster members are re-clustered using stored pair scores. Use this to fix bad merges. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- shatter_cluster - Break an entire cluster into individual records. All members become singletons. Use when a cluster is completely wrong. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- suggest_config - Analyze bad merges and suggest config changes. Provide examples of incorrect merges (pairs that should NOT have matched) and GoldenMatch will identify which fields/thresholds to tighten. Example: [{"record_a": {...}, "record_b": {...}, "reason": "different people"}] Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- profile_data - Get data quality profile: column types, null rates, unique counts, sample values. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- export_results - Export matching results to a file (CSV or JSON). Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- list_domains - List available domain extraction rulebooks (built-in + user-defined). Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- create_domain - Create a custom domain extraction rulebook. Define patterns for a specific data domain (medical devices, automotive parts, real estate, etc.). Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- test_domain - Test a domain extraction rulebook against sample records. Shows what features would be extracted from the loaded data. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- pprl_auto_config - Analyze the loaded dataset and recommend optimal PPRL (privacy-preserving record linkage) configuration. Returns recommended fields, bloom filter parameters, threshold, and explanation. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/
- pprl_link - Run privacy-preserving record linkage between two parties' data. Computes bloom filters, matches records without sharing raw data. Specify fields, threshold, and security level. Endpoint: https://goldenmatch-mcp-production.up.railway.app/mcp/

## Resources
Not captured

## Prompts
- deduplicate-walkthrough - Step-by-step guided deduplication workflow: profile data, configure matching, run, review results, fix bad merges. Arguments: focus
- investigate-cluster - Deep-dive into a specific cluster: explain why records matched, identify potential bad merges, suggest fixes. Arguments: cluster_id
- compare-records - Detailed comparison of two records: field-by-field scoring, match/no-match verdict, explanation. Arguments: record_a, record_b
- data-quality-audit - Full data quality audit: profile columns, identify issues, recommend cleaning steps before matching.
- pprl-setup - Guide through privacy-preserving record linkage setup: assess data sensitivity, recommend PPRL config, run linkage. Arguments: security_level

## Metadata
- Owner: io.github.benzsevern
- Version: 1.4.8
- Runtime: Pypi
- Transports: STDIO, HTTP
- License: Not captured
- Language: Not captured
- Stars: Not captured
- Updated: Jun 1, 2026
- Source: https://registry.modelcontextprotocol.io