# XFMS — Model Source MCP server

Pick the right LLM for any task. Ranked shortlist with rationale across 8 evaluators.

## Links
- Registry page: https://www.getdrio.com/mcp/dev-xpansion-xfms
- Repository: https://github.com/VisionAIrySE/XFMS
- Website: https://xpansion.dev/xfms

## Install
- Endpoint: https://xfms.vercel.app/mcp/
- Auth: Auth required by registry metadata

## Setup notes
- Remote header: Authorization (required; secret)
- The upstream registry signals required auth or secrets.
- Remote endpoint: https://xfms.vercel.app/mcp/
- Header: Authorization

## Tools
- rank (Rank LLMs) - Rank LLMs for a stated purpose. Returns a shortlist with weights, scores, and plain-English rationale per pick. Use when the user wants to see and compare alternatives, not just one answer. Endpoint: https://xfms.vercel.app/mcp/
- pick (Pick the best LLM) - Return the single best LLM for a stated purpose. Concise output, no list. Use when the user has settled on the criteria and just wants one answer. Endpoint: https://xfms.vercel.app/mcp/
- discover (Discover quality dimensions) - Show which quality dimensions matter for a stated purpose, WITHOUT ranking any models. Returns the inferred weights and the discovery-walk trace. Useful for understanding how XFMS interprets the purpose before committing to a pick. Endpoint: https://xfms.vercel.app/mcp/
- benchmark (Benchmark the engine's top picks with real test queries) - Run a live A/B test against the engine's TOP 3 PICKS for a stated purpose — the engine chooses the candidates from the full catalog. Generates 5 representative test queries (auto-expands to 10 or 15 if results are too close to call), runs them through the picked models in parallel, and returns real cost, latency, and plain-English commentary on who won what. Use AFTER `pick` or `rank` when the user wants the engine's own picks stress-tested with live data. DO NOT use this when the user has already named specific candidate models — the engine will ignore the names and test its own picks. Use `compare` instead in that case. Costs more than `rank` (15+ live LLM calls). Endpoint: https://xfms.vercel.app/mcp/
- compare (Compare specific models head-to-head with real test queries) - Run a live A/B test between 2–5 user-specified models for a stated purpose. NO ranking step — the supplied model_ids ARE the candidate set. Generates 5 representative test queries from the purpose, runs them through every named model in parallel, and returns real cost, latency, and plain-English commentary on who won what. Unknown IDs are dropped with a note; if fewer than 2 IDs resolve, the call refuses. Use this whenever the user names specific models to compare (e.g. 'A/B test X and Y'). For engine-chosen candidates, use `benchmark` instead. Costs more than `rank` (10+ live LLM calls). Free-tier note: when any candidate ends in ':free', the probe is capped at 3 queries (no adaptive expansion) because free-tier rate limits often push longer probes past the deploy's 5-minute ceiling — evidence will be shallower. The commentary surfaces this when it happens. Endpoint: https://xfms.vercel.app/mcp/

## Resources
Not captured

## Prompts
Not captured

## Metadata
- Owner: dev.xpansion
- Version: 0.4.0
- Runtime: Streamable Http
- Transports: HTTP
- License: Not captured
- Language: Not captured
- Stars: Not captured
- Updated: May 18, 2026
- Source: https://registry.modelcontextprotocol.io
