BenchLocal
Test LLMs on real tasks. Compare models side-by-side.
BenchLocal is a local-first desktop app for running practical benchmark packs against local or remote models. Point it at Ollama, LM Studio, OpenRouter, or any OpenAI-compatible API, install a Bench Pack, and run real scenarios for extraction, tool use, debugging, structured output, and agent workflows.
Built for deterministic scoring, verifier-backed benchmarks, and comparing models without giving up local control.