Anubis OSS — native macOS app for benchmarking local LLMs with real-time hardware telemetry (free, open source)

Native, Open Source macOS app for benchmarking local LLMs with real-time hardware telemetry - Anubis OSS

I built a free, open-source macOS app for benchmarking local LLMs on Apple Silicon. It correlates real-time hardware telemetry (GPU/CPU power, frequency, memory, thermals via IOReport) with inference performance, with exportable and stored benchmark results - something I couldn’t find in any existing tool. Extra are was taken to make the app and inferences light-weight- the performance hit with metrics on and off was negligible after extensive tuning - but I only have my lowly 24GB M4 Air to test on. Help me make it better for the community!

What it does

  • Real-time metrics - tok/s, GPU/CPU utilization, power consumption (watts), GPU frequency, and memory - all charted live during inference
  • Any backend - works with Ollama, mlx_lm.server, LM Studio, vLLM, LocalAI, or any OpenAI-compatible endpoint
  • A/B Arena - compare two models side-by-side with the same prompt and vote on a winner
  • History & Export - session history with full replay, CSV export, and one-click image export for sharing results
  • Process monitoring - auto-detects backend processes and tracks their actual memory footprint (including Metal/GPU allocations)

Why I hope this might be useful for the HF community

  • Quantization comparisons - comparing Q4_K_M vs Q8_0 vs FP16 on your hardware? Anubis shows the actual power/performance tradeoff - not just tok/s but watts-per-token
  • MLX users - works with mlx_lm.server out of the box. Just start the server and add it as an OpenAI-compatible backend
  • Model cards & benchmarks - if you’re publishing benchmarks for the community, the image export gives you shareable, branded results with one click
  • Apple Silicon insights - per-core CPU utilization, GPU frequency, ANE/DRAM power - hardware data that no chat wrapper or CLI tool surfaces

Links

GitHub
Download
Requirements
License

Looking for feedback

I’d especially love to hear from anyone running MLX or GGUF models locally:

  • What metrics matter most to you?
  • What backends should I prioritize?
  • What would make this useful for your workflow?

Open an issue or start a discussion on the repo. If Anubis is useful to you, consider buying me a coffee :hot_beverage:

1 Like

whoops here are links to the project page and git repo - d love to hit 75 stars to get the app on Homebrew as a cask.

1 Like