Anubis OSS — native macOS app for benchmarking local LLMs with real-time hardware telemetry (free, open source)

JT-UncSoft · February 10, 2026, 12:09am

Native, Open Source macOS app for benchmarking local LLMs with real-time hardware telemetry - Anubis OSS

I built a free, open-source macOS app for benchmarking local LLMs on Apple Silicon. It correlates real-time hardware telemetry (GPU/CPU power, frequency, memory, thermals via IOReport) with inference performance, with exportable and stored benchmark results - something I couldn’t find in any existing tool. Extra are was taken to make the app and inferences light-weight- the performance hit with metrics on and off was negligible after extensive tuning - but I only have my lowly 24GB M4 Air to test on. Help me make it better for the community!

What it does

Real-time metrics - tok/s, GPU/CPU utilization, power consumption (watts), GPU frequency, and memory - all charted live during inference
Any backend - works with Ollama, mlx_lm.server, LM Studio, vLLM, LocalAI, or any OpenAI-compatible endpoint
A/B Arena - compare two models side-by-side with the same prompt and vote on a winner
History & Export - session history with full replay, CSV export, and one-click image export for sharing results
Process monitoring - auto-detects backend processes and tracks their actual memory footprint (including Metal/GPU allocations)

Why I hope this might be useful for the HF community

Quantization comparisons - comparing Q4_K_M vs Q8_0 vs FP16 on your hardware? Anubis shows the actual power/performance tradeoff - not just tok/s but watts-per-token
MLX users - works with mlx_lm.server out of the box. Just start the server and add it as an OpenAI-compatible backend
Model cards & benchmarks - if you’re publishing benchmarks for the community, the image export gives you shareable, branded results with one click
Apple Silicon insights - per-core CPU utilization, GPU frequency, ANE/DRAM power - hardware data that no chat wrapper or CLI tool surfaces

Links


GitHub
Download
Requirements
License

Looking for feedback

I’d especially love to hear from anyone running MLX or GGUF models locally:

What metrics matter most to you?
What backends should I prioritize?
What would make this useful for your workflow?

Open an issue or start a discussion on the repo. If Anubis is useful to you, consider buying me a coffee

JT-UncSoft · February 11, 2026, 2:55am

whoops here are links to the project page and git repo - d love to hit 75 stars to get the app on Homebrew as a cask.

Topic		Replies	Views
Anubis OSS - Local LLM Benchmarking for Apple Silicon with Real-Time Hardware Telemetry (Looking for Testers + Open Data) Show and Tell	0	40	February 25, 2026
[Tool] M-Courtyard – GUI for local LLM fine-tuning on Apple Silicon Show and Tell	0	73	February 17, 2026
M-Courtyard: Fine-tune LLMs on macOS with zero code (MLX + Ollama GUI) Models	0	89	February 13, 2026
Track real-time GPU and LLM pricing across all major cloud and inference providers Show and Tell	2	8	March 4, 2026
M-Courtyard: GUI app for fine-tuning mlx-community models on Mac (open source) Beginners	0	55	February 13, 2026