Papers
arxiv:2602.06057

QEIL v2: Heterogeneous Computing for Edge Intelligence via Roofline-Derived Pareto-Optimal Energy Modeling and Multi-Objective Orchestration

Published on Apr 5
· Submitted by
Satyam Kumar
on Apr 10
Authors:

Abstract

QEIL v2 improves energy efficiency and performance of large language model inference on edge devices through physics-based adaptive optimization and workload-aware resource allocation.

AI-generated summary

Deploying large language models (LLMs) on heterogeneous edge devices demands frameworks that jointly optimize energy efficiency, inference quality, and reliability. Our prior QEIL v1 (Kumar & Jha, 2026) achieved 4.82x IPW improvement but relied on static efficiency factors, greedy optimization, and unverified candidate selection. QEIL v2 replaces every static heuristic with physics-grounded, runtime-adaptive models. We introduce three device-workload metrics: DASI (roofline-derived compute utilization), CPQ (memory pressure from allocation theory), and Phi (thermal yield from CMOS leakage physics), forming a unified energy equation with every coefficient traceable to semiconductor physics. For optimization, PGSAM (Pareto-Guided Simulated Annealing with Momentum) simultaneously minimizes energy, latency, and device underutilization. At inference time, the EAC/ARDE selection cascade with CSVET early stopping provides progressive verification among repeated samples. Evaluated on WikiText-103, GSM8K, and ARC-Challenge across seven model families (125M-8B parameters, including one pre-quantized variant), QEIL v2 achieves 75.7% pass@k at 63.8W (IPW=0.9749), a 2.86x improvement over standard inference. When applied to a 4-bit Llama-3.1-8B, QEIL v2's physics-grounded routing achieves IPW=1.024 at 54.8W -- the first edge orchestration system to surpass the IPW=1.0 empirical reference mark, with the gain attributable entirely to QEIL v2's workload-adaptive device allocation on a model with reduced memory bandwidth requirements. Total energy drops 75.6% vs. standard with 38.3% latency reduction, zero thermal throttling, and 100% fault recovery across all benchmarks and model families.

Community

Paper author Paper submitter

Excited to share QEIL v2: Roofline-Derived Pareto-Optimal Edge Intelligence via First-Principles Energy Modeling and Multi-Objective Orchestration.

This work replaces static heuristics with physics-grounded runtime modeling using DASI, CPQ, and Φ, then combines them with PGSAM for Pareto-optimal device allocation and an EAC/ARDE verification cascade for higher-quality inference. On our heterogeneous edge platform, QEIL v2 achieves 75.7% pass@k at 63.8W (IPW=0.9749), 75.6% lower energy than standard inference, and zero thermal throttling across benchmarks.

We believe this provides a practical path toward reliable, high-efficiency LLM deployment on resource-constrained edge hardware. Feedback and discussion are very welcome.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2602.06057
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.06057 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.06057 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.06057 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.