ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands
Paper • 2512.24965 • Published • 43
ShowUI-π is a Vision-Language-Action model for GUI drag-and-drop, built on SmolVLA (500M). It uses a flow-matching action head to predict drag trajectories from a single screenshot and a natural-language instruction.
Paper: ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands
Code: https://github.com/showlab/showui-pi
Training Data: showlab/ShowUI-pi-data
Evaluation Benchmark: h-siyuan/ScreenDrag
git clone https://github.com/showlab/showui-pi.git
cd showui-pi
pip install -e .
import torch
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
from lerobot.policies.factory import make_pre_post_processors
policy = SmolVLAPolicy.from_pretrained("showlab/ShowUI-pi").to("cuda").eval()
preprocessor, postprocessor = make_pre_post_processors(
policy.config,
"showlab/ShowUI-pi",
preprocessor_overrides={"device_processor": {"device": "cuda"}},
)
bash scripts/train_showui_pi.sh
See the training script for all flags and defaults.
PYTHONPATH=lerobot/src \
python scripts/eval_dex.py \
--ckpt <path/to/checkpoint> \
--output_dir outputs/eval_dex
PYTHONPATH=lerobot/src \
python scripts/eval_screenspot_pro.py \
--ckpt <path/to/checkpoint> \
--annotations_root <path/to/ScreenSpot-Pro/annotations> \
--images_root <path/to/ScreenSpot-Pro/images>
@article{hu2025showui,
title={ShowUI-$$\backslash$pi $: Flow-based Generative Models as GUI Dexterous Hands},
author={Hu, Siyuan and Lin, Kevin Qinghong and Shou, Mike Zheng},
journal={arXiv preprint arXiv:2512.24965},
year={2025}
}