ShowUI-π

ShowUI-π is a Vision-Language-Action model for GUI drag-and-drop, built on SmolVLA (500M). It uses a flow-matching action head to predict drag trajectories from a single screenshot and a natural-language instruction.

Paper: ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands

Code: https://github.com/showlab/showui-pi

Training Data: showlab/ShowUI-pi-data

Evaluation Benchmark: h-siyuan/ScreenDrag

Quick start

git clone https://github.com/showlab/showui-pi.git
cd showui-pi
pip install -e .

Inference

import torch
from lerobot.policies.smolvla.modeling_smolvla import SmolVLAPolicy
from lerobot.policies.factory import make_pre_post_processors

policy = SmolVLAPolicy.from_pretrained("showlab/ShowUI-pi").to("cuda").eval()

preprocessor, postprocessor = make_pre_post_processors(
    policy.config,
    "showlab/ShowUI-pi",
    preprocessor_overrides={"device_processor": {"device": "cuda"}},
)

Training

bash scripts/train_showui_pi.sh

See the training script for all flags and defaults.

Evaluation

DEX Benchmark

PYTHONPATH=lerobot/src \
python scripts/eval_dex.py \
    --ckpt <path/to/checkpoint> \
    --output_dir outputs/eval_dex

ScreenSpot-Pro

PYTHONPATH=lerobot/src \
python scripts/eval_screenspot_pro.py \
    --ckpt <path/to/checkpoint> \
    --annotations_root <path/to/ScreenSpot-Pro/annotations> \
    --images_root <path/to/ScreenSpot-Pro/images>

Citation

@article{hu2025showui,
  title={ShowUI-$$\backslash$pi $: Flow-based Generative Models as GUI Dexterous Hands},
  author={Hu, Siyuan and Lin, Kevin Qinghong and Shou, Mike Zheng},
  journal={arXiv preprint arXiv:2512.24965},
  year={2025}
}
Downloads last month
77
Video Preview
loading

Paper for showlab/ShowUI-pi