trl-lib/trl-download-stats
Viewer โข Updated โข 2.28k
None defined yet.
[batch ร seq ร vocab] logits tensor before computing cross-entropy, which dominates peak memory at long context lengths. The new loss_type="chunked_nll" path drops ignored-label tokens before the lm_head matmul and computes cross-entropy in checkpointed chunks of 256.nll, and unlocks sequence lengths that don't fit at all under the standard path.SFTConfig(loss_type="chunked_nll")trl.experimental.openreward adapter plugs any environment speaking the [Open Reward Standard](https://openrewardstandard.io) protocol into any TRL trainer that takes an environment_factory. One string โ a catalog name or a URL โ wires the dataset, factory, and reward_func slots; tools are bound dynamically from JSON Schema, no per-env wrapper code:from trl import GRPOTrainer
from trl.experimental.openreward import OpenRewardSpec
spec = OpenRewardSpec("Eigent/SETA", num_tasks=64)
trainer = GRPOTrainer(
...,
train_dataset=spec.train_dataset,
environment_factory=spec.environment_factory,
reward_funcs=spec.reward_funcs,
)Qwen/Qwen3.6-27B, Qwen/Qwen3.6-35B-A3B) reuses the Qwen3.5-MoE architecture but ships a slightly different chat template, so we updated the stack end-to-end: new training template with {% generation %} markers, tool-call response schema routing, tiny test models for the VLM matrix.from trl import SFTConfig, SFTTrainer
trainer = SFTTrainer(
model="Qwen/Qwen3.6-27B",
args=SFTConfig(assistant_only_loss=True),
train_dataset=dataset,
)
trainer.train()tools=[...] to GRPOTrainer.trl vllm-serve (Qwen3 MTP / Eagle3 drafts), 12 more KTO โ DPO alignment PRs (KTO promotion to stable is now in reach), three more {% generation %} chat templates (Gemma/Gemma 2, Phi-3, GLM-4-MoE), and a chunky SFT entropy bug fix.from trl.experimental.ssd import SSDConfig, SSDTrainer
trainer = SSDTrainer(
model="Qwen/Qwen3-4B-Instruct",
args=SSDConfig(temperature=0.6, top_k=20, top_p=0.95),
train_dataset=dataset,
)
trainer.train()use_transformers_paged, and key fixes for VLM response parsing.