Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
15.1
TFLOPS
2
1
235
Nicholas
nickoo004
Follow
Quyashbek's profile picture
tahamajs's profile picture
davron112's profile picture
16 followers
ยท
74 following
NursultanMRX
nursultan-koshekbaev
AI & ML interests
ML and NLP , and also DL,NN
Recent Activity
reacted
to
anakin87
's
post
with โค๏ธ
about 19 hours ago
How LLM training with RL Environments works? It all starts with ๐ฅ๐ฒ๐ถ๐ป๐ณ๐ผ๐ฟ๐ฐ๐ฒ๐บ๐ฒ๐ป๐ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด ๐๐ถ๐๐ต ๐ฉ๐ฒ๐ฟ๐ถ๐ณ๐ถ๐ฎ๐ฏ๐น๐ฒ ๐ฅ๐ฒ๐๐ฎ๐ฟ๐ฑ๐ - question asked - model generates reasoning + answer - answer checked against ground truth - reward drives RL training In this setup, the environment is simple: fixed questions and answers, rollout logic, reward(s) Consider a more complex tic-tac-toe env โโญ It adds: - dynamic game generation/handling - tunable opponent skill - multi-turn interactions (envs can also include tools) --- What happens at training? We use ๐๐ฟ๐ผ๐๐ฝ ๐ฅ๐ฒ๐น๐ฎ๐๐ถ๐๐ฒ ๐ฃ๐ผ๐น๐ถ๐ฐ๐ ๐ข๐ฝ๐๐ถ๐บ๐ถ๐๐ฎ๐๐ถ๐ผ๐ป with a tic-tac-toe env No critic model needed, the group is the baseline Simpler than PPO 1๏ธโฃ Rollout generation: from the same board, model plays N games via sampling 2๏ธโฃ Each game scored with deterministic rewards (win, format, ...) 3๏ธโฃ Mean score computed across the group 4๏ธโฃ Each rollout's advantage = its score minus the group mean 5๏ธโฃ Model updated to favor trajectories above baseline ๐ Repeat For a deep dive, check out ๐ฑ https://github.com/anakin87/llm-rl-environments-lil-course a free hands-on course on RL environments for LLMs
liked
a dataset
2 days ago
tencent/MegaStyle-1.4M
published
a dataset
6 days ago
nickoo004/kaa-parallel-corpus
View all activity
Organizations
None yet
models
3
Sort:ย Recently updated
nickoo004/karakalpak-gpt2-v3
Text Generation
โข
97M
โข
Updated
20 days ago
โข
374
โข
1
nickoo004/gemma-2b-reasoning-keras
Updated
Jan 11
โข
3
nickoo004/gpt2_karakalpak
Text Generation
โข
0.1B
โข
Updated
Jun 6, 2025
โข
6
โข
4
datasets
4
Sort:ย Recently updated
nickoo004/kaa-parallel-corpus
Viewer
โข
Updated
6 days ago
โข
14.1k
โข
33
nickoo004/gemma-reasoning-gold-15k
Viewer
โข
Updated
Jan 9
โข
27.1k
โข
20
nickoo004/FeruzaSpeech_to_fine_tuning
Viewer
โข
Updated
Sep 2, 2025
โข
13k
โข
101
โข
2
nickoo004/uzbekdata
Viewer
โข
Updated
Feb 23, 2025
โข
7.27k
โข
5
โข
3