ChenWu98/opd_grpo_verifier_hard_Qwen-Qwen3-8B_alpha0.5_lr1e-6_opd1.0_pg0.1_k3_o_r5_rlgen_initrlgen Updated 2 days ago
ChenWu98/opd_grpo_verifier_hard_Qwen-Qwen3-8B_alpha0.5_lr1e-6_opd1.0_pg1.0_k3_o_r5_rlgen Updated 3 days ago
ChenWu98/opd_grpo_verifier_hard_Qwen-Qwen3-8B_alpha0.5_lr1e-6_opd1.0_pg0.1_k3_o_r5 Updated 8 days ago
ChenWu98/grpo_sciknoweval_from_math_easy_Qwen-Qwen2.5-1.5B-Instruct_lr1e-6_global_step_3400 Updated Feb 16
ChenWu98/grpo_rl_ref_math_easy_Qwen-Qwen2.5-Math-1.5B-Instruct_rlglobal_step_1600_kl1.0_lr1e-6 Updated Feb 14
ChenWu98/grpo_Qwen-Qwen3-8B_ref_math_hard_Qwen-Qwen2.5-1.5B-Instruct_kl1.0_lr1e-6_kl_incorrect Updated Feb 14
ChenWu98/grpo_Qwen-Qwen2.5-7B-Instruct_ref_math_easy_Qwen-Qwen2.5-1.5B-Instruct_kl1.0_lr1e-6 Updated Feb 13