cool datasets some interesting datasets to use for language modeling 54rt1n/wikipedia-summary-dataset Viewer ⢠Updated Sep 10, 2024 ⢠5.32M ⢠23 ⢠2 appvoid/raw-corpus Viewer ⢠Updated Feb 23, 2025 ⢠1.6M ⢠17 pszemraj/simple_wikipedia Viewer ⢠Updated Dec 29, 2025 ⢠238k ⢠302 ⢠8 common-pile/youtube Viewer ⢠Updated Jun 6, 2025 ⢠1.13M ⢠718 ⢠12
cool spaces Running on Zero Agents 468 Stable Audio Open Zero š„ 468 Generate immersive audio from text prompts
Running on Zero Agents 468 Stable Audio Open Zero š„ 468 Generate immersive audio from text prompts
arco releases small models aiming at language modeling without system prompts appvoid/arco-3 Text Generation ⢠0.6B ⢠Updated Nov 7, 2025 ⢠3 appvoid/arco-2 Text Generation ⢠0.5B ⢠Updated May 31, 2025 ⢠14 ⢠7 appvoid/arco 0.5B ⢠Updated Dec 5, 2024 ⢠12 ⢠14
cool datasets some interesting datasets to use for language modeling 54rt1n/wikipedia-summary-dataset Viewer ⢠Updated Sep 10, 2024 ⢠5.32M ⢠23 ⢠2 appvoid/raw-corpus Viewer ⢠Updated Feb 23, 2025 ⢠1.6M ⢠17 pszemraj/simple_wikipedia Viewer ⢠Updated Dec 29, 2025 ⢠238k ⢠302 ⢠8 common-pile/youtube Viewer ⢠Updated Jun 6, 2025 ⢠1.13M ⢠718 ⢠12
arco releases small models aiming at language modeling without system prompts appvoid/arco-3 Text Generation ⢠0.6B ⢠Updated Nov 7, 2025 ⢠3 appvoid/arco-2 Text Generation ⢠0.5B ⢠Updated May 31, 2025 ⢠14 ⢠7 appvoid/arco 0.5B ⢠Updated Dec 5, 2024 ⢠12 ⢠14
cool spaces Running on Zero Agents 468 Stable Audio Open Zero š„ 468 Generate immersive audio from text prompts
Running on Zero Agents 468 Stable Audio Open Zero š„ 468 Generate immersive audio from text prompts