Favorites Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 80
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 80
To read Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 150 Elucidating the Design Space of Diffusion-Based Generative Models Paper • 2206.00364 • Published Jun 1, 2022 • 18 GLU Variants Improve Transformer Paper • 2002.05202 • Published Feb 12, 2020 • 5 StarCoder 2 and The Stack v2: The Next Generation Paper • 2402.19173 • Published Feb 29, 2024 • 156
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 150
Elucidating the Design Space of Diffusion-Based Generative Models Paper • 2206.00364 • Published Jun 1, 2022 • 18
Favorites Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 80
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 80
To read Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 150 Elucidating the Design Space of Diffusion-Based Generative Models Paper • 2206.00364 • Published Jun 1, 2022 • 18 GLU Variants Improve Transformer Paper • 2002.05202 • Published Feb 12, 2020 • 5 StarCoder 2 and The Stack v2: The Next Generation Paper • 2402.19173 • Published Feb 29, 2024 • 156
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 150
Elucidating the Design Space of Diffusion-Based Generative Models Paper • 2206.00364 • Published Jun 1, 2022 • 18