-
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Paper • 2402.10790 • Published • 42 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 79
Gabriel Pendl
jompaaa
AI & ML interests
None yet
Recent Activity
liked a model about 16 hours ago
microsoft/harrier-oss-v1-27b liked a model 1 day ago
jinaai/jina-embeddings-v4 liked a model 5 days ago
google/gemma-4-26B-A4B-it