17 37 22

ct2

ct-2

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

InCoder-32B: Code Foundation Model for Industrial Scenarios

upvoted a paper 4 days ago

FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach

upvoted a paper 5 days ago

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

View all activity

Organizations

None yet

upvoted a paper 3 days ago

InCoder-32B: Code Foundation Model for Industrial Scenarios

Paper • 2603.16790 • Published 4 days ago • 289

upvoted a paper 4 days ago

FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach

Paper • 2603.13364 • Published 12 days ago • 9

upvoted a paper 5 days ago

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

Paper • 2603.10444 • Published 10 days ago • 10

upvoted a paper 14 days ago

Mixture of Attention Heads: Selecting Attention Heads Per Token

Paper • 2210.05144 • Published Oct 11, 2022 • 3

upvoted 4 papers 17 days ago

upvoted a paper 26 days ago

Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers

Paper • 2602.18292 • Published 29 days ago • 10

upvoted a paper 28 days ago

Arcee Trinity Large Technical Report

Paper • 2602.17004 • Published about 1 month ago • 18

upvoted a collection about 1 month ago

Low-bit model

Collection

2 items • Updated Feb 4 • 4

upvoted 3 papers about 1 month ago

RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs

Paper • 2602.05367 • Published Feb 5 • 7

DFlash: Block Diffusion for Flash Speculative Decoding

Paper • 2602.06036 • Published Feb 5 • 44

POP: Prefill-Only Pruning for Efficient Large Model Inference

Paper • 2602.03295 • Published Feb 3 • 4

upvoted a paper 3 months ago

Fairy2i: Training Complex LLMs from Real LLMs with All Parameters in {pm 1, pm i}

Paper • 2512.02901 • Published Dec 2, 2025 • 6

upvoted a paper 4 months ago

Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

Paper • 2511.23319 • Published Nov 28, 2025 • 24

upvoted a collection 4 months ago

Olmo 3

Collection

Artifacts for the Olmo 3 release. • 7 items • Updated 19 days ago • 167

upvoted a paper 7 months ago

Metis: Training Large Language Models with Advanced Low-Bit Quantization

Paper • 2509.00404 • Published Aug 30, 2025 • 7

upvoted an article 7 months ago

Article

The Hacker's Guide to Building an AI Supercluster

Aug 31, 2025

•

upvoted a collection 9 months ago

Jamba 1.7

Collection

The AI21 Jamba family of models are hybrid SSM-Transformer foundation models, blending speed, efficient long context processing, and accuracy. • 4 items • Updated Jul 2, 2025 • 12

ct2

AI & ML interests

Recent Activity

Organizations

ct-2's activity

The Hacker's Guide to Building an AI Supercluster