paper-buf
LLM safety
LLM360 K2: Scaling up 360-open-source large language models
NOTES
Risk-averse fine-tuning of large language models
NOTES
Reasoning
Multiagent finetuning: Self improvement with diverse reasoning chains
,
code
The lessons of developing process reward models in mathematical reasoning
O1 replication journey - Part 2: Surpassing O1-preview through simple distillation, big progress or bitter lesson?
O1 replication journey - Part 3: Inference-time scaling for medical reasoning
LLM
LongProc: Benchmarking long-context language models on long procedural generation
NOTES
Step-by-step mastery: Enhancing soft constraint following ability of large language models
Agent
WebWalker: Benchmarking llms in web traversal