As a practitioner for note taking (sometimes evergreen notes), I finally find it not working well when splitting note-taking in different dates with separate files. So I decide to use a holistic note-keeping “place” to hold all the notes and here it is. I hope I can keep writing here obeying the grammar of markdown.

Table of Contents:


25-08-11

<aside> 💡

Overtrained language models are harder to fine-tune.

  1. Extended pre-training hurts post-training
    1. Experiments on olmo-1b/7b and llm360-amber-7b, 3 model families with intermediate checkpoints
    2. Two observations: 1). extended pre-training always benefit base model, 2). extended pre-training beyond certain budget might hurt task-specific post-training for both ID and OOD tasks
  2. Catastrophic overtraining
  3. A theoretical perspective of overtraining </aside>

<aside> 💡

Diffusion language models are super data learners, released on Aug. 09 2025.

</aside>

<aside> 💡

ovatarl: training language model from scratch with pure reinforcement learning, Aug. 9 2025.

</aside>

<aside> 💡

Statistical suggestions for mech interp research and beyond, Aug. 6 2025.

<aside> 💡

Assessing skeptical views of interpretability research, by Prof. Chris Potts.

<aside> 💡

System prompts of LLM vendors and Agent vendors

<aside> 💡

yet-another-applied-llm-benchmark

<aside> 📃

Paper buf

xxx

</aside>

<aside> 📃

RL2: Ray less reinforcement learning: A concise library of reinforcement learning for large language models.

25-08-13

<aside> 📃

The 2025 IMO: A blogpost that rethinks LLM’s IMO breakthrough.

</aside>

<aside> 💡

gpt-oss-reverse-engineering: A repo that analyzes prior knowledge in those LLM’s, by priming LLMs with several sentence leading token, such as ‘What’, ‘The’, ‘How’ etc. And then analyze the characteristics of the generated content.

</aside>

25-08-14

<aside> 💡

There I have a question: given a set of short phrases denoted as $\mathcal{P}$, and you know they have certain relationships such as:

25-08-15