Andy’s evergreen notes with obsidian

Andy Matuschak mode in Obsidian, just write the following code into your vault’s .obsidian/themes/obsidian.css file, and click stack tabs on somewhere top right.

/* Andy Matuschak mode! */
.workspace-split.mod-vertical { overflow-x:auto; }
.workspace-leaf, .workspace-split > .workspace-split { min-width: 650px; min-height: 500px; }
.workspace-split.mod-horizontal { overflow-y: auto; }

Miles’s tweet on RL-based llm for everything

Here is the original tweet, I screenshot it as follows. In my understanding, he argues the following points:
1. RL with CoT is a generally useful technique for boosting LLM capability beyond math/code,
2. CoT with RL explores the problem solving space, so sampling diverse candidate solutions might largely increase the probability to find the correct solution, even for tasks that ground-truth is hard to obtain,
3. The ways to create reward signal is significant even outside math/code tasks.

He also recommend a paper from Deepmind’s scientist titled, which he said to explain his “make up for imperfection with diversity” claim.
- Boundless Socratic learning with language games, Nov. 2024.