Andy’s evergreen notes with obsidian
- Andy Matuschak mode in Obsidian, just write the following code into your vault’s
.obsidian/themes/obsidian.css
file, and click stack tabs
on somewhere top right.
/* Andy Matuschak mode! */
.workspace-split.mod-vertical { overflow-x:auto; }
.workspace-leaf, .workspace-split > .workspace-split { min-width: 650px; min-height: 500px; }
.workspace-split.mod-horizontal { overflow-y: auto; }
Miles’s tweet on RL-based llm for everything
- Here is the original tweet, I screenshot it as follows. In my understanding, he argues the following points:
- RL with CoT is a generally useful technique for boosting LLM capability beyond math/code,
- CoT with RL explores the problem solving space, so sampling diverse candidate solutions might largely increase the probability to find the correct solution, even for tasks that ground-truth is hard to obtain,
- The ways to create reward signal is significant even outside math/code tasks.
- He also recommend a paper from Deepmind’s scientist titled, which he said to explain his “make up for imperfection with diversity” claim.