paper-buf
LLM safety
Machine unlearning doesn’t do what you think: Lessons for generative ai policy, research and practice
Open problems in machine unlearning in ai safety
,
tweet
Tensor Trust: Interpretable prompt injection attacks from an online game
,
code
Bag of tricks: Benchmarking of jailbreak attacks on LLMs
, NeurIPS 2024,
code
NOTES
LLM
MUSE: Machine unlearning six-way evaluation for language models
OpenAssistant Conversations - Democratizing large language model alignment
(OASST1), a crowd-sourced human preference dataset for llm alignment
NOTES
Others
Rethinking aleatoric and epistemic uncertainty