Projects
Firebending Gauntlet
Using an Arduino Nano, Butane, and an assortment of hardware store components, I try to mimic the firebending antagonists from a childhood show.
1 min read
LLM Metacognition & Long-Horizon Task Performance
Evaluating whether metacognitive calibration on GPQA Diamond, when distinguished from raw intelligence, serves as a strong predictor of long-horizon task completion ability in the METR benchmark.
This is very close to being done!