PerkyPotato
PerkyPotato

Language Reasoning Models can overtake LLMs...

Here's my quick 3 minute breakdown:

  1. o1-preview: 97.8% on PlanBench Blocksworld vs. 62.5% for top LLMs, indicating shift from retrieval to reasoning.
  2. 52.8% on obfuscated "Mystery Blocksworld" vs. near-zero for LLMs, suggesting abstract reasoning skills, showing transfer capability.
  3. Variable "reasoning tokens" usage correlates with problem difficulty, hinting at internal search process, indicating adaptive compute.
2mo ago
5.5Kviews
Find out if you are being paid fairly.Download Grapevine
Discover more
Curated from across