JumpyTaco
JumpyTaco

Breakthrough in Test-Time Compute

I came across two interesting papers recently on scaling laws in AI and wanted to share a summary. Here are the key takeaways:

Scaling LLM Test-Time Compute Two papers looked at how to scale up test-time compute for LLMs:

  1. Simple strategies like weighted voting keep improving as you scale up test-time compute
  2. There's a regime where recognizing good solutions becomes the bottleneck, not generating them
  3. The ratio of test-time to training-time compute is increasing
  4. Batch size 1 inference may become less important; parallel generations could become standard
  5. Tree search with Process Reward Models is emerging as a legitimate strategy
  6. We may see more compound systems with separate proposer and verifier modules

Finetuning Effects A study on finetuning 1-16B param LLMs found:

  1. Model size matters more than finetuning dataset size
  2. Pretraining dataset size matters more than finetuning dataset size
  3. Finetuning dataset size matters way more than params added by PEFT methods
  4. Power law curves fit the results well, but coefficients vary by method/task
Post image
5mo ago
Talking product sense with Ridhi
9 min AI interview5 questions
Round 1 by Grapevine
No comments yet

You're early. There are no comments yet.

Be the first to comment.

Discover more
Curated from across
Software Engineers
by SqueakyPickleGoogle

Gemini Pro 1.5 launched! New model just a week after launch of Gemini Ultra 1.0.Has 10M token context window, can perceive and answer questions about 10 hours of video, or 100 hours of audio, or 300K lines of code...

Humans mind are not trained to understand exponential nature of any trend.... Currently people are mocking AI because current models are not that good. However people are unable to see the double exponential trend with which AI is improv...

Post image
Top comment
user

Software engineering career as it is right now will be the earliest casualty

Data Scientists

Paper with Code: You can now run LLMs without Matrix Multiplications

Saw this paper: https://arxiv.org/pdf/2406.02528

In essence, MatMul operations can be completely eliminated from LLMs while maintaining strong performance at billion-parameter scales and by utilising an optimised kernel during inferen...