SnoozyBiscuit
SnoozyBiscuit

Interview Question Transformers

Can someone answer the question below. I was asked the question in a data scientist( 8YOE) interview?

Why large language models need multi-headed attention layer as appossed to having a single attention layer?

Follow up question- During the training process why does the different attention layers get tuned to have different weights?

6mo ago
3.3Kviews
Find out if you are being paid fairly.Download Grapevine
Discover more
Curated from across