img

Interview Question Transformers

Can someone answer the question below. I was asked the question in a data scientist( 8YOE) interview? Why large language models need multi-headed attention layer as appossed to having a single attention layer? Follow up question- During the training process why does the different attention layers get tuned to have different weights?

img

steppenwolf

Stealth

2 months ago

img

Tandev89

Paytm

2 months ago

img

steppenwolf

Stealth

2 months ago

See more comments
img

CompleteKamikaze

Sprinklr

2 months ago

img

Tandev89

Paytm

2 months ago

img

Umadbro

Stealth

2 months ago

Sign in to a Grapevine account for the full experience.

Discover More

Curated from across