Data Scientists Community

by Tandev89

2 months ago

Interview Question Transformers

Can someone answer the question below. I was asked the question in a data scientist( 8YOE) interview? Why large language models need multi-headed attention layer as appossed to having a single attention layer? Follow up question- During the training process why does the different attention layers get tuned to have different weights?

steppenwolf

Stealth

2 months ago

Many sentences can have ambiguous meanings and a single attention layer might not capture the true meaning of the sentence. Multi head attention solves this by allowing each head to focus on different parts of the sentence. For eg. 'I saw a man with binoculars.' can have 2 meanings. Multi headed attention lets each head focus on one part of the sentence and 1 interpretation of it and later by combining the information the model can decide which meaning better suits the context based on different weights. That is why they are tuned to have different weights.

Tandev89

Paytm

2 months ago

I agree that a sentence can have mutiple interpretations but are are usually limited to 2 or 3 if not 1. Why LLMs have 12 to 50 full self attention layers . When you say one attention head focuses on a part of the sentence, what do you mean by it? In a single full self attention head, weighted relation of a word with all other words of a sentence is captured.

steppenwolf

Stealth

2 months ago

How many layers to be used has been proven Empirically so that we can extract complex and deeper understanding. One head here will focus on the relationship between man and binoculars the other between 2 other words and so on.

See more comments

CompleteKamikaze

Sprinklr

2 months ago

Multihead attention learns the various relationships between different words in different latent spaces , this way they capture better semantic and syntactic relationships. Different layers learn different kinda relationships as inputs to them and learns different weights i.e. key, values and query values

Tandev89

Paytm

2 months ago

Thanks for your answer. Naive question - why do they learn different semantic relations. Why don't the different multihead converge to same weights?

Umadbro

Stealth

2 months ago

Bhai ignore these kind of questions. Mostly asked by people who have no idea what they want the hire to do. Been a trend recently, where the interviewer wants to show they know stuff. But very unlikely to be used in anything that you would do on the job. Does not even test the ability, just knowledge.

Discover More

Curated from across

Data Scientists on

by Gooner7

Goldman Sachs

3 months ago

Neural Machine Translation by Jointly Learning to Align and Translate

If you want more papers like these, drop a "+1" comment below. I will notify these people in DMs next time I upload a new paper. Bahdanau, Cho, and Bengio published a pivotal paper that reshaped the landscape of artificial intelligence, particularly in NLP. This is the first time the world was introduced to the attention mechanism, the most important thing in modern neural machine translation systems. Unlike traditional approaches that relied solely on fixed-length vector representations, the attention mechanism allowed models to dynamically focus on different parts of the input sequence during the translation process. This breakthrough not only significantly improved the accuracy of translation but also enabled the handling of longer sentences with greater fluency. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio not only innovated in the field of machine translation forward but also laid the foundation for attention-based architectures across various domains of deep learning. Their innovative approach demonstrated the power of neural networks to tackle complex sequence-to-sequence tasks and opened doors to a new era of natural language understanding and generation.

Dzmitry Bahdanau, KyungHyun Cho and Yoshua Bengio

https://arxiv.org/pdf/1409.0473

Data Scientists on

by altGrape

Stealth

5 months ago

As someone in AI, which concept blew your mind away when you first learnt about it?

For me? It was GradCAM was a gamechanger at selling computer vision initiatives internally to the non-technical stakeholders. The gradCAM function computes the importance map by taking the derivative of the reduction layer output for a given class with respect to a convolutional feature map. If you have a 5 layered convolutional neural network, then you can use this against any layer and check the class activation map. The best explanation to give is: "Regions in Red are the areas of the image that the neural network is looking at to make a decision" Well to be honest, regions in red represent the class activations arising from that region but, it would be too much for normie business guys.

Indian Startups on

by Birdsarentreal

Student

a year ago

Exploring Seed Funding Options for Our AI Innovation

I'm thrilled to share our latest developments in our startup: We launched our prototype and it's gaining momentum! Our previous product, currently in stealth mode, has remarkable insights. However, the GPU usage costs have skyrocketed. Our new goal? To build an AI platform that serves as a cost-effective hub for training AI models. No one's tackling this issue, so we are the first ones to solve this issue. We already have some ties with the microchip manufacturer. Here's the thing: While we've been self-funded until now, we're now exploring seed funding. But there's a catch – we're cautious about accelerators and VCs. I have seen them undervalue my mate's company and prioritize quick revenue over steady growth. So how should we go about it ? I'll drop an example post for much clearer problem statement.

Data Scientists on

by Babel

Yubi

4 months ago

The Research paper that changed the world...

Anything happening in AI today can be traced back to this one brief moment in history... Share your favourite papers. GPT ftw :)

https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

Data Scientists on

by Cryptonerdy

Stealth

6 months ago

Indic focused SLM/LLMs

Is anyone here who has built Indic-focused SLM/LLM models for text generation? If yes, please DM me. Want to understand the following things 1. Want to understand the training/fine-tuning and running costs. 2. I also want to understand how much training data was required to attain minimum acceptance criteria. 3. Was the effort worth the reward? Meaning that had you not built the model yourself and used any other foundational models, how could the cost and output differed?

Software Engineers on

by steppenwolf

Stealth

a year ago

Trying to get into Data science

I have 3 years of experience in other domain and i am trying to get into Data science. Any suggestions amid these layoffs ?

Product Managers on

by diffusedbandit

Unicorn-Startup

a year ago

AMA 🎙️: Hi, I am diffusedbandit, Senior ML Engineer at an Indian Unicorn

Hey everyone, happy to answer any questions around Data Science/ML/AI and making a career within it A brief about me, I work currently as a Sr. ML Engineer at an Indian unicorn startup I graduated 5+ years ago and have since worked as a Data Scientist / ML Engineer at an ecommerce company and social media company. My designations have changed based on the industry requirements and work too changed based on that. Would love to take questions on what it means to be an ML Engineer, my early learnings from a career in data science, and thoughts on AI. Look forward to all questions :)

AMA on

by Digitalnomads

Stealth

8 months ago

I am Shrey, CEO of Finsire - Fintech Infrastructure company from Chennai. Ask me anything!

Software Engineers on

by BowedFocus

Samsung

9 months ago

How to become a GenAI Developer?

I am an ML enthusiast and have been learning Reinforcement Learning and other Algorithms to shift my profile to ML engineer. But as soon as I started applying for jobs, every job is looking for people who have experience in GENAI, LLMs, etc. So I want to know, how can I become one?