This Research Paper changed my life forever.
It was one of the papers that was discussed in my interview at Goldman. I came to know about this research paper a few years back after consulting a friend doing an ML PhD at University of Maryland, College Park. The explanation of the paper: 1. Initialize the neural network with small random values typically (-0.1,0.1) to avoid symmetry issues. 2. Now get ready to do Forward propagation: you pass thetraining data through the multilayer perceptron and compute the output. For each neuron in the MLP, calculate the weighted sum of its inputs and apply the activation function. (my favourite is tanh for LSTM applications) 3. Now compute the loss using a loss function like mean squared error, between output computed and the actual value. 4. Now get ready to do backpropagation, where you need to calculate the gradient of the loss function with respect to each weight by propagating the error backward through the network. 5. So, compute partial derivatives of the loss with respect to each weight, starting from the output layer and moving back to the input layer. 6. Here is the fun part: update the weights using the gradients obtained from the backward pass. here people usually use adam optimizer, which allows for accelerated stochastic gradient descent. Fun trivia: Adam stands for "Adaptive Moment Estimation". 7. Now repeat the forward and backward propagation process for numerous tries until theperformance of the model stabilizes.
https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf
FairyMermaid
Stealth
4 months ago
Discover More
Curated from across
Data Scientists on
by Gooner7
Goldman Sachs
OpenAI Cofounder: "Learn these 30 research papers and you will know 90% of what matters today."
Please like and bookmark this if you find it useful. I have acess to the top AI researchers and ML PhDs working on the cutting edge in India and the US, will post some good content if this crosses 100 likes. Ilya Sutskever, OpenAI cofounder, gave John Carmack this reading list of approximately 30 research papers and said, ‘If you really learn all of these, you’ll know 90% of what matters today.’ FYI: John Carmack is also considered to be the greatest programmer of all time.
https://arc.net/folder/D0472A20-9C20-4D3F-B145-D2865C0A9FEE
Data Scientists on
by Babel
Yubi
The Research paper that changed the world...
Anything happening in AI today can be traced back to this one brief moment in history... Share your favourite papers. GPT ftw :)
https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Data Scientists on
by Gooner7
Goldman Sachs
This paper is the revolution of AI in selecting stocks for investment...
I was talking to my friend from University of Maryland, College Park and he told me to read this paper. I will continue to share more such papers if everyone who reads this post, decides to upvote this. Next paper at 50 upvotes.
https://proceedings.neurips.cc/paper_files/paper/1996/file/1d72310edc006dadf2190caad5802983-Paper.pdf
Data Scientists on
by Gooner7
Goldman Sachs
Paper with Code: You can now run LLMs without Matrix Multiplications
Saw this paper: https://arxiv.org/pdf/2406.02528 In essence, MatMul operations can be completely eliminated from LLMs while maintaining strong performance at billion-parameter scales and by utilising an optimised kernel during inference, their model’s memory consumption can be reduced by more than 10× compared to un-optimised models. source: https://x.com/rohanpaul_ai/status/1799122826114330866
Implementation for MatMul-free LM. Contribute to ridgerchu/matmulfreellm development by creating an account on GitHub.
https://github.com/ridgerchu/matmulfreellm
Data Scientists on
by Gooner7
Goldman Sachs
ImageNet Classification with Deep Convolutional Neural Networks
The 2012 breakthrough by Krizhevsky(of AlexNet fame), Sutskever(Co-Founder at OpenAI), and Hinton(Godfather of AI) with AlexNet revolutionized AI. By using deep convolutional neural networks and leveraging GPUs for training, they achieved insane accuracy on ImageNet dataset. This not only validated deep learning's potential but also introduced key innovations like ReLU activations, dropout, and data augmentation.
Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton
https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf