The Magic Behind Neural Networks
gradiently.ioOften times, we lose sight in the magic behind neural nets and get discouraged while learning about some new modeling architecture. I felt this personally when studying the transformer architecture and trying to grasp its complex components such as the multi-headed attention and its positional encoding scheme. I wrote a post here describing the magic behind these magnificent networks, to remind us not to get intimidated! At the end of the day, we are still just simply feeding data into the model and allowing back propagation to work its magic.
Here it is. Cheers!