Dear Krish, The formula at 6:00 made my day. My month actually. I build my CNN with the standard library only, cuda toolkit, with a pinch of openCV. Implementing this in my CNN's fully-connected section brought that alive, boosting it's performance. Tonight we will clink glasses to your health! Thanks man!
मेरी इतने दिनों की खोज यहाँ आ कर खत्म हुई है। बहुत बहुत धन्यवाद आपका। आप अभी तक के सबसे बेहतर ML DL गुरु हैं । भगवान आपको स्वस्थ रखें।
I cant express how thankful I am to stumble upon your channel
Your amazing at explaining ,excellent Guruji i found finally in Data science field
The video is great. However, could you explain why the respective techniques are good with sigmoid/relu?
This video is very good, I'm also watching the MIT deep learning videos in there they are just briefing the topics and not explaining the actual working in details, this video is very easy to understand.
Thank you so much man, amazing.
Best video and explanation.Thank You Sir...
It may be worth to note that instead of partial derivatives one can work with derivatives as the linear transformations they really are. Also, looking at the networks in a more structured manner makes clear that the basic ideas of BPP apply to very general types of neural networks. Several steps are involved. 1.- More general processing units. Any continuously differentiable function of inputs and weights will do; these inputs and weights can belong, beyond Euclidean spaces, to any Hilbert space. Derivatives are linear transformations and the derivative of a neural processing unit is the direct sum of its partial derivatives with respect to the inputs and with respect to the weights. This is a linear transformation expressed as the sum of its restrictions to a pair of complementary linear subspaces. 2.- More general layers (any number of units). Single unit layers can create a bottleneck that renders the whole network useless. Putting together several units in a unique layer is equivalent to taking their product (as functions, in the sense of set theory). The layers are functions of the of inputs and of the weights of the totality of the units. The derivative of a layer is then the product of the derivatives of the units; this is a product of linear transformations. 3.- Networks with any number of layers. A network is the composition (as functions, and in the set theoretical sense) of its layers. By the chain rule the derivative of the network is the composition of the derivatives of the layers; this is a composition of linear transformations. 4.- Quadratic error of a function. ... ——- With the additional text down below this is going to be excessively long. Hence I will stop the itemized previous comments. The point is that a sufficiently general, precise and manageable foundation for NNs clarifies many aspects of BPP. If you are interested in the full story and have some familiarity with Hilbert spaces please google for our paper dealing with Backpropagation in Hilbert spaces. A related article with matrix formulas for backpropagation on semilinear networks is also available. We have developed a completely new deep learning algorithm called Neural Network Builder (NNB) which is orders of magnitude more efficient, controllable, precise and faster than BPP. The NNB algorithm assumes the following guiding principle: The neural networks that recognize given data, that is, the “solution networks”, should depend only on the training data vectors. Optionally the solution network may also depend on parameters that specify the distances of the training vectors to the decision boundaries, as chosen by the user and up to the theoretically possible maximum. The parameters specify the width of chosen strips that enclose decision boundaries, from which strips the data vectors must stay away. When using the traditional BPP the solution network depends, besides the training vectors, in guessing a more or less arbitrary initial network architecture and initial weights. Such is not the case with the NNB algorithm. With the NNB algorithm the network architecture and the initial (same as the final) weights of the solution network depend only on the data vectors and on the decision parameters. No modification of weights, whether incremental or otherwise, need to be done. For a glimpse into the NNB algorithm, search in this platform our video about : NNB Deep Learning Without Backpropagation. In the description of the video links to a free demo software will be found. The new algorithm is based on the following very general and powerful result (google it): Polyhedrons and Perceptrons Are Functionally Equivalent. For the conceptual basis of general NNs in see our article Neural Network Formalism. Regards, Daniel Crespin
You are Amazing Sir.......................
Very lucid and insightful
Great Job Krish !
Thank you Sir 🙏🙏🙏🙏♥️😊♥️
Very helpful! Thank you!
Simple and useful
Good stuff
Thanks a lot, an excellent explanation.
Thanks
Wow! Didnt know about maths behind weight initialization!
@tharukabalasooriya3269