@tharukabalasooriya3269

Gosh!!, India in another level of education man!! from higher Dimension

@DanielSzalko

Dear Krish,
The formula at 6:00 made my day. My month actually.
I build my CNN with the standard library only, cuda toolkit, with a pinch of openCV.
Implementing this in my CNN's fully-connected section brought that alive, boosting it's performance.
Tonight we will clink glasses to your health! Thanks man!

@AdityaRajPVSS

मेरी इतने दिनों की खोज यहाँ आ कर खत्म हुई है। बहुत बहुत धन्यवाद आपका। आप अभी तक के सबसे बेहतर ML DL गुरु हैं । भगवान आपको स्वस्थ रखें।

@jaskarankaur4971

I cant express how thankful I am to stumble upon your channel

@hanman5195

Your amazing at explaining ,excellent Guruji i found finally in Data science field

@AdmMusicc

The video is great. However, could you explain why the respective techniques are good with sigmoid/relu?

@ayushmansharma4362

This video is very good, I'm also watching the MIT deep learning videos in there they are just briefing the topics and not explaining the actual working in details, this video is very easy to understand.

@yousefsrour3316

Thank you so much man, amazing.

@aish_waryaaa

Best video and explanation.Thank You Sir...

@dcrespin

It may be worth to note that instead of partial derivatives one can work with derivatives as the linear transformations they really are. 

Also, looking at the networks in a more structured manner makes clear that the basic ideas of BPP apply to very general types of neural networks. Several steps are involved. 

1.- More general processing units. 
Any continuously differentiable function of inputs and weights will do; these inputs and weights can belong, beyond Euclidean spaces, to any Hilbert space. Derivatives are linear transformations and the derivative of a neural processing unit is the direct sum of its partial derivatives with respect to the inputs and with respect to the weights. This is a linear transformation expressed as the sum of its restrictions to a pair of complementary linear subspaces. 

2.- More general layers (any number of units). 
Single unit layers can create a bottleneck that renders the whole network useless. Putting together several units in a unique layer is equivalent to taking their product (as functions, in the sense of set theory). The layers are functions of the of inputs and of the weights of the totality of the units. The derivative of a layer is then the product of the derivatives of the units; this is a product of linear transformations. 

3.- Networks with any number of layers. 
A network is the composition (as functions, and in the set theoretical sense) of its layers. By the chain rule the derivative of the network is the composition of the derivatives of the layers; this is a composition of linear transformations. 

4.- Quadratic error of a function. 
...
——-
With the additional text down below this is going to be excessively long. Hence I will stop the itemized previous comments. 

The point is that a sufficiently general, precise and manageable foundation for NNs clarifies many aspects of BPP. 

If you are interested in the full story and have some familiarity with Hilbert spaces please google for our paper dealing with Backpropagation in Hilbert spaces. A related article with matrix formulas for backpropagation on semilinear networks is also available. 

We have developed a completely new deep learning algorithm called Neural Network Builder (NNB) which is orders of magnitude more efficient, controllable, precise and faster than BPP. 

The NNB algorithm assumes the following guiding principle:
The neural networks that recognize given data, that is, the “solution networks”, should depend only on the training data vectors. 

Optionally the solution network may also depend on parameters that specify the distances of the training vectors to the decision boundaries, as chosen by the user and up to the theoretically possible maximum. The parameters specify the width of chosen strips that enclose decision boundaries, from which strips the data vectors  must stay away. 

When using the traditional BPP the solution network depends, besides the training vectors, in guessing a more or less arbitrary initial network architecture and initial weights. Such is not the case with the NNB algorithm. 

With the NNB algorithm the network architecture and the initial (same as the final) weights of the solution network depend only on the data vectors and on the decision parameters. No modification of weights, whether incremental or otherwise, need to be done. 

For a glimpse into the NNB algorithm, search in this platform our video about : 
NNB Deep Learning Without Backpropagation. 

In the description of the video links to a free demo software will be found. 

The new algorithm is based on the following very general and powerful result (google it): Polyhedrons and Perceptrons Are Functionally Equivalent. 

For the conceptual basis of general NNs in see our article Neural Network Formalism. 

Regards,
Daniel Crespin

@muppurigopi9576

You are Amazing Sir.......................

@ArthurCor-ts2bg

Very lucid and insightful

@cristianchavez5674

Great Job Krish !

@good114

Thank you Sir 🙏🙏🙏🙏♥️😊♥️

@marijatosic217

Very helpful! Thank you!

@ahmedelsabagh6990

Simple and useful

@rohitrn4568

Good stuff

@fthialbkosh1632

Thanks a lot, an excellent explanation.

@nitayg1326

Wow! Didnt know about maths behind weight initialization!