Thinking about hyper-parameter choosing
”If we were coming to this problem for the first time then there wouldn’t be much in the output to guide us on what to do. We might worry not only about the learning rate, but about every other aspect of our neural network. We might wonder if we’ve initialized the weights and biases in a way that makes it hard for the network to learn? Or maybe we don’t have enough training data to get meaningful learning? Perhaps we haven’t run for enough epochs? Or maybe it’s impossible for a neural network with this architecture to learn to recognize handwritten digits? Maybe the learning rate is too low? Or, maybe, the learning rate is too high? When you’re coming to a problem for the first time, you’re not always sure.
The lesson to take away from this is that debugging a neural network is not trivial, and, just as for ordinary programming, there is an art to it. You need to learn that art of debugging in order to get good results from neural networks. More generally, we need to develop heuristics for choosing good hyper-parameters and a good architecture.”
Inspiration from Face detection:
“The end result is a network which breaks down a very complicated question - does this image show a face or not - into very simple questions answerable at the level of single pixels. It does this through a series of many layers, with early layers answering very simple and specific questions about the input image, and later layers building up a hierarchy of ever more complex and abstract concepts. Networks with this kind of many-layer structure - two or more hidden layers - are called deep neural networks.”
CHAPTER 2 How the backpropagation algorithm works
Backpropagation（BP）： a fast algorithm for computing the gradient of the cost function.