Reading questions
If you are unsure about any of these concepts, post a question on the
discussion forum!
- What is the loss function used to train a neural network for classification?
- What is the chain rule?
- What is backpropagation?
- What is a flow graph?
- What are the convergence conditions for (stochastic) gradient descent?
- What is Newton's method?
- What is gradient checking based on the finite difference approximation?
- How best to initialize the parameters of a feedforward neural network?
- What is early stopping?
- What is weight decay?
- What is a training epoch?
- What is momentum?
- What is the time-constant of decay for the learning rate (or decrease constant in assignment #1)?
- How to perform a grid search of the hyper-parameters?