Reading questions

If you are unsure about any of these concepts, post a question on the discussion forum!

  • What is the loss function used to train a neural network for classification?
  • What is the chain rule?
  • What is backpropagation?
  • What is a flow graph?
  • What are the convergence conditions for (stochastic) gradient descent?
  • What is Newton's method?
  • What is gradient checking based on the finite difference approximation?
  • How best to initialize the parameters of a feedforward neural network?
  • What is early stopping?
  • What is weight decay?
  • What is a training epoch?
  • What is momentum?
  • What is the time-constant of decay for the learning rate (or decrease constant in assignment #1)?
  • How to perform a grid search of the hyper-parameters?