## Neural Networks

http://www.cs.toronto.edu/~bonner/courses/2014s/csc321/lectures/lec5.pdf

DNN: gradient check, learning rate, initialization(not zero, asymmetric), local minimum, plateau momentenm (exponential average of previous gradients, pointing to the same direction), saddle point()

Regularization: norm penalties, early stopping, data augmentation, drop out (hidden units cannot co-adapt, generally used, test time:expectation, geometric average for signel layer)

Auto encoder: sparse AE, denoising AE, contractive AE(penalizing derivatives)

Uncertainty in neural networks: In a Bayesian neural network, instead of having fixed weights, each weight is drawn from some distribution. By applying dropout to all the weight layers in a neural network, we are essentially drawing each weight from a Bernoulli

1.Regularization ( irrespective of over fitting or not)

2.Model selection or comparison without the need of cross-validation data set

Variational inference

**Sparse Coding**

## CNN

convolution: sparse interactions(only connect to small number of input units), parameter sharing(units organized into the same feature map share parameters) and equivariant representation(extract the same feature in every position)

pooling and sampling:max pooling, average pooling (invariance to local translations, reduced the number of hidden units): subsampling, prior(This prior says that the function the layer should learn contains only local interactions and is equivariant to translation)

stride: move/slide the filter

zero-padding: pad the input volume with zeros around the border, allow us to control the spatial size of the output volumes

architecture: AlexNet, VGGNet

transfer learning: ConvNet as fixed feature extractor, Fine-tuning the ConvNet(smaller learning rate)

## RNN

drawback: when doing prediction, only using earlier information, but no later information

?back propagation through time:

vanishing gradients problem:

exploding gradients: gradient clipping (rescale gradients once it exceeds a certain threshold)

## Gated Recurrent Unit (GRU)

## LSTM

gradient vanishing/explode, ifog

DBN (deep belief network)

RBM(restricted boltzman machine) binary factor analysis

GAN(generative adversarial network) sample from a simple distribution, learn transformation to training distribution; generator network and discriminative network

Variational auto encoder: generate data from $z$, model the decoder as a neural network, variational inference

Neural network regression: The unit in output layer most commonly does not have an activation because it is usually taken to represent the class scores in classification and arbitrary real-valued numbers in regression. For classification, the number of output units matches the number of categories of prediction while there is only one output node for regression.

Interpretation of deep learning: Universal Approximation Properties and Depth;** learning representation/distributed representation**

In summary, a feedforward network with a single layer is suﬃcient to representany function, but the layer may be infeasibly large and may fail to learn andgeneralize correctly. In many circumstances, using deeper models can reduce thenumber of units required to represent the desired function and can reduce theamount of generalization error

## Geometric deep learning

Data on a domain: fixed graph and changing manifolds

- spectral domain: use FFT to calculate convolution
- spatial domain: mean over neighbours|edge decoration, vertex decoration
- spatial domain (charting based) methods: spatial convolution on manifolds, multiple domains

local system of coordinates to solve the shift-variant problem

set of weighting functions: select pixels

spatial convolution: patch operator (local weighted average around the points)

Geodesic polar coordinates + weight function Mixture Model Network ( to generalize CNN architectures to non-Euclidean domains (graphs and manifolds) and the framework could generalize several spectral methods on graphs [2], [21] as well as some models on manifolds [22], [23]), function map net

One of the major problems in applying the same paradigm to non-Euclidean domains is the lack of shift-invariance, implying that the ‘patch operator’ extracting a local ‘patch’ would be position-dependent.

application: shape correspondence, texture mapping, invariant of deformation (intrinsic deep net)

- Geodesic CNN: weights defined radial, angular
- Anisotropic CNN: isotropic heat kernals
- Mixture model network (MoNet): the weight function is Gaussian