machine learning

Deep Learning Foundations

Neural Networks

DNN: gradient check, learning rate, initialization(not zero, asymmetric), local minimum, plateau momentenm  (exponential average of previous gradients, pointing to the same direction), saddle point()

Tips for training

Regularization: norm penalties, early stopping, data augmentation, drop out (hidden units cannot co-adapt, generally used, test time:expectation, geometric average for signel layer)

Auto encoder: sparse AE, denoising AE, contractive AE(penalizing derivatives)

Uncertainty in neural networks: In a Bayesian neural network, instead of having fixed weights, each weight is drawn from some distribution. By applying dropout to all the weight layers in a neural network, we are essentially drawing each weight from a Bernoulli

1.Regularization ( irrespective of over fitting or not)

2.Model selection or comparison without the need of cross-validation data set

Variational inference

Sparse Coding


convolution: sparse interactions(only connect to small number of input units), parameter sharing(units organized into the same feature map share parameters) and equivariant representation(extract the same feature in every position)

pooling and sampling:max pooling, average pooling (invariance to local translations, reduced the number of hidden units): subsampling, prior(This prior says that the function the layer should learn contains only local interactions and is equivariant to translation)

stride: move/slide the filter

zero-padding:  pad the input volume with zeros around the border, allow us to control the spatial size of the output volumes

architecture: AlexNet, VGGNet

transfer learning: ConvNet as fixed feature extractor, Fine-tuning the ConvNet(smaller learning rate)


drawback: when doing prediction, only using earlier information, but no later information

?back propagation through time:

vanishing gradients problem:

exploding gradients: gradient clipping (rescale gradients once it exceeds a certain threshold)

Gated Recurrent Unit (GRU)



gradient vanishing/explode, ifog

DBN (deep belief network)

RBM(restricted boltzman machine)   binary factor analysis

GAN(generative adversarial network) sample from a simple distribution, learn transformation to training distribution; generator network and discriminative network

Variational auto encoder: generate data from $z$, model the decoder as a neural network, variational inference

Neural network regression: The unit in output layer most commonly does not have an activation because it is usually taken to represent the class scores in classification and arbitrary real-valued numbers in regression. For classification, the number of output units matches the number of categories of prediction while there is only one output node for regression.

Interpretation of deep learning:  Universal Approximation Properties and Depth; learning representation/distributed representation

In summary, a feedforward network with a single layer is sufficient to representany function, but the layer may be infeasibly large and may fail to learn andgeneralize correctly. In many circumstances, using deeper models can reduce thenumber of units required to represent the desired function and can reduce theamount of generalization error


Geometric deep learning

Data on a domain: fixed graph and changing manifolds

  • spectral domain: use FFT to calculate convolution
  • spatial domain: mean over neighbours|edge decoration, vertex decoration
  • spatial domain (charting based) methods: spatial convolution on manifolds, multiple domains

local system of coordinates to solve the shift-variant problem

set of weighting functions: select pixels

spatial convolution: patch operator (local weighted average around the points)

Geodesic polar coordinates + weight function Mixture Model Network ( to generalize CNN architectures to non-Euclidean domains (graphs and manifolds) and the framework could generalize several spectral methods on graphs [2], [21] as well as some models on manifolds [22], [23]), function map net

One of the major problems in applying the same paradigm to non-Euclidean domains is the lack of shift-invariance, implying that the ‘patch operator’ extracting a local ‘patch’ would be position-dependent.

application: shape correspondence, texture mapping, invariant of deformation (intrinsic deep net)

  • Geodesic CNN: weights defined radial, angular
  • Anisotropic CNN: isotropic heat kernals
  • Mixture model network (MoNet): the weight function is Gaussian

Screen Shot 2019-04-23 at 10.51.55 AM



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s