machine learning

Deep Learning

http://www.cs.toronto.edu/~bonner/courses/2014s/csc321/lectures/lec5.pdf DNN: gradient check, learning rate, initialization(not zero, asymmetric), local minimum, plateau momentenm  (exponential average of previous gradients, pointing to the same direction), saddle point() Tips for training Regularization: norm penalties, early stopping, data augmentation, drop out (hidden units cannot co-adapt, generally used, test time:expectation, geometric average for signel layer) Auto encoder: sparse AE, denoising… Continue reading Deep Learning

machine learning

Kinds of Machine Learning

Kinds of learning: Based on the information available: Supervised learning, Reinforcement learning, Unsupervised learning Based on the role of the learner: Passive learning, Active learning Scenarios: Membership Query Synthesis: the learner may request labels for any unlabeled instance in the input space Stream-Based Selective Sampling: it can first be sampled from the actual distribution, and then… Continue reading Kinds of Machine Learning

machine learning

Feature Engineering

Data Pre-processing(Transformation) Sampling under-sampling, over-sampling, increasing minority samples and decreasing majority samples simultaneously, synthesise “new” samples from the minority class, bootstrap Normalization sigmoid normalization 0-1 normalization ((bla – min(bla)) / ( max(bla) – min(bla) )) z-score Gaussian normalization (Gaussian kernel) Box-cox transformation log transformation Tukey’s Ladder of Powers   Feature Engineering image speech text time series: entropy,… Continue reading Feature Engineering

data science · machine learning

Classification

Elements of a model Objective Model structure (e.g. variables, formula, equation, parameters) Model assumptions Parameter estimates and interpretation Model fit (e.g. goodness-of-fit tests and statistics) Model selection LDA generative model model p(x|y) as multivariate Gaussian, Both classes have the same covariance matrix, Σ QDA Each class has their own Σ Naive Bayes generative model Assume the xj… Continue reading Classification

Big Data · data science · machine learning · programming

Apache Hadoop (projects)

QUESTIONS setInputFormat comparator top k frequent words HADOOP SYSTEM Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. HDFS(Hadoop distributed file system): data storage (data split and data replication) Map Reduce(data processing): how to leverage job; how do nodes communicate; how to deal with node… Continue reading Apache Hadoop (projects)