machine learning

Deep Learning DNN: gradient check, learning rate, initialization(not zero, asymmetric), local minimum, plateau momentenm  (exponential average of previous gradients, pointing to the same direction), saddle point() Tips for training Regularization: norm penalties, early stopping, data augmentation, drop out (hidden units cannot co-adapt, generally used, test time:expectation, geometric average for signel layer) Auto encoder: sparse AE, denoising… Continue reading Deep Learning

machine learning

Kinds of Machine Learning

Kinds of learning: Based on the information available: Supervised learning, Reinforcement learning, Unsupervised learning Based on the role of the learner: Passive learning, Active learning Scenarios: Membership Query Synthesis: the learner may request labels for any unlabeled instance in the input space Stream-Based Selective Sampling: it can first be sampled from the actual distribution, and then… Continue reading Kinds of Machine Learning

machine learning

Feature Engineering

Data Pre-processing(Transformation) Sampling under-sampling, over-sampling, increasing minority samples and decreasing majority samples simultaneously, synthesise “new” samples from the minority class, bootstrap Normalization sigmoid normalization 0-1 normalization ((bla – min(bla)) / ( max(bla) – min(bla) )) z-score Gaussian normalization (Gaussian kernel) Box-cox transformation log transformation Tukey’s Ladder of Powers   Feature Engineering image speech text time series: entropy,… Continue reading Feature Engineering

data science · machine learning


Elements of a model Objective Model structure (e.g. variables, formula, equation, parameters) Model assumptions Parameter estimates and interpretation Model fit (e.g. goodness-of-fit tests and statistics) Model selection LDA generative model model p(x|y) as multivariate Gaussian, Both classes have the same covariance matrix, Σ QDA Each class has their own Σ Naive Bayes generative model Assume the xj… Continue reading Classification

Big Data · data science · machine learning · programming

Apache Hadoop (projects)

QUESTIONS setInputFormat comparator top k frequent words HADOOP SYSTEM Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. HDFS(Hadoop distributed file system): data storage (data split and data replication) Map Reduce(data processing): how to leverage job; how do nodes communicate; how to deal with node… Continue reading Apache Hadoop (projects)

data science · machine learning

Time Series Analysis

TIME SERIES BASICS Difference between regression and time series: time series are not necessarily independent and not necessarily identically distributed.  They are lists of observations where the ordering matters.  Ordering is very important because there is dependency and changing the order could change the meaning of the data. Characteristics: Is there a trend,  on average, the… Continue reading Time Series Analysis