Uncategorized

Feature Engineering

Data Pre-processing(Transformation) Sampling under-sampling, over-sampling, increasing minority samples and decreasing majority samples simultaneously, synthesise “new” samples from the minority class, bootstrap Normalization sigmoid normalization 0-1 normalization ((bla – min(bla)) / ( max(bla) – min(bla) )) z-score Gaussian normalization (Gaussian kernel) Box-cox transformation   Feature Engineering image speech text time series: entropy, approximate entropy, sample entropy plus some… Continue reading Feature Engineering

Uncategorized

Classification

Elements of a model Objective Model structure (e.g. variables, formula, equation, parameters) Model assumptions Parameter estimates and interpretation Model fit (e.g. goodness-of-fit tests and statistics) Model selection LDA Naive Bayes Decision Tree Logistic Regression (one of GLM) Variables: Y: a binary response variable. Yi = 1 if the trait is present in observation (person, unit, etc…) i; Yi… Continue reading Classification

Uncategorized

Intern Meeting Logs

2017.2.17. Type of data collected from Smart Eye tracker: gaze, saccade, fixation, pursuit. We need to smooth the data firstly and calculate the velocity. Smoothing filters: moving-average( Savitzky-Golay filter, polynomial, spatial-exponential) Feature selection: LDA, FDA, mutual information, Minimum Redundancy Maximum Relevance (MRMR), map(SOM) Clustering: variational Bayesian mixture Ideas: build better features, rather than use sophisticated… Continue reading Intern Meeting Logs

Uncategorized

Apache Hadoop (projects)

QUESTIONS setInputFormat comparator top k frequent words HADOOP SYSTEM Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. HDFS(Hadoop distributed file system): data storage (data split and data replication) Map Reduce(data processing): how to leverage job; how do nodes communicate; how to deal with node… Continue reading Apache Hadoop (projects)