data science · machine learning

Deep Learning – NLP Home Word2Vec: CBOW, skip-grams | Glove (cannot solve the problem of synonyms) ELMo (Embedding from Language Models): Deep contextualized word representation Instead of using a fixed embedding for each word, ELMo looks at the entire sentence before assigning each word in it an embedding. It uses a bi-directional LSTM trained on a specific… Continue reading Deep Learning – NLP

data science · machine learning

Interpretability of ML Global Interpretability Partial Dependence and Partial Dependence Plot (PDP) Individual Conditional Expectation (ICE) Total and two-way H Statistics Global Feature importance using permutation Global surrogate model Local Interpretability Local Interpretable Model-agnostic Explanations (LIME) Shapley additive explanation An intuitive way to understand the Shapley value is the following illustration: The feature values enter a… Continue reading Interpretability of ML

data science · machine learning


Elements of a model Objective Model structure (e.g. variables, formula, equation, parameters) Model assumptions Parameter estimates and interpretation Model fit (e.g. goodness-of-fit tests and statistics) Model selection LDA generative model model p(x|y) as multivariate Gaussian, Both classes have the same covariance matrix, Σ QDA Each class has their own Σ Naive Bayes generative model Assume the xj… Continue reading Classification

Big Data · data science · machine learning · programming

Apache Hadoop (projects)

QUESTIONS setInputFormat comparator top k frequent words HADOOP SYSTEM Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. HDFS(Hadoop distributed file system): data storage (data split and data replication) Map Reduce(data processing): how to leverage job; how do nodes communicate; how to deal with node… Continue reading Apache Hadoop (projects)