data science · machine learning

Deep Learning – NLP

https://zhuanlan.zhihu.com/p/49271699 Home https://jalammar.github.io/ Neural Language Model predict the next work, replace HMM, RNN LSTM different architectures stateful LSTM: memorize last batch. dependent stateless LSTM: update parameter in batch one, when batch two, initialize hidden states and cell states to zero. batch to batch. independent in different batches Word2Vec: CBOW, skip-grams | Glove (cannot solve the… Continue reading Deep Learning – NLP

data science · machine learning

Interpretability of ML

https://github.com/jphall663/awesome-machine-learning-interpretability https://christophm.github.io/interpretable-ml-book/index.html Global Interpretability Partial Dependence and Partial Dependence Plot (PDP) Individual Conditional Expectation (ICE) Total and two-way H Statistics Global Feature importance using permutation Global surrogate model Local Interpretability Local Interpretable Model-agnostic Explanations (LIME) Shapley additive explanation An intuitive way to understand the Shapley value is the following illustration: The feature values enter a… Continue reading Interpretability of ML

data science · machine learning

Classification

Elements of a model Objective Model structure (e.g. variables, formula, equation, parameters) Model assumptions Parameter estimates and interpretation Model fit (e.g. goodness-of-fit tests and statistics) Model selection LDA generative model model p(x|y) as multivariate Gaussian, Both classes have the same covariance matrix, Σ QDA Each class has their own Σ Naive Bayes generative model Assume the xj… Continue reading Classification

Big Data · data science · machine learning · programming

Apache Hadoop (projects)

QUESTIONS setInputFormat comparator top k frequent words HADOOP SYSTEM Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. HDFS(Hadoop distributed file system): data storage (data split and data replication) Map Reduce(data processing): how to leverage job; how do nodes communicate; how to deal with node… Continue reading Apache Hadoop (projects)