data science · machine learning

Deep Learning – NLP

https://zhuanlan.zhihu.com/p/49271699

Home

https://jalammar.github.io/

Neural Language Model

predict the next work, replace HMM,

RNN

LSTM

different architectures

stateful LSTM: memorize last batch. dependent

stateless LSTM: update parameter in batch one, when batch two, initialize hidden states and cell states to zero. batch to batch. independent in different batches

Word2Vec: CBOW, skip-grams | Glove (cannot solve the problem of synonyms)

ELMo (Embedding from Language Models): Deep contextualized word representation

preview

Instead of using a fixed embedding for each word, ELMo looks at the entire sentence before assigning each word in it an embedding. It uses a bi-directional LSTM trained on a specific task to be able to create those embeddings. ELMo gained its language understanding from being trained to predict the next word in a sequence of words – a task called Language Modeling.

ULM-FiT

ULM-FiT introduced a language model and a process to effectively fine-tune that language model for various tasks.

NLP finally had a way to do transfer learning probably as well as Computer Vision could.

GPT (Generative Pre-Training)

 

Attention

An attention model differs from a classic sequence-to-sequence model in two main ways:

First, the encoder passes a lot more data to the decoder. Instead of passing the last hidden state of the encoding stage, the encoder passes all the hidden states to the decoder

Second, an attention decoder does an extra step before producing its output. In order to focus on the parts of the input that are relevant to this decoding time step, the decoder does the following:

  1. Look at the set of encoder hidden states it received – each encoder hidden states is most associated with a certain word in the input sentence
  2. Give each hidden states a score (let’s ignore how the scoring is done for now)
  3. Multiply each hidden states by its softmaxed score, thus amplifying hidden states with high scores, and drowning out hidden states with low scores

Transformer

http://nlp.seas.harvard.edu/2018/04/03/attention.html

parallelization

Screen Shot 2019-04-26 at 10.49.09 AM

OpenAI Transformer

transfer learning to downstream tasks???

only using decoders

BERT

Neural Machine Translation

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s