data science · machine learning

Deep Learning – NLP


Neural Language Model

predict the next work, replace HMM,



different architectures

stateful LSTM: memorize last batch. dependent

stateless LSTM: update parameter in batch one, when batch two, initialize hidden states and cell states to zero. batch to batch. independent in different batches

Word2Vec: CBOW, skip-grams | Glove (cannot solve the problem of synonyms)

ELMo (Embedding from Language Models): Deep contextualized word representation


Instead of using a fixed embedding for each word, ELMo looks at the entire sentence before assigning each word in it an embedding. It uses a bi-directional LSTM trained on a specific task to be able to create those embeddings. ELMo gained its language understanding from being trained to predict the next word in a sequence of words – a task called Language Modeling.


ULM-FiT introduced a language model and a process to effectively fine-tune that language model for various tasks.

NLP finally had a way to do transfer learning probably as well as Computer Vision could.

GPT (Generative Pre-Training)



An attention model differs from a classic sequence-to-sequence model in two main ways:

First, the encoder passes a lot more data to the decoder. Instead of passing the last hidden state of the encoding stage, the encoder passes all the hidden states to the decoder

Second, an attention decoder does an extra step before producing its output. In order to focus on the parts of the input that are relevant to this decoding time step, the decoder does the following:

  1. Look at the set of encoder hidden states it received – each encoder hidden states is most associated with a certain word in the input sentence
  2. Give each hidden states a score (let’s ignore how the scoring is done for now)
  3. Multiply each hidden states by its softmaxed score, thus amplifying hidden states with high scores, and drowning out hidden states with low scores



Screen Shot 2019-04-26 at 10.49.09 AM

OpenAI Transformer

transfer learning to downstream tasks???

only using decoders


Neural Machine Translation


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s