Reference: CS224d-lecture8

Language Models

A language model computes a probability for a sequence of words:P(w1, w2, … , wT)

RNN advantages

Given list of word vectors: x1, x2, … , xt-1, xt, xt+1, … ,xT

At a single time step:

Main Idea use the same set of W weights at all time steps.

Same cross entropy loss function but predicting words instead of classes:

Evaluation could just be negative of average log probability over dataset of size (number of words) T:

bt more common: perplexity: 2^J , so low is better

Training RNNs is hard

Sequence modeling for other tasks

Opinion Mining with DNN

Goal: Classify each word as


Approach: RNN

Bidirectional RNNs

Problem: For classificatoin you want to incorporate information from words both preceding and following

Deep Bidirectional RNNs

Each memory layer passes an intermediate sequential representation to the next.


Below is a Bidirectional Recurrent Neural Network(LSTM) implementation using TensorFlow library.

from __future__ import print_function
import tensorflow as tf
from tensorflow.contrib import rnn
import numpy as np

# Import MNIST data