# Topic Model Implementation

## 1. Introduction

## 2. LDA

LDA is a generative probabilistic model for collections of **grouped discrete data**.Each group is described as **a random mixture** over a set of latent topics where each topic is a discrete distribution over the collection’s vocabulary.

Corpus: a collection of documents

data: words

**The generative process for a document collection D under the LDA model:**

For k = 1, … K

a) draw k topic-word distribution phi_k from Dirichlet(beta)

For each **document d** belongs to D

a) draw a doc-topic distribution for current document from Dirichlet(alpha)

b) for each word **w_i** belongs to d

i. z_i <- Dsicrete(theta(d))

ii. w_i <- Discrete(phi(z_i))

The generative process described above results in the following joint distribution:

**p(w, z, θ, φ | α, β) = p(φ | β)p(θ | α)p(z | θ)p(w | φ z )** |