Topic Model Implementation

1. Introduction

2. LDA

LDA is a generative probabilistic model for collections of grouped discrete data.Each group is described as a random mixture over a set of latent topics where each topic is a discrete distribution over the collection’s vocabulary.

Corpus: a collection of documents

data: words

The generative process for a document collection D under the LDA model:

The generative process described above results in the following joint distribution:

​ **p(w, z, θ, φα, β) = p(φβ)p(θα)p(zθ)p(wφ z )**