COLLECTED BY
Organization:
Alexa Crawls
Starting in 1996,
Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the
Wayback Machine after an embargo period.
Starting in 1996,
Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the
Wayback Machine after an embargo period.
The Wayback Machine - http://web.archive.org/web/20161126004857/http://www.cs.princeton.edu:80/~blei/lda-c/
Latent Dirichlet Allocation in C
Latent Dirichlet allocation
This is a C implementation of variational EM for latent Dirichlet
allocation (LDA), a topic model for text or other discrete data. LDA
allows you to analyze of corpus, and extract the topics that combined
to form its documents. For example, click
here to see the topics estimated from a small corpus of
Associated Press documents. LDA is fully described in
Blei et al. (2003) .
This code contains:
- an implementation of variational inference for the per-document
topic proportions and per-word topic assignments
- a variational EM procedure for estimating the topics
and exchangeable Dirichlet hyperparameter
Downloads
Download the readme.txt .
Download the code: lda-c.tgz .
Sample data
2246 documents from the Associated Press [ download ].
Top 20 words from 100 topics estimated from the AP corpus [pdf].
Bug fixes and updates
To learn about bug-fixes, updates, and discuss LDA and related
techniques, please join the topic-models mailing list,
topic-models [at] lists.cs.princeton.edu.
To join, click here .
Other implementations on the web
There are several other implementations of LDA on the web: