Skip to main content
-1 votes
1 answer
76 views

Unsupervised Topic Modeling for Short Event Descriptions

I have a dataset of approximately 750 lines containing quite short texts (less than 150 words each). These are all event descriptions related to a single broad topic (which I cannot specify for ...
Arthur GONAY's user avatar
0 votes
1 answer
90 views

MiniBatchKMeans BERTopic not returning topics for half of data

I am trying to topic a dataset of tweets. I have around 50 million tweets. Unfortunately, such a large dataset will not fit in ram (even 128GB) due to the embeddings. Therefore, I have been working on ...
Matthieu B's user avatar
0 votes
0 answers
35 views

Calculating Topic Correlations or Coocurrences for keyATM

I have been playing around with the keyATM package extensively, however unfortunately there is no approach how to calculate topic correlations and cooccurences, once the model is calculated. I already ...
dpaltra22's user avatar
0 votes
1 answer
100 views

Correct topics from LDA Sequence Model in Gensim

Python's Gensim package offers a dynamic topic model called LdaSeqModel(). I have run into the same problem as in this issue from the Gensim mailing list (which has not been solved). The problem is ...
hyco's user avatar
  • 221
1 vote
1 answer
140 views

Inspect all probabilities of BERTopic model

Say I build a BERTopic model using from bertopic import BERTopic topic_model = BERTopic(n_gram_range=(1, 1), nr_topics=20) topics, probs = topic_model.fit_transform(docs) Inspecting probs gives me ...
coolhand's user avatar
  • 2,109
0 votes
0 answers
41 views

importing util library failed

i am trying to pip install bertopic command for installing and usng bertopic model, here is my next code : from bertopic import BERTopic topic_model = BERTopic.load("MaartenGr/BERTopic_Wikipedia&...
user avatar
0 votes
0 answers
85 views

Unhashable type when calling HuggingFace topic model `topic_labels_` function

If I try to follow the topic modeling tutorial at: https://huggingface.co/docs/hub/en/bertopic The first few lines give me an error: from bertopic import BERTopic topic_model = BERTopic.load("...
coolhand's user avatar
  • 2,109
0 votes
0 answers
25 views

PackagesNotFound error even when verified packages as installed

I am trying to follow this tutorial for BERT topic modeling: https://jpcompartir.github.io/BertopicR/ library(reticulate) reticulate::install_miniconda() library(BertopicR) BertopicR::...
coolhand's user avatar
  • 2,109
0 votes
0 answers
53 views

Topic modelling outputs are gender biased?

Has anyone had this issue? My topic modelling seems to be presenting responses that are very dominated by male respondents. The volume of responses across three different questions is over 800 in each ...
GrBrn's user avatar
  • 3
0 votes
1 answer
62 views

Stopwords problem in text data preprocessing in Python

I want to do topic modeling in Python. For this reason, I used my own stop word list, a stop word list I found on GitHub, and nltk's stop word list to clean the stopwords. However, when I examined the ...
deniz's user avatar
  • 11
0 votes
0 answers
41 views

Cannot find AIC/BIC of my topic modelling after using "lda.collapsed.gibbs.sampler" in LDA package

I have used "lda.collapsed.gibbs.sampler" to do my topic modelling and LDA visualisation, and now I want to determine which number of models (K) best fits my model. Then I tried to use AIC/...
Pang kalok's user avatar
4 votes
1 answer
469 views

Topic modelling many documents with low memory overhead

I've been working on a topic modelling project using BERTopic 0.16.3, and the preliminary results were promising. However, as the project progressed and the requirements became apparent, I ran into a ...
Bbrk24's user avatar
  • 1,033
0 votes
1 answer
45 views

How to extract terms and probabilities from tmResult$terms in topic modeling?

I like to create separate word clouds for each of my 8 topics in an LDA model. I extracted top 40 words across 8 topics - an object of length 320 containing top words and occurrence probabilities. I ...
NoaMi's user avatar
  • 41
0 votes
1 answer
100 views

How is coherence score calculated in Mallet?

I do understand how the diagnostics output shows the coherence values for each topic but my values range between -150 and -600 and other posts that I have seen where Mallet was used show coherence ...
Glorifier's user avatar
0 votes
1 answer
65 views

Inconsistent Results When Running Python Mallet/Gibb's Sampling as a Soft-Clustering Method to Identify Optimal Number of Topics

Sorry, but I am inexperienced with Mallet and could use some help. I am currently trying to use Mallet as a soft-clustering technique to assign group membership for a given set of terms contained ...
A Bolton's user avatar

15 30 50 per page
1
2 3 4 5
66