976 questions
-1
votes
1
answer
76
views
Unsupervised Topic Modeling for Short Event Descriptions
I have a dataset of approximately 750 lines containing quite short texts (less than 150 words each). These are all event descriptions related to a single broad topic (which I cannot specify for ...
0
votes
1
answer
90
views
MiniBatchKMeans BERTopic not returning topics for half of data
I am trying to topic a dataset of tweets. I have around 50 million tweets. Unfortunately, such a large dataset will not fit in ram (even 128GB) due to the embeddings. Therefore, I have been working on ...
0
votes
0
answers
35
views
Calculating Topic Correlations or Coocurrences for keyATM
I have been playing around with the keyATM package extensively, however unfortunately there is no approach how to calculate topic correlations and cooccurences, once the model is calculated. I already ...
0
votes
1
answer
100
views
Correct topics from LDA Sequence Model in Gensim
Python's Gensim package offers a dynamic topic model called LdaSeqModel(). I have run into the same problem as in this issue from the Gensim mailing list (which has not been solved). The problem is ...
1
vote
1
answer
140
views
Inspect all probabilities of BERTopic model
Say I build a BERTopic model using
from bertopic import BERTopic
topic_model = BERTopic(n_gram_range=(1, 1), nr_topics=20)
topics, probs = topic_model.fit_transform(docs)
Inspecting probs gives me ...
0
votes
0
answers
41
views
importing util library failed
i am trying to pip install bertopic command for installing and usng bertopic model, here is my next code :
from bertopic import BERTopic
topic_model = BERTopic.load("MaartenGr/BERTopic_Wikipedia&...
0
votes
0
answers
85
views
Unhashable type when calling HuggingFace topic model `topic_labels_` function
If I try to follow the topic modeling tutorial at: https://huggingface.co/docs/hub/en/bertopic
The first few lines give me an error:
from bertopic import BERTopic
topic_model = BERTopic.load("...
0
votes
0
answers
25
views
PackagesNotFound error even when verified packages as installed
I am trying to follow this tutorial for BERT topic modeling:
https://jpcompartir.github.io/BertopicR/
library(reticulate)
reticulate::install_miniconda()
library(BertopicR)
BertopicR::...
0
votes
0
answers
53
views
Topic modelling outputs are gender biased?
Has anyone had this issue?
My topic modelling seems to be presenting responses that are very dominated by male respondents.
The volume of responses across three different questions is over 800 in each ...
0
votes
1
answer
62
views
Stopwords problem in text data preprocessing in Python
I want to do topic modeling in Python. For this reason, I used my own stop word list, a stop word list I found on GitHub, and nltk's stop word list to clean the stopwords. However, when I examined the ...
0
votes
0
answers
41
views
Cannot find AIC/BIC of my topic modelling after using "lda.collapsed.gibbs.sampler" in LDA package
I have used "lda.collapsed.gibbs.sampler" to do my topic modelling and LDA visualisation, and now I want to determine which number of models (K) best fits my model. Then I tried to use AIC/...
4
votes
1
answer
469
views
Topic modelling many documents with low memory overhead
I've been working on a topic modelling project using BERTopic 0.16.3, and the preliminary results were promising. However, as the project progressed and the requirements became apparent, I ran into a ...
0
votes
1
answer
45
views
How to extract terms and probabilities from tmResult$terms in topic modeling?
I like to create separate word clouds for each of my 8 topics in an LDA model. I extracted top 40 words across 8 topics - an object of length 320 containing top words and occurrence probabilities.
I ...
0
votes
1
answer
100
views
How is coherence score calculated in Mallet?
I do understand how the diagnostics output shows the coherence values for each topic but my values range between -150 and -600 and other posts that I have seen where Mallet was used show coherence ...
0
votes
1
answer
65
views
Inconsistent Results When Running Python Mallet/Gibb's Sampling as a Soft-Clustering Method to Identify Optimal Number of Topics
Sorry, but I am inexperienced with Mallet and could use some help. I am currently trying to use Mallet as a soft-clustering technique to assign group membership for a given set of terms contained ...