5 questions
0
votes
1
answer
97
views
Compute Jaccard Index for two similar but not equal shapefiles in R [closed]
I have two distinct shapefiles that have a high degree of overlap, but aren't the same. I want to make a comparison and one of the things I would like to generate is the Jaccard Index of regions ...
0
votes
0
answers
48
views
Evaluating Fuzzy clustering quality
Initially, I performed kmeans clustering and obtained some meaningful clusters. To refine these clusters, I ran Fuzzy C Means on the Kmeans center using "e1071" package. Are there any ...
0
votes
0
answers
58
views
How to optimize PySpark code to calculate Jaccard Similarity for a huge dataset
I have a huge PySpark dataframe that contains 250 million rows, with columns ItemA and ItemB. I'm trying to calculate the Jaccard Similarity M_ij that can run efficiently and takes a short amount of ...
2
votes
0
answers
59
views
What's going wrong in these weighted jaccard sum calculations for comparing the pronunciation of consonant clusters? [closed]
Context
I have this code for my attempt to create a "similarity mapping" between consonants (or consonant clusters), to the same set of consonants/clusters (basically a cross product mapping)...
0
votes
3
answers
88
views
R: calculation distance matrix between two lists of strings
Please consider the reprex at the end of the post.
I have two lists of dataframes. Each dataframe has a $keyword column, which is a vector of text.
I am looking for a computationally efficient way to ...