354 questions
1
vote
1
answer
63
views
Algorithm to find the best file match when the filename is often a substring of the search term
I'm making Chrome extension to find lyrics for song on YouTube. I have lyrics written in files in extension folder. Files are named Name of song.txt. What I'm trying to do is to match search term ...
0
votes
1
answer
66
views
Compare 2 large tables of objects by string similarity of one of their property in Javascript [closed]
I would like to compare 2 large tables of objects from two different databases:
an array of 2700 objects
an array of 1800 objects
Each object is a scientific publication with 30 properties. The aim ...
0
votes
0
answers
75
views
Rapidfuzz critical error when using workers != 1
When using the latest version 3.9.6 of rapidfuzz, I get a critical error when using workers != 1, i.e. I cannot use multi-processing to speed-up string comparisons:
Process finished with exit code -...
-1
votes
1
answer
71
views
Fuzzy name matching with conditions
I want to do a fuzzy match between two dfs. The first df has a list of names with unique IDs and other data, e.g.:
ID Name A B
0 12445 John Smith a b
1 23455 Jack Smith ...
4
votes
0
answers
101
views
Performing a join between dataframes with fuzzy matching without iterrows?
I have looked around a bit, but have not found a similar question, forgive me if I missed something.
Using pandas, I am trying to write a function to merge 2 dataframes : df_ref and df_to_merge.
...
-1
votes
1
answer
88
views
Best way to match strings from different systems [closed]
Suppose that I have a list of strings like this (the real dataset is far larger and contains other data too):
List<string> modelNames =
[
"XC60 Momentum Standard T6",
...
0
votes
1
answer
2k
views
Grouping similar text?
I have a list of landowners and whatever is highlighted shows a similar text string. these highlighted groupings are the same landowner but a slightly different text string. I was thinking maybe ...
0
votes
1
answer
198
views
Python or R Context aware fuzzy matching
I am trying to match two string columns containing food descriptions [foods1 and foods2]. I applied an algorithm weighting the word frequency so less frequent words have more weight but it fails as it ...
-2
votes
1
answer
699
views
Fuzzy comparison of strings in lists of huge length (taking into account performance)
I have two lists:
The first list I get from the database is the names of various companies (can be written in uppercase, lowercase or a combination)
list_from_DB = ["Reebok", "MAZDA&...
0
votes
0
answers
81
views
String Match using Fuzzy Lookup in Excel
I am trying to use Fuzzy Lookup to match two strings in two columns of a table that looks like below.
Table1 Table2
| Column A | Column B |
| -------- | -------- |
| Flower.com |...
0
votes
1
answer
52
views
Fuzzyfication of an excel file using ranges from a txt with simpful
I want to fuzzyficate this excel file using simpful:
with these fuzzy rules:
In this case, for example, I'd need the excel to be 'FIFTIES' if the age is between 50 and 59 and EVOL to be '10to15' if ...
0
votes
2
answers
74
views
Best way to Join 2 Tables with columns containing SIMILAR data
I am having trouble joining to tables together, the 2 columns have similar data but not exactly the same data.
Example:
Table 1: Column 1 = "Expect rain for todays weather"
Table 2: Column 2 ...
2
votes
1
answer
230
views
Is it possible to merge two tables in Power Query Editor (Power BI) with Python fuzzy matching?
Merge two tables in power query editor (Power BI) based on string similarity with Python
Consider the tables bellow:
Table1
Table1
Name
...
Apple Fruit A11
...
Banana Fruit B12
...
...
...
Table2
...
1
vote
1
answer
81
views
use adist to determine which element only needs deletions
I've got this vector of strings (y) and a single string (x) which I want to compare and see which y fits x best if only deletions are considered.
x = "PCOR1"
y = c("PCor", "...
0
votes
1
answer
749
views
Levenshtein on dataframe column and input list
New to pyspark and I need to do fuzzy match. Found that levenhenstein is a native function which can do that. I have a dataframe like this:
+----------------+----------------+
| col1| ...