961 questions
1
vote
1
answer
99
views
How to improve word_similarity query performance in postgresql?
In a project, we are using postgresql v12.11 (updating is sadly not an option at the moment).
Consider the following relations:
document (
id uuid primary key
)
page (
id uuid primary key,
...
2
votes
0
answers
30
views
Maximum number of records for rapidfuzz duplicate searches
I’m using RapidFuzz (token_set_ratio) to detect near-duplicate news articles from Refinitiv data. My dataset size can go beyond 20k records, and I'm comparing record pairs using multiprocessing. While ...
4
votes
0
answers
141
views
Something better than Levenshtein for fuzzy search
It seems like Levenshtein distance is considered a standard, but for fuzzy searches, it doesn't quite work well.
For example, if we have the strings:
xyz
abc123456
And we search with the query:
abc
...
0
votes
0
answers
109
views
Optimise FAISS vector search results
I am using FAISS vector search to search across about 6 million data present in different vectors and then on top that results I am using fuzzysearch to filter out the top results.
The problem here is ...
0
votes
1
answer
50
views
R: fuzyyjoin, or other, to merge accelerometer and magneterometer data
I have two datasets, one containing accelerometer data and another magnetometer data. I want to merge both datasets but the values are not measured at the same milliseconds:
ACC data:
head(...
0
votes
1
answer
30
views
Fuzzy matching multi term query wrong results
Looks like I'm missing something obvious when trying to fuzzy match multi term query.
What I'd like to achieve is to get only "Goleniow Helenow" result when providing "Goleniow Heleniow&...
0
votes
1
answer
64
views
Azure AI search - Fuzzy search with same terms one char dfference
I have cosmos document with name "Test Only"
There is search index created on the cosmos.
when I am searching with search text "olly~1 AND olly~1" getting above record.
Below is ...
0
votes
0
answers
24
views
RapidFuzz discount matching of common tokens
Using
choices = [rapidfuzz.utils.default_process(sentence=x) for x in allcrmaccts['Account Name']]
choices = [re.sub('|'.join(extraneous),'',x) for x in choices]
choices = sorted(choices)
queries = [...
1
vote
1
answer
48
views
Fetch rows from PostgreSQL with rearranged words similar to a given string
I want to retrieve all rows from a PostgreSQL database that contain sentences similar to a provided string. The sentences in the database can have their words in any order (rearranged). How can I ...
2
votes
1
answer
272
views
Fuzzy Search with OpenStreetMap
I am trying to implement a restaurant search using OpenStreetMap that corrects typos similarly to Google Search. For example, if a user types 'Tresch,' it should still find the restaurant 'Brasserie ...
1
vote
0
answers
76
views
How to adjust mongoose query for Atlas Search?
This is my Index Definition in Monodb Atlas Search.
{
"mappings": {
"dynamic": false,
"fields": {
"title": {
"type": "...
0
votes
1
answer
34
views
Why does the PHP spread operator in an Elasticsearch query return different results than hardcoding parts of the query?
I've got a problem that I'm not entirely sure I understand. I've got the following Elasticsearch query, written in PHP, with fuzziness matching hardcoded; this results in about an expected 1,500 ...
0
votes
1
answer
112
views
Fuzziness AUTO doesn't give desired results
I am using fuzzy logic in elastic search to search for similar names in the watchlist.
I am trying
{
"query": {
"fuzzy": {
"column1": {
"value&...
2
votes
0
answers
638
views
How to implement a proper fuzzy search in Flutter (dart)?
I did research a lot before posting here and found 2 libraries fuzzy and distance, tried both but are not giving good results.
It is incredible the way the sort is working although my search term is ...
2
votes
2
answers
332
views
How can I find all exact occurrences of a string, or close matches of it, in a longer string in Python?
Goal:
I'd like to find all exact occurrences of a string, or close matches of it, in a longer string in Python.
I'd also like to know the location of these occurrences in the longer string.
To define ...