-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Search Tags in Elasticsearch from Classified Listing and Post Creation #6024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just some comments.
@@ -36,14 +36,6 @@ class Tags extends Component { | |||
prevLen: 0, | |||
showingRulesForTag: null, | |||
}; | |||
|
|||
const algoliaId = document.querySelector("meta[name='algolia-public-id']") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming we're not removing these <meta /> tags from markup in
_top.html.erb` yet because other things still rely on Algolia?
<meta name="algolia-public-id" content="<%= ApplicationConfig["ALGOLIASEARCH_APPLICATION_ID"] %>">
<meta name="algolia-public-key" content="<%= ALGOLIASEARCH_PUBLIC_SEARCH_ONLY_KEY %>">
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah
app/javascript/article-form/elements/__tests__/__snapshots__/tags.test.jsx.snap
Show resolved
Hide resolved
"value": "#56c938", | ||
}, | ||
}, | ||
"bg_color_hex": "#888751", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably doesn't matter for the use case in this PR, but if other parts of the application start to use Elastic Search for tag search, I imagine we'll need this other metadata about tags? e.g. bg_color_hex
, text_color_hex
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a possibility we end up needing other attributes in the future but for now I kept it only to what we are using. One thing about Elasticsearch is that you can very easily add fields, you CANNOT take them away without creating an entirely new index and reindexing all of your data. For this reason, I like to move slow on adding things and only do it when we absolutely need to.
.then(content => { | ||
return fetch(`/tags/search?name=${query}`, { | ||
method: 'GET', | ||
headers: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR, but I imagine we reuse these same headers for almost all calls on the frontend for our endpoints. It might be good to consider having this consolidated so a dev doesn't need to add common headers every time a fetch
call is made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have this utility https://github.com/thepracticaldev/dev.to/blob/master/app/assets/javascripts/utilities/sendFetch.js which is not great but it's something - but I think we can keep it like this for now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 💯 Added a couple of comments for quick clarification!
spec/requests/tags_spec.rb
Outdated
mock_search_response, | ||
) | ||
get "/tags/search" | ||
expect(response.parsed_body).to eq( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super clean 🔪
Would it be okay for me to something like this in the future?
source_hashes_array = mock_search_response.dig("hits", "hits").map { |tag| tag.dig("_source")
expect(response.parsed_body["result"]).to eq(source_hashes_array)
My thinking was I have to stop and walk through the digging of 2 levels deep, the map, and then the dig to understand what exactly the expected result format is. And with source_hashes_array
or some other variable name, I can read that and (hopefully) more quickly understand what is being pulled out of mock_search_response
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could definitely do it that way, honestly now that I look at it, the "hits" hash is very Elasticsearch specific. I think that is something we should tuck behind the Search::Model
class. Going to make that change
query_string: { | ||
query: query_string, | ||
analyze_wildcard: true, | ||
allow_leading_wildcard: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because leading wildcards are less performant? Or maybe there isn't a use case where we really need this ability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leading wildcards are WAY less performant and any ES person will always tell you to avoid them at all costs. Honestly, on small indexes they are not that bad, but when you get a big index 😬. One of the big lessons I learned from ES training is "Dont let your users take you down" and that means putting things like this in so if someone enters the tag "*tag" we stop that search from happening.
92105fd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L-O-V-E the change, thanks! LGTM 💯
spec/requests/tags_spec.rb
Outdated
mock_documents, | ||
) | ||
get "/tags/search" | ||
expect(response.parsed_body).to eq("result" => mock_documents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good on a first glance. I have a suggestion that I'd like you to consider before merging this PR or submitting other PRs to index things:
I think we should move TagsController#search
into SearchController#tags
and have:
/search/tags?params
/search/articles?params
and so on.
Another thing is that I wasn't able to test this locally :( - I first tried to run the scripts search.rake
(one of them is broken, search:destroy
as it calls a method that doesn't exist anymore). I then tried to fire up the console but encountered the following errors:
[4] pry(main)> Search::Cluster.recreate_indexes
ETHON: performed EASY effective_url=http://localhost:9200/tags_development response_code=200 return_code=ok total_time=0.008455
2020-02-12 11:00:00 +0100: HEAD http://localhost:9200/tags_development [status:200, request:0.010s, query:n/a]
2020-02-12 11:00:00 +0100: <
ETHON: performed EASY effective_url=http://localhost:9200/tags_development response_code=200 return_code=ok total_time=0.098912
2020-02-12 11:00:00 +0100: DELETE http://localhost:9200/tags_development [status:200, request:0.100s, query:n/a]
2020-02-12 11:00:00 +0100: < {"acknowledged":true}
ETHON: performed EASY effective_url=http://localhost:9200/tags_development response_code=404 return_code=ok total_time=0.00239
2020-02-12 11:00:00 +0100: HEAD http://localhost:9200/tags_development [status:404, request:0.004s, query:N/A]
2020-02-12 11:00:00 +0100: <
2020-02-12 11:00:00 +0100: [404]
ETHON: performed EASY effective_url=http://localhost:9200/tags_development response_code=200 return_code=ok total_time=0.354333
2020-02-12 11:00:00 +0100: PUT http://localhost:9200/tags_development [status:200, request:0.355s, query:n/a]
2020-02-12 11:00:00 +0100: > {"settings":{"index":{"number_of_shards":1,"number_of_replicas":0}}}
2020-02-12 11:00:00 +0100: < {"acknowledged":true,"shards_acknowledged":true,"index":"tags_development"}
ETHON: performed EASY effective_url=http://localhost:9200/tags_development/_alias/tags_development_alias response_code=400 return_code=ok total_time=0.004234
2020-02-12 11:00:00 +0100: PUT http://localhost:9200/tags_development/_alias/tags_development_alias [status:400, request:0.006s, query:N/A]
2020-02-12 11:00:00 +0100: < {"error":{"root_cause":[{"type":"invalid_alias_name_exception","reason":"Invalid alias name [tags_development_alias], an index exists with the same name as the alias","index_uuid":"jQGFOhT3ToyeMuHZ1NrFWQ","index":"tags_development_alias"}],"type":"invalid_alias_name_exception","reason":"Invalid alias name [tags_development_alias], an index exists with the same name as the alias","index_uuid":"jQGFOhT3ToyeMuHZ1NrFWQ","index":"tags_development_alias"},"status":400}
2020-02-12 11:00:00 +0100: [400] {"error":{"root_cause":[{"type":"invalid_alias_name_exception","reason":"Invalid alias name [tags_development_alias], an index exists with the same name as the alias","index_uuid":"jQGFOhT3ToyeMuHZ1NrFWQ","index":"tags_development_alias"}],"type":"invalid_alias_name_exception","reason":"Invalid alias name [tags_development_alias], an index exists with the same name as the alias","index_uuid":"jQGFOhT3ToyeMuHZ1NrFWQ","index":"tags_development_alias"},"status":400}
Elasticsearch::Transport::Transport::Errors::BadRequest: [400] {"error":{"root_cause":[{"type":"invalid_alias_name_exception","reason":"Invalid alias name [tags_development_alias], an index exists with the same name as the alias","index_uuid":"jQGFOhT3ToyeMuHZ1NrFWQ","index":"tags_development_alias"}],"type":"invalid_alias_name_exception","reason":"Invalid alias name [tags_development_alias], an index exists with the same name as the alias","index_uuid":"jQGFOhT3ToyeMuHZ1NrFWQ","index":"tags_development_alias"},"status":400}
from /Users/rhymes/.rvm/gems/ruby-2.6.5/gems/elasticsearch-transport-7.4.0/lib/elasticsearch/transport/transport/base.rb:205:in `__raise_transport_error'
The reason why I tried to recreate the index is because when I started the server and tried to search for tags in the editor, nothing came up (from the developer tools console I get {"result":[]}
as a response).
Last but not least, I think we should consider adding debounce
for this (cc @nickytonline on this). We do make use of it in the reading list to avoid firing too many HTTP calls:
.then(content => { | ||
return fetch(`/tags/search?name=${query}`, { | ||
method: 'GET', | ||
headers: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have this utility https://github.com/thepracticaldev/dev.to/blob/master/app/assets/javascripts/utilities/sendFetch.js which is not great but it's something - but I think we can keep it like this for now :)
@rhymes what likely happened in your case is when you originally tried to run the code you attempted to index a tag and since no tag index existed it created an index with the name of the alias. This then breaks the ability to recreate indexes because we can no longer use that alias. I'll set up a rake task that'll just blast everything so if you get in a bad situation like that you can start over. Also might be a good idea to turn off auto index creation. I like the idea of a search controller and keeping everything in one spot. I can't really think of a downside to it but I definitely like the idea of having all the code in one place and making it really easy to find |
01cc2a4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything special I need to do to get this working?
I deleted the index, ran rails search:setup
, fired up the web server but still can't get any results in the tag search. I also tried with Search::Tag.refresh_index
from the console but still get no results. I'm sure I'm missing an obvious step somewhere but I'm not sure which...
How does this work in development mode?
@rhymes you have to index all of your tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I wonder if we should add two things in the following PRs:
-
I know they were removed explicitly, but maybe we can add back the rules, as for example now I can find tags by keywords in the rule (but if this saves a lot of memory let's not do it). I do think we should mention this change in a changelog post on the website. We usually assume people don't use a feature, until we find out they do :D
-
add debouncing to the JS search function, as we might want to limit the amount of HTTP calls to the server
What type of PR is this? (check all applicable)
Description
This updates our Tag's searching for tagging a post and tagging a classified listing to use Elasticsearch instead of Algolia! 🙌
There is 1 minor feature change. BEFORE we were searching tag summary AND name. Now we are only searching the name field. I think this makes sense and I know I personally only expect to see tags autocomplete based on how well their name matches what I enter rather than their summary which I can't see.
Another small change, Elasticsearch by default returns the top 10 records, I stuck with that rather than 8 which is what we had before.
Related Tickets & Documents
https://github.com/thepracticaldev/dev.to/projects/6#card-32289413
Mobile & Desktop Screenshots/Recordings (if there are UI changes)
Added to documentation?