Skip to content

Search Tags in Elasticsearch from Classified Listing and Post Creation #6024

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 17, 2020

Conversation

mstruve
Copy link
Contributor

@mstruve mstruve commented Feb 11, 2020

What type of PR is this? (check all applicable)

  • Feature

Description

This updates our Tag's searching for tagging a post and tagging a classified listing to use Elasticsearch instead of Algolia! 🙌

There is 1 minor feature change. BEFORE we were searching tag summary AND name. Now we are only searching the name field. I think this makes sense and I know I personally only expect to see tags autocomplete based on how well their name matches what I enter rather than their summary which I can't see.

Another small change, Elasticsearch by default returns the top 10 records, I stuck with that rather than 8 which is what we had before.

Related Tickets & Documents

https://github.com/thepracticaldev/dev.to/projects/6#card-32289413

Mobile & Desktop Screenshots/Recordings (if there are UI changes)

Screen Shot 2020-02-11 at 1 58 17 PM

Screen Shot 2020-02-11 at 2 01 02 PM

Added to documentation?

  • no documentation needed

alt_text

@mstruve mstruve requested review from a team and atsmith813 and removed request for a team February 11, 2020 19:19
@pr-triage pr-triage bot added the PR: unreviewed bot applied label for PR's with no review label Feb 11, 2020
Copy link
Contributor

@nickytonline nickytonline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some comments.

@@ -36,14 +36,6 @@ class Tags extends Component {
prevLen: 0,
showingRulesForTag: null,
};

const algoliaId = document.querySelector("meta[name='algolia-public-id']")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming we're not removing these <meta /> tags from markup in _top.html.erb` yet because other things still rely on Algolia?

      <meta name="algolia-public-id" content="<%= ApplicationConfig["ALGOLIASEARCH_APPLICATION_ID"] %>">
      <meta name="algolia-public-key" content="<%= ALGOLIASEARCH_PUBLIC_SEARCH_ONLY_KEY %>">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah

"value": "#56c938",
},
},
"bg_color_hex": "#888751",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably doesn't matter for the use case in this PR, but if other parts of the application start to use Elastic Search for tag search, I imagine we'll need this other metadata about tags? e.g. bg_color_hex, text_color_hex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a possibility we end up needing other attributes in the future but for now I kept it only to what we are using. One thing about Elasticsearch is that you can very easily add fields, you CANNOT take them away without creating an entirely new index and reindexing all of your data. For this reason, I like to move slow on adding things and only do it when we absolutely need to.

.then(content => {
return fetch(`/tags/search?name=${query}`, {
method: 'GET',
headers: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but I imagine we reuse these same headers for almost all calls on the frontend for our endpoints. It might be good to consider having this consolidated so a dev doesn't need to add common headers every time a fetch call is made.

Copy link
Contributor

@rhymes rhymes Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have this utility https://github.com/thepracticaldev/dev.to/blob/master/app/assets/javascripts/utilities/sendFetch.js which is not great but it's something - but I think we can keep it like this for now :)

atsmith813
atsmith813 previously approved these changes Feb 11, 2020
Copy link
Contributor

@atsmith813 atsmith813 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 💯 Added a couple of comments for quick clarification!

mock_search_response,
)
get "/tags/search"
expect(response.parsed_body).to eq(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super clean 🔪

Would it be okay for me to something like this in the future?

source_hashes_array = mock_search_response.dig("hits", "hits").map { |tag| tag.dig("_source")

expect(response.parsed_body["result"]).to eq(source_hashes_array)

My thinking was I have to stop and walk through the digging of 2 levels deep, the map, and then the dig to understand what exactly the expected result format is. And with source_hashes_array or some other variable name, I can read that and (hopefully) more quickly understand what is being pulled out of mock_search_response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could definitely do it that way, honestly now that I look at it, the "hits" hash is very Elasticsearch specific. I think that is something we should tuck behind the Search::Model class. Going to make that change

query_string: {
query: query_string,
analyze_wildcard: true,
allow_leading_wildcard: false
Copy link
Contributor

@atsmith813 atsmith813 Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because leading wildcards are less performant? Or maybe there isn't a use case where we really need this ability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leading wildcards are WAY less performant and any ES person will always tell you to avoid them at all costs. Honestly, on small indexes they are not that bad, but when you get a big index 😬. One of the big lessons I learned from ES training is "Dont let your users take you down" and that means putting things like this in so if someone enters the tag "*tag" we stop that search from happening.

@pr-triage pr-triage bot added PR: reviewed-approved bot applied label for PR's where reviewer approves changes and removed PR: unreviewed bot applied label for PR's with no review labels Feb 11, 2020
nickytonline
nickytonline previously approved these changes Feb 11, 2020
@mstruve mstruve dismissed stale reviews from nickytonline and atsmith813 via 92105fd February 11, 2020 21:59
@pr-triage pr-triage bot added PR: unreviewed bot applied label for PR's with no review and removed PR: reviewed-approved bot applied label for PR's where reviewer approves changes labels Feb 11, 2020
atsmith813
atsmith813 previously approved these changes Feb 11, 2020
Copy link
Contributor

@atsmith813 atsmith813 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L-O-V-E the change, thanks! LGTM 💯

mock_documents,
)
get "/tags/search"
expect(response.parsed_body).to eq("result" => mock_documents)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love_it_gif

@pr-triage pr-triage bot added PR: reviewed-approved bot applied label for PR's where reviewer approves changes and removed PR: unreviewed bot applied label for PR's with no review labels Feb 11, 2020
@pr-triage pr-triage bot added PR: unreviewed bot applied label for PR's with no review and removed PR: reviewed-approved bot applied label for PR's where reviewer approves changes labels Feb 11, 2020
@mstruve mstruve requested a review from atsmith813 February 11, 2020 22:28
atsmith813
atsmith813 previously approved these changes Feb 11, 2020
@pr-triage pr-triage bot added PR: reviewed-approved bot applied label for PR's where reviewer approves changes and removed PR: unreviewed bot applied label for PR's with no review labels Feb 11, 2020
nickytonline
nickytonline previously approved these changes Feb 11, 2020
Copy link
Contributor

@nickytonline nickytonline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@rhymes rhymes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good on a first glance. I have a suggestion that I'd like you to consider before merging this PR or submitting other PRs to index things:

I think we should move TagsController#search into SearchController#tags and have:

  • /search/tags?params
  • /search/articles?params

and so on.

Another thing is that I wasn't able to test this locally :( - I first tried to run the scripts search.rake (one of them is broken, search:destroy as it calls a method that doesn't exist anymore). I then tried to fire up the console but encountered the following errors:

[4] pry(main)> Search::Cluster.recreate_indexes
ETHON: performed EASY effective_url=http://localhost:9200/tags_development response_code=200 return_code=ok total_time=0.008455
2020-02-12 11:00:00 +0100: HEAD http://localhost:9200/tags_development [status:200, request:0.010s, query:n/a]
2020-02-12 11:00:00 +0100: <
ETHON: performed EASY effective_url=http://localhost:9200/tags_development response_code=200 return_code=ok total_time=0.098912
2020-02-12 11:00:00 +0100: DELETE http://localhost:9200/tags_development [status:200, request:0.100s, query:n/a]
2020-02-12 11:00:00 +0100: < {"acknowledged":true}
ETHON: performed EASY effective_url=http://localhost:9200/tags_development response_code=404 return_code=ok total_time=0.00239
2020-02-12 11:00:00 +0100: HEAD http://localhost:9200/tags_development [status:404, request:0.004s, query:N/A]
2020-02-12 11:00:00 +0100: <
2020-02-12 11:00:00 +0100: [404]
ETHON: performed EASY effective_url=http://localhost:9200/tags_development response_code=200 return_code=ok total_time=0.354333
2020-02-12 11:00:00 +0100: PUT http://localhost:9200/tags_development [status:200, request:0.355s, query:n/a]
2020-02-12 11:00:00 +0100: > {"settings":{"index":{"number_of_shards":1,"number_of_replicas":0}}}
2020-02-12 11:00:00 +0100: < {"acknowledged":true,"shards_acknowledged":true,"index":"tags_development"}
ETHON: performed EASY effective_url=http://localhost:9200/tags_development/_alias/tags_development_alias response_code=400 return_code=ok total_time=0.004234
2020-02-12 11:00:00 +0100: PUT http://localhost:9200/tags_development/_alias/tags_development_alias [status:400, request:0.006s, query:N/A]
2020-02-12 11:00:00 +0100: < {"error":{"root_cause":[{"type":"invalid_alias_name_exception","reason":"Invalid alias name [tags_development_alias], an index exists with the same name as the alias","index_uuid":"jQGFOhT3ToyeMuHZ1NrFWQ","index":"tags_development_alias"}],"type":"invalid_alias_name_exception","reason":"Invalid alias name [tags_development_alias], an index exists with the same name as the alias","index_uuid":"jQGFOhT3ToyeMuHZ1NrFWQ","index":"tags_development_alias"},"status":400}
2020-02-12 11:00:00 +0100: [400] {"error":{"root_cause":[{"type":"invalid_alias_name_exception","reason":"Invalid alias name [tags_development_alias], an index exists with the same name as the alias","index_uuid":"jQGFOhT3ToyeMuHZ1NrFWQ","index":"tags_development_alias"}],"type":"invalid_alias_name_exception","reason":"Invalid alias name [tags_development_alias], an index exists with the same name as the alias","index_uuid":"jQGFOhT3ToyeMuHZ1NrFWQ","index":"tags_development_alias"},"status":400}
Elasticsearch::Transport::Transport::Errors::BadRequest: [400] {"error":{"root_cause":[{"type":"invalid_alias_name_exception","reason":"Invalid alias name [tags_development_alias], an index exists with the same name as the alias","index_uuid":"jQGFOhT3ToyeMuHZ1NrFWQ","index":"tags_development_alias"}],"type":"invalid_alias_name_exception","reason":"Invalid alias name [tags_development_alias], an index exists with the same name as the alias","index_uuid":"jQGFOhT3ToyeMuHZ1NrFWQ","index":"tags_development_alias"},"status":400}
from /Users/rhymes/.rvm/gems/ruby-2.6.5/gems/elasticsearch-transport-7.4.0/lib/elasticsearch/transport/transport/base.rb:205:in `__raise_transport_error'

The reason why I tried to recreate the index is because when I started the server and tried to search for tags in the editor, nothing came up (from the developer tools console I get {"result":[]} as a response).

Last but not least, I think we should consider adding debounce for this (cc @nickytonline on this). We do make use of it in the reading list to avoid firing too many HTTP calls:

https://github.com/thepracticaldev/dev.to/blob/3f08572306f8e0c7650379097131240c09f532d8/app/javascript/readingList/readingList.jsx#L42-L44

.then(content => {
return fetch(`/tags/search?name=${query}`, {
method: 'GET',
headers: {
Copy link
Contributor

@rhymes rhymes Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have this utility https://github.com/thepracticaldev/dev.to/blob/master/app/assets/javascripts/utilities/sendFetch.js which is not great but it's something - but I think we can keep it like this for now :)

@mstruve
Copy link
Contributor Author

mstruve commented Feb 12, 2020

@rhymes what likely happened in your case is when you originally tried to run the code you attempted to index a tag and since no tag index existed it created an index with the name of the alias. This then breaks the ability to recreate indexes because we can no longer use that alias. I'll set up a rake task that'll just blast everything so if you get in a bad situation like that you can start over. Also might be a good idea to turn off auto index creation.

I like the idea of a search controller and keeping everything in one spot. I can't really think of a downside to it but I definitely like the idea of having all the code in one place and making it really easy to find

@benhalpern
Copy link
Contributor

@rhymes @mstruve yeah search controller makes sense...Especially for true search usecases. If we use Algolia for more resource retrieval scenarios, then it could live under that resource controller.

@mstruve mstruve dismissed stale reviews from nickytonline and atsmith813 via 01cc2a4 February 12, 2020 19:22
@mstruve mstruve requested a review from rhymes February 12, 2020 20:06
Copy link
Contributor

@benhalpern benhalpern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me

Copy link
Contributor

@rhymes rhymes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything special I need to do to get this working?

I deleted the index, ran rails search:setup, fired up the web server but still can't get any results in the tag search. I also tried with Search::Tag.refresh_index from the console but still get no results. I'm sure I'm missing an obvious step somewhere but I'm not sure which...

How does this work in development mode?

@mstruve
Copy link
Contributor Author

mstruve commented Feb 13, 2020

@rhymes you have to index all of your tags Tag.find_each(&:index_to_elasticsearch) which is what the first DataUpdateScript is going to be once that PR goes out

Copy link
Contributor

@rhymes rhymes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I wonder if we should add two things in the following PRs:

  • I know they were removed explicitly, but maybe we can add back the rules, as for example now I can find tags by keywords in the rule (but if this saves a lot of memory let's not do it). I do think we should mention this change in a changelog post on the website. We usually assume people don't use a feature, until we find out they do :D

  • add debouncing to the JS search function, as we might want to limit the amount of HTTP calls to the server

@mstruve mstruve merged commit 7ff79f7 into master Feb 17, 2020
@pr-triage pr-triage bot added PR: merged bot applied label for PR's that are merged and removed PR: reviewed-approved bot applied label for PR's where reviewer approves changes labels Feb 17, 2020
@mstruve mstruve deleted the mstruve/search-tags branch February 17, 2020 13:16
mstruve added a commit that referenced this pull request Feb 17, 2020
@mstruve mstruve mentioned this pull request Feb 17, 2020
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: merged bot applied label for PR's that are merged
5 participants
close