0

In Elasticsearch, I want to query terms and also get results from URLs. Therefore I've tried to let the field "url" be analyzed and queried the following way - but the result was always empty.

index-config.json:

{
  "mappings": {
    "Mytype": {
      "properties": {
        "about": {
          "url": {
            "type": "string",
            "analyzer":"url_analyzer"
  }}}}},
  "settings" : {
    "analysis": {
      "analyzer": {
        "url_analyzer": {
          "type": "custom",
          "tokenizer": "url_tokenizer"
        }
      },
      "tokenizer": {
        "url_tokenizer" : {
          "type": "pattern",
          "pattern": "[\\.:/]+"
}}}}}

Query in head-plugin:

{
  "query": {
    "bool": {
      "must": [{
          "query_string": {
            "default_field": "Mytype.url",
            "query": "myquery"
      }}],
      "must_not": [],
      "should": []
  }},
  "from": 0,
  "size": 10,
  "sort": [],
  "facets": {}
}

(I've queried a bit differently through the Java API as well - same problem occuring.)

Result:

Now, that works if I take e.g. stackoverflow.com as myquery.
But the result is empty, if I take stackoverflow only.

That confuses me, because I think the pattern of the url_tokenizer should take the . as a limiter.

What is wrong here?

7
  • Your mapping seems not well-formed, i.e. I'm not sure about the "properties / about / url" field, something must be missing. If about is an object field, then you must include the url sub-field inside a properties structure. Can you extract the actual mapping using curl -XGET localhost:9200/your_index and update your question with it? Commented Mar 1, 2016 at 13:47
  • @Val Thanks for reply! This is JSON Schema notation and should not be part of the problem, since search for "stackoverflow.com" and many other queries work quite will. Nevertheless I've extracted and checked the actual mapping. It is exactly the same as posted above. Commented Mar 1, 2016 at 14:19
  • I'll check the details shortly, but the following answer should give you a head start: stackoverflow.com/questions/34887458/… Commented Mar 1, 2016 at 14:20
  • @Val I've installed the analysis-url plugin mentioned in stackoverflow.com/a/34986008/4420271 and it the log told me it has been installed without an error. After restarting ES, I still get the following exception: org.elasticsearch.ElasticsearchIllegalArgumentException: failed to find token filter type [url] for [url_host]. ==> Maybe a new question? Commented Mar 1, 2016 at 15:31
  • I was going to submit an answer working with that plugin, but I'm not done yet :-) But judging by the error, it simply looks like the plugin is not installed apparently. Can you make sure you see its name appearing in the logs when ES starts up? Commented Mar 1, 2016 at 15:35

1 Answer 1

3

I came across this as I too was looking for an elasticsearch analyzer where I could search for stackoverflow and stackoverflow.com. I too thought your pattern tokenizer looked like it should work, but it didn't for me either. Instead of looking into why it didn't I ended up just using the built-in lowercase tokenizer which splits the tokens on all non-letter characters, which not perfect for domains with non letter characters in the URL, but good enough for my use-case. I also filtered the http and https tokens so searching for either alone didn't return every result but searching for http://stackoverflow.com still works.

"analysis": {
  "filter": {
    "url_stop": {
      "type": "stop",
      "stopwords": ["http", "https"]
    }
  },
  "analyzer": {
    "url_analyzer": {
        "tokenizer": "lowercase",
        "filter": "url_stop"
    }
  }
}

And used it in the mapping:

"mappings": {
  "my_mapping": {
    "properties": {
      "url": {
        "analyzer": "url_analyzer",
        "type": "string"
      }
    }
  }
}

Hopefully the OP has solved their problems, but maybe this is useful to someone else in the future.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.