0

I store URL as a field in Elasticsearch. However, I would like to filter only documents that has subdomain in the url.

For example.

I want my search result to have

http://any-subdomain.example.com

But I don't want the result to have

https://www.example.com

Is this possible in Elasticsearch query?

1

1 Answer 1

1

Have you tried with query_string query? For example, I used for twitter data like below:

GET /twitter2/tweet/_search
{
    "query": {
        "query_string": {
           "default_field": "entities.media.url",
           "query": "https\\:\\/\\/t.co\\/* AND -https\\:\\/\\/t.co\\/6*"
        }
    },
    "_source": ["entities.media.url"]
}

For this search my mapping :

PUT /twitter2/tweet/_mapping
{
    "properties": {
        "entities": {
            "properties": {
                "media": {
                    "properties": {
                        "url": {
                            "type": "string",
                            "index": "not_analyzed"
                        }
                    }
                }
            }
        }
    }
}

And you can use following query for your case:

GET /your-index/your-type/_search
{
    "query": {
        "query_string": {
           "default_field": "url",
           "query": "http\\:\\/\\/*.example.com AND -http\\:\\/\\/www.example.com"
        }
    }
}

Note : you should know that you can get your result faster if you use something to handle while indexing your data as url and host. With elastic 5.x, you can use ingest node to manipulate your data like this. I will try to create a pipeline for this but you can check the doc for more information

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.