Link Search Menu Expand Document Documentation Menu

Term vectors

The _termvectors API retrieves term vector information for a single document. Term vectors provide detailed information about the terms (words) in a document, including term frequency, positions, offsets, and payloads. This can be useful for applications such as relevance scoring, highlighting, or similarity calculations. For more information, see Term vector parameter.

Endpoints

GET  /{index}/_termvectors
POST /{index}/_termvectors
GET  /{index}/_termvectors/{id}
POST /{index}/_termvectors/{id}

Path parameters

The following table lists the available path parameters.

Parameter Required Data type Description
index Required String The name of the index containing the document.
id Optional String The unique identifier of the document.

Query parameters

The following table lists the available query parameters. All query parameters are optional.

Parameter Data type Description
field_statistics Boolean If true, the response includes the document count, sum of document frequencies, and sum of total term frequencies. (Default: true)
fields List or String A comma-separated list or a wildcard expression specifying the fields to include in the statistics. Used as the default list unless a specific field list is provided in the completion_fields or fielddata_fields parameters.
offsets Boolean If true, the response includes term offsets. (Default: true)
payloads Boolean If true, the response includes term payloads. (Default: true)
positions Boolean If true, the response includes term positions. (Default: true)
preference String Specifies the node or shard on which the operation should be performed. See preference query parameter for a list of available options. By default the requests are routed randomly to available shard copies (primary or replica), with no guarantee of consistency across repeated queries.
realtime Boolean If true, the request is real time as opposed to near real time. (Default: true)
routing List or String A custom value used to route operations to a specific shard.
term_statistics Boolean If true, the response includes term frequency and document frequency. (Default: false)
version Integer If true, returns the document version as part of a hit.
version_type String The specific version type.
Valid values are:
- external: The version number must be greater than the current version.
- external_gte: The version number must be greater than or equal to the current version.
- force: The version number is forced to be the given value.
- internal: The version number is managed internally by OpenSearch.

Request body fields

The following table lists the fields that can be specified in the request body.

Field Data type Description
doc Object A document to analyze. If provided, the API does not retrieve an existing document from the index but uses the provided content.
fields Array of strings A list of field names for which to return term vectors.
offsets Boolean If true, the response includes character offsets for each term. (Default: true)
payloads Boolean If true, the response includes payloads for each term. (Default: true)
positions Boolean If true, the response includes token positions. (Default: true)
field_statistics Boolean If true, the response includes statistics such as document count, sum of document frequencies, and sum of total term frequencies. (Default: true)
term_statistics Boolean If true, the response includes term frequency and document frequency. (Default: false)
routing String A custom routing value used to identify the shard. Required if custom routing was used during indexing.
version Integer The specific version of the document to retrieve.
version_type String The type of versioning to use. Valid values: internal, external, external_gte, force.
filter Object Allows filtering of tokens returned in the response (for example, by frequency or position). See Filtering terms for available options.
per_field_analyzer Object Specifies a custom analyzer to use per field. Format: { "field_name": "analyzer_name" }.
preference String Specifies shard or node routing preferences. See preference query parameter.

Filtering terms

The filter object in the request body allows you to filter the tokens to include in the term vector response. The filter object supports the following fields.

Field Data type Description
max_num_terms Integer The maximum number of terms to return.
min_term_freq Integer The minimum term frequency in the document required for a term to be included.
max_term_freq Integer The maximum term frequency in the document required for a term to be included.
min_doc_freq Integer The minimum document frequency across the index required for a term to be included.
max_doc_freq Integer The maximum document frequency across the index required for a term to be included.
min_word_length Integer The minimum length of the term to be included.
max_word_length Integer The maximum length of the term to be included.

Example

Create an index:

PUT /my-index
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "term_vector": "with_positions_offsets_payloads"
      }
    }
  }
}

Index the document:

POST /my-index/_doc/1
{
  "text": "OpenSearch is a search engine."
}

Example request

Retrieve the term vectors:

GET /my-index/_termvectors/1
{
  "fields": ["text"],
  "term_statistics": true
}

Alternatively, you can provide fields and term_statistics as query parameters:

GET /my-index/_termvectors/1?fields=text&term_statistics=true

Example response

The response displays term vector information:

{
  "_index": "my-index",
  "_id": "1",
  "_version": 1,
  "found": true,
  "took": 1,
  "term_vectors": {
    "text": {
      "field_statistics": {
        "sum_doc_freq": 5,
        "doc_count": 1,
        "sum_ttf": 5
      },
      "terms": {
        "a": {
          "doc_freq": 1,
          "ttf": 1,
          "term_freq": 1,
          "tokens": [
            {
              "position": 2,
              "start_offset": 14,
              "end_offset": 15
            }
          ]
        },
        "engine": {
          "doc_freq": 1,
          "ttf": 1,
          "term_freq": 1,
          "tokens": [
            {
              "position": 4,
              "start_offset": 23,
              "end_offset": 29
            }
          ]
        },
        "is": {
          "doc_freq": 1,
          "ttf": 1,
          "term_freq": 1,
          "tokens": [
            {
              "position": 1,
              "start_offset": 11,
              "end_offset": 13
            }
          ]
        },
        "opensearch": {
          "doc_freq": 1,
          "ttf": 1,
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 10
            }
          ]
        },
        "search": {
          "doc_freq": 1,
          "ttf": 1,
          "term_freq": 1,
          "tokens": [
            {
              "position": 3,
              "start_offset": 16,
              "end_offset": 22
            }
          ]
        }
      }
    }
  }
}

Response body fields

The following table lists all response body fields.

Field Data type Description
term_vectors Object Contains term vector data for each specified field.
term_vectors.text Object Contains term vector details for the text field.
term_vectors.text.field_statistics Object Contains statistics for the entire field. Present only if field_statistics is true.
term_vectors.text.field_statistics.doc_count Integer The number of documents that contain at least one term in the specified field.
term_vectors.text.field_statistics.sum_doc_freq Integer The sum of document frequencies for all terms in the field.
term_vectors.text.field_statistics.sum_ttf Integer The sum of total term frequencies (including repetitions) for all terms in the field.
term_vectors.text.terms Object A map, in which each key is a term and each value contains details about that term.
term_vectors.text.terms.<term>.term_freq Integer The number of times the term appears in the document.
term_vectors.text.terms.<term>.doc_freq Integer The number of documents containing the term. Present only if term_statistics is true.
term_vectors.text.terms.<term>.ttf Integer The total term frequency across all documents. Present only if term_statistics is true.
term_vectors.text.terms.<term>.tokens Array A list of token objects providing information about individual term instances.
term_vectors.text.terms.<term>.tokens[].position Integer The position of the token within the text. Present only if positions is true.
term_vectors.text.terms.<term>.tokens[].start_offset Integer The start character offset of the token. Present only if offsets is true.
term_vectors.text.terms.<term>.tokens[].end_offset Integer The end character offset of the token. Present only if offsets is true.
term_vectors.text.terms.<term>.tokens[].payload String (Base64) Optional payload data associated with the token. Present only if payloads is true and available.
350 characters left

Have a question? .

Want to contribute? or .