Term vectors

The _termvectors API retrieves term vector information for a single document. Term vectors provide detailed information about the terms (words) in a document, including term frequency, positions, offsets, and payloads. This can be useful for applications such as relevance scoring, highlighting, or similarity calculations. For more information, see Term vector parameter.

Endpoints

GET  /{index}/_termvectors
POST /{index}/_termvectors
GET  /{index}/_termvectors/{id}
POST /{index}/_termvectors/{id}

Path parameters

The following table lists the available path parameters.

Parameter	Required	Data type	Description
`index`	Required	String	The name of the index containing the document.
`id`	Optional	String	The unique identifier of the document.

Query parameters

The following table lists the available query parameters. All query parameters are optional.

Parameter	Data type	Description
`field_statistics`	Boolean	If `true`, the response includes the document count, sum of document frequencies, and sum of total term frequencies. (Default: `true`)
`fields`	List or String	A comma-separated list or a wildcard expression specifying the fields to include in the statistics. Used as the default list unless a specific field list is provided in the `completion_fields` or `fielddata_fields` parameters.
`offsets`	Boolean	If `true`, the response includes term offsets. (Default: `true`)
`payloads`	Boolean	If `true`, the response includes term payloads. (Default: `true`)
`positions`	Boolean	If `true`, the response includes term positions. (Default: `true`)
`preference`	String	Specifies the node or shard on which the operation should be performed. See preference query parameter for a list of available options. By default the requests are routed randomly to available shard copies (primary or replica), with no guarantee of consistency across repeated queries.
`realtime`	Boolean	If `true`, the request is real time as opposed to near real time. (Default: `true`)
`routing`	List or String	A custom value used to route operations to a specific shard.
`term_statistics`	Boolean	If `true`, the response includes term frequency and document frequency. (Default: `false`)
`version`	Integer	If `true`, returns the document version as part of a hit.
`version_type`	String	The specific version type. Valid values are: - `external`: The version number must be greater than the current version. - `external_gte`: The version number must be greater than or equal to the current version. - `force`: The version number is forced to be the given value. - `internal`: The version number is managed internally by OpenSearch.

Request body fields

The following table lists the fields that can be specified in the request body.

Field	Data type	Description
`doc`	Object	A document to analyze. If provided, the API does not retrieve an existing document from the index but uses the provided content.
`fields`	Array of strings	A list of field names for which to return term vectors.
`offsets`	Boolean	If `true`, the response includes character offsets for each term. (Default: `true`)
`payloads`	Boolean	If `true`, the response includes payloads for each term. (Default: `true`)
`positions`	Boolean	If `true`, the response includes token positions. (Default: `true`)
`field_statistics`	Boolean	If `true`, the response includes statistics such as document count, sum of document frequencies, and sum of total term frequencies. (Default: `true`)
`term_statistics`	Boolean	If `true`, the response includes term frequency and document frequency. (Default: `false`)
`routing`	String	A custom routing value used to identify the shard. Required if custom routing was used during indexing.
`version`	Integer	The specific version of the document to retrieve.
`version_type`	String	The type of versioning to use. Valid values: `internal`, `external`, `external_gte`, `force`.
`filter`	Object	Allows filtering of tokens returned in the response (for example, by frequency or position). See Filtering terms for available options.
`per_field_analyzer`	Object	Specifies a custom analyzer to use per field. Format: `{ "field_name": "analyzer_name" }`.
`preference`	String	Specifies shard or node routing preferences. See preference query parameter.

Filtering terms

The filter object in the request body allows you to filter the tokens to include in the term vector response. The filter object supports the following fields.

Field	Data type	Description
`max_num_terms`	Integer	The maximum number of terms to return.
`min_term_freq`	Integer	The minimum term frequency in the document required for a term to be included.
`max_term_freq`	Integer	The maximum term frequency in the document required for a term to be included.
`min_doc_freq`	Integer	The minimum document frequency across the index required for a term to be included.
`max_doc_freq`	Integer	The maximum document frequency across the index required for a term to be included.
`min_word_length`	Integer	The minimum length of the term to be included.
`max_word_length`	Integer	The maximum length of the term to be included.

Example

Create an index:

PUT /my-index
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "term_vector": "with_positions_offsets_payloads"
      }
    }
  }
}

Index the document:

POST /my-index/_doc/1
{
  "text": "OpenSearch is a search engine."
}

Example request

Retrieve the term vectors:

GET /my-index/_termvectors/1
{
  "fields": ["text"],
  "term_statistics": true
}

Alternatively, you can provide fields and term_statistics as query parameters:

GET /my-index/_termvectors/1?fields=text&term_statistics=true

Example response

The response displays term vector information:

{
  "_index": "my-index",
  "_id": "1",
  "_version": 1,
  "found": true,
  "took": 1,
  "term_vectors": {
    "text": {
      "field_statistics": {
        "sum_doc_freq": 5,
        "doc_count": 1,
        "sum_ttf": 5
      },
      "terms": {
        "a": {
          "doc_freq": 1,
          "ttf": 1,
          "term_freq": 1,
          "tokens": [
            {
              "position": 2,
              "start_offset": 14,
              "end_offset": 15
            }
          ]
        },
        "engine": {
          "doc_freq": 1,
          "ttf": 1,
          "term_freq": 1,
          "tokens": [
            {
              "position": 4,
              "start_offset": 23,
              "end_offset": 29
            }
          ]
        },
        "is": {
          "doc_freq": 1,
          "ttf": 1,
          "term_freq": 1,
          "tokens": [
            {
              "position": 1,
              "start_offset": 11,
              "end_offset": 13
            }
          ]
        },
        "opensearch": {
          "doc_freq": 1,
          "ttf": 1,
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 10
            }
          ]
        },
        "search": {
          "doc_freq": 1,
          "ttf": 1,
          "term_freq": 1,
          "tokens": [
            {
              "position": 3,
              "start_offset": 16,
              "end_offset": 22
            }
          ]
        }
      }
    }
  }
}

Response body fields

The following table lists all response body fields.

Field	Data type	Description
`term_vectors`	Object	Contains term vector data for each specified field.
`term_vectors.text`	Object	Contains term vector details for the `text` field.
`term_vectors.text.field_statistics`	Object	Contains statistics for the entire field. Present only if `field_statistics` is `true`.
`term_vectors.text.field_statistics.doc_count`	Integer	The number of documents that contain at least one term in the specified field.
`term_vectors.text.field_statistics.sum_doc_freq`	Integer	The sum of document frequencies for all terms in the field.
`term_vectors.text.field_statistics.sum_ttf`	Integer	The sum of total term frequencies (including repetitions) for all terms in the field.
`term_vectors.text.terms`	Object	A map, in which each key is a term and each value contains details about that term.
`term_vectors.text.terms.<term>.term_freq`	Integer	The number of times the term appears in the document.
`term_vectors.text.terms.<term>.doc_freq`	Integer	The number of documents containing the term. Present only if `term_statistics` is `true`.
`term_vectors.text.terms.<term>.ttf`	Integer	The total term frequency across all documents. Present only if `term_statistics` is `true`.
`term_vectors.text.terms.<term>.tokens`	Array	A list of token objects providing information about individual term instances.
`term_vectors.text.terms.<term>.tokens[].position`	Integer	The position of the token within the text. Present only if `positions` is `true`.
`term_vectors.text.terms.<term>.tokens[].start_offset`	Integer	The start character offset of the token. Present only if `offsets` is `true`.
`term_vectors.text.terms.<term>.tokens[].end_offset`	Integer	The end character offset of the token. Present only if `offsets` is `true`.
`term_vectors.text.terms.<term>.tokens[].payload`	String (Base64)	Optional payload data associated with the token. Present only if `payloads` is `true` and available.

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.