This page shows you how to import records from Amazon S3, Google Cloud Storage, or Azure Blob Storage into an index. Importing from object storage is the most efficient and cost-effective way to load large numbers of records into an index. To run through this guide in your browser, see the Bulk import colab notebook.
This feature is in public preview and available only on Standard and Enterprise plans.

Before you import

Before you can import records, ensure you have a serverless index, a storage integration, and data formatted in a Parquet file and uploaded to an Amazon S3 bucket, Google Cloud Storage bucket, or Azure Blob Storage container.

Create an index

Create a serverless index for your data. Be sure to create your index on a cloud that supports importing from the object storage you want to use:
Index locationAWS S3Google Cloud StorageAzure Blob Storage
AWS
GCP
Azure

Add a storage integration

To import records from a public data source, a storage integration is not required. However, to import records from a secure data source, you must create an integration to allow Pinecone access to data in your object storage. See the following guides:

Prepare your data

  1. In your Amazon S3 bucket, Google Cloud Storage bucket, or Azure Blob Storage container, create an import directory containing a subdirectory for each namespace you want to import into. The namespaces must not yet exist in your index. For example, to import data into the namespaces example_namespace1 and example_namespace2, your directory structure would look like this:
    example_bucket/
    --/imports/
    ----/example_namespace1/
    ----/example_namespace2/
    
    To import into the default namespace, use a subdirectory called __default__. The default namespace must be empty.
  2. For each namespace, create one or more Parquet files defining the records to import. Parquet files must contain specific columns, depending on the index type:
    To import into a namespace in a dense index, the Parquet file must contain the following columns:
    Column nameParquet typeDescription
    idSTRINGRequired. The unique identifier for each record.
    valuesLIST<FLOAT>Required. A list of floating-point values that make up the dense vector embedding.
    metadataSTRINGOptional. Additional metadata for each record. To omit from specific rows, use NULL.
    The Parquet file cannot contain additional columns.
    For example:
    id | values                   | metadata
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    1  | [ 3.82  2.48 -4.15 ... ] | {"year": 1984, "month": 6, "source": "source1", "title": "Example1", "text": "When ..."}
    2  | [ 1.82  3.48 -2.15 ... ] | {"year": 1990, "month": 4, "source": "source2", "title": "Example2", "text": "Who ..."}
    
  3. Upload the Parquet files into the relevant subdirectory. For example, if you have subdirectories for the namespaces example_namespace1 and example_namespace2 and upload 4 Parquet files into each, your directory structure would look as follows after the upload:
    example_bucket/
    --/imports/
    ----/example_namespace1/
    ------0.parquet
    ------1.parquet
    ------2.parquet
    ------3.parquet
    ----/example_namespace2/
    ------4.parquet
    ------5.parquet
    ------6.parquet
    ------7.parquet
    

Import records into an index

Review current limitations before starting an import.
Use the start_import operation to start an asynchronous import of vectors from object storage into an index.
  • For uri, specify the URI of the bucket and import directory containing the namespaces and Parquet files you want to import. For example:
    • Amazon S3: s3://BUCKET_NAME/IMPORT_DIR
    • Google Cloud Storage: gs://BUCKET_NAME/IMPORT_DIR
    • Azure Blob Storage: https://STORAGE_ACCOUNT.blob.core.windows.net/CONTAINER_NAME/IMPORT_DIR
  • For integration_id, specify the Integration ID of the Amazon S3, Google Cloud Storage, or Azure Blob Storage integration you created. The ID is found on the Storage integrations page of the Pinecone console.
    An Integration ID is not needed to import from a public bucket.
  • For error_mode, use CONTINUE or ABORT.
    • With ABORT, the operation stops if any records fail to import.
    • With CONTINUE, the operation continues on error, but there is not any notification about which records, if any, failed to import. To see how many records were successfully imported, use the describe_import operation.
from pinecone import Pinecone, ImportErrorMode

pc = Pinecone(api_key="YOUR_API_KEY")

# To get the unique host for an index, 
# see https://docs.pinecone.io/guides/manage-data/target-an-index
index = pc.Index(host="INDEX_HOST")
root = "s3://example_bucket/imports"

index.start_import(
    uri=root,
    integration_id="a12b3d4c-47d2-492c-a97a-dd98c8dbefde", # Optional for public buckets
    error_mode=ImportErrorMode.CONTINUE # or ImportErrorMode.ABORT
)
The response contains an id that you can use to check the status of the import:
Response
{
   "id": "101"
}
Once all the data is loaded, the index builder indexes the records, which usually takes at least 10 minutes. During this indexing process, the expected job status is InProgress, but 100.0 percent complete. Once all the imported records are indexed and fully available for querying, the import operation is set to Completed.
You can start a new import using the Pinecone console. Find the index you want to import into, and click the ellipsis (..) menu > Import data.

Track import progress

The amount of time required for an import depends on various factors, including:
  • The number of records to import
  • The number of namespaces to import, and the the number of records in each
  • The total size (in bytes) of the import
To track an import’s progress, check its status bar in the Pinecone console or use the describe_import operation with the import ID:
from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

# To get the unique host for an index, 
# see https://docs.pinecone.io/guides/manage-data/target-an-index
index = pc.Index(host="INDEX_HOST")

index.describe_import(id="101")
The response contains the import details, including the import status, percent_complete, and records_imported:
Response
{
  "id": "101",
  "uri": "s3://example_bucket/imports",
  "status": "InProgress",
  "created_at": "2024-08-19T20:49:00.754Z",
  "finished_at": "2024-08-19T20:49:00.754Z",
  "percent_complete": 42.2,
  "records_imported": 1000000
}
If the import fails, the response contains an error field with the reason for the failure:
Response
{
  "id": "102",
  "uri": "s3://example_bucket/imports",
  "status": "Failed",
  "percent_complete": 0.0,
  "records_imported": 0,
  "created_at": "2025-08-21T11:29:47.886797+00:00",
  "error": "User error: The namespace \"namespace1\" already exists. Imports are only allowed into nonexistent namespaces.",
  "finished_at": "2025-08-21T11:30:05.506423+00:00"
}

Manage imports

List imports

Use the list_imports operation to list all of the recent and ongoing imports. By default, the operation returns up to 100 imports per page. If the limit parameter is passed, the operation returns up to that number of imports per page instead. For example, if limit=3, up to 3 imports are returned per page. Whenever there are additional imports to return, the response includes a pagination_token for fetching the next page of imports.
When using the Python SDK, list_import paginates automatically.
Python
from pinecone import Pinecone, ImportErrorMode

pc = Pinecone(api_key="YOUR_API_KEY")

# To get the unique host for an index, 
# see https://docs.pinecone.io/guides/manage-data/target-an-index
index = pc.Index(host="INDEX_HOST")

# List using a generator that handles pagination
for i in index.list_imports():
    print(f"id: {i.id} status: {i.status}")

# List using a generator that fetches all results at once
operations = list(index.list_imports())
print(operations)
Response
{
  "data": [
    {
      "id": "1",
      "uri": "s3://BUCKET_NAME/PATH/TO/DIR",
      "status": "Pending",
      "started_at": "2024-08-19T20:49:00.754Z",
      "finished_at": "2024-08-19T20:49:00.754Z",
      "percent_complete": 42.2,
      "records_imported": 1000000
    }
  ],
  "pagination": {
    "next": "Tm90aGluZyB0byBzZWUgaGVyZQo="
  }
}
You can view the list of imports for an index in the Pinecone console. Select the index and navigate to the Imports tab.

Cancel an import

The cancel_import operation cancels an import if it is not yet finished. It has no effect if the import is already complete.
from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

# To get the unique host for an index, 
# see https://docs.pinecone.io/guides/manage-data/target-an-index
index = pc.Index(host="INDEX_HOST")

index.cancel_import(id="101")
Response
{}
You can cancel your import using the Pinecone console. To cancel an ongoing import, select the index you are importing into and navigate to the Imports tab. Then, click the ellipsis (..) menu > Cancel.

Import limits

MetricLimit
Max size per import request2 TB or 200,000,000 records
Max namespaces per import request10,000
Max files per import request100,000
Max size per file10 GB
Also:
  • You cannot import data from an AWS S3 bucket into a Pinecone index hosted on GCP or Azure.
  • You cannot import data from S3 Express One Zone storage.
  • You cannot import data into an existing namespace.
  • When importing data into the __default__ namespace of an index, the default namespace must be empty.
  • Each import takes at least 10 minutes to complete.
  • When importing into an index with integrated embedding, records must contain vectors, not text. To add records with text, you must use upsert.

See also

close