From Data Lake to Dialogue: Talking to Your S3 Iceberg Tables with Snowflake Intelligence
A step-by-step guide to connecting Snowflake to your AWS Glue catalog and querying open data formats without moving a single byte.
The modern data stack is all about openness and flexibility. We store massive amounts of data in cloud data lakes like Amazon S3.
But how do we bring the powerful analytics and AI capabilities of a platform like Snowflake to the data which is being hosted externally?
The answer is Iceberg Tables with the power of Snowflake Cortex.
In this guide, I’ll walk you through the exact steps to connect Snowflake to an existing Iceberg data lake managed by AWS Glue. We’ll set up the necessary permissions and integrations to query data in S3 directly from the Snowflake UI.
Finally, I’ll show you how this setup unlocks incredible potential with Cortex Intelligence.
What We’re Building
By the end of this article, you will have:
- An External Volume in Snowflake that securely points to your S3 bucket.
- A Catalog Integration that allows Snowflake to talk to your AWS Glue Data Catalog.
- An Iceberg Table in your Snowflake database that reads directly from your S3 data lake, without data duplication.
- Using Snowflake Intelligence ready to answer natural language questions about your data.
1–3 will mainly be using this Quickstart here
Let’s get started.
Step 1: Setting the Stage in Snowflake
First, let’s get our Snowflake environment ready. This ensures all our new objects are organized and we have the compute power to run our queries. Log into your Snowflake account and run the following commands in a worksheet.
-- Use a standard warehouse
USE ROLE ACCOUNTADMIN;
CREATE WAREHOUSE IF NOT EXISTS a_warehouse;
USE WAREHOUSE a_warehouse;
-- Create a dedicated database and schema
CREATE DATABASE IF NOT EXISTS iceberg_db;
CREATE SCHEMA IF NOT EXISTS iceberg_schema;
USE SCHEMA iceberg_db.iceberg_schema;
Step 2: Connecting Snowflake to Your S3 Storage (External Volume)
Snowflake needs a secure way to access the raw Parquet files in your S3 bucket. We do this by creating an External Volume. This requires setting up an IAM role in AWS that grants Snowflake read/write access to a specific S3 bucket.
- Create an S3 Bucket & IAM Policy: If you don’t have one already, create a standard S3 bucket. Then, create an IAM policy that grants access to this bucket. It should look something like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<your-bucket-name>/*",
"arn:aws:s3:::<your-bucket-name>"
]
}
]
}
- Create the IAM Role: Create a new IAM Role in AWS that trusts your Snowflake account and attach the policy you just created.
- Create the External Volume in Snowflake: Now, back in your Snowflake worksheet, create the External Volume. This tells Snowflake it’s allowed to use the role you created to access the S3 path.
CREATE OR REPLACE EXTERNAL VOLUME iceberg_external_volume
STORAGE_LOCATIONS =
(
(
NAME = 'your-s3-iceberg-glue-us-east-1'
STORAGE_PROVIDER = 'S3'
STORAGE_BASE_URL = 's3://<your-bucket-name>/'
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::<your-aws-account-id>:role/<your-iam-role-name>'
)
);
With the External Volume in place, Snowflake can now see your storage. Next, we need to teach it how to understand the structure of your data using the Glue catalog.
Step 3: Connecting Snowflake to Your Metadata (Catalog Integration)
The AWS Glue Data Catalog holds the metadata for your Iceberg tables — the schema, partitions, and pointers to the data files. A Catalog Integration lets Snowflake read this metadata directly.
This is another IAM Role setup, but this one grants Snowflake specific permissions to interact with Glue.
- Create the IAM Policy for Glue: This policy allows Snowflake to read from the Glue Data Catalog.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetTable",
"glue:GetTables"
],
"Resource": "*"
}
]
}
- Create the IAM Role: Just like before, create a new IAM Role that trusts your Snowflake account and attach this new Glue policy.
- Create the Catalog Integration in Snowflake: This final piece connects everything together.
CREATE OR REPLACE CATALOG INTEGRATION glue_catalog_int
CATALOG_SOURCE = GLUE
CATALOG_NAMESPACE = '<your-glue-database-name>'
TABLE_FORMAT = ICEBERG
GLUE_AWS_ROLE_ARN = 'arn:aws:iam::<your-aws-account-id>:role/<your-glue-role-name>'
GLUE_CATALOG_ID='your-glue-catalog-id'
ENABLED = TRUE;
- Now for the magic moment. We have connected storage and metadata. Let’s create the table.
Step 4: Creating the Iceberg Table
This is the simplest, yet most powerful step. We’ll create a table in Snowflake that is nothing more than a pointer to the Iceberg table defined in your Glue Catalog. No data is copied or moved.
SQL
CREATE OR REPLACE ICEBERG TABLE my_iceberg_table
EXTERNAL_VOLUME = 'iceberg_external_volume'
CATALOG = 'glue_catalog_int'
CATALOG_TABLE_NAME = '<your-glue-table-name>';
That’s it! You can now browse this table in the Snowflake UI and, more importantly, query it.
We can query it like any other table.
SELECT * FROM my_iceberg_table LIMIT 100;
Step 5: Querying Your Iceberg Table with Snowflake Intelligence
The true power of this architecture is its openness; your data remains in your S3 data lake, accessible by a whole ecosystem of tools like Spark or Trino. But what if you could go a step further? We want to leverage Snowflake Intelligence to empower our teams to simply talk to this data.
Snowflake Intelligence is a powerful set of AI capabilities that creates a conversational bridge to your information. It allows anyone in the organization, from analysts to the sales team, to ask questions in plain English and get immediate, secure answers from both your structured tables and unstructured documents.
5.1 The Semantic Model: Teaching AI Your Business Language
To unlock this capability, we first need to give the AI some context about our data’s business meaning. This is done by creating a Semantic Model using Snowflake Cortex Analyst, which will act as the “brain” for our conversational agent.
The Semantic Model is a YAML file where you define the business logic of your data. It tells Cortex Analyst which columns are metrics, which are dimensions, and how they relate. This is the crucial translation layer.
For our my_iceberg_table
, a simple semantic model might look like this:
# In a file named 'my_iceberg_model.yml'
semantic_model:
name: 'Iceberg Table Business View'
tables:
- name: MY_ICEBERG_TABLE
columns:
- name: TOTAL_RIDERS
semantic_type: METRIC
- name: DATE
semantic_type: TIME
- name: DAYTYPE
semantic_type: DIMENSION
5.2 Using Snowflake Intelligence
With the semantic model defined, you can now create an agent in Snowflake that uses it as its brain. In the Snowflake UI, you would navigate to the Agent section, create a new agent, and point it to your database and the YAML file you created above. This agent is the personality you will converse with.
5.3 Starting the Conversation
Now for the payoff. Let’s head over to the Snowflake Intelligence interface, choose the agent we just created, and start asking questions in plain English.
Instead of writing complex SQL, you can simply ask:
“Show me the total number of insurance sold by day of the week for January 2019.”
Or, for a more complex query:
“Compare the average insurance purchases in 2018 vs 2019”
You can even ask for visualizations:
“Can you plot the daily insurance trend for the last three months?” OR
“Can you tell me the number of policy numbers and customers by region? Give me a graph please”
Conclusion: The Best of Both Worlds
By connecting your S3 data lake to Snowflake through Iceberg tables, you achieve the perfect balance: your data remains in an open, flexible format, while your teams get to use the world-class capabilities of Snowflake’s platform.
Adding Snowflake Intelligence and Cortex Analyst on top transforms this powerful architecture into an intuitive and conversational experience. You’re no longer just querying data; you’re having a dialogue with it, democratizing insights and empowering your entire organization to make faster, smarter decisions.