This article is an English translation of the Japanese version.
Hello, this is Sagara.
One of the new features announced at Snowflake Summit 2025 is "Cortex AISQL," which enables multimodal data analysis of text, images, and other data types using SQL commands.
https://www.snowflake.com/ja/blog/ai-sql-query-language/
I've tried out this Cortex AISQL feature, so I'd like to summarize my experience.
What is Cortex AISQL?
Cortex AISQL is a new feature that directly integrates AI capabilities into Snowflake's SQL engine, enabling analysis of diverse data types (multimodal data) such as text and images using familiar SQL commands.
https://docs.snowflake.com/en/user-guide/snowflake-cortex/aisql
The main characteristics include:
- AI utilization with familiar SQL: Data analysts can perform advanced analysis like AI engineers without complex coding.
- High-speed and low-cost: Deeply integrated with Snowflake's engine, it's over 30% faster than traditional AI implementations and achieves up to 60% cost reduction.
-
Support for diverse data types: The new
FILE
data type enables seamless analysis combining unstructured data like text and images with structured data.
The following Snowflake Summit session report article is also very helpful for understanding Cortex AISQL:
https://zenn.dev/finatext/articles/snowflake-summit-2025-report-day3
Differences from Previously Available LLM Functions
At this point, you might think, "Wait, wasn't AI usage within SQL already provided through a feature set called LLM Functions?"
These LLM Functions have been integrated into the Cortex AISQL feature set with the release of Cortex AISQL. The figure below is from the Cortex AISQL documentation, which states "(including previous LLM functions)."
Cost
The cost structure is as shown in the figure below (credit consumption per million tokens is listed).
※Quoted from the following link:
https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf
What We'll Try
There's an official Quickstart available for trying out Cortex AISQL, so let's give it a try.
Prerequisites
Following the "2.Setup" section from the link below, let's prepare the environment.
- Launch a worksheet in your Snowflake account and execute all queries in setup.sql from top to bottom
- Download all image files from this folder and upload them to the stage
@DASH_DB.DASH_SCHEMA.DASH_IMAGE_FILES
created in step 1 - Launch a worksheet in your Snowflake account and execute all queries in images.sql from top to bottom
- When I tried it, there was
USE WAREHOUSE DASH_S;
which caused an error, so I changed it toUSE WAREHOUSE DASH_WH_S;
and executed it. - This query creates a table containing the
IMG_FILE
column ofFILE
type using theTO_FILE
function based on the screenshots uploaded to the stage.
- When I tried it, there was
- Download cortex_aisql.ipynb and create a Notebook by uploading it to your Snowflake account following the steps shown below
After the Notebook launches, execute the top cell that imports libraries.
Trying Cortex AISQL by Executing SQL in the Notebook
AI_COMPLETE Function
First, let's execute the following query in Identify Customer Issues Across Text and Image Data Formats
.
This query uses the AI_COMPLETE
function to perform the following tasks. It also uses the PROMPT
function to dynamically insert column values specified as arguments (using {0}
to specify which argument column to insert):
- WITH clause
IMAGE_INSIGHTS
: Summarizes the content of screenshot images in theIMAGES
table using AI. - WITH clause
EMAIL_INSIGHTS
: Summarizes the content of inquiry emails in theEMAILS
table using AI. - UNION and CREATE TABLE: Unions the above two results and finally saves them as a new table called
INSIGHTS
.
create table if not exists insights as
with IMAGE_INSIGHTS as (
select created_at,user_id,relative_path as ticket_id,img_file as input_file,file_url as input_file_url,
AI_COMPLETE('pixtral-large', prompt('Summarize this issue shown in this screenshot in one concise sentence: {0}', img_file)) as summary, summary as content
from images
),
EMAIL_INSIGHTS as (
select created_at,user_id,ticket_id::text as ticket_id,null as input_file,'' as input_file_url,content as content,
AI_COMPLETE('claude-3-7-sonnet', prompt('Summarize this issue in one concise sentence.
If the user mentioned anything related to music preference, please keep that information: {0}', content)) as summary
from emails
)
select 'Image' as source, created_at, user_id, ticket_id, input_file, input_file_url, content, summary
from IMAGE_INSIGHTS
union
select 'Email' as source, created_at, user_id, ticket_id, input_file, input_file_url, content, summary
from EMAIL_INSIGHTS;
After creating this table, executing the following SQL in the Notebook's Consolidated Data Across Text, Image, and Audio Data Formats
section shows summaries of the target screenshots and emails in the SUMMARY
column, and the original email text in the CONTENT
column. (Looking at the CTAS statement above, the image-related descriptions in the CONTENT
column output exactly the same content as the SUMMARY
column.)
AI_FILTER Function in JOIN Clauses
As background, the SOLUTION_CENTER_ARTICLES
table summarizes solutions to troubles in the SOLUTION
column as shown below.
The query in Semantically JOIN Issues with Solutions Library
uses the AI_FILTER
function to join and output results when the SOLUTION
column in the SOLUTION_CENTER_ARTICLES
table seems to match as a solution for the customer problems held in the CONTENT
column of the INSIGHTS
table created earlier using the AI_COMPLETE
function.
Multiple solution
columns may match in some cases, such as when the TICKET_ID
column is 177
in the figure below.
select
c.ticket_id,
c.content as "CUSTOMER ISSUE",
s.solution,
c.created_at,
from
INSIGHTS c
left join
SOLUTION_CENTER_ARTICLES s
on AI_FILTER(prompt('You are provided a customer issue and a solution center article. Please check if the solution article can address customer concerns. Reminder to check if the error details are matching. Customer issues: {0}; \n\nSolution: {1}', content, s.solution))
order by created_at asc;
Important note: When executing JOIN
operations using the AI_FILTER
function, the number of rows in each table within the JOIN
cannot exceed "500". (From the official docs) This makes sense, as JOINs with tens of thousands of rows would quickly result in exponentially large token counts...
To enable large-scale joins with AI_FILTER
function JOIN operations, contact your account manager to enable the adaptive optimization preview.
AI_AGG Function
In Aggregate Top Pain Points By Month
, Python code is written, but let's first look at just the SQL using the AI_AGG
function contained in this Python code.
The following SQL uses the AI_AGG
function to analyze the values in the column specified as the first argument with the instruction: "Analyze all the support ticket reviews and provide a comprehensive list of all issues mentioned. Format your response as a bulleted list of issues with their approximate frequency in percentage."
select
monthname(created_at) as month,
count(*) as total_tickets,
count(distinct user_id) as unique_users,
AI_AGG(summary,'Analyze all the support ticket reviews and provide a comprehensive list of all issues mentioned.
Format your response as a bulleted list of issues with their approximate frequency in percentage.') as top_issues_reported,
from (select * from insights order by random() limit 200)
group by month
order by total_tickets desc,month desc;
The Python code in the Notebook then visualizes these aggregation results using Streamlit. Although the target records are changed using the RANDOM
function, it's concerning that the output format varies slightly between the first and second executions, but this might be unavoidable due to the nature of generative AI...
AGGREGATE_PROMPT = """
Analyze all the support ticket reviews and provide a comprehensive list of all issues mentioned.
Format your response as a bulleted list of issues with their approximate frequency in percentage.
"""
sql = f"""
select
monthname(created_at) as month,
count(*) as total_tickets,
count(distinct user_id) as unique_users,
AI_AGG(summary,'{AGGREGATE_PROMPT}') as top_issues_reported,
from (select * from insights order by random() limit 200)
group by month
order by total_tickets desc,month desc
"""
df = session.sql(sql).to_pandas()
rows_to_display = 1
for row in df[:rows_to_display].itertuples():
st.subheader(f"Aggregated Insights for {row.MONTH}")
st.metric("Total Tickets",row.TOTAL_TICKETS)
st.metric("Unique Users", row.UNIQUE_USERS)
st.subheader("Top Issues")
st.markdown(row.TOP_ISSUES_REPORTED)
st.divider()
df_long = df.melt(id_vars='MONTH', value_vars=['TOTAL_TICKETS', 'UNIQUE_USERS'],
var_name='Metric', value_name='Total')
chart = alt.Chart(df_long).mark_bar().encode(
y=alt.Y('MONTH:N', sort='-x'),
x=alt.X('Total:Q'),
color=alt.Color('Metric:N', scale=alt.Scale(scheme='tableau10')),
tooltip=['MONTH', 'Metric', 'Total']
).properties(height=300)
st.altair_chart(chart, use_container_width=True)
- First execution
- Second execution
AI_CLASSIFY Function
In Classification
, the following query is described. This query uses both AI_FILTER
and AI_CLASSIFY
functions to perform the following:
- Use
AI_FILTER
function to extract only music-related emails from theINSIGHTS
table to create aFILTERED
table - Use
AI_CLASSIFY
function to classify which music genre from theMUSIC_GENRES
specification theFILTERED
table corresponds to
SET FILTER_PROMPT = '
I am trying to find if the customer has mentioned any music genre perference in their comment.
Does this comment mentioning specific music genre preference from the customer?: {0}';
SET CLASSIFY_PROMPT = 'Please help me classify the music preference mentioned in this comment: ';
SET MUSIC_GENRES = 'Electronic/Dance Music (EDM), Jazz, Indie/Folk, Rock, Classical, World Music, Blues, Pop';
create table if not exists filtered as
select * from
(select * from insights order by random() limit 500)
where AI_FILTER(prompt($FILTER_PROMPT, summary));
select
source,
summary,
AI_CLASSIFY($CLASSIFY_PROMPT || summary,SPLIT($MUSIC_GENRES, ','))['labels'][0] as classified_label
from filtered;
Additionally, visualization using Streamlit was also performed based on these classification results.
Conclusion
I tried out "Cortex AISQL," one of the new features announced at Snowflake Summit 2025, which enables multimodal data analysis of text, images, and other data types using SQL commands.
As a genuine impression, functions like AI_FILTER
and AI_AGG
use completely different syntax from traditional SQL! However, I also felt that Cortex AISQL would be very useful in cases where clear join conditions cannot be created for text data or image data.
Top comments (0)