DEV Community

Sagara
Sagara

Posted on

[Snowflake's New Feature]Cortex AISQL: Multimodal Data Analysis with SQL Commands

This article is an English translation of the Japanese version.

Hello, this is Sagara.

One of the new features announced at Snowflake Summit 2025 is "Cortex AISQL," which enables multimodal data analysis of text, images, and other data types using SQL commands.

https://www.snowflake.com/ja/blog/ai-sql-query-language/

I've tried out this Cortex AISQL feature, so I'd like to summarize my experience.

What is Cortex AISQL?

Cortex AISQL is a new feature that directly integrates AI capabilities into Snowflake's SQL engine, enabling analysis of diverse data types (multimodal data) such as text and images using familiar SQL commands.

https://docs.snowflake.com/en/user-guide/snowflake-cortex/aisql

The main characteristics include:

  • AI utilization with familiar SQL: Data analysts can perform advanced analysis like AI engineers without complex coding.
  • High-speed and low-cost: Deeply integrated with Snowflake's engine, it's over 30% faster than traditional AI implementations and achieves up to 60% cost reduction.
  • Support for diverse data types: The new FILE data type enables seamless analysis combining unstructured data like text and images with structured data.

The following Snowflake Summit session report article is also very helpful for understanding Cortex AISQL:

https://zenn.dev/finatext/articles/snowflake-summit-2025-report-day3

Differences from Previously Available LLM Functions

At this point, you might think, "Wait, wasn't AI usage within SQL already provided through a feature set called LLM Functions?"

These LLM Functions have been integrated into the Cortex AISQL feature set with the release of Cortex AISQL. The figure below is from the Cortex AISQL documentation, which states "(including previous LLM functions)."

2025-06-06_15h29_02

Cost

The cost structure is as shown in the figure below (credit consumption per million tokens is listed).

2025-06-06_15h49_14

※Quoted from the following link:

https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf

What We'll Try

There's an official Quickstart available for trying out Cortex AISQL, so let's give it a try.

https://quickstarts.snowflake.com/guide/getting-started-with-cortex-aisql/index.html?index=..%2F..index#0

Prerequisites

Following the "2.Setup" section from the link below, let's prepare the environment.

https://quickstarts.snowflake.com/guide/getting-started-with-cortex-aisql/index.html?index=..%2F..index#1

  1. Launch a worksheet in your Snowflake account and execute all queries in setup.sql from top to bottom
  2. Download all image files from this folder and upload them to the stage @DASH_DB.DASH_SCHEMA.DASH_IMAGE_FILES created in step 1
  3. Launch a worksheet in your Snowflake account and execute all queries in images.sql from top to bottom
    • When I tried it, there was USE WAREHOUSE DASH_S; which caused an error, so I changed it to USE WAREHOUSE DASH_WH_S; and executed it.
    • This query creates a table containing the IMG_FILE column of FILE type using the TO_FILE function based on the screenshots uploaded to the stage.

2025-06-06_16h18_53

  1. Download cortex_aisql.ipynb and create a Notebook by uploading it to your Snowflake account following the steps shown below

2025-06-06_16h01_34

2025-06-06_16h04_52

After the Notebook launches, execute the top cell that imports libraries.

2025-06-06_16h09_18

Trying Cortex AISQL by Executing SQL in the Notebook

AI_COMPLETE Function

First, let's execute the following query in Identify Customer Issues Across Text and Image Data Formats.

This query uses the AI_COMPLETE function to perform the following tasks. It also uses the PROMPT function to dynamically insert column values specified as arguments (using {0} to specify which argument column to insert):

  • WITH clause IMAGE_INSIGHTS: Summarizes the content of screenshot images in the IMAGES table using AI.
  • WITH clause EMAIL_INSIGHTS: Summarizes the content of inquiry emails in the EMAILS table using AI.
  • UNION and CREATE TABLE: Unions the above two results and finally saves them as a new table called INSIGHTS.
create table if not exists insights as
with IMAGE_INSIGHTS as (
    select created_at,user_id,relative_path as ticket_id,img_file as input_file,file_url as input_file_url,
        AI_COMPLETE('pixtral-large', prompt('Summarize this issue shown in this screenshot in one concise sentence: {0}', img_file)) as summary, summary as content
    from images
    ),
    EMAIL_INSIGHTS as (
    select created_at,user_id,ticket_id::text as ticket_id,null as input_file,'' as input_file_url,content as content,
        AI_COMPLETE('claude-3-7-sonnet', prompt('Summarize this issue in one concise sentence. 
If the user mentioned anything related to music preference, please keep that information: {0}', content)) as summary
    from emails
    )
select 'Image' as source, created_at, user_id, ticket_id, input_file, input_file_url, content, summary 
from IMAGE_INSIGHTS
    union
select 'Email' as source, created_at, user_id, ticket_id, input_file, input_file_url, content, summary 
from EMAIL_INSIGHTS;
Enter fullscreen mode Exit fullscreen mode

After creating this table, executing the following SQL in the Notebook's Consolidated Data Across Text, Image, and Audio Data Formats section shows summaries of the target screenshots and emails in the SUMMARY column, and the original email text in the CONTENT column. (Looking at the CTAS statement above, the image-related descriptions in the CONTENT column output exactly the same content as the SUMMARY column.)

2025-06-06_16h25_33

AI_FILTER Function in JOIN Clauses

As background, the SOLUTION_CENTER_ARTICLES table summarizes solutions to troubles in the SOLUTION column as shown below.

2025-06-06_17h52_31

The query in Semantically JOIN Issues with Solutions Library uses the AI_FILTER function to join and output results when the SOLUTION column in the SOLUTION_CENTER_ARTICLES table seems to match as a solution for the customer problems held in the CONTENT column of the INSIGHTS table created earlier using the AI_COMPLETE function.

Multiple solution columns may match in some cases, such as when the TICKET_ID column is 177 in the figure below.

select 
    c.ticket_id,
    c.content as "CUSTOMER ISSUE",
    s.solution,
    c.created_at,
from
    INSIGHTS c
left join
    SOLUTION_CENTER_ARTICLES s
on AI_FILTER(prompt('You are provided a customer issue and a solution center article. Please check if the solution article can address customer concerns. Reminder to check if the error details are matching. Customer issues: {0}; \n\nSolution: {1}', content, s.solution))
order by created_at asc;
Enter fullscreen mode Exit fullscreen mode

2025-06-06_18h03_14

Important note: When executing JOIN operations using the AI_FILTER function, the number of rows in each table within the JOIN cannot exceed "500". (From the official docs) This makes sense, as JOINs with tens of thousands of rows would quickly result in exponentially large token counts...

To enable large-scale joins with AI_FILTER function JOIN operations, contact your account manager to enable the adaptive optimization preview.

AI_AGG Function

In Aggregate Top Pain Points By Month, Python code is written, but let's first look at just the SQL using the AI_AGG function contained in this Python code.

The following SQL uses the AI_AGG function to analyze the values in the column specified as the first argument with the instruction: "Analyze all the support ticket reviews and provide a comprehensive list of all issues mentioned. Format your response as a bulleted list of issues with their approximate frequency in percentage."

select 
    monthname(created_at) as month, 
    count(*) as total_tickets,
    count(distinct user_id) as unique_users,
    AI_AGG(summary,'Analyze all the support ticket reviews and provide a comprehensive list of all issues mentioned. 
Format your response as a bulleted list of issues with their approximate frequency in percentage.') as top_issues_reported,
    from (select * from insights order by random() limit 200)
    group by month
    order by total_tickets desc,month desc;
Enter fullscreen mode Exit fullscreen mode

2025-06-06_18h26_05

The Python code in the Notebook then visualizes these aggregation results using Streamlit. Although the target records are changed using the RANDOM function, it's concerning that the output format varies slightly between the first and second executions, but this might be unavoidable due to the nature of generative AI...

AGGREGATE_PROMPT = """
Analyze all the support ticket reviews and provide a comprehensive list of all issues mentioned. 
Format your response as a bulleted list of issues with their approximate frequency in percentage.
"""

sql = f"""
select 
    monthname(created_at) as month, 
    count(*) as total_tickets,
    count(distinct user_id) as unique_users,
    AI_AGG(summary,'{AGGREGATE_PROMPT}') as top_issues_reported,
    from (select * from insights order by random() limit 200)
    group by month
    order by total_tickets desc,month desc
"""

df = session.sql(sql).to_pandas()
rows_to_display = 1
for row in df[:rows_to_display].itertuples():
    st.subheader(f"Aggregated Insights for {row.MONTH}")
    st.metric("Total Tickets",row.TOTAL_TICKETS)
    st.metric("Unique Users", row.UNIQUE_USERS)
    st.subheader("Top Issues")
    st.markdown(row.TOP_ISSUES_REPORTED)
    st.divider()

df_long = df.melt(id_vars='MONTH', value_vars=['TOTAL_TICKETS', 'UNIQUE_USERS'],
                  var_name='Metric', value_name='Total')

chart = alt.Chart(df_long).mark_bar().encode(
    y=alt.Y('MONTH:N', sort='-x'),
    x=alt.X('Total:Q'),
    color=alt.Color('Metric:N', scale=alt.Scale(scheme='tableau10')),
    tooltip=['MONTH', 'Metric', 'Total']
).properties(height=300)

st.altair_chart(chart, use_container_width=True)
Enter fullscreen mode Exit fullscreen mode
  • First execution

2025-06-06_18h21_33

  • Second execution

2025-06-06_18h27_28

AI_CLASSIFY Function

In Classification, the following query is described. This query uses both AI_FILTER and AI_CLASSIFY functions to perform the following:

  • Use AI_FILTER function to extract only music-related emails from the INSIGHTS table to create a FILTERED table
  • Use AI_CLASSIFY function to classify which music genre from the MUSIC_GENRES specification the FILTERED table corresponds to
SET FILTER_PROMPT = '
I am trying to find if the customer has mentioned any music genre perference in their comment. 
Does this comment mentioning specific music genre preference from the customer?: {0}';

SET CLASSIFY_PROMPT = 'Please help me classify the music preference mentioned in this comment: ';
SET MUSIC_GENRES = 'Electronic/Dance Music (EDM), Jazz, Indie/Folk, Rock, Classical, World Music, Blues, Pop';

create table if not exists filtered as 
select * from 
        (select * from insights order by random() limit 500)
    where AI_FILTER(prompt($FILTER_PROMPT, summary));

select 
    source, 
    summary,
    AI_CLASSIFY($CLASSIFY_PROMPT || summary,SPLIT($MUSIC_GENRES, ','))['labels'][0] as classified_label
from filtered;
Enter fullscreen mode Exit fullscreen mode

2025-06-06_18h39_51

Additionally, visualization using Streamlit was also performed based on these classification results.

2025-06-06_18h34_06

Conclusion

I tried out "Cortex AISQL," one of the new features announced at Snowflake Summit 2025, which enables multimodal data analysis of text, images, and other data types using SQL commands.

As a genuine impression, functions like AI_FILTER and AI_AGG use completely different syntax from traditional SQL! However, I also felt that Cortex AISQL would be very useful in cases where clear join conditions cannot be created for text data or image data.

Top comments (0)