How to return result of a SELECT inside a function in PostgreSQL?

Question

I have this function in PostgreSQL, but I don't know how to return the result of the query:

CREATE OR REPLACE FUNCTION wordFrequency(maxTokens INTEGER)
  RETURNS SETOF RECORD AS
$$
BEGIN
  SELECT text, count(*), 100 / maxTokens * count(*)
  FROM (
    SELECT text
    FROM token
    WHERE chartype = 'ALPHABETIC'
    LIMIT maxTokens
  ) AS tokens
  GROUP BY text
  ORDER BY count DESC
END
$$
LANGUAGE plpgsql;

But I don't know how to return the result of the query inside the PostgreSQL function.

I found that the return type should be SETOF RECORD, right? But the return command is not right.

What is the right way to do this?

Why do you count them; do you have duplicate tokens in your token TABLE? Also: please add the table definition to your question. — wildplasser
– wildplasser, Commented Oct 30, 2011 at 15:52
Is this your entire function? If you don't have any other statements in the function, you should just make it LANGUAGE SQL. — jpmc26
– jpmc26, Commented Oct 4, 2014 at 18:38

Erwin Brandstetter · Accepted Answer · 2022-05-20 21:34:37Z

Use RETURN QUERY:

CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
  RETURNS TABLE (txt   text   -- also visible as OUT param in function body
               , cnt   bigint
               , ratio bigint)
  LANGUAGE plpgsql AS
$func$
BEGIN
   RETURN QUERY
   SELECT t.txt
        , count(*) AS cnt                 -- column alias only visible in this query
        , (count(*) * 100) / _max_tokens  -- I added parentheses
   FROM  (
      SELECT t.txt
      FROM   token t
      WHERE  t.chartype = 'ALPHABETIC'
      LIMIT  _max_tokens
      ) t
   GROUP  BY t.txt
   ORDER  BY cnt DESC;                    -- potential ambiguity 
END
$func$;

Call:

SELECT * FROM word_frequency(123);

Defining the return type explicitly is much more practical than returning a generic record. This way you don't have to provide a column definition list with every function call. RETURNS TABLE is one way to do that. There are others. Data types of OUT parameters have to match exactly what is returned by the query.

Choose names for OUT parameters carefully. They are visible in the function body almost anywhere. Table-qualify columns of the same name to avoid conflicts or unexpected results. I did that for all columns in my example.

But note the potential naming conflict between the OUT parameter cnt and the column alias of the same name. In this particular case (RETURN QUERY SELECT ...) Postgres uses the column alias over the OUT parameter either way. This can be ambiguous in other contexts, though. There are various ways to avoid any confusion:

Use the ordinal position of the item in the SELECT list: ORDER BY 2 DESC. Example:
- Select first row in each GROUP BY group?
Repeat the expression ORDER BY count(*).
(Not required here.) Set the configuration parameter plpgsql.variable_conflict or use the special command #variable_conflict error | use_variable | use_column in the function. See:
- Naming conflict between function parameter and result of JOIN with USING clause

Don't use "text" or "count" as column names. Both are legal to use in Postgres, but "count" is a reserved word in standard SQL and a basic function name and "text" is a basic data type. Can lead to confusing errors. I use txt and cnt in my examples, you may want more explicit names.

Added a missing ; and corrected a syntax error in the header. (_max_tokens int), not (int maxTokens) - data type after name.

While working with integer division, it's better to multiply first and divide later, to minimize the rounding error. Or work with numeric or a floating point type. See below.

Alternative

This is what I think your query should actually look like (calculating a relative share per token):

CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
  RETURNS TABLE (txt            text
               , abs_cnt        bigint
               , relative_share numeric)
  LANGUAGE plpgsql AS
$func$
BEGIN
   RETURN QUERY
   SELECT t.txt, t.cnt
        , round((t.cnt * 100) / (sum(t.cnt) OVER ()), 2)  -- AS relative_share
   FROM  (
      SELECT t.txt, count(*) AS cnt
      FROM   token t
      WHERE  t.chartype = 'ALPHABETIC'
      GROUP  BY t.txt
      ORDER  BY cnt DESC
      LIMIT  _max_tokens
      ) t
   ORDER  BY t.cnt DESC;
END
$func$;

The expression sum(t.cnt) OVER () is a window function. You could use a CTE instead of the subquery. Pretty, but a subquery is typically cheaper in simple cases like this one (mostly before Postgres 12).

A final explicit RETURN statement is not required (but allowed) when working with OUT parameters or RETURNS TABLE (which makes implicit use of OUT parameters).

round() with two parameters only works for numeric types. count() in the subquery produces a bigint result and a sum() over this bigint produces a numeric result, thus we deal with a numeric number automatically and everything just falls into place.

Very thanks to your answer and corrections. Is working fine now (I only changed the ratio type to numeric).
@RenatoDinhaniConceição Cool! I added a version that may or may not answer an additional question that you haven't actually asked. ;)
Nice, the only thing is I think you need a RETURN; before that END;, at least I did - but I'm doing a UNION so I'm not sure if that makes it different.
@yekta: I added some information concerning the role of RETURN. Fixed an unrelated error and added some improvements while being at it.
What is the way to do this when you don't want to constrain what is in Return TABLE(). IE RETURN TABLE(*) ?

Lee Goddard · Accepted Answer · 2022-08-29 07:20:08Z

10

Please see the following link for documentation:

https://www.postgresql.org/docs/current/xfunc-sql.html

Example:

    CREATE FUNCTION sum_n_product_with_tab (x int)
    RETURNS TABLE(sum int, product int) AS $$
        SELECT $1 + tab.y, $1 * tab.y FROM tab;
    $$ LANGUAGE SQL;

edited Aug 29, 2022 at 7:20

Lee Goddard

11.3k5 gold badges53 silver badges71 bronze badges

answered Jun 26, 2019 at 7:09

Moumita Das

1091 silver badge2 bronze badges

3 Comments

Peter Krauss Over a year ago

Yes, better to use "pure SQL" whenever you can. You can use two or more commands (SELECT's, INSERTS, etc.), only the last one is the return value. A workaround for procedural "dependent step by step" is to use a chain of clauses in a WITH. For example WITH t1 as (SELECT etc1), t2 as (SELECT etc2 from t1) SELECT result FROM t2;

Rafs Over a year ago

@PeterKrauss may I ask why? Are there any references recommending SQL over PLPGSQL?

Peter Krauss Over a year ago

Hi @Rafs, yes... It's not easy to find exactly what PostgreSQL does nowadays, but is the JIT optimizer that, for example, reuse SQL code in a SQL-VIEW, postgresql.org/docs/current/jit-reason.html

blackgreen · Accepted Answer · 2024-02-19 07:49:35Z

For example, you create person table, then insert 2 rows into it as shown below:

CREATE TABLE person (
  id INT,
  name VARCHAR(20),
  age INT
);

INSERT INTO person (id, name, age) 
VALUES (1, 'John', 27), (2, 'David', 32);

Now, you can create my_func() with a RETURN QUERY statement as shown below:

CREATE FUNCTION my_func() RETURNS SETOF person AS $$
BEGIN
  RETURN QUERY SELECT * FROM person; -- Here
END;
$$ LANGUAGE plpgsql;

Then, calling my_func() returns 2 rows as shown below:

postgres=# SELECT * FROM my_func();
 id | name  | age
----+-------+-----
  1 | John  |  27
  2 | David |  32
(2 rows)

postgres=# SELECT my_func();
   my_func
--------------
 (1,John,27)
 (2,David,32)
(2 rows)

Collectives™ on Stack Overflow

How to return result of a SELECT inside a function in PostgreSQL?

3 Answers 3

Alternative

6 Comments

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Alternative

6 Comments

3 Comments

Comments

Linked

Related