0

I have a list in my application as;

tags = ["nature", "science", "funny", "politics", "news"]

and I want to check if all these elements exist in my Tags.name table field. The idea is that user is not able to add any new tag which is not already in the system.

Currently, I am trying to run a .foreach loop in my query as;

DO $$
BEGIN
  FOREACH itag IN ARRAY {'nature', 'politics'} LOOP
    IF EXISTS (SELECT 1 FROM tags WHERE name = itag) THEN
      INSERT INTO TAGS (name, post_id) values itag, 'post01' ;
    ELSE
      RAISE EXCEPTION 'Tag % doesnt exists in table', itag;
    END IF;
  END LOOP;
END; $$

this gives error as;

ERROR:  syntax error at or near "{"
LINE 3:   FOREACH itag SLICE 1 IN ARRAY {'nature', 'politics'} LOOP

I am not sure how to query for a list in postgres. I dont want to do this via application code and fire multiple queries as there can be a lot of elements in my list. I want to have the list checked against the table values in a single query.

Also, if possible, is there a way to optimize my query?


EDIT: I have used @a_horse_with_no_name 's answer to come up with a flow similar to what I am looking for. If all tags are present in my table, I will add these entries else I will throw an exception.

DO $$
BEGIN
 IF (with to_check (itag) as (
   values ('nature'),('science'),('politics'),('scary')
  )
  select bool_and(exists (select * from tags t where t.name = tc.itag)) as all_tags_present
  from to_check tc) THEN
  RAISE INFO 'ALL GOOD. Here I will add the insert statement in my app';
ELSE
  RAISE EXCEPTION 'One or more tags are not present';
END IF;

END; $$

1 Answer 1

3

There is no need for PL/pgSQL or a loop. You can use a list of tags and check the existence for each of them in a single statement:

with to_check (itag) as (
   values ('nature'),('science'),('funny'),('politics'),('news')
)
select tc.itag, 
       exists (select * from tags t where t.name = tc.itag) as tag_exists
from to_check tc;

If you just want a single flag telling you if at least one tag is missing, you can use the following:

with to_check (itag) as (
   values ('nature'),('science'),('funny'),('politics'),('news')
)
select bool_and(exists (select * from tags t where t.name = tc.itag)) as all_tags_present
from to_check tc;

bool_and will only return true if all values are true.


The error you get is because {'nature', 'politics'} is an invalid array literal. You either need to use the array constructor

array['nature', 'politics'] 

or a string literal that can be cast to an array

'{nature, politics}'::text[]

(note the curly brackets are inside of the string).

I prefer the array constructor as I don't have to worry about nesting string literals.


The idea is that user is not able to add any new tag which is not already in the system

The correct solution to this problem is, to have one table that contains tag definitions and that ensures that every tag name is only used once:

create table tag_definition
(
   name   varchar(50) primary key
);

Then in your tags table reference the tag_definition:

create table tags
(
   name     varchar(50) not null references tag_definition, 
   post_id  integer  not null references posts
);

Now it's impossible to insert a non-existing tag to the tags table.

All you need to do now is to catch the exception when you insert the rows. No need to check for the tags before inserting.

You could save space and make the tags table substantially smaller by using a generated primary key for the tag_definition table (e.g. a serial) and use that for the reference in the tags table.


Given the insert statement in your question, you could achieve the same with a single insert statement:

with to_check (itag) as (
   values ('nature'),('politics')
)
insert into tags (tag, post_id)
select tc.itag, 'post01'
from to_check tc 
where not exists (select itag
                  from to_check
                  except
                  select t.name
                  from tags t);

That however won't scale very well if the tags table grows. The sub-select will return nothing if all tags from to_check exist and thus the not exists condition will make the INSERT return everything. If at least one tag doesn't exist, nothing will be inserted.

To make that (somewhat) efficient you will need an index on `tags(name).

Sign up to request clarification or add additional context in comments.

4 Comments

Regarding your comment on the tag modelling, I actually went with a 2 table solution (instead of a 3 table one) based on a stackoverflow recommendation; stackoverflow.com/questions/20856/… I agree 3 tables has its own benfits.
Apparently you have a one table solution otherwise you wouldn't need to check the tags table if the tag is already there (btw: how do you create new tags, if the you can only add tags that are already there and everything is stored in the same table)
my tags table has the attributes name, postid, authorid. all have not unique values. It is more of a tags_post_association table. So, you can have multiple instances of same tag name and combinations appearing in my table. Now, when I need to find the score of tags for a post (how many users tagged the post as such) I just need to get a unique count for a specific postid and authorid. The first instance of tag name will be added by me directly in the database. I already have frontend checks to ensure that no new tags can be entered. But people are free to associate existing tags with posts.
And that first tag you insert manually is associated with which post_id? What happens if you get a thousand posts per minute? Will you check and verify new tags manually a thousand times per minute? Sounds like a really strange setup to me

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.