How to check if a substring is in any array value in Postgres?

Question

So I have the following use case:

I have three values street_a, number_a and address_list of type varchar like this:

address_a = 'Main Street'
number_a = '4'
address_list = 'Lower Street 6;Main Street 3,4,5'

Now I would like to return a TRUE when I check the concatenated strings of street_a and number_a (Main Street 4) as it is part of address_list. I hoped that the following would work:

SELECT 
    lower(REPLACE(street_a, ' ', '')) || '%,' || number_a || ',%' 
    LIKE ANY 
    (string_to_array(lower(REPLACE(address_list, ' ','')), ';'))

However, the % signs in a LIKE statement are only applied when on the right side of the check. When on the left side they are just concatenated to the strings...

EDIT

And because of other cases like

address_list = 'Main Street 1,2;Lower Street 4'

I cannot do

SELECT 
    lower(REPLACE(address_list, ' ',''))
    LIKE
    lower(REPLACE(street_a, ' ', '')) || '%' || number_a || '%'

Is there a way to return TRUE for my case here?

The underlying problem is storing lists as CSV fields. Store each address as its own row in its own table. — Schwern
– Schwern, Commented Sep 10, 2021 at 0:38

NullPointerException · Accepted Answer · 2021-09-10 03:09:46Z

I'd say that this is not the best way to store addresses, or do lookups. There's a ton of complexities in addresses, for which this approach is not well suited. I'd advise against going this route at all. However, if you're forced to do so, then I think using a regular expression may be a better approach than a LIKE expression.

Note: I'm not addressing the possibility that the street or number could possibly contain characters that should be escaped, or many other scenarios, but this code "could" work under certain circumstances.

CREATE OR REPLACE FUNCTION address_match(street_a text, number_a text, address_list text) RETURNS BOOLEAN LANGUAGE plpgsql AS     
$$ 
declare
  pattern text := replace(street_a, ' ', '') || '(([0-9]+,)*)' || number_a || '(,.*)?$'; 
  result bool;
begin
  with a as (SELECT unnest(string_to_array(replace(address_list,  ' ', ''), ';')) as address) 
  select exists(select a.address from a where a.address ~* pattern) into result;
  return result;
end;
$$;

Some quick test scenarios ...

postgres=# select address_match('Main Street', '3', 'Lower Street 6;Main Street 3,4,5');
 address_match 
---------------
 t
(1 row)

postgres=# select address_match('Main Street', '4', 'Lower Street 6;Main Street 3,4,5');
 address_match 
---------------
 t
(1 row)

postgres=# select address_match('Main Street', '5', 'Lower Street 6;Main Street 3,4,5');
 address_match 
---------------
 t
(1 row)

postgres=# select address_match('maiN StreeT', '3', 'Lower Street 6;Main Street 3,4,5');
 address_match 
---------------
 t
(1 row)

postgres=# select address_match('Main Street', '33', 'Lower Street 6;Main Street 3,4,5');
 address_match 
---------------
 f
(1 row)

postgres=# select address_match('Main Street', '45', 'Lower Street 6;Main Street 3,4,5');
 address_match 
---------------
 f
(1 row)

postgres=# select address_match('maiN StreeT', '1', 'Lower Street 6;Main Street 3,4,5');
 address_match 
---------------
 f
(1 row)

postgres=# select address_match('Lower    StreeT', '6', 'Lower Street 6;Main Street 3,4,5');
 address_match 
---------------
 t
(1 row)

postgres=# select address_match('Lower    StreeT', '3', 'Lower Street 6;Main Street 3,4,5');
 address_match 
---------------
 f
(1 row)

Thanks! That works great! And yes, I am forced to this strategy as this is the data how it arrives from the frontend at the moment (hopefully will be changed soon!). I store my addresses more efficiently ;)
I just think that '(,.*)?$' should be '(.*)?$' because if I would check address_match('Main Street', '5', 'Main Street 5;Lower Street 6') it would return f
I added the comma in the expression so that it would not match on something that just begins with 5, like 50, 51, etc. The '?' makes that group optional, so even if there are no other numbers, I believe it should still work. That scenario is in one of the "Lower Street" tests I listed.

Schwern · Accepted Answer · 2021-09-10 00:46:58Z

0

address_list = 'Lower Street 6;Main Street 3,4,5'

It is very inefficient and complicated to store lists like this in SQL. Instead, store each address in its own row.

create table addresses (
  address text not null
);

insert into addresses values
  ('Lower Street 6'), ('Main Street 3'), ('Main Street 4'), ('Main Street 5');

Now it's a simple select.

select 1
from addresses
where lower(address) = lower(street_a) || ' ' || number_a

You can index lower(address) to make it run quickly and avoid duplicates.

create unique index lower_case_address ON addresses ((lower(address)));

Demonstration.

answered Sep 10, 2021 at 0:46

Schwern

167k28 gold badges225 silver badges370 bronze badges

3 Comments

JoeBe Over a year ago

Thanks for the reply. However, I am forced to this strategy as the address list is not something that is stored in my database (I store addresses more efficiently) but it is how the data arrives from the frontend. However, for now the reply by @NullPointerException solved my use case

Schwern Over a year ago

@JoeBe What is the reason you're doing it in Postgres? This sort of work is easier to do in a regular programming language. If you edit the question with your full circumstances, you might get a better solution.

JoeBe Over a year ago

The reason is that this is the current situation of our stack right now. So I think my question was formulated in the exact way as I needed it. And the solution I got from here was perfect for my specific use-case

Collectives™ on Stack Overflow

How to check if a substring is in any array value in Postgres?

2 Answers 2

4 Comments

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

3 Comments

Related