python get difference from arrays

Question

I have the following two arrays , i am trying to see whether if the elements in invalid_id_arr exists in valid_id_arr if it doesn't exist then i would form the diff array.But from the below code i see the following in diff array ['id123', 'id124', 'id125', 'id126', 'id789', 'id666'], i expect the output to be ["id789","id666"] what am i doing wrong here

tag_file= {}
tag_file['invalid_id_arr']=["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"] 
tag_file['valid_id_arr']=["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"] 
diff = [ele.split('-')[0] for ele in tag_file['invalid_id_arr'] if str(ele.split('-')[0]) not in tag_file['valid_id_arr']]

Current Output:

 ['id123', 'id124', 'id125', 'id126', 'id789', 'id666']

Expected ouptut:

 ["id789","id666"]

check out sets, if you clean your data you can do set(a).difference(set(b)). — monkut
– monkut, Commented Jun 13, 2012 at 8:37

Rodrigo Queiro · Accepted Answer · 2012-06-13 08:33:18Z

4

Using a set is more efficient, but your main problem is that you weren't removing the second half of the elements in valid_id_arr.

invalid_id_arr=["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"] 
valid_id_arr=["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
valid_id_set = set(ele.split('-')[0] for ele in valid_id_arr)
diff = [ele for ele in invalid_id_arr if ele.split('-')[0] not in valid_id_set]
print diff

output:

['id789-123', 'id666']

http://ideone.com/Q9JBw

answered Jun 13, 2012 at 8:33

community wiki

Rodrigo Queiro

Sign up to request clarification or add additional context in comments.

3 Comments

Rajeev Over a year ago

I taught python interpreter will handle it.. id1 in valid_arr >>True

Rodrigo Queiro Over a year ago

No, (x in arr) is True if and only if x == arr[i] for any i. See the first line of the table in this section: docs.python.org/library/…

Rodrigo Queiro Over a year ago

Yes - you could replace set(...) with list(...) in my code, but it will be slower if the lists are large. x in list(...) is O(n) but x in set(...) is O(log n), which is faster. However, your example was doing "id1" in ["id1-1", "id1-2"] = False, because the stuff after the hyphen hadn't been removed.

georg · Accepted Answer · 2012-06-13 08:51:19Z

3

Try sets:

invalid_id_arr = ["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"] 
valid_id_arr = ["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"] 

set_invalid = set(x.split('-')[0] for x in invalid_id_arr)
print set_invalid.difference(x.split('-')[0] for x in valid_id_arr)

edited Jun 13, 2012 at 8:51

answered Jun 13, 2012 at 8:35

georg

216k57 gold badges323 silver badges401 bronze badges

Comments

shiva · Accepted Answer · 2012-06-13 08:53:28Z

0

    >>> a = ["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"]
    >>> b = ["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
    >>> c = (s.split('-')[0] for s in b)
    >>> [ele.split('-')[0] for ele in a if str(ele.split('-')[0]) not in c]

        ['id789', 'id666']
    >>>

edited Jun 13, 2012 at 8:53

answered Jun 13, 2012 at 8:39

shiva

2,7704 gold badges24 silver badges39 bronze badges

2 Comments

Rajeev Over a year ago

I was trying to avoid two for loops

kaitian521 Over a year ago

O(n * log(N)) using set, O(n * n) using your method, but you solution can work very well!

Collectives™ on Stack Overflow

python get difference from arrays

3 Answers 3

3 Comments

Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

2 Comments

Related