1

I have the following two arrays , i am trying to see whether if the elements in invalid_id_arr exists in valid_id_arr if it doesn't exist then i would form the diff array.But from the below code i see the following in diff array ['id123', 'id124', 'id125', 'id126', 'id789', 'id666'], i expect the output to be ["id789","id666"] what am i doing wrong here

tag_file= {}
tag_file['invalid_id_arr']=["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"] 
tag_file['valid_id_arr']=["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"] 
diff = [ele.split('-')[0] for ele in tag_file['invalid_id_arr'] if str(ele.split('-')[0]) not in tag_file['valid_id_arr']]

Current Output:

 ['id123', 'id124', 'id125', 'id126', 'id789', 'id666']

Expected ouptut:

 ["id789","id666"]
2
  • you only want to check the value just after 'id'? Commented Jun 13, 2012 at 8:34
  • 1
    check out sets, if you clean your data you can do set(a).difference(set(b)). Commented Jun 13, 2012 at 8:37

3 Answers 3

4

Using a set is more efficient, but your main problem is that you weren't removing the second half of the elements in valid_id_arr.

invalid_id_arr=["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"] 
valid_id_arr=["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
valid_id_set = set(ele.split('-')[0] for ele in valid_id_arr)
diff = [ele for ele in invalid_id_arr if ele.split('-')[0] not in valid_id_set]
print diff

output:

['id789-123', 'id666']

http://ideone.com/Q9JBw

Sign up to request clarification or add additional context in comments.

3 Comments

I taught python interpreter will handle it.. id1 in valid_arr >>True
No, (x in arr) is True if and only if x == arr[i] for any i. See the first line of the table in this section: docs.python.org/library/…
Yes - you could replace set(...) with list(...) in my code, but it will be slower if the lists are large. x in list(...) is O(n) but x in set(...) is O(log n), which is faster. However, your example was doing "id1" in ["id1-1", "id1-2"] = False, because the stuff after the hyphen hadn't been removed.
3

Try sets:

invalid_id_arr = ["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"] 
valid_id_arr = ["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"] 

set_invalid = set(x.split('-')[0] for x in invalid_id_arr)
print set_invalid.difference(x.split('-')[0] for x in valid_id_arr)

Comments

0
    >>> a = ["id123-3431","id124-4341","id125-4341","id126-1w","id789-123","id666"]
    >>> b = ["id123-12345","id124-1122","id125-13232","id126-12332","id1new","idagain"]
    >>> c = (s.split('-')[0] for s in b)
    >>> [ele.split('-')[0] for ele in a if str(ele.split('-')[0]) not in c]

        ['id789', 'id666']
    >>>  

2 Comments

I was trying to avoid two for loops
O(n * log(N)) using set, O(n * n) using your method, but you solution can work very well!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.