0

I have two files, one is user input f1, and other one is database f2.I want to search if strings from f1 are in database(f2). If not print the ones that don't exist if f2. I have problem with my code, it is not working fine: Here is f1:

rbs003491
rbs003499
rbs003531
rbs003539
rbs111111

Here is f2:

AHPTUR13,rbs003411 
AHPTUR13,rbs003419 
AHPTUR13,rbs003451 
AHPTUR13,rbs003459 
AHPTUR13,rbs003469 
AHPTUR13,rbs003471 
AHPTUR13,rbs003479 
AHPTUR13,rbs003491 
AHPTUR13,rbs003499 
AHPTUR13,rbs003531 
AHPTUR13,rbs003539 
AHPTUR13,rbs003541 
AHPTUR13,rbs003549 
AHPTUR13,rbs003581 

In this case it would return rbs11111, because it is not in f2. Code is:

 with open(c,'r') as f1:
             s1 = set(x.strip() for x in f1)
             print s1
             with open("/tmp/ARNE/blt",'r') as f2:
                  for line in f2:
                      if line not in s1:
                          print line 
1
  • You could always feed the data from each file into a string and use difflib which is a built-in module. If your "database" is a sqlite or mysql database, then this probably won't work, but I'm guessing when you say database, you just mean a file containing data. Let me know if this assumption is incorrect. Commented Feb 16, 2015 at 16:00

3 Answers 3

1

If you only care about the second part of each line (rbs003411 from AHPTUR13,rbs003411):

with open(user_input_path) as f1, open('/tmp/ARNE/blt') as f2:
    not_found = set(f1.read().split())
    for line in f2:
        _, found = line.strip().split(',')
        not_found.discard(found)  # remove found word
    print not_found
    # for x in not_found:
    #     print x
Sign up to request clarification or add additional context in comments.

4 Comments

it's not working, it is printing all database f2, instead of only rbs11111, which is not in f2.
@user3319356, It print exactly what you want. See the demo run: asciinema.org/a/16511
ok, can we print just rbs11111, instead of 'set(['rb111111'])?, rstrip() it's not working.
@user3319356, Loop over the result set: for x in not_found: print x. See the commented lines in the answer.
0

Your line variable in the for loop will contain something like "AHPTUR13,rbs003411", but you are only interested in the second part. You should do something like:

for line in f2:
    line = line.strip().split(",")[1]
    if line not in s1:
        print line

Comments

0

you need to check the last part of your lines not all of them , you can split your lines from f2 with , then choose the last part (x.strip().split(',')[-1]) , Also if you want to search if strings from f1 are in database(f2) your LOGIC here is wrong you need to create your set from f2 :

with open(c,'r') as f1,open("/tmp/ARNE/blt",'r') as f2:

                  s1 = set(x.strip().split(',')[-1] for x in f2)
                  print s1
                  for line in f1:
                      if line.strip() not in s1:
                          print line

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.