I have two data files containing sets of 4 lines. I need to extract the sets of 4 lines contained in the second file if part of the 1st line of every set matches.
Here is an example of input data:
input1.txt
@abcde:134/1
JDOIJDEJAKJ
content1
content2
input2.txt
@abcde:134/2
JKDJFLJSIEF
content3
content4
@abcde:135/2
KFJKDJFKLDJ
content5
content6
Here is what the output should look like:
output.txt
@abcde:134/2
JKDJFLJSIEF
content3
content4
Here is my attempt at writing code...
import sys
filename1 = sys.argv[1] #input1.txt
filename2 = sys.argv[2] #input2.txt
F = open(filename1, 'r')
R = open(filename2, 'r')
def output(input1, input2):
for line in input1:
if "@" in line:
for line2 in input2:
if line[:-1] in line2:
for i in range(4):
print next(input2)
output = output(F, R)
write(output)
I get invalid syntax for next() which I can't figure out, and I would be happy if someone could correct my code or give me tips on how to make this work.
===EDIT=== OK, I think I have managed to implement the solutions proposed in the comments below (thank you). I am now running the code on a Terminal session connected by ssh to a remote Ubuntu server. Here is what the code looks like now. (This time I am running python2.7)
filename1 = sys.argv[1] #input file 1
filename2 = sys.argv[2] #input file 2 (some lines of which will be in the output)
F = open(filename1, 'r')
R = open(filename2, 'r')
def output(input1, input2):
for line in input1:
input2.seek(0)
if "@" in line:
for line2 in input2:
if line[:-2] in line2:
for i in range(4):
out = next(input2)
print out
return
output (F, R)
Then I run this command:
python fetch_reverse.py test1.fq test.fq > test2.fq
I don't get any warnings, but the output file is empty. What am I doing wrong?
printis a function and requires the parenthesis:print(next(reverse)). Note that this works even in python2.output()doesn'treturnanything, and that you then try to shadow its name in calling it. You will also need to store your results in some container, pass it back to the caller and rename the variable before this will work at all.input1once, but trying to loop overinput2each time you hit a match; you'll read all ofinput2the first time"@" in lineis true and then, as the filepointer is at the end of the file, will not read another line again. Your code needs to gather all matching@lines frominput1first, then loop overinput2searching for matches, instead.linewill include a newline character,line[:-1]is the same line without the newline, that last digit is still going to be present.returnbut will try; I corrected the name shadowing.