0

I have a file that has this contents

1 5 9 14 15  
00000
10000
00010
11010
00010

I want to parse the file so that the following is output

UUUUUUUUUUUUUU
YUUUUUUUUUUUUU
UUUUUUUUUUUUYY
YUUUYUUUUUUUYU
UUUUUUUUUUUUYU

This means the first row provides a position. If there is a 0, it becomes U. If it is a 1 it becomes Y. Between the first two columns there are 4 unmapped cols which means that for these four cols all rows are U - and 0

I tried the following in python

    #!/usr/bin/env python2
import sys
with open(sys.argv[1]) as f:
    f.readline()
    for line in f:
        new = ''
        for char in line.rstrip():
            if char == '0':
                new += 'UU'
            elif char == '1':
                new +='YU'
        print new.rstrip()[:-1]

The problem is that this script only works if the positions are 2 apart but they can also be larger - how can I extend the script?

there is some poroblem when i run the code from, Delimity - get an error

dropbox.com/s/cf8rbv20bgyvssq/conv_inp?dl=0 these are the real da

Traceback (most recent call last):
  File "./con.py", line 8, in <module>
    for v in xrange(max(positions) + 1):
OverflowError: long int too large to convert to int
6
  • no ideas? any would be gre Commented Jun 23, 2015 at 13:11
  • 1
    I made some edits to the question thinking I'd understand it better but I still have no idea what the first line (1 5 9 14 15 ) is for? Commented Jun 23, 2015 at 13:21
  • first line is column sparse matrix columns indices ? Commented Jun 23, 2015 at 13:24
  • Is this a binary file or a text file containing numbers? Commented Jun 23, 2015 at 13:25
  • 1
    Are you sure that 00010 is UUUUUUUUUUUUYY? It should be UUUUUUUUUUUUYU! Commented Jun 23, 2015 at 13:28

2 Answers 2

1

Just a guess.

Implement the converter:

def convert(s):
    return "UUU".join({"0": "U", "1": "Y"}[c] for c in s[:-1]) + "U"

And test it:

assert convert("00000") == "UUUUUUUUUUUUUU"
assert convert("10000") == "YUUUUUUUUUUUUU"
assert convert("00010") == "UUUUUUUUUUUUYU"
assert convert("11010") == "YUUUYUUUUUUUYU"
assert convert("00010") == "UUUUUUUUUUUUYU"
Sign up to request clarification or add additional context in comments.

4 Comments

how could I just write the recoded thing to a file without the "
The quotation marks are there only in the source code. When you write a string to a file these quotation marks are not there.
how could i just use ths as a file to exe?
You can use your original code, just reduced in this way: for line in f: print convert(line.rstrip()). It might be necessary to skip the first input line but you are able to solve this problem by yourself.
0

Check this code:

#!/usr/bin/env python2
import sys

def myxrange(to):
    x = 0
    while x < to:
        yield x
        x += 1

with open(sys.argv[1]) as f:
    positions = map(lambda x: long(x) - 1, f.readline().split())
    max_pos = max(positions)
    for line in f:
        new = ''
        for i in myxrange(max_pos + 1):
            if i in positions and line[positions.index(i)] == '1':
                new += 'Y'
            else:
                new += 'U'
        print new.rstrip()

3 Comments

the code works for the example but nit for my real file...the real file is here..do you think you coudl have a quick look? dropbox.com/s/cf8rbv20bgyvssq/conv_inp?dl=0
the error is Traceback (most recent call last): File "./con.py", line 8, in <module> for v in xrange(max(positions) + 1): OverflowError: long int too large to convert to int
is there a possibility to make it fast?..it is rather slo on th einpu...the file with more than 120 mb in the description

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.