1

I want to write the contents of lists on a text file. The way I currently do it is the following:

for k in li:
    outp.write(str(k))
out.write('\n')

however, this is taking a long time to run (I have tens of millions of lines)

Is there any faster way?

Sample Lines

The lists are from a sparse matrix so there are plenty of zeros. The non null elements are ones only

eg of such a list: [0 0 0 0 0 0 0 1 0 0 1 1 1 0 0]

The numbers are separated by tabs

8
  • 2
    Does it have to be human readable? i.e. What uses the file afterwards? Commented Feb 20, 2014 at 9:34
  • @doctorlove yes in a text file read Commented Feb 20, 2014 at 9:38
  • Please post a couple of sample lines in your lists. We might be able to exploit some properties of the data Commented Feb 20, 2014 at 9:38
  • What does "in a text file read " mean? If a machine reads it, binary (eg pickle) might be quicker. And then you don't need to make it a string first. Commented Feb 20, 2014 at 9:40
  • 1
    @bigTree Then maybe this entry could be interesting to you: Speed up writing to files. Writing to the disk is slow. There is not a golden rule you can use to speed it up. But maybe you find some optimizations there... Commented Feb 20, 2014 at 9:46

2 Answers 2

1

I wonder if storing the coordinates of the 1s would yield desired speedups, as you mention that this is a sparse matrix. Consider the following:

Write to file:

def writeMatrix(matrix):
    ones = [[i for i,num in row if num==1] for row in matrix]
    with open('path/to/file', 'w') as outfile:
        for row in ones:
            outfile.write(' '.join(str(i) for i in row))
            outfile.write('\n')

Read from file:

def readMatrix(infilepath, width):
    answer = []
    with open(infilepath) as infile:
        for line in infile:
            row = [None]*width
            for i in set(int(i) for i in line.split()):
                row[i] = 1
            answer.append(row)
    return answer
Sign up to request clarification or add additional context in comments.

Comments

1

You should be able to get a pretty decent speed up by only writing once and leaving out all of those manual casts:

out.write('\n'.join(li))

There's probably further optimizations, but really you should look at a binary-based filed format for something where you have such a big array.

Specifically marshal seems like a great fit, but there are faster things out there.

1 Comment

it output an error saying "Type Error: sequence item 0: expected str instance, int found"

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.