1

I am working with information from big models, which means I have a lot of big ascii files with two float columns (lets say X and Y). However, whenever I have to read these files it takes a long time, so I thought maybe converthing them to binary files will make the reading process much faster.

I converted my asciifiles into binary files using the uu.encode(ascii_file,binary_file) command, and it worked quite well (Actually, tested the decode part and I recovered the same files).

My question is: is there anyway to read the binary files directly into python and get the data into two variables (x and y)?

Thanks!

2
  • 1
    If you intend to use the files entirely from Python, pickle the model data with cPickle instead, as that is a faster way to save and load data in Python. Commented Sep 29, 2012 at 16:33
  • 4
    Also, uu.encode doesn't encode anything into binary; it actually encodes binary into ASCII. So loading uuencoded text in Python will actually be slower since you have to unwrap the uuencoding, then load the ASCII floats from the decoded text. Commented Sep 29, 2012 at 16:35

3 Answers 3

3

You didn't specify how your float columns are represented in Python. The cPickle module is a fast general solution, with the drawback that it creates files readable only from Python, and that it should never be allowed to read untrusted data (received from the network). It is likely to just work with all regular datatypes, including numpy arrays.

If you can use numpy and store your data in numpy arrays, look into numpy.save and numpy.savetxt and the corresponding loading functions, which should offer performance superior to manually extracting the data.

array.array also has methods for writing array data to file, with the drawback that the array data is written in the native format and cannot be read from a different architecture.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, numpy was a great idea (really straightforward, and the loading takes only 10% of the time it used to to take!)
You're welcome. I expect numpy will serve you well in other respects, too. Just be careful when extracting data - creation of a single numpy float from a numpy array is slower than getting a float object out of a Python list. Do as many operations with numpy looping constructs, and you'll be fine. Also, feel free to accept the answer. :)
1

Check out python's struct module. It's probably what you'd want to be using for reading and writing your data.

Comments

0

I suggest that instead of the suggested struct module, if your model is just floats/doubles (coordinates), you should see the array module, must be much faster than any ops in the struct module. The downside of it is that the collection is homogenous, you need to have first values in odd indexes, second ones in even indexes, or sequentially.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.