I'm unit testing code in Python2.7 that writes numpy array via ndarray.tofile(fileHandle,..). Since doing file IO in unit tests is bad for a number of reasons, how do I substitute a byte memorystream in place of the file handle? (io.BytesIO failed to work because ndarray.toFile() asks it for a file name.)
Add a comment
|
2 Answers
Shouldn't tobytes [1] and frombuffer [2] do what you need for testing purposes?
m = np.random.rand(5,3)
b = m.tobytes()
mb = np.frombuffer(b).reshape(m.shape)
1 Comment
Lance Kind
That will work assuming that tofile() doesn't deviate from tobytes. This is the best answer for the current state of numpy. It's unfortunate that tofile doesn't accept a stream so it would be possible to directly test unit test tofile() API.
Would a tempfile.TemporaryFile suit your purposes?
It exposes the same interface as a normal file object, so you can pass it directly to np.ndarray.tofile(), and it will be deleted immediately when it is either explicitly closed or garbage collected:
import numpy as np
from tempfile import TemporaryFile
x = np.random.randn(1000)
with TemporaryFile() as t:
x.tofile(t)
# do your testing...
# t is closed and deleted
It will, however, reside temporarily on disk (usually in /tmp/ on a Linux machine), but I don't see an easy way to avoid I/O altogether, since .tofile() will ultimately need a valid OS-level file descriptor.
6 Comments
Lance Kind
Building automated unit tests with file io leaves race conditions that cause tests to not behave deterministically. If one adds sleeps to ensure asynchronous file io is finished, then you have slow unit tests and slow unit tests aren't scalable to having hundreds of unit tests which run in a few seconds. What you suggest is perfectly fine for a few system tests but that's not what I'm doing.
ali_m
It would be helpful if you could provide a bit more information about your requirements. How much data are you writing? Do you need to be able to read it back? What sort of race conditions are you concerned about? Do you absolutely need to use
tofile?Lance Kind
I want to test an application that uses numbpy. The application is creating files. I need to write out a bytes to kb to confirm if the correct bytes are being produced. To write automated tests that will work consistently across hardware, I want to check the in memory output stream before it is written to a file. This last part I'm finding difficult because although the api docs for nddarray.toFile() say it takes a filehandler argument or a handle to a stream, it doesn't handle the bytesIo handle I'm passing in. I fear that as of now, it requires a file handler. :-)
Lance Kind
I'm looking at nddarray.toBytes() and doing some tests to see if I can assume toBytes() will mirror what is written out with toFile(). If that works, I'll figure something out. Maybe use it for mocking.
ali_m
FWIW, here's the relevant method in the C source, which in turn calls
npy_PyFile_Dup2 (note the use of os.dup to duplicate a file descriptor). This is all going on at C level, so I don't see an easy way to fake an open file via a Python object. |