I have a homebrew web based file system that allows users to download their files as zips; however, I found an issue while dev'ing on my local box not present on the production system.
In linux this is a non-issue (the local dev box is a windows system).
I have the following code
algo = CipherType('AES-256', 'CBC')
decrypt = DecryptCipher(algo, cur_share.key[:32], cur_share.key[-16:])
file = open(settings.STORAGE_ROOT + 'f_' + str(cur_file.id), 'rb')
temp_file = open(temp_file_path, 'wb+')
data = file.read(settings.READ_SIZE)
while data:
dec_data = decrypt.update(data)
temp_file.write(dec_data)
data = file.read(settings.READ_SIZE)
# Takes a dump right here!
# error in cipher operation (wrong final block length)
final_data = decrypt.finish()
temp_file.write(final_data)
file.close()
temp_file.close()
The above code opens a file, and (using the key for the current file share) decrypts the file and writes it to a temporary location (that will later be stuffed into a zip file).
My issue is on the file = open(settings.STORAGE_ROOT + 'f_' + str(cur_file.id), 'rb') line. Since windows cares a metric ton about binary files if I don't specify 'rb' the file will not read to end on the data read loop; however, for some reason since I am also writing to temp_file it never completely reads to the end of the file...UNLESS i add a + after the b 'rb+'.
if i change the code to file = open(settings.STORAGE_ROOT + 'f_' + str(cur_file.id), 'rb+') everything works as desired and the code successfully scrapes the entire binary file and decrypts it. If I do not add the plus it fails and cannot read the entire file...
Another section of the code (for downloading individual files) reads (and works flawlessly no matter the OS):
algo = CipherType('AES-256', 'CBC')
decrypt = DecryptCipher(algo, cur_share.key[:32], cur_share.key[-16:])
file = open(settings.STORAGE_ROOT + 'f_' + str(cur_file.id), 'rb')
filename = smart_str(cur_file.name, errors='replace')
response = HttpResponse(mimetype='application/octet-stream')
response['Content-Disposition'] = 'attachment; filename="' + filename + '"'
data = file.read(settings.READ_SIZE)
while data:
dec_data = decrypt.update(data)
response.write(dec_data)
data = file.read(settings.READ_SIZE)
# no dumps to be taken when finishing up the decrypt process...
final_data = decrypt.finish()
temp_file.write(final_data)
file.close()
temp_file.close()
Clarification
The cipher error is likely because the file was not read in its entirety. For example, I have a 500MB file I am reading in at 64*1024 bytes at a time. I read until I receive no more bytes, when I don't specify b in windows it cycles through the loop twice and returns some crappy data (because python thinks it is interacting with a string file not a binary file).
When I specify b it takes 10-15 seconds to completely read in the file, but it does it succesfully, and the code completes normally.
When I am concurrently writing to another file as i read in from the source file (as in the first example) if I do not specify rb+ it displays the same behavior as not even specifying b which is, that it only reads a couple segments from the file before closing the handle and moving on, i end up with an incomplete file and the decryption fails.
+seems to fix this, but I want to know why it does on windows and what is going on when it opens the file handle. The python docs in this area are quite weak.decrypt = DecryptCipher(algo, cur_share.key[:32], cur_share.key[-16:]), so is it still using thedecryptfrom the first section?'rb+'where'rb'should suffice...bit cycles through the loop twice, and then you say that withbbut not+it does the same thing as withoutb, and then you say that it reads a couple segments and then closes the handle. That's contradictory. And neither one makes any sense. The only difference betweenrandrbis thatrwill translate line endings (so each pair of bytes that matches 0D 0A will be replaced by a single byte 0A).