0

I am reading a Hexadecimal binary file. I need to remove bytes after seek command to specific location. Below code is reading binary file. But i don't know how to remove 4 bytes in middle of file.

 import os
 import struct

 with open("esears36_short.dat", "rb") as f:
    data = f.read(2)
    number = struct.unpack(">h", data)[0]
    f.seek(number, 1)
    #need to code to remove 4 bytes

I need to execute this code in loop until EOF. Remove 4 bytes after every n bytes specfied in number field.

Value of number field in this case : 28045

Please help!

4
  • So do you want to delete the byte of the file in the position 28047 to 28051? Commented Apr 29, 2020 at 4:37
  • yes. Then seek 28045 bytes and delete bytes from 56094 to 56097 and so on Commented Apr 29, 2020 at 4:43
  • You want to move everything forward 4 bytes and thus make the file 4 bytes smaller? This is easier to do if you write a new smaller file. Commented Apr 29, 2020 at 4:44
  • Is it possible to edit in the same file itself because copying large file to another file multiple times will take lot of time. Commented Apr 29, 2020 at 4:49

1 Answer 1

1

To remove 4 bytes you have to copy the remaining file forward 4 bytes and that can be messy as you are reading and writing buffers in the same file. Its easier to write a new file and rename. In that case, you just seek ahead 4 bytes as needed.

import os
import struct

with open("esears36_short.dat", "rb") as f, open("esars32_short.dat.tmp", 'wb') as f_out:
    data = f.read(2)
    number = struct.unpack(">h", data)[0]
    f.seek(2, 1)
    while True:
        buf = f.read(number)
        if not buf:
            break
        f_out.write(buf)
        f.seek(4, 1) # 4 bytes forward
os.remove("esears36_short.dat")
os.rename("esars32_short.dat.tmp", "esears36_short.dat")

Although you are writing a new file you are doing less actual copying.

Sign up to request clarification or add additional context in comments.

6 Comments

Sorry If my question was not clear. I need to remove 4 bytes after every n bytes specfied in number field until EOF. In this case, 28047 to 28051 and 56094 to 56097 and so on.
@Arvinth - does the counting start at the beginning of the file or after number has been read? Should the first 2 bytes be in the output file? And number is only read once in the first 2 bytes?
Actually the counting starts beginning of file. first 2 bytes should not be in output file. number is present after every 28045 bytes. But number value stays the same.
Okay, current rev starts "number" writes starting from 2 then skips 4 til eof.
Not sure if that should have been 4 at the front or just the 2 for the number value.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.