0

I am trying to combine close to 3k text files from a folder in python to dataframe. I have successfully combined the all the text file in one text file, however, when I try to read the file it keeps throwing an error.

ParserError: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead

Need your help around this.

file_list=glob.glob(r"C:\Users\E0565588\Documents\POS Downloaded Data\New folder\*.*")
with open("result.txt", "wb") as outfile:
    for f in file_list:
        with open(f, "rb") as infile:
            outfile.write(infile.read())
df = pd.DataFrame()
a=pd.read_csv('result.txt',delimiter=",",header=None,engine='python', names=["Duns ID","Invoice Number","Invoice Line Number","Salesperson Name","Customer Number","Customer Name","Address Line 1","Address Line 2","Address Line ","City","State/Province","Postal Code","Country Code","NAICS","Part Number","Invoice Price","Invoice Quantity","Unit of Measure","Invoice Date","Order Date","Ship Date","Require Date","Program Type","Rebated Location ID"])
df=df.append(a)

`

1 Answer 1

1

Short Answer:

Either use the default engine by omitting engine='python' in the read_csv call or replace the NULL bytes:

with open("result.txt", "w") as outfile: 
    for f in file_list: 
        with open(f, "r") as infile: 
            outfile.write(infile.read().replace("\0", ""))

(Also consider the removed b's - read below)

Long Answer:

I'm not sure why you use the "python" engine but you can fix the problem by using the default "c" engine. The later one handles NULL bytes without problems. NULL bytes are used to mark the end of a file. So when you concatenate your files, the NULL bytes end up in your combined file.

If you have to use the "python" engine, then you can replace the NULL bytes as shown above.

I would also recommend to not use the binary mode (b) in your read/write. This is only to reading and writing binary data and not text data like CSV.

Sign up to request clarification or add additional context in comments.

3 Comments

Hi Thanks for the help, however, after working on your advice I made the changes to the code but it threw me a new error:- UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 65572: character maps to <undefined>
@ZulfikarSKhan Are you running python 2 or 3?
@ZulfikarSKhan try to add encoding="utf-8" to your read_csv call.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.