Note: This issue is specific to Microsoft Excel. It works fine on LibreOffice without any problems.
I am developing a React application where I need to let the user download some table data upon a button click. On the download, I generate a UTF-16 encoded CSV file.
export const stringToUTF16Blob = (content: string): Blob => {
// Initialize array with BOM for UTF-16-LE
const charCodes = [0xff, 0xfe];
// Convert string to UTF-16-LE in one step
for (let i = 0; i < content.length; i++) {
const charCode = content.charCodeAt(i);
// Push the low byte first, then high byte
charCodes.push(charCode % 256, Math.floor(charCode / 256) % 256);
}
return new Blob([new Uint8Array(charCodes)], { type: 'text/csv;charset=utf-16le;' });
};
When I generate this file. It can render the Unicode characters when we do a double click. But it doesn't split into lines correctly.
I can add the separator file string to the file:
const separatorString ='"sep=,"';
(The content value that I'm passing to the stringToUTF16Blob
function in this code consists of separatorString + headers + content.)
Once I add it, it can split by comma correctly. But it doesn't consider Unicode characters. I think it just renders its ASCII representation.
Here is how my CSV file looks like when opened with text editor (you can see the encoding as UTF-16LE):
I tried with utf-8 with no luck, found out that Excel support is better when we use utf-16. indeed it made the Unicode charactors to work without the separator string.
In Europe (and many, many other countries)
,` is the decimal separator so;
is used as the field separator..
is the decimal separator around the PacificCSV
is not a well defined format and applications used delimited files with thecsv
extension for decades (since the 1970s) before RFC4180 was introduced in 2005. And even now, all kinds of applications will create CSVs by just writing formatted decimals to text files without considering the separator.Sep=^
isn't standard, it's only an application trick used to handle the different formats one can encounter12045.3456
in Brazil, Germany, South Africa, where,
is the decimal separator? Or Canada where BOTH separators are used in different territories? The application should detect the client's desired language fromAccept-Language
or ask the user what to use.