0

Note: This issue is specific to Microsoft Excel. It works fine on LibreOffice without any problems.

I am developing a React application where I need to let the user download some table data upon a button click. On the download, I generate a UTF-16 encoded CSV file.

export const stringToUTF16Blob = (content: string): Blob => {
  // Initialize array with BOM for UTF-16-LE
  const charCodes = [0xff, 0xfe];

  // Convert string to UTF-16-LE in one step
  for (let i = 0; i < content.length; i++) {
    const charCode = content.charCodeAt(i);
    // Push the low byte first, then high byte
    charCodes.push(charCode % 256, Math.floor(charCode / 256) % 256);
  }

  return new Blob([new Uint8Array(charCodes)], { type: 'text/csv;charset=utf-16le;' });
};

When I generate this file. It can render the Unicode characters when we do a double click. But it doesn't split into lines correctly.

I can add the separator file string to the file:

const separatorString ='"sep=,"';

(The content value that I'm passing to the stringToUTF16Blob function in this code consists of separatorString + headers + content.)

Once I add it, it can split by comma correctly. But it doesn't consider Unicode characters. I think it just renders its ASCII representation.

Here is how my CSV file looks like when opened with text editor (you can see the encoding as UTF-16LE):

csv file when opened with the text editor

I tried with utf-8 with no luck, found out that Excel support is better when we use utf-16. indeed it made the Unicode charactors to work without the separator string.

6
  • No it's not specific to Excel at all and not about Unicode. You aren't loading a CSV file, you're importing it. When you double-click on a text file Excel has no idea what encoding or separator is used so it uses the user's locale. If the file has a BOM (including UTF8 files), that will be used to load the text. The field separator though will always be determined by the user's locale and specifically, the List Separator setting . In Europe (and many, many other countries) ,` is the decimal separator so ; is used as the field separator. . is the decimal separator around the Pacific Commented Sep 3 at 11:24
  • CSV is not a well defined format and applications used delimited files with the csv extension for decades (since the 1970s) before RFC4180 was introduced in 2005. And even now, all kinds of applications will create CSVs by just writing formatted decimals to text files without considering the separator. Sep=^ isn't standard, it's only an application trick used to handle the different formats one can encounter Commented Sep 3 at 11:33
  • Arguably, that's a bug of the React application and possibly a serious one. What happens when saving the number 12045.3456 in Brazil, Germany, South Africa, where , is the decimal separator? Or Canada where BOTH separators are used in different territories? The application should detect the client's desired language from Accept-Language or ask the user what to use. Commented Sep 3 at 11:39
  • @panagiotis, that means there is nothing we can do programmatically to handle Excel not using Unicode when clicking to open a CSV. I'm using LibreOffice, and it always opens the import wizard on CSV double-click open. I am currently looking for a way to I can tell Excel to open the import wizard upon double click on a CSV. Commented Sep 3 at 11:51
  • That means you've made incorrect assumptions. And OpenOffice/LibreOffice didn't even handle Unicode (much less localized settings) at first. I had some very, very interesting issues with it in 2001. You're comparing two programs using different heuristics - Excel tries to use the user's locale (users would be very cross if 101234.456 was misread) while Libre just assumes US settings and let the user ask for a refund Commented Sep 3 at 11:56

1 Answer 1

0

This isn't specific to Excel, and not about Unicode either. CSV isn't a well defined format. Delimited files appeared in the 1970s but the first time there was any kind of "standard", was RFC 4180 in 2005. And even that says nothing about number or date formats. The words don't even exist in the RFC.

Why you can't make assumptions

Different countries use different locales, with most not using . as the decimal separator, while Canada uses both . and ,. Never mind date formats. If you look at Wikipedia's world map in the Decimal separator page you'll realize that most countries don't use . while half of the people do (India and China use .. Also the US but that's only 350M).

When you double-click on a text file with the .csv extension you're actually importing a text file. The application has to decide what settings to use for separators, numbers and dates. Different applications will use different heuristics.

Excel was built to "just work" everywhere, with whatever files users had, and those files are most likely to use the country's formats. Users paid for this and want it to just work. Those formats are taken from the current user's Locale settings, specifically the Decimal, List separator and Date formats.

OpenOffice/LibreOffice on the other hand, didn't even handle Unicode at first (I had some "interesting" experiences in 2001) and decided to use commas. No idea what it does with dates.

Possible Solution: Create an Excel file directly

You can create real XLSX files in JavaScript (and many other languages) without having to install Excel or LibreOffice. An XLSX is a ZIP package containing well-defined XML files. There are many npm packages that can generate XLSX files from data, create pivot tables etc. For example, exceljs has 2M weekly downloads (!).

Based on the docs, you could save a list of rows to an Excel file with something like :

const workbook = new ExcelJS.Workbook();
const sheet = workbook.addWorksheet('My Sheet');

const rows = [
  [5,'Bob',new Date()], // row by array
  {id:6, name: 'Barbara', dob: new Date()}
];

const newRows = sheet.addRows(rows);

And save it either as Excel :

// write to a new buffer
const buffer = await workbook.xlsx.writeBuffer();

or CSV with settings :

// write to a file with European Date-Times
const workbook = new Excel.Workbook();
const options = {
  dateFormat: 'DD/MM/YYYY HH:mm:ss',
  dateUTC: true, // use utc when rendering dates
};
await workbook.csv.writeFile(filename, options);
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.