3

I'm beginner in SQL. I would like to import a CSV file with Japanese text to a PostgreSQL table. I created a table and tried importing the CSV but this error exit:

ERROR:  invalid byte sequence for encoding "UTF8": 0x8c
CONTEXT:  COPY tTokyoDir, line 1

********** Error **********

ERROR: invalid byte sequence for encoding "UTF8": 0x8c
SQL state: 22021
Context: COPY tTokyoDir, line 1

Can anyone help?

3
  • Can you also post your import command? Commented Sep 6, 2013 at 2:27
  • 1
    It has been a long time since I did anything with SQL, but it sounds like your table is UTF-8 encoded while your data is EUC-JP or perhaps Shift-JIS. Re-save your file as UTF-8 encoded rather than your local encoding, whatever that may be. Commented Sep 6, 2013 at 3:09
  • As @ChronoKitsune said, unless the data is supposed to be UTF-8, you'll need to convert it to UTF-8 from its current code set (maybe using iconv) before importing it. Or you will have to set up the database so that it expects the correct Japanese code set (but then it won't be able to store text from other languages such as Hebrew, Arabic or Russian that use letters not found in Japanese). If the data is supposed to be UTF-8, then there's a bug in the code generating the 'UTF-8' data; you can't have a byte 0x8C (a continuation byte) as the first byte of a character, but that's what you have. Commented Sep 6, 2013 at 6:54

1 Answer 1

1

You need to identify the encoding of the CSV file, since it's not utf-8.

See How to auto detect text file encoding? if you need help with that.

As said in the comments, EUC-JP and Shift-JIS are plausible encodings for japanese, both are supported by postgres.

Then instruct the server to expect that encoding for the duration of the import.

For example:

SET client_encoding TO 'EUC-JP';
COPY table_name FROM 'file.csv' CSV;
SET client_encoding TO default;

This method converts the data on the fly, it's the simplest way and works for any PostgreSQL version.

If you use 9.1 or a more recent version, COPY has an ENCODING argument that makes it a one-liner:

COPY table_name FROM 'file.csv' CSV ENCODING 'EUC-JP';
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.