0

I am confused! Recently my webhotel updated php and now my old tables render special characters differently (wrongly). Both my tables and my input/output-php-pages are set to utf-8 and since this update, also the inputs from php are treated differently; now my special characters are being utf-8-encoded as they enter the database. So since this change, when I review tables within phpMyAdmin, the old inserts have the original (non-encoded) special characters - the new posts have utf-8-encoded charcters (also special).

So what I would like to do is rewrite input and output to insert and show non-encoded characters - but I am not sure if this is possible without skipping utf-8 entirely (in php and mySQL). But is there an utf-8- way to submit non-encoded characters?

AND - perhaps more fundamentally - I need to understand what the possible downsides are. I am using Danish characters in and out and I'm not going to use any other language (for this project). So if it IS possible to insert and output non-encoded characters using utf-8 - am I then going to have unexpected/destructive issues?

I have read a lot of posts regarding php/mySQL/special characters but I haven't seen this angle on the issue yet. Hope I am not duplicating I hope not because it has been working very nicely until the update.

2
  • If you have a DB for testning, I would try mb_convert_encoding. I would recommend to only try this in the test-DB before you know it works.. Commented Mar 6, 2015 at 16:48
  • I don't have a testing db - but possbily I might need one for this reason. Undecided yet. But thanks Commented Mar 6, 2015 at 19:19

1 Answer 1

3

Even if you are using only Danish characters, you may as well go utf8 all the way.

There are many places where the encoding needs to be stated:

  • The at the top of the html
  • The columns in the database (column CHARACTER SET defaults from table, which defaults from database)
  • The encoding in your PHP code.

When you CREATE TABLE, tack on DEFAULT CHARACTER SET utf8. If you have existing tables, without that, speak up; we may need to deal with them. If you want Danish collation, the specify COLLATION utf8_danish_ci, too. Then (if I recall correctly), aa will sort after z. (The default is utf8_general_ci, which won't do that sorting.) Figure out what encoding you have (or can get) in your php code. If you have some text with accents in it, do this:

$hex = unpack('H*', $text);
echo implode('', $hex)

If you have utf8, å will be C3A5, for latin1 it will be E5.

Regardless of what encoding in in the tables, you must call set_charset('utf8') or set_charset('latin1') depending on what encoding is in the data in PHP. MySQL will gladly transcode between latin1 and utf8 as things are passed between PHP and MySQL. For different APIs:

⚈  mysql: mysql_set_charset('utf8');
⚈  mysqli: $mysqli_obj->set_charset('utf8');
⚈  PDO: $db = new PDO('dblib:host=host;dbname=db;charset=UTF-8', $user, $pwd);

For much more info, see http://mysql.rjweb.org/doc.php/charcoll .

Sign up to request clarification or add additional context in comments.

7 Comments

With utf8_danish_ci, these sort after z, in the clumps shown: Ä=Æ=ä=æ Ö=Ø=ö=ø Aa=Å=å Þ=þ
Well, what I am really asking is; is there a way to store the actual special characters in db using utf-8. What I am getting now is "æ" instead of æ, "Ø" instead of "Ø", etc. This seems stupid to me; I am getting different special characters inserted into db when I would rather have "my own" special characters inserted. As I see it you are guiding me to work with (and accept) the utf-encoded char's but I would only like to do that IF I AM CONVINCED THAT IT SERVES A FEASIBLE/REASONABLE PURPOSE - OR IF IT IS UNAVOIDABLE?
(It is a common problem, and it is fixable.) The utf8 encoding for Ø is hex C398. But when that hex is interpreted as latin1, it comes out Ø. So, the problem is that PHP had bytes in one encoding, but the transmission to/from MySQL was assuming a different encoding. That inconsistency led to an error either on INSERTion or on SELECTing. Do SELECT HEX(col) ... to see what is in the table. Then we can pursue where the 'bug' lies. My blog covers the problem and too much more: mysql.rjweb.org/doc.php/charcoll . I have provided tidbits of it in this thread.
Sorry - haven't had much time to read up on your blog. But I intend to. So far, many thanks for elaborate answer. But perhaps you could answer me this: when I look into my database, via mySQL, and I see my desired special characters displaying correctly (those inserted before update) - are they actually formatted correctly (c398, etc.) and then presented the right way beacuse the phpMyAdmin-page is build the right way - and my own front end is flawed? OR! Am I supposed to see the utf-8-encoding when I check phpMyAdmin?
Right. I read up on the blog, but too many unknowns for me. When I investigated further on "SET NAMES" I saw several warnings about that and stumbled upon recommendations (stackoverflow-1650591) to use mysql_set_charset('utf8', $link). When I tried that something happened that seems to have affected ALL MY PHP-pages so the old inserts show correctly, and only the few recent look wrong (the false latino-special characters). Not sure if the problem is solved for good - but it's working at the moment...
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.