3

I'm comparing a string from the database to a list of strings in an array:

if (in_array($entry, array('Söme string', 'other-string')))

This works for other-string, but not for Söme string, the main difference beeing that this string has an umlaut and an html entity in it. If $entry is Söme string in the database, the comparison fails, even though it should be the same string.

I also tried strcmp and direct comparison using === and ==, but the comparison is always negative. I also tried utf8_encode before comparison, but that did nothing.

The database is using UTF-8, I fetch the data using Drupal API functions and my php file is also UTF-8 encoded. If I print $entry and Söme string to the output HTML, they are indistinguishable.

Any idea what could be causing this behaviour?

Update

Thanks for the help. It seems the   is converted on the way and is stored as a real non-breaking space in the database, not as an HTML entity. Printing it converts it back to an HTML entity (or maybe Firebug does that when I look at it).

The output of var_dump() (using print function, taken from resulting html source):

$entry: string(14) "Söme string"

"Söme string": string(18) "Söme string"

(I've edited the string as the real one contains a name)

Update 2

I've changed the string to "Some string" and here's the output of

var_dump(bin2hex($entry));
var_dump(bin2hex('Some string'));

$entry: string(24) "536f6d65c2a0737472696e67"
"Some string": string(32) "536f6d65266e6273703b737472696e67"
6
  • What encoding does the connection have? Commented Sep 8, 2010 at 8:29
  • what encoding does the PHP file have? (the one which defines the hard-coded string), or whatever source you use for comparison. Commented Sep 8, 2010 at 8:32
  • @Gumbo Drupal uses UTF-8 everywhere, so I'm pretty sure the connection is also using UTF-8 Commented Sep 8, 2010 at 8:34
  • @Alexander The php file is UTF-8 encoded. Commented Sep 8, 2010 at 8:35
  • Is the form where you enter $entry UTF-8 encoded as well? Commented Sep 8, 2010 at 8:41

2 Answers 2

4

Then the strings are not the same. Perhaps:

  • $entry has an actual space instead of a non-breaking space.
  • One has the HTML entity   while the other has an actual non-breaking space.
  • In one of the scripts the character ö is decomposed and in the other it isn't.

Try to var_dump the array and $entry.

Sign up to request clarification or add additional context in comments.

Comments

0

The problem was that $entry contained a UTF-8 encoded non-breaking space (0xc2a0). Just calling html_entities on it did not work, because I did not specify the charset. So my solution is the following:

htmlentities($entry, ENT_QUOTES, 'UTF-8')

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.