4

I have two strings with seemingly the same values. One is stored as a key in an array, the other a value in another different array. I compare the two using ==, ===, and strcmp. All treat them as different strings. I do a var_dump and this is what I get.

string(17) "Valentine’s Day" 
string(15) "Valentine's Day"

Does anyone have any idea why the first string would be 17 characters and the second 15?

Update: This is slightly more obvious when I pasted this out of my editor whose font made the two different apostrophe's almost indistinguishable.

1
  • 8
    looks like an issue with different character encodings, notice that the apostrophe is different in each case Commented Jan 3, 2011 at 21:23

3 Answers 3

8

The first string contains a Unicode character for the apostrophe while the second string just has a regular ASCII ' character.

The Unicode character takes up more space.

If you run the PHP ord() function on each of those characters you'll see that you get different values for each:

echo ord("’"); //226 This is just the first 2 bytes (see comments below for details from ircmaxell)
echo ord("'"); //27
Sign up to request clarification or add additional context in comments.

3 Comments

Bingo. It's all in the apostrophe.
Just to clarify a little, "takes up more space" means that that unicode character takes multiple bytes to represent. PHP is not currently multibyte safe. For that, there are mb_* functions.
Pedantic note: the 226 result is only the first of the 3 bytes that make up that character (226 is the starting block for a 3 byte block). So there's 2 more bytes that ord silently discards (which is why you see the extra 2 characters)...
1

As a complement to @Mark answer above which is right (the is a multi-byte character, most probably UTF-8, while ' is not). You can easily convert it to ASCII (or ISO-8859-1) using iconv, per example:

echo iconv('utf-8', 'ascii//TRANSLIT', $str);

Note: Not all characters can be transformed from multi-byte to ASCII or latin1. You can use //IGNORE to have them removed from the resulting string.

Comments

0

’ != '

mainly. if you want this not to be an issue, you could do something like this.

if (str_replace('’', '\'', "Valentine’s Day") == "Valentine's Day") {

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.