29

this is what i have right now

Drawing an RSS feed into the php, the raw xml from the rss feed reads:

Paul’s Confidence

The php that i have so far is this.

$newtitle = $item->title;
$newtitle = utf8_decode($newtitle);

The above returns;

Paul?s Confidence

If i remove the utf_decode, i get this

Paul’s Confidence

When i try a str_replace;

$newtitle = str_replace("”", "", $newtitle);

It doesnt work, i get;

Paul’s Confidence

Any thoughts?

3
  • 1
    In your first code-block you wrote ’ In your str_replace(), you wrote ” Is this affecting the results? Commented Jul 27, 2009 at 16:04
  • 4
    I would say the character encoding of the page you're trying to show the string on could be affecting your result... is the above output on any web page somewhere we could peek at for reference (when I test locally I don't get any funky output, just a single quote) Commented Jul 27, 2009 at 16:08
  • the feed is claygroup.org/blog/feed @sshow was a typo Commented Jul 27, 2009 at 16:31

14 Answers 14

27

This is my function that always works, regardless of encoding:

function RemoveBS($Str) {  
  $StrArr = str_split($Str); $NewStr = '';
  foreach ($StrArr as $Char) {    
    $CharNo = ord($Char);
    if ($CharNo == 163) { $NewStr .= $Char; continue; } // keep £ 
    if ($CharNo > 31 && $CharNo < 127) {
      $NewStr .= $Char;    
    }
  }  
  return $NewStr;
}

How it works:

echo RemoveBS('Hello õhowå åare youÆ?'); // Hello how are you?
Sign up to request clarification or add additional context in comments.

4 Comments

This does not preserve UTF8 encoding.
I don't believe it was supposed to - that said, you can encode to UTF-8 afterwards or modify the function for your own needs!
There is so much docs out there talking about specifying character encoding to utf-8......but this is the only thing that really worked for me! I will read up on the ord function. Thanks!!!
All hail RemoveBS()! Worked for me!
24

Try this:

$newtitle = html_entity_decode($newtitle, ENT_QUOTES, "UTF-8")

If this is not the solution browse this page https://www.php.net/manual/en/function.html-entity-decode.php

Comments

16

This will remove all non-ascii characters / special characters from a string.

//Remove from a single line string
$output = "Likening ‘not-critical’ with";
$output = preg_replace('/[^(\x20-\x7F)]*/','', $output);
echo $output;
 
//Remove from a multi-line string
$output = "Likening ‘not-critical’ with \n Likening ‘not-critical’ with \r Likening ‘not-critical’ with. ' ! -.";
$output = preg_replace('/[^(\x20-\x7F)\x0A\x0D]*/','', $output);
echo $output;

Comments

10

I solved the problem. Seems to be a short fix rather than the larger issue, but it works.

$newtitle = str_replace('’', "'", $newtitle);

I also found this useful snippit that may help others with same problem;

<?
$find[] = '“'; // left side double smart quote
$find[] = 'â€'; // right side double smart quote
$find[] = '‘'; // left side single smart quote
$find[] = '’'; // right side single smart quote
$find[] = '…'; // elipsis
$find[] = '—'; // em dash
$find[] = '–'; // en dash

$replace[] = '"';
$replace[] = '"';
$replace[] = "'";
$replace[] = "'";
$replace[] = "...";
$replace[] = "-";
$replace[] = "-";

$text = str_replace($find, $replace, $text);
?>

Thanks everyone for your time and consideration.

3 Comments

This fails to work for a Linux box however as the 'php' file's encoding could be different rendering the special characters useless. Just an FYI.
Yeah this is not working for me. What is the workaround for this?
You need to put the 'â€' (right side double smart quote) at the end of the array or it's going to match anything starting with â€.
7

Yeah this is not working for me. What is the workaround for this? – vaichidrewar Mar 12 at 22:29

Add this to the HTML head (or modify if already there):

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

This will encode the funny chars like "“" into UTF-8 so that the str_replace() function will interpret them properly.

Or you can do this:

ini_set('default_charset', 'utf-8');

Comments

2

Is the character encoding setting for your PHP server something other than UTF-8? If so, is there a reason or could it be changed to UTF-8? Though we don't store data in UTF-8 in our database, I've found that setting the webserver's character set to UTF-8 seems to help resolve character set issues.

I'd be interested in hearing others' opinions about this... whether I'm setting myself up for problems by setting webserver to UTF-8 while storing submitted data in Latin1 in our mysql database. I know there was a reason I chose Latin1 for the database but can't recall what it was. Interestingly, our current setup seems to allow for non-UTF-8 character entry and subsequent rendering... it seems that storing in Latin1 doesn't prevent subsequent decoding and display of all UTF-8 characters?

Comments

2

Use the below PHP code to remove

html_entity_decode(mb_convert_encoding(stripslashes($name), "HTML-ENTITIES", 'UTF-8'))

Comments

1

Read up on http://us.php.net/manual/en/function.html-entity-decode.php

That & symbol is a html code so you can easily decode it.

Comments

1

Super simple solution is to have the characters decoded when the page is loaded

Simply copy/paste the following at the beginning of the script

 header('Content-Type: text/html; charset=UTF-8');

 mb_internal_encoding('UTF-8');  
 mb_http_output('UTF-8'); 
 mb_http_input('UTF-8');  
 mb_regex_encoding('UTF-8');

Reference: http://php.net/manual/en/function.mb-internal-encoding.php comment left by webfav at web dot de

Comments

1

Many Strange Character be removed by applying mysqli_set_charset($con,"utf8"); below the mysql connection code.

but in some circumstances of removing this type strange character like â€

we need to use: $title = ' Stefen Suraj'; $newtitle = preg_replace('/[^(\x20-\x7F)]*/','', $title); echo $newtitle;

Output will be: Stefen Suraj

1 Comment

mysqli_set_charset($con,"utf8"); is helpful
0

It does not work You need to use $arr1 = str_split($str) then foreach and echo($arr1[$k]) This will show you exactly which characters are written into the string.

Comments

0
Please Try this. 


$find[] = '/&acirc;&#128;&#156;/' //'“'; // left side double smart quote
$find[] = '/&acirc;&#128;&#157;/' //'â€'; // right side double smart quote
$find[] = '/&acirc;&#128;&#152;/' //'‘'; // left side single smart quote
$find[] = '/&acirc;&#128;&#153;/' //'’'; // right side single smart quote
$find[] = '/&acirc;&#128;&#133/'  //'…'; // elipsis
$find[] = '/&acirc;&#128;&#150;/' //'—'; // em dash
$find[] = '/&acirc;&#128;&#147;/' //'–'; // en dash

$replace[] = '&ldquo;' // '"';
$replace[] = '&rdquo;' // '"';
$replace[] = '&lsquo;' // "'";
$replace[] = '&rsquo;' // "'";
$replace[] = '&#8943;' // "...";
$replace[] = '&mdash;' // "-";
$replace[] = '&ndash;' // "-";

$text = str_replace($find, $replace, $text);

Comments

0

1.The order of the strings in the $find array is significant. 2.This string "‘" should contain a tilde and look like three characters. If I save the .php file with my Genie editor it gits changed to just two characters "â€". 3.This is a useful reference https://www.i18nqa.com/debug/utf8-debug.html

<?php
$text = "‘’“â€1‘ 2’ 3â€â€œâ€™â€˜ 4’ 5 6 7’ ‘, ’, “, â€â€˜";
echo($text . "<br>");
$find = array("‘", "’", "“", "â€");
$replace = array("‘", "’", "“", "”");
$text = str_replace($find, $replace, $text);
echo($text);
?>

Comments

-1

Just one simple solution.

if your string contains these type of strange chars suppose $text contains some of these then just do as shown bellow:

$mytext=mb_convert_encoding($text, "HTML-ENTITIES", 'UTF-8')

and it will work..

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.