2

I have problem with a special character §. I want to replace multiple occurrences of § with single §. The following regex works fine on Regex 101.

$file_data = file_get_contents($file_name);
$file_data = preg_replace('/\§+/g', '§',$file_data);

It changed

§§§§§§§§§This free 3D robot game could redefine how kids learn to codeDigital Trends It’s hard to get kids to code. Up until very recently, it was largely ....

to

§This free 3D robot game could redefine how kids learn to codeDigital Trends It’s hard to get kids to code. Up until very recently, it was largely ....

However, it is not working on the server after I upload it. Here is the var_dump($file_data) by PHP

§§§§§§§§ This free 3D robot game could redefine how kids learn to codeDigital Trends It’s hard to get kids to code. Up until very recently, it was largely ....

So, there seems to be an additional character  before every § in the var_dump. The extra character  does not show up on webpage when echoed as HTML. It just shows up during plain PHP var_dump. How can I replace multiple occurrences of § using regex in PHP?

6
  • 1
    I would start by removing the g modifier since it doesn't exist in php regex. My first guess would be to try the u modifier: /§+/u. Have fun Commented Nov 15, 2015 at 11:13
  • 1
    Second guess: make sure to use utf-8 in your html document or send a header beforehand to define the type: header('Content-Type: text/html; charset=utf-8'); Commented Nov 15, 2015 at 11:16
  • 1
    @HamZa Thank you. It was working on regex101 so I thought it would work on server too. I would see if it solves the problem. Commented Nov 15, 2015 at 11:17
  • @HamZa The document is a .dat file with HTML content. Commented Nov 15, 2015 at 11:17
  • 1
    @HamZa It is working now. What you said about using u was correct. If you write it as an answer I will accept it. Otherwise, I will accept the other answer which does what your comment says. Commented Nov 15, 2015 at 11:35

2 Answers 2

2

You will need to set the u (utf-8) modifier:

From perlre documentation:

/u means to use Unicode rules when pattern matching. On ASCII platforms, this means that the code points between 128 and 255 take on their Latin-1 (ISO-8859-1) meanings (which are the same as Unicode's)....

$output = preg_replace('/§+/u', '§', $input);
                         // ^ 
Sign up to request clarification or add additional context in comments.

Comments

0
$str="§§§§§§§§§This free 3D robot game could redefine how kids learn to codeDigital Trends It's hard to get kids to code. Up until very recently, it was largely ....";
$pttn='@\§{2,}@um';
echo preg_replace( $pttn,'§',$str );

/* will output */
/*
   §This free 3D robot game could redefine how kids learn to codeDigital Trends It's hard to get kids to code. Up until very recently, it was largely .... 
*/

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.