3

Hello i tried below code using regex

$str =  preg_replace("/[^a-z0-9_]/i", '', 'New_text % *');

//output => New_text

( _ is except char )

all work perfect but when my input string something like in other language(ex Hindi), char in Hindi Lang will also delete.

same as above example

$str =  preg_replace("/[^a-z0-9_]/i", '', 'कपिल शर्मा % * _');

//output => _

how to get: कपिल शर्मा _

is there any mistake in regex or any other way in PHP we can do?

3
  • 3
    Use \W instead of the full character range. Also add the u modifier. Commented Jun 24, 2016 at 12:16
  • That a-z does not cover Hindi letters should be obvious … those are Latin letters. Commented Jun 24, 2016 at 12:24
  • So, what exactly do you allow then? a-z0-9 is simple and specific. "And also Hindi" is very wide, vague and unspecific. What about Arabic, Japanese and other languages? Commented Jun 24, 2016 at 12:29

3 Answers 3

2

You need to use

'~[^\p{M}\w]+~u'

See the regex demo

It seems that PHP PCRE regex does not match combining marks with \W and /u modifier, so, we need to use the corresponding [^\w] negated character class and add a \p{M} Unicode property (combining marks) there.

See more on Unicode properties here.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Wiktor Stribiżew
2

Use unicode properties:

$str =  preg_replace("/[^\p{L}\p{N}\p{Z}_]/u", '', 'कपिल शर्मा % * _');

Where

  • \p{L} stands for any letter in any language
  • \p{N} stands for any digit in any language
  • \p{Z} stands for any kind of separator.
  • u flag for unicode

Documentation

7 Comments

it's give me Warning: preg_replace(): Compilation failed: unknown property name after \P or \p
@user3736833: add spaces in the character class, see my edit.
Come on, use '~\W+~u'. @Toto: \p{Z} does not match a tab character. Nor does it match vertical whitespace.
it's give me output कपल शरम _ which is not correct i need कपिल शर्मा _
@user3736833 what is पि? Maybe just add that in a character class? It doesn't appear to be a word character. e.g. maybe [^\wपि]+?
|
0

you can use filter_var

filter_var('your string &% * _',FILTER_SANITIZE_STRING | FILTER_FLAG_STRIP_HIGH);

or if you can smal symbol you can use str_replace

$arrayRequer = array('*','_','^','%');
str_replace($arrayRequer,'',$yourString);

2 Comments

If you can click downvote please, don't be afraid say me why?
Just because no one explained why, I will make up for it. I have got a similar downvote today. Even for a perfectly working and well-explained answer. That is not what SO must be: downvoting because you do not like something is definitely evil.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.