14

I am trying to build a regex function that will remove any non alpha numeric characters and remove all duplicate characters e.g. this : aabcd*def%gGGhhhijkklmnoP\1223 would become this : abcddefgGhijklmnoPR3. I am able to remove the special characters easily but can't for the life of me work out how to remove the duplicate characters ? This is my current code for removing the special characters :

var oldString = aabcd*def%gGGhhhijkklmnoP\122
var filtered = oldStringt.replace(/[^\w\s]/gi, ""); 

How can I extend the above regex to check for duplicate characters and those duplicate characters separated by non-alphanumeric characters.

4 Answers 4

35

The regex is /[^\w\s]|(.)\1/gi

Test here: http://jsfiddle.net/Cte94/

it uses the backreference to search for any character (.) followed by the same character \1

Unless by "check for duplicate characters" you meant that aaa => a

Then it's /[^\w\s]|(.)(?=\1)/gi

Test here: http://jsfiddle.net/Cte94/1/

Be aware that both regexes don't distinguish between case. A == a, so Aa is a repetition. If you don't want it, take away the i from /gi

Sign up to request clarification or add additional context in comments.

9 Comments

thank you so much is there anyway i can ensure that duplicates are removed that are separated by non-alphanumeric characters.
@jonathanp Make it two Regexes (one to remove non-alphanumeric and one to remove duplicated). It's useless to make uber-complex Regexes, especially if you then have to handle/modify them
@jonathanp var filtered = oldString.replace(/[^\w\s]/g, "").replace(/(.)(?=\1)/gi, "");
@Pascalius Yep. Because the question didn't require it. The question explicitly used \w, that is a-zA-Z0-9.
@Faks /[^\w\s]|(.)(?=\1\1)/gi will remove only if there are at least three aaa, and will leave aa alone
|
5

\1+ is the key

"aabcdd".replace(/(\w)\1+/g, function (str, match) {
    return match[0]
}); // abcd

Comments

2

Non regex version:

var oldString = "aabcd*def%gGGhhhijkklmnoP\122";
var newString = "";

var len = oldString.length;
var c = oldString[0];
for ( var i = 1; i < len; ++i ) {
  if ( c != oldString[i] ) {
    newString += c;
  }
  c = oldString[i];
}

1 Comment

Hi that does a great job of removing the duplicated characters so would you suggest runing my regex first then executing the above ?
1

short and simple input=Brahmananda output:Brahmnd ref:http://jsfiddle.net/p7yu8etz/5/

var str = "Brahmananda";
var reg = /(.)(.*?)(\1)/g;
while (reg.test(str))
str = str.replace(reg, "$1$2");
$("div").text(str);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div><div>

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.