1

Hello everyone I want to ask about memory utilization and time required for a process. I have these following code. I want to optimize my code so that it will be faster. String will take more memory any alternative for that?

public String replaceSingleToWord(String strFileText) {

    strFileText = strFileText.replaceAll("\\b(\\d+)[ ]?'[ ]?(\\d+)\"", "$1 feet $2  ");
    strFileText = strFileText.replaceAll("\\b(\\d+)[ ]?'[ ]?(\\d+)''", "$1 feet $2     inch");

    //for 23o34'
    strFileText = strFileText.replaceAll("(\\d+)[ ]?(degree)+[ ]?(\\d+)'", "$1 degree $3 second");

    strFileText = strFileText.replaceAll("(\\d+((,|.)\\d+)?)sq", " $1 sq");

    strFileText = strFileText.replaceAll("(?i)(sq. Km.)", " sqkm");
    strFileText = strFileText.replaceAll("(?i)(sq.[ ]?k.m.)", " sqkm");
    strFileText = strFileText.replaceAll("(?i)\\s(lb.)", " pound");
    //for pound
    strFileText = strFileText.replaceAll("(?i)\\s(am|is|are|was|were)\\s?:", "$1 ");
    return strFileText;
}

I think it will take more memory and time I just want to reduce the complexity.I just want reduce time and memory for process what changes i need to do.is there any alternative for replaceAll function? How this code i will minimize? so that my get faster and with low memory utilization? Thank you in advanced

6
  • Consider changing the topic to sth like "optimizing regex performance", "how to make regex file processing faster", etc. Your current title says very little about what you are actually asking about. Commented Oct 14, 2013 at 12:39
  • @Dariusz StringUtil.ReplaceEach() is useful for me ?for above code? Commented Oct 14, 2013 at 12:49
  • StringUtils is non-standard. Do you mean apache-commons? That does not use regexes. Commented Oct 14, 2013 at 12:52
  • @Dariusz Yes instead of writing regex can i use StringUtil.ReplaceEach() ? Commented Oct 14, 2013 at 12:55
  • Your last replacement is garbeld. Commented Oct 14, 2013 at 12:57

4 Answers 4

3

Optimization methods:

  • use Pattern.compile() for each replace. Create a class, make patterns fields, and compile the patterns only once. That way you will save a lot of time, since regex compile takes place each time you call replaceAll() and it is a very costly operation
  • use non-greedy regexes. Instead of (\\d+) use (\\d+?).
  • try to not use regexes if possible (lb.->pound)?
  • merging several regexes with the same substitutions into one - applicable to your sqkm or feet replaces
  • you could try to base your api on StringBuilder; then use addReplacement to process your text.

Moreover a dot in many of your replaces is unescaped. Dot matches any character. Use \\..

Class idea:

class RegexProcessor {
  private Pattern feet1rep = Pattern.compile("\\b(\\d+)[ ]?'[ ]?(\\d+)\"");
  // ...

  public String process(String org) {
    String mod = feet1rep.match(org).replaceAll("$1 feet $2  ");
    /...
  }
}
Sign up to request clarification or add additional context in comments.

7 Comments

what is difference between (\\d+)and (\\d?+) any example please.
But \\d?+ is no replacement for \\d+. It should be \\d++, right?
@maaartinus \\d+?, I had a typo, corrected it some time ago
I see that you're corrected it, but somehow I thought you wanted possessive, which IMHO makes more sense.
|
1

The StringBuffer and StringBuilder classes are used when there is a necessity to make a lot of modifications to Strings of characters.

Unlike Strings objects of type StringBuffer and Stringbuilder can be modified over and over again with out leaving behind a lot of new unused objects.

The StringBuilder class was introduced as of Java 5 and the main difference between the StringBuffer and StringBuilder is that StringBuilders methods are not thread safe(not Synchronised).

It is recommended to use StringBuilder whenever possible because it is faster than StringBuffer. However if thread safety is necessary the best option is StringBuffer objects.

public class Test{

    public static void main(String args[]){
       StringBuffer sBuffer = new StringBuffer(" test");
       sBuffer.append(" String Buffer");
       System.ou.println(sBuffer);  
   }
}




public class StringBuilderDemo {
    public static void main(String[] args) {
        String palindrome = "Dot saw I was Tod";

        StringBuilder sb = new StringBuilder(palindrome);

        sb.reverse();  // reverse it

        System.out.println(sb);
    }
}

so according to your need you cal select one of tham.

Reference http://docs.oracle.com/javase/tutorial/java/data/buffers.html

Comments

1

Use precompiled Pattern and a loop just like Joop Eggen suggested. Group your expressions together. For example, the first two can be written like

`"\\b(\\d++) ?' ?(\\d+)(?:''|\")"`

You can go much further at the expense of readability loss. A single expression for all your replacements is possible, too.

`"\\b(\\d++) ?(?:' ?(?:(\\d+)(?:''|\")|degree ?(\\d++)|...)"`

Then you need to branch on conditions like group(2) == null. This gets very hard to maintain, but with a single loop and cleverly written regex you'll win the race. :D


what will be the regex for words like can't -> canot, shouldn't -> should not etc.

It depends how exact you want to be. The most trivial way is s.replaceAll("\\Bn't\\b", " not"). The above optimizations apply, so don't ever use replaceAll when performance matters.

A general solution could go like this

Pattern SHORTENED_WORD_PATTERN =
    Pattern.compile("\\b(ca|should|wo|must|might)(n't)\\b");

String getReplacement(String trunk) {
    switch (trunk) { // needs Java 7
        case "wo": return "will not";
        case "ca": return "cannot";
        default: return trunk + " not";
    }
}

... relevant part of the replacer loop (see [replaceAll][])

    while (matcher.find()) {
        matcher.appendReplacement(result, getReplacement(matcher.group(1)));
    }

what should i do in case of strFileText = strFileText.replace("á", "a"); strFileText = strFileText.replace("’", "\'"); strFileText = strFileText.replace("â€Â", "\'"); strFileText = strFileText.replace("ó", "o"); strFileText = strFileText.replace("é", "e"); strFileText = strFileText.replace("á", "a"); strFileText = strFileText.replace("ç", "c"); strFileText = strFileText.replace("ú", "u"); if i want to write this in one line or other way replaceEach() is better for that case

If you go for efficiency note that all the above string starts with the same character Ã. A single regex could like á|’"|... is much slower than Ã(ƒÂƒÃ‚¡|¢Â€Â™"|...) (unless the regex engine can optimize it itself, which is currently not the case).

So write a regex where all common prefixes are extracted and use

String getReplacement(String match) {
    switch (match) { // needs Java 7
        case "á": return "a";
        case "’"": return "\\";
        ...
        default: throw new IllegalArgumentException("Unexpected: " + match);
    }
}

and

    while (matcher.find()) {
        matcher.appendReplacement(result, getReplacement(matcher.group()));
    }

Maybe a HashMap might be faster than the switch above.

3 Comments

what will be the regex for words like can't -> canot, shouldn't -> should not etc.
what should i do in case of strFileText = strFileText.replace("á", "a"); strFileText = strFileText.replace("’", "\'"); strFileText = strFileText.replace("â€Â", "\'"); strFileText = strFileText.replace("ó", "o"); strFileText = strFileText.replace("é", "e"); strFileText = strFileText.replace("á", "a"); strFileText = strFileText.replace("ç", "c"); strFileText = strFileText.replace("ú", "u"); if i want to write this in one line or other way replaceEach() is better for that case
@Aditya: Isn't it time for a new question? My answer is overlong already. :D
0

The regex patterns can be improved at spots_ [,.] or ? (instead [ ]?).

Use compiled static final Pattern s outside the functions.

private static final Pattern PAT = Pattern.compile("...");


StringBuffer sb = new StringBuffer();
Matcher m = PAT.matcher(strFileText);
while (m.find()) {
    m.appendReplacement(sb, "...");
}
m.appendTail(sb);
strFileText = sb.toString();

Optimisable with first testing if (m.find) before doing a new StringBuffer.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.