1

I am reading a csv file that has about 50,000 lines and 1.1MiB in size (and can grow larger).

In Code1, I use String to process the csv, while in Code2 I use StringBuilder (only one thread executes the code, so no concurrency issues)

Using StringBuilder makes the code a little bit harder to read that using normal String class.

Am I prematurely optimizing things with StringBuilder in Code2 to save a bit of heap space and memory?

Code1

            fr = new FileReader(file);
            BufferedReader reader = new BufferedReader(fr);

            String line = reader.readLine();
                while ( line != null )
                {
                    int separator = line.indexOf(',');
                    String symbol = line.substring(0, seperator);
                    int begin = separator;
                    separator = line.indexOf(',', begin+1);
                    String price = line.substring(begin+1, seperator);

                    // Publish this update
                    publisher.publishQuote(symbol, price);

                    // Read the next line of fake update data
                    line = reader.readLine();
                 }

Code2

                    fr = new FileReader(file);
                    StringBuilder stringBuilder = new StringBuilder(reader.readLine());

                while( stringBuilder.toString() != null ) {
                    int separator = stringBuilder.toString().indexOf(',');
                    String symbol = stringBuilder.toString().substring(0, separator);
                    int begin = separator;
                    separator = stringBuilder.toString().indexOf(',', begin+1);
                    String price = stringBuilder.toString().substring(begin+1, separator);
                    publisher.publishQuote(symbol, price);

                    stringBuilder.replace(0, stringBuilder.length(), reader.readLine());
                }

Edit

I eliminated the toString() call, so there will be less string objects produced.

Code3

while( stringBuilder.length() > 0 ) {
                    int separator = stringBuilder.indexOf(",");
                    String symbol = stringBuilder.substring(0, separator);
                    int begin = separator;
                    separator = stringBuilder.indexOf(",", begin+1);
                    String price = stringBuilder.substring(begin+1, separator);
                    publisher.publishQuote(symbol, price);
                    Thread.sleep(10);
                    stringBuilder.replace(0, stringBuilder.length(), reader.readLine());
                }

Also, the original code is downloaded from http://www.devx.com/Java/Article/35246/0/page/1

1
  • Even without the toString call, Code1 will be more efficient. why? since String is immutable, the String.substring() call will share the underlying char[] (which is a bigger memory issue than the String class itself). The StringBuilder.substring() call must copy the char[] since the StringBuilder is mutable. So, Code3 involves more char copying and more char[] instantiating. Commented Apr 7, 2010 at 17:19

5 Answers 5

3

Will the optimized code increase performance of the app? - my question

The second code sample will not save you any memory nor any computation time. I am afraid you might have misunderstood the purpose of StringBuilder, which is really meant for building strings - not reading them.

Within the loop or your second code sample, every single line contains the expression stringBuilder.toString(), essentially turning the buffered string into a String object over and over again. Your actual string operations are done against these objects. Not only is the first code sample easier to read, but it is most certainly as performant of the two.

Am I prematurely optimizing things with StringBuilder? - your question

Unless you have profiled your application and have come to the conclusion that these very lines causes a notable slowdown on the execution speed, yes. Unless you are really sure that something will be slow (eg if you recognize high computational complexity), you definately want to do some profiling before you start making optimizations that hurt the readability of your code.

What kind of optimizations could be done to this code? - my question

If you have profiled the application, and decided this is the right place for an optimization, you should consider looking into the features offered by the Scanner class. Actually, this might both give you better performance (profiling will tell you if this is true) and more simple code.

Sign up to request clarification or add additional context in comments.

3 Comments

+1, the code seems to be misusing StringBuilder Implementation(1) is cleaner and possibly better.
@Jørn In code3 above I have removed the stringBuilder.toString() call, and this should reduced the number of string objects created. The reason code1 uses the functionality in String class instead of csv parser is to minimize the number of objects in the heap, as this code runs on java real-time vm devx.com/Java/Article/35246/0/page/1 Based on your experience, will Scanner class perform better?
@por: My experience with performance critical string operations in Java is not very thorough. If you really think it matters, you could try both in a tight loop and measure the throughput.
2

Am I prematurely optimizing things with StringBuilder in Code2 to save a bit of heap space and memory?

Most probably: yes. But, only one way to find out: profile your code.

Also, I'd use a proper CSV parser instead of what you're doing now: http://ostermiller.org/utils/CSV.html

2 Comments

+1 for recommending a real parser. The given code fails with quotes and commas as actual values.
+1 I am using ostermiller csv/excelcsv parser/printer in production environment, it is very nice.
1

Code2 is actually less efficient than Code1 because every time you call stringBuilder.toString() you're creating a new java.lang.String instance (in addition to the existing StringBuilder object). This is less efficient in terms of space and time due to the object creation overhead.

Assigning the contents of readLine() directly to a String and then splitting that String will typically be performant enough. You could also consider using the Scanner class.

Memory Saving Tip

If you encounter multiple repeating tokens in your input consider using String.intern() to ensure that each identical token references the same String object; e.g.

String[] tokens = parseTokens(line);
for (String token : tokens) {
  // Construct business object referencing interned version of token.
  BusinessObject bo = new BusinessObject(token.intern());
  // Add business object to collection, etc.
}

1 Comment

+1 Do note though, @portoalet, that string interning is not guarenteed to yield any additional performance. Do some profiling (at least with a stop watch) to check if it helps in your case or not :)
0

StringBuilder is usually used like this:

StringBuilder sb = new StringBuilder();
sb.append("You").append(" can chain ")
  .append(" your ").append(" strings ")
  .append("for better readability.");

String myString = sb.toString(); // only call once when you are done
System.out.prinln(sb); // also calls sb.toString().. print myString instead

Comments

0

StringBuilder has several good things

  • StringBuffer's operations are synchronized but StringBuilder is not, so using StringBuilder will improve performance in single threaded scenarios
  • Once the buffer is expanded the buffer can be reused by invoking setLength(0) on the object. Interestingly if you step into the debugger and examine the contents of StringBuilder you will see that contents are still exists even after invoking setLength(0). The JVM simply resets the pointer beginning of the string. Next time when you start appending the chars the pointer moves
  • If you are not really sure about length of string, it is better to use StringBuilder because once the buffer is expanded you can reuse the same buffer for smaller or equal size

StringBuffer and StringBuilder are almost same in all operations except that StringBuffer is synchronized and StringBuilder is not

If you dont have multithreading then it is better to use StringBuilder

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.