1

I'm implementing some code for my college and I have to sort two classes by its name. So, I started using Java's compareTo for Strings, but it wasn't doing it correctly. For example, I have these two names TEST-6 and TEST-10. But, the result was TEST-10 ahead of TEST-6.

I've searched and got this solution:

private int compare(String o1, String o2) {
    return extractInt(o1) - extractInt(o2);
}
private int extractInt(String s) {
    String num = s.replaceAll("\\D", "");
    // return 0 if no digits found
    return num.isEmpty() ? 0 : Integer.parseInt(num);
}

But my strings could assume any form. And when I tried this test: TEST-6 and TEST10) the result was TEST-6 ahead of TEST10, but what I expect is TEST10 then TEST-6.

The expected result should be normal string comparison, but comparing the full number when it is needed. So if substrings before numbers are equal, the number is compared, if not, keep string comparison. Or something like this:

TE
TES-100
TEST-1
TEST-6
TESTT-0
TEXT-2
109
8
  • 1
    Is it always a non-number followed by a positive integer? Can there be things like a1b2c3d4? Commented Oct 22, 2019 at 12:39
  • 2
    Also, why is 109 last? Since the non-number portion is an empty string, shouldn't it be the first? Commented Oct 22, 2019 at 12:40
  • You'd basically have to split the strings into numeric and non-numeric parts and compare those one by one. The numeric parts would have to be parsed to get a proper order though. Commented Oct 22, 2019 at 12:40
  • -6 is less than 10, hence the sort works as expected. Commented Oct 22, 2019 at 12:46
  • your compareTo could compare the two Strings that come before the digits (if a string does not have any digits then it takes the whole string for comparison). So: a= "foo1bar" and b="foo2" would result into "foo" being compared with "foo". When the two strings are already ordered now, then fine, but when they are equal (like "foo=foo") then you compare their digits 1 comparedTo 2 -this will result in foo1bar being before foo2.. and so on, you have to compare and convert/compare all substrings until you get an order. Commented Oct 22, 2019 at 12:52

3 Answers 3

4

You can do something like that:

list.sort(Comparator.comparing(YourClass::removeNumbers).thenComparing(YourClass::keepNumbers));

These are two methods:

private static String removeNumbers(String s) {
    return s.replaceAll("\\d", "");
}

private static Integer keepNumbers(String s) {
    String number = s.replaceAll("\\D", "");
    if (!number.isEmpty()) {
        return Integer.parseInt(number);
    }
    return 0;
}

For following data:

List<String> list = new ArrayList<>();
list.add("TEXT-2");
list.add("TEST-6");
list.add("TEST-1");
list.add("109");
list.add("TE");
list.add("TESTT-0");
list.add("TES-100");

This is the sorting result:

[109, TE, TES-100, TEST-1, TEST-6, TESTT-0, TEXT-2]
Sign up to request clarification or add additional context in comments.

5 Comments

Nice solution. But be aware that with your solution numbers are not compared with their numerical value but with their string representation. For example TEST-12 < Test-3. I would change the return value of keepNumbers to int.
@Eritrean I have updated the keepNumbers method. Now it returns the Integer representation of the number instead of a String.
I've tried your solution and it seems that works perfectly. I'll do more tests and then set it to correct solution. Thank you
@CaioAmbrosio Sure, I hope it works. Otherwise, you might have to tweak it a little bit depending on your data.
@CaioAmbrosio I am glad that my answer helped you.
0

if i am right,the problem is with your character '-',by using string.replace("-","") and then you can proceed with the normal sorting,have the string as it is for sorting,hopefully it should work as you expect.

String num = s.replaceAll("\\D", "").replace("-","");

if you won't have any negative values it should work,even then apply the regex for checking is it a negative number or string contains the '-'.

1 Comment

"\\D" includes the - too. You don't have to redo it.
0

Here's a compare method that we're using to sort strings that can contain multiple numbers at any location (e.g. strings like "TEST-10.5" or "TEST-42-Subsection-3"):

boolean isDigit( char c ) {
  return '0' <= c && c <= '9';
}

int compare( String left, String right, Collator collator ) {
  if ( left == null || right == null ) {
    return left == right ? 0 : ( left == null ? -1 : 1 );
  }

  String s1 = left.trim();
  String s2 = right.trim();

  int l1 = s1.length();
  int l2 = s2.length();
  int i1 = 0;
  int i2 = 0;
  while ( i1 < l1 && i2 < l2 ) {
    boolean isSectionNumeric = isDigit( s1.charAt( i1 ) );
    if ( isSectionNumeric != isDigit( s2.charAt( i2 ) ) ) {
      // one of the strings now enters a digit section and one is in a text section so we're done 
      //switch to -1 : 1 if you want numbers before text
      return isSectionNumeric ? 1 : -1;
    }

    // read next section
    int start1 = i1;
    int start2 = i2;
    for ( ++i1; i1 < l1 && isDigit( s1.charAt( i1 ) ) == isSectionNumeric; ++i1 ){/* no operation */}
    for ( ++i2; i2 < l2 && isDigit( s2.charAt( i2 ) ) == isSectionNumeric; ++i2 ){/* no operation */}
    String section1 = s1.substring( start1, i1 );
    String section2 = s2.substring( start2, i2 );

    // compare the sections:
    int result =
        isSectionNumeric ? Long.valueOf( section1 ).compareTo( Long.valueOf( section2 ) )
      : collator == null ? section1.trim().compareTo( section2.trim() )
      :                    collator.compare( section1.trim(), section2.trim() );

    if ( result != 0 ) {
      return result;
    }

    if ( isSectionNumeric ) {
      // skip whitespace
      for (; i1 < l1 && Character.isWhitespace( s1.charAt( i1 ) ); ++i1 ){/* no operation */}
      for (; i2 < l2 && Character.isWhitespace( s2.charAt( i2 ) ); ++i2 ){/* no operation */}
    }
  }

  // we've reached the end of both strings but they still are equal, so let's do a "normal" comparison
  if ( i1 == l1 && i2 == l2 ) {      
    return collator == null ? left.compareTo( right ) : collator.compare( left, right );
  }

  // we've reached the end of only one string, so the other must either be greater or smaller
  return ( i1 == l1 )? -1 : 1;
}

The idea is to "split" the strings into "text" and numeric sections and to compare the sections one by one. Decimal numbers would be supported in that the integer, decimal point and fraction parts would be 3 sections that are compared individually.

This would basically be similar to splitting a string into an array of substring and comparing the elements at each corresponding index. You then have the following situations:

  • both elements are texts: do a normal string comparison
  • both elements represent numbers: parse and compare the numbers
  • one element is a text and the other represents a number: decide which one is greater
  • we've reached the end of both strings but all elements are equal: we could be done or do a "normal" comparison on the entire strings to get an order if possible
  • we've reached the end of only one string and they are still equal: the longer one is reported to be greater (must be because there's more content ;) )

Note that this is just our way of doing it and there are others as well (e.g. ones that don't skip whitespace).

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.