16
ArrayList <String> list = new ArrayList(); 
list.add("behold");
list.add("bend");
list.add("bet");
list.add("bear");
list.add("beat");
list.add("become");
list.add("begin"); 

There is a way to search for the regexp bea.* and get the indexes like in ArrayList.indexOf ?

EDIT: returning the items is fine but I need something with more performance than a Linear search

4
  • You can't better performance if you put your strings in a List. Is your regex always a prefix, or do you want to handle any regex? Commented Nov 20, 2008 at 23:35
  • Then which data structure should I use? My regex is always a prefix. Commented Nov 21, 2008 at 16:08
  • I recomment some automata data structure. en.wikipedia.org/wiki/Trie Commented Jan 17, 2015 at 14:47
  • It is fundamental that unless you know something about the ordering of the list, then you cannot do better than linear search. This is because, without knowing anything about the ordering, in order to locate every matching element you must test against every element. If you only want the first matching element, then the only optimization you can apply is to test in an order which lets you terminate at the first hit( i.e. first to last). If you want sub-linear performance you have to tell us how your elements are ordered Commented Jan 25, 2017 at 18:31

8 Answers 8

20

Herms got the basics right. If you want the Strings and not the indexes then you can improve by using the Java 5 foreach loop:

import java.util.regex.Pattern;
import java.util.ListIterator;
import java.util.ArrayList;

/**
 * Finds the index of all entries in the list that matches the regex
 * @param list The list of strings to check
 * @param regex The regular expression to use
 * @return list containing the indexes of all matching entries
 */
List<String> getMatchingStrings(List<String> list, String regex) {

  ArrayList<String> matches = new ArrayList<String>();

  Pattern p = Pattern.compile(regex);

  for (String s:list) {
    if (p.matcher(s).matches()) {
      matches.add(s);
    }
  }

  return matches
}
Sign up to request clarification or add additional context in comments.

1 Comment

I thought of returning the actual matching strings, but the question specifically asked for the indicies. Returning the matching strings is generally a bit cleaner though.
8

Is there a built-in method? Not that I know of. However, it should be rather easy to do it yourself. Here's some completely untested code that should give you the basic idea:

import java.util.regex.Pattern;
import java.util.ListIterator;
import java.util.ArrayList;

/**
 * Finds the index of all entries in the list that matches the regex
 * @param list The list of strings to check
 * @param regex The regular expression to use
 * @return list containing the indexes of all matching entries
 */
List<Integer> getMatchingIndexes(List<String> list, String regex) {
  ListIterator<String> li = list.listIterator();

  List<Integer> indexes = new ArrayList<Integer>();

  while(li.hasNext()) {
    int i = li.nextIndex();
    String next = li.next();
    if(Pattern.matches(regex, next)) {
      indexes.add(i);
    }
  }

  return indexes;
}

I might have the usage of Pattern and ListIterator parts a bit wrong (I've never used either), but that should give the basic idea. You could also do a simple for loop instead of the while loop over the iterator.

5 Comments

Personally, I think that api methods should take arguments and return values of the most abstract type possible. Hence, my pedantic correction of your answer would be: public List<int> getMatchingIndices(List<String>list, String regex){..}
Good point. I was just throwing it together really quick and didn't pay as much attention to that stuff.
FYI, <int> is not a valid type parameter. You would have to make it a List<Integer>. Also, when you use a regex in a loop like this, you should compile it into a Pattern object before entering the loop, like DJClayworth did.
Hmm, I thought autoboxing took care of the int->Integer thing. Oh well. Like I said, it was untested and quickly thrown together :)
autoboxing will let you .add an int and convert it to an Integer but a primitive type cannot be used in a type parameter like that.
4

One option is to use Apache Commons CollectionUtils "select" method. You would need to create a Predicate object (an object with a single "evaluate" method that uses the regular expression to check for a match and return true or false) and then you can search for items in the list that match. However, it won't return the indexes, it will return a collection containing the items themselves.

Comments

3

This is a one liner in guava:

final Iterable<String> matches = Iterables.filter(myStrings, Predicates.contains(Pattern.compile("myPattern")));

for (final String matched : matches) {
   ...
}

Comments

1

I do not believe there is a Java API way of doing this, nor is there a Apache Commons way of doing this. It would not be difficult to roll your own however.

Comments

0

This will a thread revival, but might be useful to somebody. You might not need indexes, probably next step will do something on the items which matched the regex and therefore you asked for indexes. But you can use Java8 streams and lambda expression:

  import java.util.regex.Pattern;
  import java.util.stream.Collectors;
  import java.util.List;

  ...

  var pattern = Pattern.compile(define);  // var is Java 10 feature

  List<String> list = originalList
      .stream()
      .filter(e -> pattern.matcher(e).matches())
      .collect(Collectors.toList());

You can take the original list, convert it to a stream, run a filter on it which runs lambda to match your pattern and convert it back to a List. But you can keep it as stream and run .foreach on it with another lambda expression.

Comments

0

When we are talking about large lists it makes sense to stream them in parallel with Java8 built-in functions.

@Test
public void testRegexPerformance()
{
    List<String> list = new ArrayList<>();
    list.add("behold");
    list.add("bend");
    list.add("bet");
    list.add("bear");
    list.add("beat");
    list.add("become");
    list.add("begin");
    for (int i = 0; i < 20; i++)
    {
        list.addAll(list);
    }
    System.out.println("Original list size: " + list.size());
    Instant startTime = Instant.now();
    List<String> results = testLoopApproach(list, "bea.*");
    Instant current = Instant.now();
    System.out.println("Found List size: " + results.size());
    System.out.println("Elapsed millis: " + (current.toEpochMilli() - startTime.toEpochMilli()));
    startTime = Instant.now();
    results = testStreamApproach(list, "bea.*");
    current = Instant.now();
    System.out.println("Found List size: " + results.size());
    System.out.println("Elapsed millis: " + (current.toEpochMilli() - startTime.toEpochMilli()));
}

private List<String> testStreamApproach(List<String> list, String regex)
{
    Predicate<String> pred = Pattern.compile(regex).asPredicate();
    return list.parallelStream().filter(pred).collect(Collectors.toList());
}

private List<String> testLoopApproach(List<String> list, String regex)
{
    Pattern p = Pattern.compile(regex);
    List<String> resulsts = new ArrayList<>();
    for (String string : list)
    {
        if (p.matcher(string).find())
        {
            resulsts.add(string);
        }
    }
    return resulsts;
}

and the results are:
Original list size: 7340032
Found List size: 2097152
Elapsed millis: 1785
Found List size: 2097152
Elapsed millis: 260

Comments

0

Here is an answer with linear complexity using a simple for loop which gives you the option to either return the index or the word!

ArrayList<String> wordList = new ArrayList<String>(Arrays.asList("behold", "bend", "bet", "bear", "beat", "become", "begin"));
for (int i = 0; i < wordList.size(); i++) {
  String word = wordList.get(i);
  if (word.matches("bea.*")) {
    System.out.println("index for " + word + " is: " + i);
  }
}

As previously mentioned you cannot do better than linear search unless you know something about the ordering of the list,

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.