1

I have got a method that reads a file, puts each word into an array of strings and then adds each word to a tree. I want to modify it so that the word is not added to the tree if it contains NON English characters eg spanish etc. I though about the 'contains' method but it doesn't work on the array of type String. How would i do it ?

    public void parse(File f) throws Exception {

    Node root = new  Node('+'); //create a root node
    BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(f)));

    String line;
    while((line = br.readLine())!=null){
        String[] words = line.toLowerCase().split(" ");

        for(int i = 0; i < words.length; i++){
            addToTree(words[i], root);
        }
    }//end of while
5
  • Can't you use the contains method on the String (words[i]) that you are trying to add to the tree? Commented Apr 4, 2013 at 15:09
  • You can use Regex, that accept only a to Z with -;!,'. Commented Apr 4, 2013 at 15:10
  • stackoverflow.com/questions/2774320/… this should solve your issue. Commented Apr 4, 2013 at 15:11
  • This question is pretty meaningless unless you define exactly what 'English characters' are. For example, both English and Spanish are based on the Roman alphabet. Are you talking about excluding things like diacritics? Commented Apr 4, 2013 at 15:13
  • stackoverflow.com/questions/150033/… could be useful here Commented Apr 4, 2013 at 15:15

2 Answers 2

3

You can use regex for that:

Pattern nonEng = Pattern.compile("[^A-Za-z]");
...
for(int i = 0; i < words.length; i++) {
    if (!pattern.matcher(words[i]).find()) {
        addToTree(words[i], root);
    }
}

This would throw away all words that are not composed entirely of English characters.

Sign up to request clarification or add additional context in comments.

1 Comment

should be words[i], not words[1]
0

if words are composed of letters from [a-zA-Z_0-9]

return !myString.matches("^\\w+$");

if you have special requirements like punctuation marks and other characters, add them as well in the regex. [^\w.,;:'"]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.