My goal is to read from standard input, break up the input into words, case-insensitively, and produce a report to standard output, where each line is a word followed by a space followed by its count. The output should be sorted by words.
This is my code:
use std::collections::BTreeMap;
use std::io;
use std::io::BufRead;
fn main() {
let mut counts: BTreeMap<String, isize> = BTreeMap::new();
let stdin = io::stdin();
for line_result in stdin.lock().lines() {
match line_result {
Ok(line) => {
let lowercase_line = line.to_lowercase();
let words = lowercase_line.split(|c: char| {
!(c.is_alphabetic() || c == '\'')
}).filter(|s| !s.is_empty());
for word in words {
*(counts.entry(word.to_string()).or_insert(0)) += 1;
}
},
Err(e) => {
panic!("Error parsing stdin: {:?}", e);
}
}
}
for (key, value) in counts.iter() {
println!("{} {}", key, value);
}
}
My questions are:
- Is
BTreethe proper dictionary? - I know that there is a regex crate, but I would like to stay with things in standard Rust. That said, splitting is a terrible way to break up lines because you have to filter empties. Is there a way to just match the words, rather than splitting on non-word sequences?
- Is matching on the
Errpart of the result proper? Or should we let the script crash? Is panicking okay? - I noticed one is not allowed to say
let words = line.to_lowercase().split(...) because of the infamous"borrowed reference does not live long enough"` but is there a cleaner way? - Is there a nicer way to count words in a map? I don't like the asterisk.
- I wish I didn't have to do an explicit lock on stdin.
Rust has a lot of things going for it, but when I compare what I got to the much prettier Julia version of this script, namely...
counts = Dict{AbstractString, UInt64}()
for line in eachline(STDIN)
for word in matchall(r"[a-z\']+", lowercase(line))
counts[word] = get(counts, word, 0) + 1
end
end
for (word, count) in sort(collect(counts))
println("$word $count")
end
...I'm thinking I don't know Rust very well, or, that's just the way things are. I mean, I know as a systems language, it's really hard to make vectors and strings. And they tell me I will learn to love the borrow checker. :) Hopefully someone with expertise in idiomatic Rust can be of service here. I'm not expecting it to be as short as the Julia code but I do fear my Rust is not idiomatic enough.
HashMapandBTreeMap. In the Julia example you clearly want to have the stuff sorted which is guaranteed with theBTreeMapand NOT theHashMap. So in this case I would argue thatBTreeMapis the "proper dictionary". \$\endgroup\$BTreeMap, but thought there might be something else, specific to counters. \$\endgroup\$