7

take this string as an example: "will see you in London tomorrow and Kent the day after tomorrow".

How would I convert this to an associative array that contains the keywords as keys, whilst preferably missing out the common words, like this:

Array ( [tomorrow] => 2 [London] => 1 [Kent] => 1)

Any help greatly appreciated.

3 Answers 3

7

I would say you could :

  • split the string into an array of words
    • with explode
    • or preg_split
    • depending on the complexity you'll accept for your words separators
  • use array_filter to only keep the lines (i.e. words) you want
    • the callback function will have to return false for all non-valid-words
  • and, then, use array_count_values on the resulting list of words
    • which will count how many times each words is present in the array of words


EDIT : and, just for fun, here's a quick example :

First of all, the string, that gets exploded into words :

$str = "will see you in London tomorrow and Kent the day after tomorrow";
$words = preg_split('/\s+/', $str, -1, PREG_SPLIT_NO_EMPTY);
var_dump($words);

Which gets you :

array
  0 => string 'will' (length=4)
  1 => string 'see' (length=3)
  2 => string 'you' (length=3)
  3 => string 'in' (length=2)
  4 => string 'London' (length=6)
  5 => string 'tomorrow' (length=8)
  6 => string 'and' (length=3)
  7 => string 'Kent' (length=4)
  8 => string 'the' (length=3)
  9 => string 'day' (length=3)
  10 => string 'after' (length=5)
  11 => string 'tomorrow' (length=8)

Then, the filteting :
function filter_words($word) {
    // a pretty simple filter ^^
    if (strlen($word) >= 5) {
        return true;
    } else {
        return false;
    }
}
$words_filtered = array_filter($words, 'filter_words');
var_dump($words_filtered);

Which outputs :

array
  4 => string 'London' (length=6)
  5 => string 'tomorrow' (length=8)
  10 => string 'after' (length=5)
  11 => string 'tomorrow' (length=8)

And, finally, the counting :
$counts = array_count_values($words_filtered);
var_dump($counts);

And the final result :

array
  'London' => int 1
  'tomorrow' => int 2
  'after' => int 1

Now, up to you to build up from here ;-)
Mainly, you'll have to work on :
  • A better exploding function, that deals with ponctuation (or deal with that during filtering)
  • An "intelligent" filtering function, that suits your needs better than mine

Have fun !

Sign up to request clarification or add additional context in comments.

7 Comments

str_word_count might also be interesting: php.net/manual/en/function.str-word-count.php
Thanks that works. is it possible to get the final result without the "int"? i.e. just the number on its own
@Steven : yes, yes, of course it's possible :: those "int", "string", and stuff like that in the output I presented are there because I used var_dump, which is great for inspecting variables -- but not quite when it comes to displaying them to user ;-) ;;; it's just a matter of displaying the data with something else than var_dump.
i feel like filtering by word length would be troublesome. It could easily get rid of valid words i.e. honda, jeep, paris. really depends on what youre using this for as to what method you should choose.
Galen's solution below works well, except when there is an apostrophe. How would I fix that. (thanks again)
|
1

You could have a table of common words, then go through your string one word at a time, checking if it exists in the table, if not, then add it to your associative array, or +1 to it if it already exists.

Comments

0

using a blacklist of words not to be included

$str = 'will see you in London tomorrow and Kent the day after tomorrow';
$skip_words = array( 'in', 'the', 'will', 'see', 'and', 'day', 'you', 'after' );
// get words in sentence that aren't to be skipped and count their values
$words = array_count_values( array_diff( explode( ' ', $str ), $skip_words ) );

print_r( $words );

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.