Create a words histogram of a given string in python

Question

I wrote a code that given a string (txt), will print a histogram that contains the number of occurrences of each word in it and the word itself.

A word is defined to be a sequence of chars separated by one space or more.

The code is working, I just want to see if I can improve it.

def PrintWordsOccurence(txt):
    lst = [x for x in txt.split(' ') if x != ' ' and x != '']
    newlst = list(dict.fromkeys(lst))
    for item in newlst:
        print("[{0}] {1}".format(lst.count(item),item))

An input-output example:

>> PrintWordsOccurence("hello code review!  this is  my   code")

[1] hello
[2] code
[1] review!
[1] this
[1] is
[1] my

The first line creates a list with all words, there will be duplicates in it, so to remove them, I turn the list into a dictionary and then I turn it back to a list. In the end, for each item in the list, I print the number of its occurrences in the original string and the item itself.

The time complexity of the code is \$\Theta(n^2)\$

Any suggestions to improve the code?

Pavlo Slavynskyy · Accepted Answer · 2021-07-08 11:33:24Z

str.split method without arguments

It does the same filtering as you do. Just skip ' ' argument.

list(dict.fromkeys(list comprehension))

It looks overburdened, right? You can create a set of list comprehension as well:

newlst = list({x for x in txt.split()})

or just

newlst = list(set(txt.split()))

but the best way is

collections.Counter

You're reinventing it with worse time complexity.

Format strings

f-strings look better (but it depends on your task).

Function name

PEP8 advices to use lower case with underscores: print_words_occurence

All together

from collections import Counter
def print_words_occurence(txt):
   for word, count in Counter(txt.split()).items():
       print(f"[{count}] {word}")

Also consider dividing an algorithmic part and input/output - like yielding a pair (word, count) from the function and outputting it somewhere else.

ingyhere · Accepted Answer · 2021-07-16 02:54:02Z

2

This can be improved by implementing a two-line, basic counting sort in core python:

def print_words_occurence(txt):
    wdict = dict()
    for w in txt.strip().split(): 
        wdict[w] = wdict.get(w, 0) + 1
    # just to print results
    [print(f"[{wdict[k]}] {k}") for k in wdict.keys()]

The run time is O(n) if the final dict isn't sorted. [If it is sorted, it's still O(n+k).] I don't think anything can be more efficient.

edited Jul 16, 2021 at 2:54

answered Jul 9, 2021 at 2:40

ingyhere

1294 bronze badges

Add a comment |

Stack Exchange Network

Create a words histogram of a given string in python

2 Answers 2

str.split method without arguments

list(dict.fromkeys(list comprehension))

collections.Counter

Format strings

Function name

All together

You must log in to answer this question.

Hot Network Questions

Create a words histogram of a given string in python

2 Answers 2

str.split method without arguments

list(dict.fromkeys(list comprehension))

collections.Counter

Format strings

Function name

All together

You must log in to answer this question.

Related

Hot Network Questions