0

My program would scrape some website and create two lists, one for category, the other for content. I then use dict(zip(......)) command to match them and put them into a dict.

Something like this:

complete_dict=dict(zip(category_list,info_list))

I run into the problem that my program is reading empty element in both lists (category, info). It's fine as long as I can remove them later. The problem is, I failed to find a way to do so. When reading out, both lists contain empty elements, not empty strings, but more like an empty list within a list. I try to remove them both in lists and in dictionary after zipping them, using commands like:

category_list=filter(None, category_list)

or:

info_list=[x for x in info_list if x != []]

Of course operation is done for both lists.

None prevailed. I then tried doing it in the dictionary with:

dict((k, v) for k, v in complete_list.iteritems() if v)

What else can I try at this point?

Edit

I tried filtering, and either my conditions are not set correctly or it simply doesn't solve the problem. I'm looking for other way so it's not a duplicate of another thread (that thread has some useful info though).

Edit 2

What I'm getting right now is:

[u'info1', u'info2', u'info3', u'info4', ...]

[]

[]

[]

[]

[u'info1', u'info2', u'info3', u'info4', ...]

[]

[]

[]

[u'info1', u'info2', u'info3', u'info4', ...]

info 1, 2, 3, and 4 (and there are actually more elements) are content scraped from website, sorry I can't really reveal what those are, but the idea shows. This is one of the list (info_list), and I'm trying to remove all the []'s stuck in middle, so the result should be like

[u'info1', u'info2', u'info3', u'info4', ...]

[u'info1', u'info2', u'info3', u'info4', ...]

[u'info1', u'info2', u'info3', u'info4', ...]

and so on

Edit 3

My result looks like this after dict(zip(...))

{u'category1': u'info1', u'category2': u'info2', ...}

{}

{}

{u'category1': u'info1', u'category2': u'info2', ...}

{u'category1': u'info1', u'category2': u'info2', ...}

{}

{}

{}

and so on.

5
  • possible duplicate of Python: Filter a dictionary Commented May 26, 2015 at 18:15
  • 2
    Please show expected input and output. This is still vague to me with the way you've described it. Commented May 26, 2015 at 18:17
  • This doesn't look like a problem with individual empty keys or values. It looks like you're running this comprehension over and over, and sometimes, it produces a completely empty dict. It's probably a problem with the surrounding code. Commented May 26, 2015 at 18:44
  • Hi @user2357112 Do you think there's a way to get around it by simply removing those empty ones? Commented May 26, 2015 at 18:50
  • It's impossible for us to tell. If there's a bug in the code causing this comprehension to be run on inputs it shouldn't, though, you should fix that code rather than trying to patch up the results afterward. Commented May 26, 2015 at 19:07

3 Answers 3

2

Using a dict comprehension with an is not None check:

info_list = {k: v for (k, v) in complete_list.iteritems() if v is not None}

From the documentation on dict comprehensions

Sign up to request clarification or add additional context in comments.

1 Comment

I tried this, and the result is still the same. I used 'complete_dict2 = {k: v for k, v in complete_dict.iteritems() if v is not None}' complete_dict is the dict after zipping the two lists together
1

but more like an empty list within a list.

Assuming this is guaranteed you can do

# make sure value is not "[]" or "[[]]"
{k: v for k, v in complete_list.iteritems() if v and v[0]}

Example:

complete_list = {'x': [[]], 'y': [], 'z': [[1]]}
{k: v for k, v in complete_list.iteritems() if v and v[0]}
# returns {'z': [[1]]}

EDIT

From your updated question, I see you are zipping lists together after scraping from a website like so:

complete_dict=dict(zip(category_list,info_list))

It looks like your info_list is empty in some cases, just do

if info_list:
    complete_dict=dict(zip(category_list,info_list))

to ensure you don't zip category_list with an empty list.

2 Comments

sorry it doesn't seem like it's working. One thing is, I don't know what's being used as "key" when I zipped those two lists together. Both lists have empty entries, so there doesn't seem to have any problem zipping them together. However, since there's no "key" being used for empty values, I don't know how to call or remove those.
hmmm ok updated answer based on your edited question.
-1

Use filter, you had it wrong, first argument must be callable accepting one argument, you can use built-in bool function

category_list = filter(bool, category_list)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.