0

I want to create a boolean Dataframe from sets,

So there are 4 sets, each containing a collection of names

a = { a collection of names }
b = { another collection of names}
c = { ... } 
d = { ... }

And the result should be a Dataframe that looks like this:

 Name   |   a   |   b   |  c    |   d 
 --------------------------------------
'John'  | True  | True  | False | True
'Mike'  | False | True  | False | False
   .
   .
   .

I want a way to do this in Python using Pandas and in an efficient manner.

One way to do is to pick each name and see if it's in each set and then add that name to the Dataframe. But there should be faster ways like merging the sets and applying some function.

1
  • 1
    What have you tried so far? Please also post sample data. Commented Jun 27, 2017 at 19:56

2 Answers 2

1

I've put together some random sample data that should scale:

a = ['foo', 'bob']
b = ['foo', 'john', 'jeff']

df
   name
0  jeff
1  john
2   bob

df['a'] = df.name.isin(a)
df['b'] = df.name.isin(b)

df
   name      a      b
0  jeff  False   True
1  john  False   True
2   bob   True  False
Sign up to request clarification or add additional context in comments.

Comments

1

Here is one possible approach:

a = {'John', 'Mike'}
b = {'Mike', 'Jake'}

pd.DataFrame.from_dict({
    'a': dict.fromkeys(a, True),
    'b': dict.fromkeys(b, True),
}).fillna(False)
          a      b
Jake  False   True
John   True  False
Mike   True   True

dict.fromkeys(..., True) gives you something like

{'John': True, 'Mike': True}

This dictionary is interpreted as a series when passed to DataFrame. Pandas takes care of aligning the indices, so the final data frame is indexed by the union of all the sets.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.