5

I want to make a pandas dataframe with three columns, such that the rows contain all permutations of three columns, each with its own range of values are included. In addition, I want to sort them asc by c1, c2, c3.

For example, a = [0,1,2,3,4,5,6,7], b = [0,1,2], and c= [0,1]. The result I want looks like this:

c1 c2 c3
 0  0  0
 0  0  1
 0  1  0
 0  1  1
 0  2  0
 0  2  1
 1  0  0 
 1  0  1
 1  1  0
 ...
 7  2  0
 7  2  1

I keep trying to fill columns using numpy.arange, i.e., numpy.arange(0,7,1) for c1. But that doesn't easily create all the possible rows. In my example, I should end up with 8 * 3 * 2 = 48 unique rows of three values each.

I need this to act as a mask of all possible value combinations from which I can merge a sparse matrix of experimental data.

Does anyone know how to do this? Recursion?

0

5 Answers 5

5

If you would like to stay in Numpy you can also rely on np.meshgrid to compute the Cartesian product of the values, like so:

import numpy as np

a = [0, 1, 2, 3, 4, 5, 6, 7]
b = [0, 1, 2]
c = [0, 1]

values = np.meshgrid(a, b, c, indexing="ij")
cartesian_product = np.column_stack([_.flat for _ in values])

print(cartesian_product)

Which prints:

[[0 0 0]
 [0 0 1]
 [0 1 0]
 [0 1 1]
 [0 2 0]
 [0 2 1]
 [1 0 0]
 [1 0 1]
 [1 1 0]
 [1 1 1]
...

I haver not compared the performance against the itertools.product() approach. Numpy might be faster for large arrays.

Sign up to request clarification or add additional context in comments.

Comments

5

Pure pandas approach

As you are trying to create a data frame, you can do this entirely in pandas using pd.merge(how="cross") to take the Cartesian product:

import pandas as pd
a = [0, 1, 2, 3, 4, 5, 6, 7]
b = [0, 1, 2]
c = [0, 1]

(
    pd.DataFrame({"c1": a})
    .merge(pd.DataFrame({"c2": b}), how="cross")
    .merge(pd.DataFrame({"c3": c}), how="cross")
)

itertools approach

This is an appropriate use for the built-in itertools.product() which takes the Cartesian product of the input iterables.

from itertools import product
import pandas as pd

a = [0, 1, 2, 3, 4, 5, 6, 7]
b = [0, 1, 2]
c = [0, 1]

pd.DataFrame(product(a,b,c), columns=['c1', 'c2', 'c3'])

Output (with both approaches):

c1  c2  c3
0   0   0
0   0   1
0   1   0
0   1   1
0   2   0
0   2   1
1   0   0
1   0   1
1   1   0
1   1   1
1   2   0
1   2   1
2   0   0
2   0   1
2   1   0
2   1   1
2   2   0
2   2   1
3   0   0
3   0   1
3   1   0
3   1   1
3   2   0
3   2   1
4   0   0
4   0   1
4   1   0
4   1   1
4   2   0
4   2   1
5   0   0
5   0   1
5   1   0
5   1   1
5   2   0
5   2   1
6   0   0
6   0   1
6   1   0
6   1   1
6   2   0
6   2   1
7   0   0
7   0   1
7   1   0
7   1   1
7   2   0
7   2   1

1 Comment

Thank you for this! I chose the other pure pandas answer below over this one after much consideration of which one was easier for my pandas-leaning colleagues to understand. They slightly preferred the indexing approach below over your excellent merge response solely because they understand indexing more clearly than merge. Maintainability is an important consideration in my project since others will continue it after I am gone, so I simply asked them which approach was easier to comprehend. Thanks for being the quickest answer, much appreciated
4

You can use pandas MultiIndex (the doc)

import pandas as pd

a = range(8)
b = range(3)
c = range(2)
multi_index = pd.MultiIndex.from_product([a, b, c], names=['c1', 'c2', 'c3'])
df = multi_index.to_frame(index=False)
print(df)

Output:

    c1  c2  c3
0    0   0   0
1    0   0   1
2    0   1   0
3    0   1   1
4    0   2   0
5    0   2   1
6    1   0   0
7    1   0   1
8    1   1   0
9    1   1   1
10   1   2   0
11   1   2   1
12   2   0   0
13   2   0   1
14   2   1   0
15   2   1   1
16   2   2   0
17   2   2   1
18   3   0   0
19   3   0   1
20   3   1   0
21   3   1   1
22   3   2   0
23   3   2   1
24   4   0   0
25   4   0   1
26   4   1   0
27   4   1   1
28   4   2   0
29   4   2   1
30   5   0   0
31   5   0   1
32   5   1   0
33   5   1   1
34   5   2   0
35   5   2   1
36   6   0   0
37   6   0   1
38   6   1   0
39   6   1   1
40   6   2   0
41   6   2   1
42   7   0   0
43   7   0   1
44   7   1   0
45   7   1   1
46   7   2   0
47   7   2   1

1 Comment

Thanks everyone - so many excellent answers! I am in an environment where my colleagues operate in and best understand pandas, so I selected this answer as the preferred solely because of its maintainability by my colleagues.
3
  • Option 1 (Numpy)

Since you added tag in your question, so here is a numpy-based option

np.array(list(map(np.ravel,np.meshgrid(a, b,c)))).T

which shows

array([[0, 0, 0],
       [0, 0, 1],
       [1, 0, 0],
       [1, 0, 1],
       [2, 0, 0],
       [2, 0, 1],
       [3, 0, 0],
       [3, 0, 1],
       [4, 0, 0],
       [4, 0, 1],
       [5, 0, 0],
       [5, 0, 1],
       [6, 0, 0],
       [6, 0, 1],
       [7, 0, 0],
       [7, 0, 1],
       [0, 1, 0],
       [0, 1, 1],
       [1, 1, 0],
       [1, 1, 1],
       [2, 1, 0],
       [2, 1, 1],
       [3, 1, 0],
       [3, 1, 1],
       [4, 1, 0],
       [4, 1, 1],
       [5, 1, 0],
       [5, 1, 1],
       [6, 1, 0],
       [6, 1, 1],
       [7, 1, 0],
       [7, 1, 1],
       [0, 2, 0],
       [0, 2, 1],
       [1, 2, 0],
       [1, 2, 1],
       [2, 2, 0],
       [2, 2, 1],
       [3, 2, 0],
       [3, 2, 1],
       [4, 2, 0],
       [4, 2, 1],
       [5, 2, 0],
       [5, 2, 1],
       [6, 2, 0],
       [6, 2, 1],
       [7, 2, 0],
       [7, 2, 1]])
  • Option 2 (Recursion, without Numpy)

For fun, you can define a recursion function like below

def expandgrid(*args):
  if len(args) == 2:
    a, b = args
    res = []
    for x in a:
      for y in b:
        res.append((x if isinstance(x, list) else [x]) + [y])
    return res
  return expandgrid(expandgrid(*args[:-1]), args[-1])

when you run expandgrid(a,b,c), you will obtain

[[0, 0, 0],
 [0, 0, 1],
 [0, 1, 0],
 [0, 1, 1],
 [0, 2, 0],
 [0, 2, 1],
 [1, 0, 0],
 [1, 0, 1],
 [1, 1, 0],
 [1, 1, 1],
 [1, 2, 0],
 [1, 2, 1],
 [2, 0, 0],
 [2, 0, 1],
 [2, 1, 0],
 [2, 1, 1],
 [2, 2, 0],
 [2, 2, 1],
 [3, 0, 0],
 [3, 0, 1],
 [3, 1, 0],
 [3, 1, 1],
 [3, 2, 0],
 [3, 2, 1],
 [4, 0, 0],
 [4, 0, 1],
 [4, 1, 0],
 [4, 1, 1],
 [4, 2, 0],
 [4, 2, 1],
 [5, 0, 0],
 [5, 0, 1],
 [5, 1, 0],
 [5, 1, 1],
 [5, 2, 0],
 [5, 2, 1],
 [6, 0, 0],
 [6, 0, 1],
 [6, 1, 0],
 [6, 1, 1],
 [6, 2, 0],
 [6, 2, 1],
 [7, 0, 0],
 [7, 0, 1],
 [7, 1, 0],
 [7, 1, 1],
 [7, 2, 0],
 [7, 2, 1]]

Comments

3

Another possible solution:

pd.DataFrame(
    np.stack(
        np.broadcast_arrays(
            a[:, None, None],
            b[None, :, None],
            c[None, None, :]),
        axis=-1)
    .reshape(-1, 3),
    columns=['c1','c2','c3'])

This solution generates all combinations of values from arrays a, b, and c using numpy broadcasting and reshapes them into a dataframe. The key idea is to use np.broadcast_arrays to align arrays a[:, None, None], b[None, :, None], and c[None, None, :] into a common shape (8, 3, 2), creating a grid of all combinations. These broadcasted arrays are then stacked along a new last axis using np.stack, producing a 4D array (8, 3, 2, 3) where each innermost vector represents a unique combination of (a, b, c). This array is flattened with .reshape(-1, 3) into a 2D array with all permutations sorted lexicographically by c1, c2, and c3, and finally wrapped in dataframe.

Output:

    c1  c2  c3
0    0   0   0
1    0   0   1
2    0   1   0
3    0   1   1
4    0   2   0
5    0   2   1
6    1   0   0
7    1   0   1
8    1   1   0
9    1   1   1
10   1   2   0
11   1   2   1
12   2   0   0
13   2   0   1
14   2   1   0
15   2   1   1
16   2   2   0
17   2   2   1
18   3   0   0
19   3   0   1
20   3   1   0
21   3   1   1
22   3   2   0
23   3   2   1
24   4   0   0
25   4   0   1
26   4   1   0
27   4   1   1
28   4   2   0
29   4   2   1
30   5   0   0
31   5   0   1
32   5   1   0
33   5   1   1
34   5   2   0
35   5   2   1
36   6   0   0
37   6   0   1
38   6   1   0
39   6   1   1
40   6   2   0
41   6   2   1
42   7   0   0
43   7   0   1
44   7   1   0
45   7   1   1
46   7   2   0
47   7   2   1

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.