1

Basically, I have 2 dataframes, df1 & df2, in df1 I have 5 weeks alone.

In df2 I have the same column of year/week with a product column.

So if one of those rows in df2 is not in df1, I'm looking for adding the product in that respectively year/week.

df1:
+----+-------------+
|    | year/week   |
+====+=============+
|  0 | 2022/01     |
+----+-------------+
|  1 | 2022/02     |
+----+-------------+
|  2 | 2022/03     |
+----+-------------+
|  3 | 2022/04     |
+----+-------------+
|  4 | 2022/05     |
+----+-------------+

df2:
+----+-------------+-----------+
|    | year/week   | product   |
+====+=============+===========+
|  0 | 2022/01     | A         |
+----+-------------+-----------+
|  1 | 2022/02     | A         |
+----+-------------+-----------+
|  2 | 2022/01     | B         |
+----+-------------+-----------+
|  3 | 2022/04     | B         |
+----+-------------+-----------+
|  4 | 2022/05     | C         |
+----+-------------+-----------+

this is the expected output that I want to obtain, is there is a pythonic way to obtain this?

+----+-------------+-----------+
|    | year/week   | product   |
+====+=============+===========+
|  0 | 2022/01     | A         |
+----+-------------+-----------+
|  1 | 2022/02     | A         |
+----+-------------+-----------+
|  2 | 2022/03     | A         |
+----+-------------+-----------+
|  3 | 2022/04     | A         |
+----+-------------+-----------+
|  4 | 2022/05     | A         |
+----+-------------+-----------+
|  5 | 2022/01     | B         |
+----+-------------+-----------+
|  6 | 2022/02     | B         |
+----+-------------+-----------+
|  7 | 2022/03     | B         |
+----+-------------+-----------+
|  8 | 2022/04     | B         |
+----+-------------+-----------+
|  9 | 2022/05     | B         |
+----+-------------+-----------+
| 10 | 2022/01     | C         |
+----+-------------+-----------+
| 11 | 2022/02     | C         |
+----+-------------+-----------+
| 12 | 2022/03     | C         |
+----+-------------+-----------+
| 13 | 2022/04     | C         |
+----+-------------+-----------+
| 14 | 2022/05     | C         |
+----+-------------+-----------+

1 Answer 1

1

You could create a Cartesian product from the "year/week" column in df1 and the unique "products" in df2 and convert it into a DataFrame. You can omit sort_values if you don't particularly care about the order.

out = (pd.MultiIndex.from_product([df1['year/week'], df2['product'].unique()], 
                                 names=['year/week','product']).to_frame()
       .reset_index(drop=True).sort_values(by='product', ignore_index=True))

Output:

   year/week product
0    2022/01       A
1    2022/02       A
2    2022/03       A
3    2022/04       A
4    2022/05       A
5    2022/01       B
6    2022/02       B
7    2022/03       B
8    2022/04       B
9    2022/05       B
10   2022/01       C
11   2022/02       C
12   2022/03       C
13   2022/04       C
14   2022/05       C
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.