1

I am trying to replace rows in a dataframe with rows from another dataframe. I have an excel file with all the existing product code in column 0 called 'MASTER.xlsx', and where the remaining columns are empty. I have another excel file called 'COUT PROJET - HOTEL DE VILLE.xlsx' containing some of the product codes in column 0 and where the remaining columns are filled with values.

Ultimately, I want to iterate through both the 'MASTER.xlsx' and 'COUT PROJET - HOTEL DE VILLE.xlsx' files. When the product code is in both files, I want to replace that respective row in 'MASTER.xlsx' with the filled out row from 'COUT PROJET - HOTEL DE VILLE.xlsx'. When the product code is not in 'COUT PROJET - HOTEL DE VILLE.xlsx', I want that row in 'MASTER.xlsx' to remain unchanged (empty).

import numpy as np
import pandas as pd
import time
import glob

df_master = pd.read_excel('MASTER.XLSX')

df = pd.read_excel('COÛT PROJET - HÔTEL DE VILLE.xlsx')

for index, column in df.iterrows(): 
        for index, row in df_master.iterrows():
            if row['DATE :'] == column['DATE :']:
                df_master.update(df)
            else:
                continue
                
        
df_master.to_excel('UPDATED COÛT PROJET - HÔTEL DE VILLE.xlsx')

The current code seems to partly work, however I think because the dataframes don't have the same size. I have included pictures of what the excel files look like. I apologize for my lack a knowledge, I am a beginner trying to help out the family business. Thank you for the help!

enter image description here

enter image description here

1
  • Please include, sample input & expected output. Commented Jul 11, 2020 at 15:54

2 Answers 2

0

You can do most things in pandas without loops.

Try something like this:

import pandas as pd

df1 = pd.DataFrame({'A': ['A0'],
                     'B': ['B0'],
                     'C': ['C0'],
                     'D': ['D0']})

df2 = pd.DataFrame({'A': ['A0','','','',''],
                    'B': ['B1','B2', 'B3', 'B4', 'B5'],
                    'C': ['C0','','','',''],
                    'D': ['D1','D2', 'D3', 'D4', 'D5']})

pd.concat([df1, df2], axis=0, sort=False).T
Sign up to request clarification or add additional context in comments.

Comments

0

Typically you want to avoid using crude loops when using pandas. These are much slower and inefficient. The best method is to use the apply feature in pandas, documentation here. Here are a few examples on how to use apply, example 1, example 2, example 3.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.