2

I would like to merge three CSV files as follow:

df = pd.DataFrame()
df["train_board_station"] = ['Tokyo','LA','Paris','New_York','Delhi']
df["train_off_station"] = ['Phoenix','London','Sydney','Berlin','Shanghai']

Second csv file:

ref = pd.DataFrame() 
ref["station"] = ['Tokyo','London','Paris','New_York','Shanghai','LA','Sydney','Berlin','Phoenix','Delhi','Tokyo','London','Paris','Sydney','Berlin']
ref["point_A"] = ['-34.54','56.789','-78,98','45.62','111.67','23.78','-98.40','-76.89','23.98','23.89']
ref["point_B"] = ['34.89','-78.55','78.89','34.12','56.56','23.23','-78.65','34.76','23.67','21.645']

Third csv file:

rec = pd.DataFrame()
rec["code"] = ['Tokyo','London','Paris','New_York','Shanghai','LA','Sydney','Berlin','Phoenix','Delhi']
rec["count_A"] = ['1.2','7.8','4','8','7.8','3','8','5','2','10']
rec["count_B] = ['12','78','4','8','78','36,'88,'51,'25,'10']

I tried this. But i get memory error:

for x in ["board", "off"]:
    df["station"] = df["train_" + x + "_station"]
    df["code"] = df["train_" + x + "_station"]
    df = pd.concat([df, ref,rec], axis=1, join_axes=[df.index])
    df[x + "_point_A"] = df["point_A"]
    df[x + "_point_B"] = df["point_B"]
    df[x + "_count_A"] = df["count_A"]
    df[x + "_count_B"] = df["count_B"]
    df = df.drop(["station", "point_A","point_B","code","count_A","count_B"], axis=1)

I get the memory error.

2
  • Because your for loop is trying to access df["train_" + x + "_station"] with x = board which is invalid. Commented May 3, 2017 at 8:55
  • I have the df["train_board_station"] in my first csv file Commented May 3, 2017 at 8:57

1 Answer 1

1

It seems you need df1 and df2 variables in loops:

for x in ["board", "off"]:
    df["station"] = df["train_" + x + "_station"]
    df1 = pd.concat([df, ref], axis=1, join_axes=[df.index])
    df[x + "_latitude"] = df1["latitude"]
    df[x + "_longitude"] = df1["longitude"]
    df = df.drop("station", axis=1)

for x in ["board", "off"]:
    df["code"] = df["train_" + x + "_station"]
    df2 = pd.concat([df, por], axis=1, join_axes=[df.index])
    df[x + "_freq"] = df2["freq"]
    df[x + "_count"] = df2["count"]
    df = df.drop(["code"], axis=1)

print (df)
  train_board_station train_off_station board_latitude board_longitude  \
0               Tokyo           Phoenix         -34.54           34.89   
1                  LA            London         56.789          -78.55   
2               Paris            Sydney         -78,98           78.89   
3            New_York            Berlin          45.62           34.12   
4               Delhi          Shanghai         111.67           56.56   

  off_latitude off_longitude board_freq board_count off_freq off_count  
0       -34.54         34.89        1.2          12      1.2        12  
1       56.789        -78.55        7.8          78      7.8        78  
2       -78,98         78.89          4           4        4         4  
3        45.62         34.12          8           8        8         8  
4       111.67         56.56        7.8          78      7.8        78  
Sign up to request clarification or add additional context in comments.

3 Comments

I have a small problem with this ! the files doesnt get merged if there exists two more more lines with same code name. only the first line with the code gets merged, if the second line has the same code name, it doesnt get merged. can you please help me to solve this problem ?
bteere is create new question, but I am working on solution.
i have another error also. please check : stackoverflow.com/questions/43977906/…