1

I am trying to import multiple files between two dates into a Pandas DataFrame. But the resulting dataframe has multiple copys of the data instead of one copy.

My code looks like this:

Mu = pd.DataFrame()
lis = []
for date in daterange:
    path = 'Z:/directory/to/files' + date + '.txt'
    frame = pd.read_csv(path,delimiter=' ', skipinitialspace=True,usecols=[0,1,2,3], 
              names = ['date','time','type1','type2'],
              parse_dates = {'timestamp': ['date','time']})

    lis.append(frame)
Mu = pd.concat(lis, axis =0, ignore_index = True)

If I have files like this:

File A:
20170501 00:00:11 11 1
20170501 00:00:20 21 2

File B:
20170502 00:06:11 31 3
20170502 00:30:11 41 4

File C:
20170503 00:40:11 51 5
20170503 00:50:11 61 6 

The resulting dataframe looks like this:

20170501 00:00:11 11 1
20170501 00:00:20 21 2
20170502 00:06:11 31 3
20170502 00:30:11 41 4
20170503 00:40:11 51 5
20170503 00:50:11 61 6    
20170501 00:00:11 11 1
20170501 00:00:20 21 2
20170502 00:06:11 31 3
20170502 00:30:11 41 4
20170503 00:40:11 51 5
20170503 00:50:11 61 6   
20170501 00:00:11 11 1
20170501 00:00:20 21 2
20170502 00:06:11 31 3
20170502 00:30:11 41 4
20170503 00:40:11 51 5
20170503 00:50:11 61 6   

What I want is this:

20170501 00:00:11 11 1
20170501 00:00:20 21 2
20170502 00:06:11 31 3
20170502 00:30:11 41 4
20170503 00:40:11 51 5
20170503 00:50:11 61 6 

How can I create the wanted dataframe?

1 Answer 1

3

You can use drop_duplicates:

Mu = Mu.drop_duplicates()

output :

0   20170501    00:00:11    11  1
1   20170501    00:00:20    21  2
2   20170502    00:06:11    31  3
3   20170502    00:30:11    41  4
4   20170503    00:40:11    51  5
5   20170503    00:50:11    61  6
Sign up to request clarification or add additional context in comments.

1 Comment

@Sanne No problem, can you accept the response if that fit your need?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.