2

Python beginner here.

I couldn't find anything similar to this, but I have the feeling it shouldn't be so hard.

I have a large excel sheet with values from different sensors, but some of the values are missing due to errors in the measurements. So when I put everything into a pandas dataframe I have something like this:

TimeStamp1 Sensor1 TimeStamp2 Sensor2
08:00 100 08:00 60
08:05 102 08:10 40
08:10 105 08:15 50
08:15 101 08:25 31
08:20 103 NaT NaN
08:25 104 NaT NaN

The real dataframe has 7 sensors and more than 100k rows, so there are different numbers of NaT's and NaN's in different columns.

I need timestamps for each sensor to be aligned in order to avoid some inconsistencies. So I want to shift the lines in TimeStamp2 and Sensor2 from the point where it differs from TimeStamp1, add the missing time and a NaN (or empty) value in the position in Sensor2, and make the NaT and NaN at the end disappear from both columns.

An output like this:

TimeStamp1 Sensor1 TimeStamp2 Sensor2
08:00 100 08:00 60
08:05 102 08:05 Empty (NaN)
08:10 105 08:10 40
08:15 101 08:15 50
08:20 103 08:20 Empty (NaN)
08:25 104 08:25 31

I guess I could simplify the question by asking a way to insert a specific element in a specific row of a specific column. All shifting examples I've seen will shift the entire column up or down. Is there an easy way to do this?

If it's easier, this solution also works for me:

TimeStamp Sensor1 Sensor2
08:00 100 60
08:05 102 Empty (NaN)
08:10 105 40
08:15 101 50
08:20 103 Empty (NaN)
08:25 104 31
1
  • 3
    it may be sufficient to set the TimeStamp column type to be a timeseries and then merge outer on it Commented Oct 18, 2021 at 15:19

2 Answers 2

2

@ti7's suggestion is spot on; split the dataframe into individual frames, merge and fillna :

sensor1 = df.filter(like='1')
sensor2 = df.filter(like='2')
(sensor1.merge(sensor2, 
               how = 'outer', 
               left_on='TimeStamp1', 
               right_on = 'TimeStamp2', 
               sort = True)
        .fillna({"TimeStamp2" : df.TimeStamp1})
        .dropna(subset=['TimeStamp1'])
) 
  TimeStamp1  Sensor1 TimeStamp2  Sensor2
0      08:00    100.0      08:00     60.0
1      08:05    102.0      08:05      NaN
2      08:10    105.0      08:10     40.0
3      08:15    101.0      08:15     50.0
4      08:20    103.0      08:20      NaN
5      08:25    104.0      08:25     31.0
Sign up to request clarification or add additional context in comments.

1 Comment

This method worked perfectly, I appreciate the help!
1

This will work if your data is setup exactly as your example, otherwise you'll have to adapt for your data.

# change timestamps columns to datetime. You don't say if there's a date component, so you may have to get your timestamps in order before moving on.
timestamps = df.filter(regex='TimeStamp').columns.tolist()
for t in timestamps:
    df[t] = pd.to_datetime(df[t])

# get the max and min of all datetimes in the timestamp columns
end = df.filter(regex='TimeStamp').max().max()
start = df.filter(regex='TimeStamp').min().min()

# create a new date range
new_dates = pd.date_range(start=start, end=end, freq='5Min')

# get columns for iterations - should only be even and contain timestamp and sensor columns as your example shows
num_columns = df.shape[1]

# iterate and concat
dflist = []
for i in range(0, num_columns, 2):
    print(i)
    d = df.iloc[:, i:i+2].set_index(df.iloc[:, i].name).dropna().reindex(new_dates)
    dflist.append(d)
pd.concat(dflist, axis=1)

                     Sensor1  Sensor2
2021-10-18 08:00:00      100     60.0
2021-10-18 08:05:00      102      NaN
2021-10-18 08:10:00      105     40.0
2021-10-18 08:15:00      101     50.0
2021-10-18 08:20:00      103      NaN
2021-10-18 08:25:00      104     31.0

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.