I would like to compare the min/max values of a time-series with a test time-series. Additionally, I would like to compare the time of the "peaks". However, I'm having trouble extracting these features from a Pandas DataFrame.
Given the following data:
def fake_phase_data():
in_li = []
sample_points = 24 * 4
for day, bias in zip((11, 12, 13), (.5, .7, 1.)):
day_time = datetime(2016, 6, day, 0, 0, 0)
for x in range(int(sample_points)):
in_li.append((day_time + timedelta(minutes=15*x),
bias * np.sin(2 * np.pi * x / sample_points + (1.2*bias)),
bias))
fake_df = pd.DataFrame(in_li, columns=("time", "phase_sig", "bias")).set_index("time")
return fake_df
fp = fake_phase_data()
# Convert to pivot-table with 24 hour columns
dfs = {
col: pd.pivot_table(
fp,
index=fp.index.date,
columns=fp.index.hour,
values=col,
aggfunc='mean',
)
for col in fp.columns
}
ddf = pd.concat(dfs, axis=1)
Which looks like:
for i in range(len(ddf)):
ddf["phase_sig"].iloc[i].plot()
I process the data:
def col_peaks(df, cols, peak_func):
return [list(getattr(df[col], peak_func)(axis=1).values) for col in cols]
def peak_vals(df, cols, t_peak):
peak_v = []
for c_i, col in enumerate(cols):
vals = df[col].values
peak_idx = t_peak[c_i]
peak_v.append(list(vals[np.arange(len(peak_idx)), peak_idx]))
return peak_v
# I may want to process multiple columns later
# but let's focus on the single-column case
x_cols = ["phase_sig"]
# Technically, I also want the minimum
# but let's focus on the maximum case first
orig_t_max = col_peaks(ddf, x_cols, "idxmax")
print("Orig t_max", orig_t_max)
orig_v_max = peak_vals(ddf, x_cols, orig_t_max)
print("Orig v_max", orig_v_max)
# actual test data will be a single row in a dataframe
# but this test is fine for now
test_df = ddf.iloc[[0]]
test_t_max = col_peaks(test_df, x_cols, "idxmax")
print("Test t_max", test_t_max)
test_v_max = peak_vals(test_df, x_cols, test_t_max)
print("Test v_max", test_v_max)
And get the result:
Orig t_max [[4, 3, 1]]
Orig v_max [[0.4985414229286749, 0.6989567830263389, 0.9940657122457474]]
Test t_max [[4]]
Test v_max [[0.4985414229286749]]
How do I get both of these values without the weird loops I'm doing? I know I could make them more compact by using a list-comprehension, but I'd rather get rid of them altogether. Is there a way to deal with both DataFrame and Series without the awkward if-statement I use in col_peaks and peak_vals?

Seriesby e.g..iloc[[0]]instead of.iloc[0], then just remove the twoelsebranches. \$\endgroup\$