I am doing an experiment with three time-series datasets with different characteristics for my experiment whose format is as the following.
0.086206438,10
0.086425551,12
0.089227066,20
0.089262508,24
0.089744425,30
0.090036815,40
0.090054172,28
0.090377569,28
0.090514071,28
0.090762872,28
0.090912691,27
The first column is a timestamp. For reproducibility reasons, I am sharing the data here. From column 2, I wanted to read the current row and compare it with the value of the previous row. If it is greater, I keep comparing. If the current value is smaller than the previous row's value, I want to divide the current value (smaller) by the previous value (larger). Accordingly, here is the code:
import numpy as np
import matplotlib.pyplot as plt
protocols = {}
types = {"data1": "data1.csv", "data2": "data2.csv", "data3": "data3.csv"}
for protname, fname in types.items():
col_time,col_window = np.loadtxt(fname,delimiter=',').T
trailing_window = col_window[:-1] # "past" values at a given index
leading_window = col_window[1:] # "current values at a given index
decreasing_inds = np.where(leading_window < trailing_window)[0]
quotient = leading_window[decreasing_inds]/trailing_window[decreasing_inds]
quotient_times = col_time[decreasing_inds]
protocols[protname] = {
"col_time": col_time,
"col_window": col_window,
"quotient_times": quotient_times,
"quotient": quotient,
}
plt.figure(); plt.clf()
plt.plot(quotient_times,quotient, ".", label=protname, color="blue")
plt.ylim(0, 1.0001)
plt.title(protname)
plt.xlabel("time")
plt.ylabel("quotient")
plt.legend()
plt.show()
And this produces the following three points - one for each dataset I shared.
As we can see from the points in the plots based on the code given above, data1 is pretty consistent whose value is around 1, data2 will have two quotients (whose values will concentrate either around 0.5 or 0.8) and the values of data3 are concentrated around two values (either around 0.5 or 0.7). This way, given a new data point (with quotient and quotient_times), I want to know which cluster it belongs to by building each dataset stacking these two transformed features quotient and quotient_times. I am trying it with KMeans clustering as the following
from sklearn.cluster import KMeans
k_means = KMeans(n_clusters=3, random_state=0)
k_means.fit(quotient)
But this is giving me an error: ValueError: n_samples=1 should be >= n_clusters=3. How can we fix this error?
Update: samlpe quotient data = array([ 0.7 , 0.7 , 0.4973262 , 0.7008547 , 0.71287129,
0.704 , 0.49723757, 0.49723757, 0.70676692, 0.5 ,
0.5 , 0.70754717, 0.5 , 0.49723757, 0.70322581,
0.5 , 0.49723757, 0.49723757, 0.5 , 0.49723757])



