I'm sorry, but this code is really hard to read. I must admit I don't know Cython too well, so I won't be able to comment too much on that part. But anyways, here are a few comments, in random order.
While Cython does not fully support docstrings (they do not show up in the interactive
help), this should not prevent you from adding some to explain what the different functions do and what arguments they take.You seem to be doing
np.int64(start).astype('M8[D]').astype('M8[M]').view("int64")quite a lot. As far as I can tell, this extracts the month from a date, which was given as an integer(?). There is quite possibly a better way to do that (using the functions indatetime), but they might be slower. Nevertheless, you should put this into its own function.You do
freq.decode("utf-8")[len(freq)-1]twice. Do it once and save it to a variable. Also,freq[len(freq)-1]should be the same asfreq[-1]andfreq[:len(freq)-1]the same asfreq[:-1]. This is especially costly aslen(freq)is \$\mathcal{O}(n)\$ sincefreqis achar *aslen(freq)is \$\mathcal{O}(n)\$ forchar *, in Cython.You create
datetime.fromtimestamp(start*24*60*60)three times, once each to get the day, month and year. Save it to a variable and reuse it.The last two comments in
loanDatesseem not to be true anymore:# If no dates generated (start date>end date) ts = ts # If last date generated is not end date add it return ts.astype('int64')The documentation seems to recommend against using C strings, unless you really need them. If I read the documentation correctly you could just make the type of
freqstrand get rid of all yourencode("utf-8")anddecode("utf-8")code.The definition of the
monthsarray is done every time the functionget_daysis called. In normal Python I would recommend making it a global constant, here you would have to try and see if it makes the runtime worse.Python has an official style-guide, PEP8. Since Cython asis only an extension, it presumably also applies here. It recommends surrounding operators with whitespace (
freq[len(freq) - 1]), usinglower_casefor all function and variable names and limiting your linelength (to 80 characters by default, but 120 is also an acceptable choice).
In the end, taking 1ms to create a date range is already quite fast. As you said this is already faster than pandas.daterange (which does a lot of parsing of the input first, which you avoid by passing in numbers directly). You might be able to push it down to microseconds, but you should ask yourself if and why you need this many dateranges per second.