Return to Answer

added 2 characters in body

Source Link

edited Aug 27, 2019 at 9:52

41.7k
7
70
134

I'm sorry, but this code is really hard to read. I must admit I don't know Cython too well, so I won't be able to comment too much on that part. But anyways, here are a few comments, in random order.

While Cython does not fully support docstrings (they do not show up in the interactive help), this should not prevent you from adding some to explain what the different functions do and what arguments they take.
You seem to be doing np.int64(start).astype('M8[D]').astype('M8[M]').view("int64") quite a lot. As far as I can tell, this extracts the month from a date, which was given as an integer(?). There is quite possibly a better way to do that (using the functions in datetime), but they might be slower. Nevertheless, you should put this into its own function.
You do freq.decode("utf-8")[len(freq)-1] twice. Do it once and save it to a variable. Also, freq[len(freq)-1] should be the same as freq[-1] and freq[:len(freq)-1] the same as freq[:-1]. This is especially costly as len(freq) is \$\mathcal{O}(n)\$ since freq is a char *as len(freq) is \$\mathcal{O}(n)\$ for char *, in Cython.
You create datetime.fromtimestamp(start*24*60*60) three times, once each to get the day, month and year. Save it to a variable and reuse it.

The last two comments in loanDates seem not to be true anymore:

 # If no dates generated (start date>end date)
 ts = ts

 # If last date generated is not end date add it
 return ts.astype('int64')

The documentation seems to recommend against using C strings, unless you really need them. If I read the documentation correctly you could just make the type of freq str and get rid of all your encode("utf-8") and decode("utf-8") code.
The definition of the months array is done every time the function get_days is called. In normal Python I would recommend making it a global constant, here you would have to try and see if it makes the runtime worse.
Python has an official style-guide, PEP8. Since Cython asis only an extension, it presumably also applies here. It recommends surrounding operators with whitespace (freq[len(freq) - 1]), using lower_case for all function and variable names and limiting your linelength (to 80 characters by default, but 120 is also an acceptable choice).

In the end, taking 1ms to create a date range is already quite fast. As you said this is already faster than pandas.daterange (which does a lot of parsing of the input first, which you avoid by passing in numbers directly). You might be able to push it down to microseconds, but you should ask yourself if and why you need this many dateranges per second.

While Cython does not fully support docstrings (they do not show up in the interactive help), this should not prevent you from adding some to explain what the different functions do and what arguments they take.
You seem to be doing np.int64(start).astype('M8[D]').astype('M8[M]').view("int64") quite a lot. As far as I can tell, this extracts the month from a date, which was given as an integer(?). There is quite possibly a better way to do that (using the functions in datetime), but they might be slower. Nevertheless, you should put this into its own function.
You do freq.decode("utf-8")[len(freq)-1] twice. Do it once and save it to a variable. Also, freq[len(freq)-1] should be the same as freq[-1] and freq[:len(freq)-1] the same as freq[:-1]. This is especially costly as len(freq) is \$\mathcal{O}(n)\$ since freq is a char *.
You create datetime.fromtimestamp(start*24*60*60) three times, once each to get the day, month and year. Save it to a variable and reuse it.

The last two comments in loanDates seem not to be true anymore:

 # If no dates generated (start date>end date)
 ts = ts

 # If last date generated is not end date add it
 return ts.astype('int64')

The documentation seems to recommend against using C strings, unless you really need them. If I read the documentation correctly you could just make the type of freq str and get rid of all your encode("utf-8") and decode("utf-8") code.
The definition of the months array is done every time the function get_days is called. In normal Python I would recommend making it a global constant, here you would have to try and see if it makes the runtime worse.
Python has an official style-guide, PEP8. Since Cython as only an extension, it presumably also applies here. It recommends surrounding operators with whitespace (freq[len(freq) - 1]), using lower_case for all function and variable names and limiting your linelength (to 80 characters by default, but 120 is also an acceptable choice).

While Cython does not fully support docstrings (they do not show up in the interactive help), this should not prevent you from adding some to explain what the different functions do and what arguments they take.
You seem to be doing np.int64(start).astype('M8[D]').astype('M8[M]').view("int64") quite a lot. As far as I can tell, this extracts the month from a date, which was given as an integer(?). There is quite possibly a better way to do that (using the functions in datetime), but they might be slower. Nevertheless, you should put this into its own function.
You do freq.decode("utf-8")[len(freq)-1] twice. Do it once and save it to a variable. Also, freq[len(freq)-1] should be the same as freq[-1] and freq[:len(freq)-1] the same as freq[:-1]. This is especially costly as len(freq) is \$\mathcal{O}(n)\$ for char *, in Cython.
You create datetime.fromtimestamp(start*24*60*60) three times, once each to get the day, month and year. Save it to a variable and reuse it.

The last two comments in loanDates seem not to be true anymore:

 # If no dates generated (start date>end date)
 ts = ts

 # If last date generated is not end date add it
 return ts.astype('int64')

The documentation seems to recommend against using C strings, unless you really need them. If I read the documentation correctly you could just make the type of freq str and get rid of all your encode("utf-8") and decode("utf-8") code.
The definition of the months array is done every time the function get_days is called. In normal Python I would recommend making it a global constant, here you would have to try and see if it makes the runtime worse.
Python has an official style-guide, PEP8. Since Cython is only an extension, it presumably also applies here. It recommends surrounding operators with whitespace (freq[len(freq) - 1]), using lower_case for all function and variable names and limiting your linelength (to 80 characters by default, but 120 is also an acceptable choice).

Source Link

answered Aug 20, 2019 at 9:20

Graipher

41.7k
7
70
134

While Cython does not fully support docstrings (they do not show up in the interactive help), this should not prevent you from adding some to explain what the different functions do and what arguments they take.
You seem to be doing np.int64(start).astype('M8[D]').astype('M8[M]').view("int64") quite a lot. As far as I can tell, this extracts the month from a date, which was given as an integer(?). There is quite possibly a better way to do that (using the functions in datetime), but they might be slower. Nevertheless, you should put this into its own function.
You do freq.decode("utf-8")[len(freq)-1] twice. Do it once and save it to a variable. Also, freq[len(freq)-1] should be the same as freq[-1] and freq[:len(freq)-1] the same as freq[:-1]. This is especially costly as len(freq) is \$\mathcal{O}(n)\$ since freq is a char *.
You create datetime.fromtimestamp(start*24*60*60) three times, once each to get the day, month and year. Save it to a variable and reuse it.

The last two comments in loanDates seem not to be true anymore:

 # If no dates generated (start date>end date)
 ts = ts

 # If last date generated is not end date add it
 return ts.astype('int64')

The documentation seems to recommend against using C strings, unless you really need them. If I read the documentation correctly you could just make the type of freq str and get rid of all your encode("utf-8") and decode("utf-8") code.
The definition of the months array is done every time the function get_days is called. In normal Python I would recommend making it a global constant, here you would have to try and see if it makes the runtime worse.
Python has an official style-guide, PEP8. Since Cython as only an extension, it presumably also applies here. It recommends surrounding operators with whitespace (freq[len(freq) - 1]), using lower_case for all function and variable names and limiting your linelength (to 80 characters by default, but 120 is also an acceptable choice).