I have a Pandas Dataframe created from a dictionary with the following code:
import pandas as pd
pd.set_option('max_colwidth', 150)
df = pd.DataFrame.from_dict(data, orient= 'index', columns = ['text'])
df
The output is as follows:
                                            text
./form/2003Q4/0001041379_2003-12-15.html    \n10-K\n1\ng86024e10vk.htm\nAFC ENTERPRISES\n\n\n\nAFC ENTERPRISES\n\n\n\nTable of Contents\n\n\n\n\n\n\n\nUNITED STATES SECURITIES AND EXCHANGE\n...
./form/2007Q2/0001303804_2007-04-17.html    \n10-K\n1\na07-6053_210k.htm\nANNUAL REPORT PURSUANT TO SECTION 13 AND 15(D)\n\n\n\n\n\n\n   \nUNITED\nSTATES\nSECURITIES AND EXCHANGE\nCOMMISSION...
./form/2007Q2/0001349848_2007-04-02.html    \n10-K\n1\nff060310k.txt\n\n UNITED STATES\n SECURITIES AND EXCHANGE COMMISSION\n ...
./form/2014Q1/0001141807_2014-03-31.html    \n10-K\n1\nf32414010k.htm\nFOR THE FISCAL YEAR ENDED DECEMBER 31, 2013\n\n\n\nf32414010k.htm\n\n\n\n\n\n\n\n\n\n\nUNITED STATES\nSECURITIES AND EX...
./form/2007Q2/0001341853_2007-04-02.html    \n10-K\n1\na07-9697_110k.htm\n10-K\n\n\n\n\n\n\n   \n \nUNITED STATES\nSECURITIES AND EXCHANGE COMMISSION\nWashington, D.C. 20549\n \nFORM 10-K\n ...
I need to split the first column (the index) into three separate columns, Year & Qtr, CIK, Filing Data. So the values in these columns from the first row would be: 2003Q4, 0001041379, 2003-12-15.
I think that if this was in a proper column that I could do this using code similar to Example #2 found here:
https://www.geeksforgeeks.org/python-pandas-split-strings-into-two-list-columns-using-str-split/
However I am thrown by the fact that it is the index that I need to split, and not a named column.
Is there a way to separate the index or do I need to somehow save this as another column, and is this possible?
I'd appreciate any help. I am a newbie, so I don't always understand the more difficult solutions. Thanks in advance.
df.index.str.extract(r'.\/form\/(.*)\/(.*)_(.*).html')