I am new to python and pandas and have created a test web page with html code to use to help with learning how to pull the data and then format into CSV for use in excel. Below is the code I have come up with that puts it into a nice format but I am stuck on how to format it into a CSV file to import.
Code:
# Importing pandas
import pandas as pd
# The webpage URL whose table we want to extract
url = "/home/dvm01/e007"
# Assign the table data to a Pandas dataframe
table = pd.read_html(url,**index_col=0**)[0]
#table2 = pd.read_html(url)[0],pd.read_html(url)[1],pd.read_html(url)[6]
# Print the dataframe
print(table)
#print(table2)
# Store the dataframe in Excel file
#table.to_excel("data.xlsx")
Output:
Account Account.1
ID: e007
Description: ABST: 198, SUR: J DOUTHIT
Geo ID: 014.0198.0000
What I am trying to figure out is how to remove the index for the rows and make the text before the first: to be a column header. In row 1 I have two: but everything after the first: should be the data for the column header.
I would like to take the above current output and and have ID, Description, and Geo ID as the column headers and the text that comes after the ':' to be the data for each of the headers.
I do not need 'Account' and 'Account.1' I believe these are being recognized as column headers. Below is what I would like the output to look like in Excel, but I cannot figure out how to format it correctly to export out to a CSV that can be imported. Maybe I do not even need to import or format into a CSV, the 'table.to_excel' function seems to not need that step.
+------+---------------------------+---------------+
| ID | Description | Geo ID |
+------+---------------------------+---------------+
| e007 | ABST: 198, SUR: J Douthit | 014.0198.0000 |
+------+---------------------------+---------------+
I was able to remove the index numbers, by using index_col=0 above where I define the dfs variable. Not sure that is the best way but it does do what I was trying to accomplish for that portion.
Since I am new to python I am having a hard time formatting my question into Google or StackOverflow to get the answers I am looking for. If someone could just point me in the right direction in what I am looking for, that would work but examples would be nice as well.
Thanks for any guidance
index=Falsekeyword argument in your output function, such asto_csvorto_excel. A sample of how you want your output to look vs. how it looks now is always good practice.