0

I have some csv data pasted below. I would like to parse it and load it to a dataframe so that it is easier to analyze.

I want to grab the values based on each grouping of the logStreamName like so:

df = pd.read_csv('mydata.csv')

logs = df['logStreamName'].unique()

for i in logs:
    grouped_df = df[df['logStreamName'] == i]
    

But then how do I parse each subsetted dataframe to get the associated values

CSV data:

message,logStreamName
20/10/07 17:40:42 - INFO - dse_run_model - n_i*n_j*n_k*n_l: 247632,data-science-dse-cplex/default/27f44fce-90f8-40b2-83f7-3e1aef216fa6
20/10/07 17:40:42 - INFO - dse_model_assets - n_i*n_j*n_k*n_l = 247632,data-science-dse-cplex/default/27f44fce-90f8-40b2-83f7-3e1aef216fa6
20/10/07 17:40:42 - INFO - dse_run_model - len(placed_ijkl): 40944,data-science-dse-cplex/default/27f44fce-90f8-40b2-83f7-3e1aef216fa6
20/10/07 17:40:42 - INFO - dse_run_model - len(placed_region_ijl): 1706,data-science-dse-cplex/default/27f44fce-90f8-40b2-83f7-3e1aef216fa6
20/10/07 17:40:42 - INFO - dse_run_model - len(not_placed_region_ijl): 1706,data-science-dse-cplex/default/27f44fce-90f8-40b2-83f7-3e1aef216fa6
20/10/07 17:41:01 - INFO - __main__ - Maximum memory usage: 12258.98828125,data-science-dse-cplex/default/27f44fce-90f8-40b2-83f7-3e1aef216fa6
20/10/07 17:40:24 - INFO - dse_run_model - n_i*n_j*n_k*n_l: 323680,data-science-dse-cplex/default/11c5884b-f7c5-4600-99d2-70584036ba3d
20/10/07 17:40:24 - INFO - dse_model_assets - n_i*n_j*n_k*n_l = 323680,data-science-dse-cplex/default/11c5884b-f7c5-4600-99d2-70584036ba3d
20/10/07 17:40:24 - INFO - dse_run_model - len(placed_ijkl): 59280,data-science-dse-cplex/default/11c5884b-f7c5-4600-99d2-70584036ba3d
20/10/07 17:40:24 - INFO - dse_run_model - len(placed_region_ijl): 2964,data-science-dse-cplex/default/11c5884b-f7c5-4600-99d2-70584036ba3d
20/10/07 17:40:24 - INFO - dse_run_model - len(not_placed_region_ijl): 2964,data-science-dse-cplex/default/11c5884b-f7c5-4600-99d2-70584036ba3d
20/10/07 17:41:01 - INFO - __main__ - Maximum memory usage: 12313.5390625,data-science-dse-cplex/default/11c5884b-f7c5-4600-99d2-70584036ba3d
20/10/07 17:40:24 - INFO - dse_run_model - n_i*n_j*n_k*n_l: 301312,data-science-dse-cplex/default/cb304e99-2c5f-4a13-b454-32de8e1370e2
20/10/07 17:40:24 - INFO - dse_model_assets - n_i*n_j*n_k*n_l = 301312,data-science-dse-cplex/default/cb304e99-2c5f-4a13-b454-32de8e1370e2
20/10/07 17:40:25 - INFO - dse_run_model - len(placed_ijkl): 44128,data-science-dse-cplex/default/cb304e99-2c5f-4a13-b454-32de8e1370e2
20/10/07 17:40:25 - INFO - dse_run_model - len(placed_region_ijl): 2758,data-science-dse-cplex/default/cb304e99-2c5f-4a13-b454-32de8e1370e2
20/10/07 17:40:25 - INFO - dse_run_model - len(not_placed_region_ijl): 2758,data-science-dse-cplex/default/cb304e99-2c5f-4a13-b454-32de8e1370e2
20/10/07 17:41:07 - INFO - __main__ - Maximum memory usage: 12286.75,data-science-dse-cplex/default/cb304e99-2c5f-4a13-b454-32de8e1370e2

Final output:

d = {'n_i*n_j*n_k*n_l': [247632, 323680, 301312], 'len(placed_ijkl)': [40944, 59280, 44128], 
     'len(placed_region_ijl)':[1706, 2964, 2758], 'len(not_placed_region_ijl)': [1706, 2964, 2758],
     'Maximum memory usage': [12258.98828125, 12313.5390625, 12286.75]}
df = pd.DataFrame(data=d)
1

1 Answer 1

1

You can use a regular expression to capture the relevant bits out of the info column. Then use pivot to create the final output:

df[["id", "value"]] = df["message"].str.extract(".*-\s.*-\s(?P<id>.*)(?:\:\s|\s=\s)(?P<value>(?:\d+|\d+\.\d+)$)")

out = df.drop_duplicates(["logStreamName", "id"]).pivot(index="logStreamName", columns="id", values="value")

print(out)
id                   Maximum memory usage len(not_placed_region_ijl) len(placed_ijkl) len(placed_region_ijl) n_i*n_j*n_k*n_l
logStreamName                                                                                                               
data-science-dse-...        12313.5390625                 2964                  59280                 2964            323680
data-science-dse-...       12258.98828125                 1706                  40944                 1706            247632
data-science-dse-...             12286.75                 2758                  44128                 2758            301312
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.