0

I have data coming from Redis and I'm struggling to convert this data in DF

Data from Redis

data = ["[Timestamp('2018-05-22 09:15:00'), 3555.75, 3559.15, 3546.45, 3548.3, 34250, 'Green', 34250]",
 "[Timestamp('2018-05-22 09:16:00'), 3549.05, 3551, 3543.25, 3548, 19500, 'Green', 53750]",
 "[Timestamp('2018-05-22 09:17:00'), 3548.95, 3553.2, 3548.05, 3548.9, 12000, 'Green', 65750]"]

How to store above data in pandas dataframe in below columns

df = pd.DataFrame(columns= 'date','open','high','close','low','volume','close','total_volume'])
1
  • 1
    The easiest approach is going to be changing how to send this data upstream. Is that a possibility for you? Commented May 22, 2018 at 19:46

2 Answers 2

2

As I said above, the easiest approach here is changing how you send the data upstream. If that is not an option, here is an approach using your current data:

split with strip

data = [i.strip('[]').split(',') for i in data]

pd.DataFrame

df = pd.DataFrame(data, columns=['date','open','high','close','low','volume','close','total_volume'])

                               date      open      high     close      low  \
0  Timestamp('2018-05-22 09:15:00')   3555.75   3559.15   3546.45   3548.3
1  Timestamp('2018-05-22 09:16:00')   3549.05      3551   3543.25     3548
2  Timestamp('2018-05-22 09:17:00')   3548.95    3553.2   3548.05   3548.9

   volume     close total_volume
0   34250   'Green'        34250
1   19500   'Green'        53750
2   12000   'Green'        65750

If your Timestamp column always has the above format, you can postprocess it using basic string slicing:

pd.to_datetime(df.date.str[11:-2])

0   2018-05-22 09:15:00
1   2018-05-22 09:16:00
2   2018-05-22 09:17:00
Name: date, dtype: datetime64[ns]
Sign up to request clarification or add additional context in comments.

5 Comments

I would suggest to use ast.literal_eval instead of manually parsing the strings. See stackoverflow.com/questions/10775894/… for reference.
Using ast.literal_eval will result in a malformed node or string error here. That was my first thought as well.
The timestamps are the problem, but this way they are also awkward, the timestamp string needs further parsing.
The above solution works like a charm. However, data cleaning is time-consuming. I receive a chunk of data from API and that data is stored in Redis. While retrieving data from Redis I'm having trouble to put the data in pandas DataFrame in one shot. Attached link to data s3-us-west-2.amazonaws.com/shared-girish/Data.txt @chrisz any better way to upload data to Redis? As I'm using rpush(data) to push data in Redis
I was able to upload data in redis as dictionary. Thank you @chrisz for direction
0
import pandas as pd
import numpy as np
import datetime

data = [[pd.Timestamp('2018-05-22 09:15:00'), 3555.75, 3559.15, 3546.45, 3548.3, 34250, 'Green', 34250],
[pd.Timestamp('2018-05-22 09:16:00'), 3549.05, 3551, 3543.25, 3548, 19500, 'Green', 53750],
[pd.Timestamp('2018-05-22 09:17:00'), 3548.95, 3553.2, 3548.05, 3548.9, 12000, 'Green', 65750]]


DataFrame = pd.DataFrame(data, columns=['date', 'open', 'high', 'close', 'low', 'volume', 'close', 'total_volume'])
print(DataFrame)

And here is your output:

             date     open     high    close     low  volume  close  \
0 2018-05-22 09:15:00  3555.75  3559.15  3546.45  3548.3   34250  Green   
1 2018-05-22 09:16:00  3549.05  3551.00  3543.25  3548.0   19500  Green   
2 2018-05-22 09:17:00  3548.95  3553.20  3548.05  3548.9   12000  Green   

total_volume  
0         34250  
1         53750  
2         65750  

4 Comments

This is not particularly helpful, as he is getting the data as a list of strings, not as a list of pd.Timestamps and other data.
@chrisz Thanks for the feeback. Could you clarify? The OP doesn't specify in terms of using a pandas Timestamp. They can obviouslyuse this later to slice and look up data values. The string 'green' is returned. I also assumed that since this is stock data, they would indeed want numerical data besides the string 'Green' (for a Green close). I may have made the wrong assumption, but that's why I intentionally got rid of the strings.
You are treating the data like an actual list. What he has is the string representation of lists, and due to the Timestamp, they cannot be parsed using something like ast.literal_eval. If he had the actual list, you're approach would work fine.
@chrisz I see what you mean. I did that intentionally, and must have misunderstood what the OP was looking for. That's my error. That's why I converted the string representations to actual lists.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.