1

I have a txt file as following:

sub_ID: ['sub-01','sub-02']

ses_ID: ['ses-01','ses-01']

mean: [0.3456,0.446]

I want to read this and convert it to a dataframe such as in the image -don't mind the values in mean_e_field column, it's just an example. the values should be the same as in the txt file. desired dataframe

I tried this and got this however I can't transform it to my prefered df :dataframe data = pd.read_csv(filename, sep=",", header=None) data

I appreaciate your answers in advance.

2
  • You said that you've tried using read_csv, but the example data you've provided is not in a csv format (in fact, it seems like YAML). Is your data presented in the format above, i.e. one line per column and a list of values? Commented Jan 10, 2023 at 16:05
  • yes, my data is a txt file with each list in a separate line. I want to convert it to a dataframe where the first element is the column name and the others are row values. and with read_csv in pandas, I could automatically convert my txt file into a dataframe, but the dataframe I want is different than I got. Commented Jan 10, 2023 at 16:11

1 Answer 1

1

So, several things here.

The reason why your previous data = pd.read_csv(filename, sep=",", header=None) did not work is that you've indicated that it should separate on , and it treats every single line as a row to be split. So, sub_ID: [ 'sub-01','sub-02' ] is split to sub_ID: ['sub-01' and 'sub-02' ].

The example data you've provided seems to be in YAML format:

sub_ID: [ 'sub-01','sub-02' ]

ses_ID: [ 'ses-01','ses-01' ]

mean: [ 0.3456,0.446 ]

If it were CSV, the data would look as follows (it does not):

sub_ID,ses_ID,mean
sub-01,ses-01,0.3456
sub-02,ses-02,0.445

To read this data into a dataframe, you will either need to preprocess it into another format (e.g. csv) or read it as YAML into a dict and pass that to pandas.DataFrame.

For example:

import yaml
with open("data.txt", "r") as file:
    try:
        # This returns a dict from the given YAML data.
        data = yaml.safe_load(file)
    except yaml.YAMLError as exc:
        print(exc)

print(data)
# {'sub_ID': ['sub-01', 'sub-02'], 'ses_ID': ['ses-01', 'ses-01'], 'mean': [0.3456, 0.446]}

After that, you can create a DataFrame from this dict:

df = pd.DataFrame(data)
df.head()


+-----+--------+--------+--------+
|     | sub_ID | ses_ID |  mean  |
+-----+--------+--------+--------+
|   0 | sub-01 | ses-01 | 0.3456 |
|   1 | sub-02 | ses-02 |  0.446 |
+-----+--------+--------+--------+

as desired.

If you have certain entries that are not valid YAML, you will need to preprocess the data before loading it into pandas.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.