Z-score normalization in pandas DataFrame (python)

Question

I am using python3 (spyder), and I have a table which is the type of object "pandas.core.frame.DataFrame". I want to z-score normalize the values in that table (to each value substract the mean of its row and divide by the sd of its row), so each row has mean=0 and sd=1. I have tried 2 approaches.

First approach

from scipy.stats import zscore
zetascore_table=zscore(table,axis=1)

Second approach

rows=table.index.values
columns=table.columns
import numpy as np
for i in range(len(rows)):
    for j in range(len(columns)):
         table.loc[rows[i],columns[j]]=(table.loc[rows[i],columns[j]] - np.mean(table.loc[rows[i],]))/np.std(table.loc[rows[i],])
table

Both approaches seem to work, but when I check the mean and sd of each row it is not 0 and 1 as it is suppose to be, but other float values. I don´t know which can be the problem.

Thanks in advance for your help!

Maybe worth noting that, (a) df['z score'] = zscore(df['col A']) and (b) df['z score'] = (df['col A']-df['col A'].mean())/df['col A'].std() do not give exactly the same z-scores. (a) uses zero degrees of freedom and (b) uses 1 degree of freedom for the std dev by default. Depending on application, you can set the ddof equal--eg using df['col A'].std(ddof=0) in (b) will make them equal (default with zscore() is ddof=0). See stackoverflow.com/questions/59668597/… for ddof. — J Prestone
– J Prestone, Commented Jun 10, 2023 at 21:41

BGG16 · Accepted Answer · 2021-02-26 22:51:43Z

14

The code below calculates a z-score for each value in a column of a pandas df. It then saves the z-score in a new column (here, called 'num_1_zscore'). Very easy to do.

from scipy.stats import zscore
import pandas as pd

# Create a sample df
df = pd.DataFrame({'num_1': [1,2,3,4,5,6,7,8,9,3,4,6,5,7,3,2,9]})

# Calculate the zscores and drop zscores into new column
df['num_1_zscore'] = zscore(df['num_1'])

display(df)

answered Feb 26, 2021 at 22:51

BGG16

5285 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

pablo11prade · Accepted Answer · 2020-01-09 21:14:15Z

Sorry, thinking about it I found myself another easier way to calculate z-score (substract the mean of each row and divide the result by the sd of the row) than the for loops:

table=table.T# need to transpose it since the functions work like that 
sd=np.std(table)
mean=np.mean(table)
numerator=table-mean #numerator in the formula for z-score 
z_score=numerator/sd
z_norm_table=z_score.T #we transpose again and we have the initial table but with all the 
#values z-scored by row.

I checked and now mean in each row is 0 or very close to 0 and sd is 1 or very close to 1, so like that was working for me. Sorry, I have few experience with coding and sometimes easy things require a lot of trials until I figure out how to solve them.

Collectives™ on Stack Overflow

Z-score normalization in pandas DataFrame (python)

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related