4

I am working on a project that plots clinical values using Matplotlib and want to display a y-axis with both negative and positive values going from -3 to 3. I'm getting the data from a DataFrame.

An example of the data I'm trying to plot:

analyte_name = ['Uric Acid - Basic', 'Urea', 'Triglycerides - Basic', 'Sodium', 'Potassium - Basic', 'Glucose - Basic', 'Gamma Glutamytranferase - Basic', 'Creatinine - Basic', 'Cholesterol Total - Basic', 'Cholesterol LDL - Basic', 'Cholesterol HDL - Basic', 'Chloride - Basic']
z_scores = ['-0.10', '-0.60', '-0.01', '-0.77', '-12.95', '-0.55', '-0.58', '-0.37', '-0.07', '0.19', '0.88', '0.69']

This is what I could come up with:

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np

df = pd.DataFrame({'x_':analyte_names, 'y_':z_scores})
fig = plt.figure()
ax = fig.add_subplot(111)

ax.set_xlabel('analyte name')
ax.set_ylabel('z-score')

# plt.axhline(0, color='black')
plt.ylim(-3, 3)
plt.xticks(rotation=90)
plt.scatter('x_', 'y_' ,data=df, marker='o')
# plt.style.use('seaborn-dark')
plt.show()

But this gives me a plot that looks like this:

y-axis plotted in sequence from z_scores[0] onwards but not displaying all z_scores

enter image description here

Commenting out the plt.ylim(-3, 3) line gives me an image like this:

y-axis plotted in sequence from z_scores[0] onwards and displaying all z_score but in sequence

enter image description here

The code I'm using is modified from one I tried using before which was:

df = pd.DataFrame({'x_':['A','B','C','D','E'], 
'y_':np.random.uniform(-3,3,5)})

fig = plt.figure()
ax = fig.add_subplot(111)

# ax.spines['top'].set_visible(False)
# ax.spines['right'].set_visible(False)

ax.set_xlabel('sample')
ax.set_ylabel('z-score')

plt.axhline(0, color='black')
plt.ylim(-3, 3)
plt.scatter('x_', 'y_' ,data=df, marker='o')
# plt.style.use('seaborn-dark')
plt.show()

That code generated what I want my final output to look like before some slight styling:

y axis with negative an positive values

enter image description here

I've been trying to use different methods to pass the data to the x and y axis like passing it as a dictionary but the results have been the same.

I'm still learning how to plot data and hope to can get some help.

Thanks.

1
  • What if you try plotting x = df['x_'].values and y = df['y'].values? Just wondering if you have the data as a raw numpy array if the error is reproduced. Edit: It looks like your z-scores are stored as strings - what if you change these to floats? Commented Mar 4, 2019 at 7:29

1 Answer 1

3

Your problem is because your z-scores are stored as strings. Matplotlib clearly doesn't interpret these as a numeric and just plots a straight line of the two 'categorical variables' against each other. To fix the issue convert your z-scores to floats:

import numpy as np

# convert to numpy arrays
analyte_name = np.array(['Uric Acid - Basic', 'Urea', 'Triglycerides - Basic', 'Sodium', 'Potassium - Basic', 'Glucose - Basic', 'Gamma Glutamytranferase - Basic', 'Creatinine - Basic', 'Cholesterol Total - Basic', 'Cholesterol LDL - Basic', 'Cholesterol HDL - Basic', 'Chloride - Basic'])
z_scores = np.array(['-0.10', '-0.60', '-0.01', '-0.77', '-12.95', '-0.55', '-0.58', '-0.37', '-0.07', '0.19', '0.88', '0.69'])

# plot, converting your z-scores to floats
plt.plot(analyte_name, z_scores.astype(float))

This will fix your problem!

Without converting them to floats I got this image:

zscores_as_strings

When converted you can see things are being plotted correctly:

z_scores_as_float

Edit:

You can see the reason it only plots 4 data points when you call plt.ylim(-3, 3) because it doesn't have any numerical points on the y-axis and so has no concept of this range. Therefore, it just plots the -3-->3 data points (i.e., the 0th, 1st, 2nd and 3rd data points).

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the help. Such a small thing had me racking my brain for hours. I hadn't even thought about checking the data type of the z_scores.
That’s OK, these things come with experience. I didn’t think to check until I saw the straight line of data despite the -12.95 in the middle! I guess it’s important to always be explicit and stores variables as you expect! Numbers should be integers or floats to ensure you get the expected behaviour. Unless of course you want string behaviour...
Nice. I'll remember this from now on. Thanks again.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.