Open In App

Covariance and Correlation

Last Updated : 23 Jul, 2025
Suggest changes
Share
16 Likes
Like
Report

Covariance and correlation are the two key concepts in Statistics that help us analyze the relationship between two variables. Covariance measures how two variables change together, indicating whether they move in the same or opposite directions.

independent_variables
Relationship between Independent and dependent variables

To understand this relationship better, consider factors like sunlight, water and soil nutrients (as shown in the image), which are independent variables that influence plant growth our dependent variable. Covariance measures how these variables change together, indicating whether they move in the same or opposite directions.

What is Covariance?

Covariance is a statistical which measures the relationship between a pair of random variables where a change in one variable causes a change in another variable. It assesses how much two variables change together from their mean values. Covariance is calculated by taking the average of the product of the deviations of each variable from their respective means. Covariance helps us understand the direction of the relationship but not how strong it is because the number depends on the units used. It’s an important tool to see how two things are connected.

  1. It can take any value between - infinity to +infinity, where the negative value represents the negative relationship whereas a positive value represents the positive relationship.
  2. It is used for the linear relationship between variables.
  3. It gives the direction of relationship between variables.

Covariance Formula

1. Sample Covariance

\text{Cov}_S(X, Y) = \frac{1}{n - 1} \sum_{i=1}^{n} (X_i - \overline{X})(Y_i - \overline{Y})

Where:

  • X_i​: The i^{th} value of the variable X in the sample.
  • Y_i​: The i^{th} value of the variable Y in the sample.
  • \overline{X}: The sample mean of variable X (i.e., the average of all X_i​ values in the sample).
  • \overline{Y}: The sample mean of variable Y (i.e., the average of all Y_i​ values in the sample).
  • n: The number of data points in the sample.
  • \sum: The summation symbol means we sum the products of the deviations for all the data points.
  • n - 1: This is the degrees of freedom. When working with a sample, we divide by n - 1 to correct for the bias introduced by estimating the population covariance based on the sample data. This is known as Bessel's correction.

2. Population Covariance

\text{Cov}_P(X, Y) = \frac{1}{n} \sum_{i=1}^{n} (X_i - \mu_X)(Y_i - \mu_Y)

Where:

  • X_i​: The i^{th} value of the variable X in the population.
  • Y_i​: The i^{th} value of the variable Y in the population.
  • \mu_X: The population mean of variable X (i.e., the average of all X_i​ values in the population).
  • \mu_Y: The population mean of variable Y (i.e., the average of all Y_i​ values in the population).
  • n: The total number of data points in the population.
  • \sum: The summation symbol means we sum the products of the deviations for all the data points.
  • n: In the case of population covariance, we divide by n because we are using the entire population data. There’s no need for Bessel’s correction since we’re not estimating anything.

Types of Covariance

  • Positive Covariance: When one variable increases, the other variable tends to increase as well and vice versa.
  • Negative Covariance: When one variable increases, the other variable tends to decrease.
  • Zero Covariance: There is no linear relationship between the two variables; they move independently of each other.

Example

Covariance and Correlation

What is Correlation?

Correlation is a standardized measure of the strength and direction of the linear relationship between two variables. It is derived from covariance and ranges between -1 and 1. Unlike covariance, which only indicates the direction of the relationship, correlation provides a standardized measure.

  • Positive Correlation (close to +1): As one variable increases, the other variable also tends to increase.
  • Negative Correlation (close to -1): As one variable increases, the other variable tends to decrease.
  • Zero Correlation: There is no linear relationship between the variables.

The correlation coefficient \rho (rho) for variables X and Y is defined as:

  1. Correlation takes values between -1 to +1, wherein values close to +1 represents strong positive correlation and values close to -1 represents strong negative correlation.
  2. In this variable are indirectly related to each other.
  3. It gives the direction and strength of relationship between variables.

Correlation Formula

Correlation Formula

Here,

  • x' and y' = mean of given sample set
  • n = total no of sample
  • x_i and y_i = individual sample of set

Example

Covariance and Correlation

Difference between Covariance and Correlation

This table shows the difference between Covariance and Covariance:

CovarianceCorrelation
Covariance is a measure of how much two random variables vary togetherCorrelation is a statistical measure that indicates how strongly two variables are related.
Involves the relationship between two variables or data sets Involves the relationship between multiple variables as well
Lie between -infinity and +infinity Lie between -1 and +1
Measure of correlation Scaled version of covariance
Provides direction of relationship Provides direction and strength of relationship
Dependent on scale of variable Independent on scale of variable
Have dimensions Dimensionless

They key difference is that Covariance shows the direction of the relationship between variables, while correlation shows both the direction and strength in a standardized form.

Applications of Covariance and Correlation

Applications of Covariance

  • Portfolio Management in Finance: Covariance is used to measure how different stocks or financial assets move together, aiding in portfolio diversification to minimize risk.
  • Genetics: In genetics, covariance can help understand the relationship between different genetic traits and how they vary together.
  • Econometrics: Covariance is employed to study the relationship between different economic indicators, such as the relationship between GDP growth and inflation rates.
  • Signal Processing: Covariance is used to analyze and filter signals in various forms, including audio and image signals.
  • Environmental Science: Covariance is applied to study relationships between environmental variables, such as temperature and humidity changes over time.

Applications of Correlation

  • Market Research: Correlation is used to identify relationships between consumer behavior and sales trends, helping businesses make informed marketing decisions.
  • Medical Research: Correlation helps in understanding the relationship between different health indicators, such as the correlation between blood pressure and cholesterol levels.
  • Weather Forecasting: Correlation is used to analyze the relationship between various meteorological variables, such as temperature and humidity, to improve weather predictions.
  • Machine Learning: Correlation analysis is used in feature selection to identify which variables have strong relationships with the target variable, improving model accuracy.

Related Post:

Suggested Quiz
5 Questions

What is the main difference between covariance and correlation?

  • A

    Covariance measures both direction and strength, while correlation only measures direction

  • B

    Covariance depends on the scale of variables, whereas correlation is standardized

  • C

    Covariance is always positive, while correlation can be negative

  • D

    Covariance is used only for categorical data, while correlation is used for numerical data

Explanation:

Covariance indicates the direction of the relationship but is scale-dependent, making it difficult to compare across datasets. Correlation standardizes covariance making it unit-free and easier to interpret.

Which of the following is true about a covariance matrix?

  • A

    It only measures relationships between two variables

  • B

    The diagonal elements represent the covariance between different variables

  • C

    It is a square matrix where diagonal elements represent variance and off-diagonal elements represent covariance

  • D

    It is used to find non-linear relationships in a dataset

Explanation:

The covariance matrix is a square matrix where the diagonal elements capture the variance of individual features, while the off-diagonal elements show how pairs of features vary together.

What does covariance measure?

  • A

    Difference between two means

  • B

    Relationship strength and direction between two variables

  • C

    Spread of a single variable

  • D

    Probability of events

Explanation:

Covariance tells us how two variables change together — positively, negatively or not at all.

If the covariance between two variables is positive, what does it indicate?


  • A

    Variables move in opposite directions

  • B

    One variable is constant

  • C

    Variables move in the same direction

  • D

    No relationship

Explanation:

A positive covariance means both variables increase or decrease together.

Which of the following is true about correlation?


  • A

    It has no units

  • B

    It ranges from 0 to 1

  • C

    It is always positive

  • D

    It measures frequency

Explanation:

Correlation is unit-free and shows how strong the linear relationship is.

Quiz Completed Successfully
Your Score :   2/5
Accuracy :  0%
Login to View Explanation
1/5 1/5 < Previous Next >

Explore