I'm studying MPI and I'm trying to solve the exercise (number 3.5) taken from Pacheco's book Introduction to parallel programming. Essentially the exercise is:
Write an MPI program that computes the matrix vector product
Axusing a block-column distribution forA. Look at what the functionMPI_Reduce_scatterdoes. Assume that the number of processespdivides the sizes of the matrix.
Given a m by n matrix A, and p processes, I have as usual that local_m = m/pand local_n = n/p. Then:
- I have
p - 1rectangular sub-matrices of sizembylocal_n. - Each rank will multiply its
mbylocal_nwith the vectorx - I'll end up with
p - 1vectors, one per process, whose sum gives the desired resulty.
Here it's written that this MPI function applies a reduction on the vector and then scatters each segment to each process. In the following there's my code. I tested it with different matrices and it seems to work just fine. The vector x is living on every process.
#include <assert.h>
#include <mpi.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
const int m = 12;
const int n = 12;
double matrix[m][n];
double x[n]; // vector
// Store the actual matrix and vector
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
matrix[i][j] = i + j;
}
x[i] = 1. + i;
}
int rank, p;
int local_m, local_n;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &p);
assert(n % p == 0 && m % p == 0);
local_m = m / p;
local_n = n / p;
double local_matrix[m][local_n];
double local_y[m]; // the result on each process
double y[m]; // the vector that will store the correct result after summing
// from each process
int recvcounts[p];
for (int i = 0; i < p; i++) {
recvcounts[i] = m;
}
// Store on each process the local (rectangular matrix) and the local part of
// x
for (int i = 0; i < m; i++) { // get all the rows now
for (int j = 0; j < local_n; j++) { // only local_n columns
local_matrix[i][j] = matrix[i][j + rank * local_n];
}
}
// Compute the local mat-vec product, the result is a vector of length m on
// each process.
for (int i = 0; i < m; i++) {
local_y[i] = 0.;
for (int j = 0; j < local_n; j++) {
local_y[i] += local_matrix[i][j] * x[j + rank * local_n];
}
}
MPI_Reduce_scatter(local_y, y, recvcounts, MPI_DOUBLE, MPI_SUM,
MPI_COMM_WORLD);
if (rank == 0) {
printf("The expected result is \n");
double result[n];
for (int i = 0; i < m; i++) {
result[i] = 0.;
for (int j = 0; j < n; j++) {
result[i] += matrix[i][j] * x[j];
}
printf("result[%d] is %lf \n", i, result[i]);
}
printf("The result obtained is being printed on process %d: \n", rank);
for (int i = 0; i < m; i++)
printf("y[%d] = %lf \n", i, y[i]);
}
MPI_Finalize();
return 0;
}