Skip to main content
added 127 characters in body
Source Link

Observe that each process only multiplies with a small part of x, so you should declare x[local_n].

Next, think about context. If you are doing a power method or so, your output functions as the input for the next matrix-vector multiplication. So inductively, if you start out using only part of x as input, you need only part of y as output, because that will be your next input. (You could also make the argument that one should never store redundant information unless strictly necessary.) In other words: your recvcounts vector should be filled with local_m, not the global m. (This way you are actually doing more of a "reduce-bcast" than a reduce scatter, which you could've done with an allreduce.)

Aside: you are using static allocation, which goes on the stack. It is better to do dynamic allocation on the heap.

But otherwise, nicely done. Finally, let me remark that Pacheco's book is quite old by now. May I suggest https://theartofhpc.com/pcse.html ?

Observe that each process only multiplies with a small part of x, so you should declare x[local_n].

Next, think about context. If you are doing a power method or so, your output functions as the input for the next matrix-vector multiplication. So inductively, if you start out using only part of x as input, you need only part of y as output, because that will be your next input. (You could also make the argument that one should never store redundant information unless strictly necessary.) In other words: your recvcounts vector should be filled with local_m, not the global m.

Aside: you are using static allocation, which goes on the stack. It is better to do dynamic allocation on the heap.

But otherwise, nicely done. Finally, let me remark that Pacheco's book is quite old by now. May I suggest https://theartofhpc.com/pcse.html ?

Observe that each process only multiplies with a small part of x, so you should declare x[local_n].

Next, think about context. If you are doing a power method or so, your output functions as the input for the next matrix-vector multiplication. So inductively, if you start out using only part of x as input, you need only part of y as output, because that will be your next input. (You could also make the argument that one should never store redundant information unless strictly necessary.) In other words: your recvcounts vector should be filled with local_m, not the global m. (This way you are actually doing more of a "reduce-bcast" than a reduce scatter, which you could've done with an allreduce.)

Aside: you are using static allocation, which goes on the stack. It is better to do dynamic allocation on the heap.

But otherwise, nicely done. Finally, let me remark that Pacheco's book is quite old by now. May I suggest https://theartofhpc.com/pcse.html ?

Source Link

Observe that each process only multiplies with a small part of x, so you should declare x[local_n].

Next, think about context. If you are doing a power method or so, your output functions as the input for the next matrix-vector multiplication. So inductively, if you start out using only part of x as input, you need only part of y as output, because that will be your next input. (You could also make the argument that one should never store redundant information unless strictly necessary.) In other words: your recvcounts vector should be filled with local_m, not the global m.

Aside: you are using static allocation, which goes on the stack. It is better to do dynamic allocation on the heap.

But otherwise, nicely done. Finally, let me remark that Pacheco's book is quite old by now. May I suggest https://theartofhpc.com/pcse.html ?