Revisions to Memory on multiple cores versus 1 core

added 94 characters in body

Source Link

edited Nov 10, 2022 at 8:46

49.4k
4
71
137

Matrix multiplication with a TB of data, memory bandwidth can easily be your bottleneck. You just have 40 CPUs, probably nicely vectorised code… That will put lots of pressure on your memory subsystem. And the better your code, the higher the pressure.

I would start by making sure that you are partitioning the work into chunks that can be handled inside your cache on each CPU. Splitting each matrix into 256 chunks < 2 MB. You might check how much work per dollar you get out of a M1 Mac. Not that much RAM, but tons of bandwidth (800GB per second), very fast SSD / virtual memory and a lot cheaper than any 1TB machine.

PS your caches may have problems if the distance between rows is a power of two. A 4096x4096 matrix could be a very bad idea. If that’s what you have, try changing it to 4100 x 4096 for example.

added 94 characters in body

Source Link

edited Nov 10, 2022 at 8:40

gnasher729

49.4k
4
71
137

Matrix multiplication with a TB of data, memory bandwidth can easily be your bottleneck. You just have 40 CPUs, probably nicely vectorised code… That will put lots of pressure on your memory subsystem.

I would start by making sure that you are partitioning the work into chunks that can be handled inside your cache on each CPU. Splitting each matrix into 256 chunks < 2 MB. You might check how much work per dollar you get out of a M1 Mac. Not that much RAM, but tons of bandwidth (800GB per second), very fast SSD / virtual memory and a lot cheaper than any 1TB machine.

Source Link

answered Nov 8, 2022 at 21:41

gnasher729

49.4k
4
71
137

Matrix multiplication with a TB of data, memory bandwidth can easily be your bottleneck. You just have 40 CPUs, probably nicely vectorised code…

I would start by making sure that you are partitioning the work into chunks that can be handled inside your cache on each CPU. Splitting each matrix into 256 chunks < 2 MB. You might check how much work per dollar you get out of a M1 Mac. Not that much RAM, but tons of bandwidth, very fast SSD and a lot cheaper than any 1TB machine.

Stack Exchange Network

Return to Answer