Hardcoding matrix multiplication

Question

I was doing a problem, SPOJ: FLIB. My code was running slow (getting Time Limit Exceeded error) when I was using Code1.

However, my code accepted within the required time limit when I hardcoded the multiplication instead, using Code2. Can someone tell how this was a significant change? Note that this program relies heavily on multiplication, and it is the performed repeatedly.

The question is on matrix exponentiation, and multiplication of two matrices is a very crucial aspect in it.

A[3][3] and B[3][3] are two matrices which are multiplied, and whose result is stored in C[3][3]. M is an int with which I am supposed to take the modulus. M = 10^9 + 7. s is of type long long, and I have used to it to reduce the number of times I have to apply the modulus operator.

Each element in these matrices can be of the order of \$10^9\$, after taking the modulus.

Code1:

for(i=0;i<3;i++)
    for(j=0;j<3;j++)
        C[i][j] = 0;

for(i=0;i<3;i++)
    for(j=0;j<3;j++)
    {
        s=0;  
        for(k=0;k<3;k++)
            s += ((long long)(A[i][k]))*B[k][j];
        s = s%M;
        C[i][j] = s;
    }

Code2:

C[0][0]= (((long long)A[0][0])*B[0][0] + ((long long)A[0][1])*B[1][0] + ((long long)A[0][2])*B[2][0])%M;
C[0][1]= (((long long)A[0][0])*B[0][1] + ((long long)A[0][1])*B[1][1] + ((long long)A[0][2])*B[2][1])%M;
C[0][2]= (((long long)A[0][0])*B[0][2] + ((long long)A[0][1])*B[1][2] + ((long long)A[0][2])*B[2][2])%M;
C[1][0]= (((long long)A[1][0])*B[0][0] + ((long long)A[1][1])*B[1][0] + ((long long)A[1][2])*B[2][0])%M;
C[1][1]= (((long long)A[1][0])*B[0][1] + ((long long)A[1][1])*B[1][1] + ((long long)A[1][2])*B[2][1])%M;
C[1][2]= (((long long)A[1][0])*B[0][2] + ((long long)A[1][1])*B[1][2] + ((long long)A[1][2])*B[2][2])%M;
C[2][0]= (((long long)A[2][0])*B[0][0] + ((long long)A[2][1])*B[1][0] + ((long long)A[2][2])*B[2][0])%M;
C[2][1]= (((long long)A[2][0])*B[0][1] + ((long long)A[2][1])*B[1][1] + ((long long)A[2][2])*B[2][1])%M;
C[2][2]= (((long long)A[2][0])*B[0][2] + ((long long)A[2][1])*B[1][2] + ((long long)A[2][2])*B[2][2])%M;

This is the complete code:

#include <bits/stdc++.h>
using namespace std;

#define M 1000000007

void matmult(int A[3][3], int B[3][3], int C[3][3])
{

    C[0][0]= (((long long)A[0][0])*B[0][0] + ((long long)A[0][1])*B[1][0] + ((long long)A[0][2])*B[2][0])%M;
    C[0][1]= (((long long)A[0][0])*B[0][1] + ((long long)A[0][1])*B[1][1] + ((long long)A[0][2])*B[2][1])%M;
    C[0][2]= (((long long)A[0][0])*B[0][2] + ((long long)A[0][1])*B[1][2] + ((long long)A[0][2])*B[2][2])%M;
    C[1][0]= (((long long)A[1][0])*B[0][0] + ((long long)A[1][1])*B[1][0] + ((long long)A[1][2])*B[2][0])%M;
    C[1][1]= (((long long)A[1][0])*B[0][1] + ((long long)A[1][1])*B[1][1] + ((long long)A[1][2])*B[2][1])%M;
    C[1][2]= (((long long)A[1][0])*B[0][2] + ((long long)A[1][1])*B[1][2] + ((long long)A[1][2])*B[2][2])%M;
    C[2][0]= (((long long)A[2][0])*B[0][0] + ((long long)A[2][1])*B[1][0] + ((long long)A[2][2])*B[2][0])%M;
    C[2][1]= (((long long)A[2][0])*B[0][1] + ((long long)A[2][1])*B[1][1] + ((long long)A[2][2])*B[2][1])%M;
    C[2][2]= (((long long)A[2][0])*B[0][2] + ((long long)A[2][1])*B[1][2] + ((long long)A[2][2])*B[2][2])%M;

}

void matpow(int Z[3][3], long long n, int A[3][3])
{
    int temp[3][3];
    int i,j;

    A[0][0] = 1;
    A[0][1] = 0;
    A[0][2] = 0;
    A[1][0] = 0;
    A[1][1] = 1;
    A[1][2] = 0;
    A[2][0] = 0;
    A[2][1] = 0;
    A[2][2] = 1;


    while(n>0)
    {
        if(n&1)
        {
            matmult(A,Z,temp);

            for(i=0;i<3;i++)
                for(j=0;j<3;j++)
                    A[i][j] = temp[i][j];
        }

        matmult(Z, Z, temp);

        for(i=0;i<3;i++)
            for(j=0;j<3;j++)
                Z[i][j] = temp[i][j];

        n/=2;
    }

}

int main(int argc, const char * argv[])
{
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int t;
    long long n,ans = 0;

    int A[3][3];

    cin>>t;

    while(t--)
    {
        cin>>n;

        int Z[3][3] = {{1,2,1},{0,5,3},{0,3,2}};

        if(n>1)
        {
            matpow(Z,n-1,A);

            ans = ((long long)(A[0][0])*2 + (long long)(A[0][1])*5 + (long long)A[0][2]*3)%M;
        }

        else
        {
            if(n==0)
                ans = 0;
            else if(n==1)
                ans = 2;
            else if(n==2)
                ans = 15;
        }

        cout<<ans<<"\n";

    }

    return 0;
}

Hi. Welcome to Code Review! We tend to prefer questions with more context. In particular, a problem statement explaining what the purpose of the code is (preferably with a link to the original) and the rest of the code. As written, this looks more like a Stack Overflow question with a minimal example. I'd like to know how this code is called, how often, and what data types A, B, C, M, and s are. Also, what are typical values for each? — Brythan
– Brythan, Commented Jan 30, 2015 at 8:37
@Brythan: I have edited the question. Please let me know if there is something more I should add. — Dhruv Mullick
– Dhruv Mullick, Commented Jan 30, 2015 at 8:58
I'd strongly recommend posting the rest of the code. Particularly as the linked problem doesn't obviously involve matrix multiplication but Fibonacci calculations. As it is, people can't run the code, so no one can try making changes to see what happens. — Brythan
– Brythan, Commented Jan 30, 2015 at 9:10

Loki Astari · Accepted Answer · 2015-01-30 19:58:28Z

You also have this little loop in your original code that is not needed.

for(i=0;i<3;i++)
    for(j=0;j<3;j++)
        C[i][j] = 0;

Note: It is not needed because you have an explicit assignment to each element.

    C[i][j] = s;

So the two pieces of code are not equivalent.

Also you have introduced a bug:

C[2][2]= (((long long)A[2][0])*B[0][2] + ((long long)A[2][1])*B[1][2] + ((long long)A[2][2])*B[2][2])%M;

The operator % has a higher precedence than + so your modulus is being applied to the last element only before the addition.

// ie. You have
C[2][2] = T1 + T2 + (T3 % M);
// You want
C[2][2] = (T1 + T2 + T3) % M;

Other things:

// Don't do this.
using namespace std;

// Macros have not type information.
#define M 1000000007

// Prefer to use static const
static long long const M = 1000000007;

You may want to initialize t

int t;
cin>>t;    // If the user types in Fred. Then the read will fail
           // the value of t is then undefined

while(t--) // This could go for a very long time.

One variable per line pelase.

long long n,ans = 0;

Declare variables as close to the point of use as possible.

    cin>>n;  // n is defined outside the current scope.
             // yet not used anywhere but inside the loop

Some white space between identifiers would be nice.

ratchet freak · Accepted Answer · 2015-01-30 08:31:54Z

1

First did you compile with optimizations enabled? If not then do so when profiling. A good optimizer will produce equivalent code after unrolling the loops.

The first double for loop is superfluous. It is doing busy work that is overwritten in the second loop.

answered Jan 30, 2015 at 8:31

ratchet freak

13k22 silver badges46 bronze badges

Add a comment |

Stack Exchange Network

Hardcoding matrix multiplication

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Hardcoding matrix multiplication

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions