I am trying to get this code running fast in python however I am having trouble getting it to run anywhere near the speed it runs in MATLAB. The problem seems to be this for loop which takes about 2 second to run when the number "SRpixels" is approximately equal to 25000.
I cant seem to find any way to trim this down any further, and I am looking for suggestions.
The datatypes for the numpy arrays below are float32 for all except the **_Location[] which are uint32.
for j in range (0,SRpixels):
    #Skip data if outside valid range
    if (abs(SR_pointCloud[j,0]) > SR_xMax or SR_pointCloud[j,2] > SR_zMax or SR_pointCloud[j,2] < 0):
        pass
    else:           
        RIGrid1_Location[j,0] = np.floor(((SR_pointCloud[j,0] + xPosition + 5) - xGrid1Center) / gridSize)
        RIGrid1_Location[j,1] = np.floor(((SR_pointCloud[j,2] + yPosition) - yGrid1LowerBound) / gridSize)
        RIGrid1_Count[RIGrid1_Location[j,0],RIGrid1_Location[j,1]] += 1
        RIGrid1_Sum[RIGrid1_Location[j,0],RIGrid1_Location[j,1]] += SR_pointCloud[j,1]
        RIGrid1_SumofSquares[RIGrid1_Location[j,0],RIGrid1_Location[j,1]] += SR_pointCloud[j,1] * SR_pointCloud[j,1]
        RIGrid2_Location[j,0] = np.floor(((SR_pointCloud[j,0] + xPosition + 5) - xGrid2Center) / gridSize)
        RIGrid2_Location[j,1] = np.floor(((SR_pointCloud[j,2] + yPosition) - yGrid2LowerBound) / gridSize)
        RIGrid2_Count[RIGrid2_Location[j,0],RIGrid2_Location[j,1]] += 1 
        RIGrid2_Sum[RIGrid2_Location[j,0],RIGrid2_Location[j,1]] += SR_pointCloud[j,1]
        RIGrid2_SumofSquares[RIGrid2_Location[j,0],RIGrid2_Location[j,1]] += SR_pointCloud[j,1] * SR_pointCloud[j,1]
I did attempt to use Cython, where I replaced  j with a cdef int j and compiled. There was no noticeable performance gain. Anyone have suggestions?

SR_pointCloud[j,2]for example, as you call that several times.