Check for Possible Errors
There are a number of possible errors in the code:
- The variable
size may be zero which will lead to division by zero.
- The vectors A and B may not be the same length, if the
size variable
is the size of the shorter vector this is not a problem, but if the
size variable is the size of the larger vector this can lead to unknown
behavior.
- Either d_a or d_b can add up to zero, which can lead to division by zero
as @coderodde pointed out.
C++ provides the exception class std:exception that has multiple sub classes.
Two of the classes are std::runtime_error and std::logic_error. Item 1 and 2
above belong in the std::logic_error class, item 3 belongs in the
std::runtime_error class.
Code Readability and Maintainability
The code has:
double mul = 0.0, d_a = 0.0, d_b = 0.0 ;
This is less readable and maintainable than putting the initialization on separate lines:
double mul = 0.0;
double d_a = 0.0;
double d_b = 0.0;
Direct Addressing Versus Indirect Addressing
The the most efficient way to write the function is to use direct addressing rather than indirect addressing. Since the code is already passing pointers
use the pointers rather than indexing them.
double cosine_similarity(double *A, double *B, unsigned int size)
{
double mul = 0.0;
double d_a = 0.0;
double d_b = 0.0 ;
for(unsigned int i = 0; i < size; ++i)
{
mul += *A * *B;
d_a += *A * *A;
d_b += *B * *B;
A++;
B++;
}
if (d_a == 0.0f || d_b == 0.0f)
{
throw std::logic_error(
"cosine similarity is not defined whenever one or both "
"input vectors are zero-vectors.");
}
return mul / (sqrt(d_a) * sqrt(d_b)) ;
}
Using Container Classes Can be Safer as Well as Efficient
The container classes such as std::vector and std::array provide safer containers than C style arrays. If the number of elements in the array are know, the std::array class is more efficient. If the number of elements in the array are not known std::vector provides a flexible as well as convenient way to store values.
These container classes provide additional information such as the size
of the array/vector and iterators for direct addressing. They also allow
for indexing through the 'array'.
#include <math.h>
#include <vector>
#include <stdexcept>
double cosine_similarity_vectors(std::vector<double> A, std::vector<double>B)
{
double mul = 0.0;
double d_a = 0.0;
double d_b = 0.0 ;
if (A.size() != B.size())
{
throw std::logic_error("Vector A and Vector B are not the same size");
}
// Prevent Division by zero
if (A.size() < 1)
{
throw std::logic_error("Vector A and Vector B are empty");
}
std::vector<double>::iterator B_iter = B.begin();
std::vector<double>::iterator A_iter = A.begin();
for( ; A_iter != A.end(); A_iter++ , B_iter++ )
{
mul += *A_iter * *B_iter;
d_a += *A_iter * *A_iter;
d_b += *B_iter * *B_iter;
}
if (d_a == 0.0f || d_b == 0.0f)
{
throw std::logic_error(
"cosine similarity is not defined whenever one or both "
"input vectors are zero-vectors.");
}
return mul / (sqrt(d_a) * sqrt(d_b));
}