##You have a threading bug here:
void fetch_weather_async()
{
while (current_city + 1 < (int) cities.size()){
// Say there is only one city left to fetch the data from.
// And both threads have reached this lock at the same time
// (multi processor system). One will get the lock the other
// will stall waiting for the lock.
city_counter_lock.lock();
int city_index = 0;
// The first thread through the lock will satisfy this
// condition and get to processes the last city.
// The second thread through the lock (after the first
// releases it) will find this condition will fail and
// as a result have `city_index = 0`
if (current_city + 1 < (int) cities.size()) {
city_index = ++current_city;
}
city_counter_lock.unlock();
// Both threads now execute the following code.
// But one is set to processes the last item,
// the other is going to re-processes the first item.
std::cout << "requesting " << city_index << std::endl;
fetchJson(city_index);
std::cout << "acquired " << city_index << std::endl;
// You have encountered your first race condition.
}
}
##Your timing is not accurate:
std::thread thread_1(&fetch_weather_async);
std::thread thread_2(&fetch_weather_async);
// By the time you get the time here.
// The two threads could have actually finished.
// So you are just timing the cost of the join operation.
// --
// Or the threads could be half finished by the time you get
// here. So what you are timing here is indeterminate.
start = std::time(nullptr);
thread_1.join();
thread_2.join();
finish = std::time(nullptr);
##Your biggest bottle neck is here:
city_counter_lock.lock();
int city_index = 0;
if (current_city + 1 < (int) cities.size()) {
city_index = ++current_city;
}
city_counter_lock.unlock();
By simply removing this bottle neck I removed 5 seconds (averaged over three runs). Time to run was reduced from 18 to 13 seconds (timed externally). I simply made one thread do even cities and the other thread do odd cities (I know its not a perfect solution but I wanted do demonstrate the cost of threads being blocked by each other on such a coarse grain).
// I changed the declaration of `fetch_weather_async()`
void fetch_weather_async(bool);
// I changed how the the thread is called.
std::thread thread_1(fetch_weather_async, false);
std::thread thread_2(fetch_weather_async, true);
// Finally I simply changed the function
// so they started at different points.
void fetch_weather_async(bool odd){
unsigned int cityIndex = (odd ? 1 : 0);
for (;cityIndex < cities.size(); cityIndex += 2){
std::cout << "requesting " << cityIndex << std::endl;
fetchJson(cityIndex);
std::cout << "acquired " << cityIndex << std::endl;
}
}
##Another Bug (but this one a normal bug)
// This is a classic anti pattern for reading a file.
// The trouble is the last read reads upto but not past
// the EOF. Thus even though there is no data left in the
// file to read the method eof() will return false.
// So you will enter the loop even though there is nothing
// to read.
while(! idFile.eof()){
// The problem is here.
// If there is nothing left to read in the file this
// getline will fail. You should be checking the result
// of calling getline before performing the next action.
//
// I am not sure of the affect on `currentLine` when
// the call fails it is either made blank or left unchanged.
// So the last line in your list is either empty
// or a repeat of the last item in the file.
getline (idFile, currentLine);
cities.push_back(currentLine);
}
The correct way to use a loop to read a file is:
std::ifstream file("FileName");
std::string line;
// Notice we check the result of the `getline()` as part of
// the loop condition. So we only enter the loop body if the
// read happened correctly.
while(std::getline(file, line)) {
cities.push_back(line);
}
If all your cities are guaranteed to be one word. you can use a simpler trick.
std::vector<std::string> cities(std::istream_iterator<std::string>(file),
std::istream_iterator<std::string>());
##Threading and Networking.
I would say that threading and networking is not the best way to demonstrate a good threading application.
The problem is that your thread spends an awful lot of its time blocked waiting for network to catch up and give it data. Usually what you do is have a single thread handling many connections simultaneously and use an even library to tell the thread when there is action on any of the sockets. I like libEvent for this it is one of the more modern ones.
But Curl has its own way of doing this with the "Multi Handle". You can create many curl handles and attach them to a multi handle then a single thread will watch all the handles simultaneously.
##Curl Handle re-use.
Like Edward mentioned (but worth re-mentioning). Reusing the same curl handle is essential to efficiency when connecting to the same site.
##Code Review
Lets stop using C-casts. They are hard to see and dangerous.
((std::string*)userp)->append((char*)contents, size * nmemb);
To make it obvious for code review please use the C++ casts. They emphasis how dangrerious they are and make them visable:
reinterpret_cast<std::string*>(userp)->append(static_cast<char*>(contents), size * nmemb);