1

I wondering what would be the way to design a web service like this:

Say I have a server listening for requests, it receives some key and checks if it's cached (for example using some DB) and if it's not it does some processing, generates the answer, stores it in cache DB and returns answer to client.

This seems to work OK but what happens if two clients request for the same non-existent key? In this case a race condition would happen, so it would look like

client 1 -> check cache DB -> generate answer -> store in cache -> reply to client
client 2 -> check cache DB -> generate answer -> store in cache -> reply to client

One way to avoid this issue would be using a UNIQUE feature in the DB, so whenever the second answer is generated and written to the DB, some error happens. This is fine but seems more like a patch rather than a real solution. Specially, imagine a case where generating the answer takes a lot of processing, then something else would be preferable.

One option I can think of is using job queues, so whenever a key is received, the key is either appended to an existing job, or a new job is added to the queue.

I've been playing with node.js for some weeks and I'm surprised that I haven't found examples showing this kind of use case. So I'm wondering if this is an acceptable solution for cases like this, or something better exists?

0

2 Answers 2

2

Here is how you can do that in a single-process setup:

var Emitter = require('events').EventEmitter;

var requests = Object.create(null);

function getSomething (key, callback) {

  var request = requests[key];

  if (!request) {
    request = requests[key] = new Emitter;

    getSomethingActually(key, function (err, result) {
      delete requests[key];
      if (err) return request.emit('error', err);
      request.emit('result', result);
    });
  }

  request.once('result', function (result) {
    callback(null, result);
  });

  request.once('error', function (err) {
    callback(err);
  });

}

if you want to scale this, you need to use some external storage + event bus, like redis.

Sign up to request clarification or add additional context in comments.

Comments

0

You should be using job queues (or some other sort of offloading jobs) either way. Processing-intensive tasks should always be taken out of your main Node application (either by a queue, spawning it as a separate process, etc) or else it will block the event loop, thus blocking all other requests.

This being said, if you choose to use a queue of some sort that can have a unique constraint, such as a postgres backed queue, and set a unique constraint on the key, duplicates will never be inserted into the work queue, so will never be processed twice. You can simply ignore a unique constraint error in this case.

Note that it is still likely possible, yet very unlikely, to have a sequence of events like:

  1. request check the 'cache' for key x, gets a miss
  2. worker completes answer for key x, inserts it into 'cache', removes x from queue
  3. request received a miss for key x, adds it to the queue
  4. worker pulls key x from the queue, starts computation

After this (probably unlikely) sequence of events, the second worker would get an error inserting the key. In my opinion, this is probably an unlikely enough event that adding a unique key constraint and just ignoring a unique constraint violation error on the second worker is probably a viable enough option.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.