1

Assume we have 3 services which works in parallel and doing writes to our MongoDB storage. They create the records, which contain following info:

{
   guid: GUID,
   ts: Timestamp,
   data: Object
}

As result in the MongoDB storage in one moment of time should be either nothing for particular GUID, or the LATEST (max ts) record.

Small example:

  1. 1 call with {guid: 1, ts: 10, data: {}} - inserted {guid: 1, ts: 10, data: {}}
  2. 2 call with {guid: 1, ts: 5, data: {}} - nothing updated
  3. 3 call with {guid: 1, ts: 15, data: {}} - document updated with ts: 15 and new data.

In other words, we have to insert the record if there is no such document with provided GUID, and update the record in case when such guid already exist and the ts is greater then in existed record. DO NOT update the record, if ts is less then in existed record.

I understand that this is some kind of upsert operation, but I can't imagine how to deal with this. Tried to use findAndModify, mapReduce or $max update operator, but no luck atm. Thank you in advance.

2
  • Can the same guid ever be modified by two things at once? What are the rules in that case? The highest timestamp should always win and have its ts and data set on the document? Commented Mar 18, 2015 at 20:52
  • Yes, you have a point. If few update operations take place, the document with the highest timestamp should win and it data should be set to this document. Also I want to mention that you can consider guid field as a primary key for the document (we can put it to _id field or apply unique index). Commented Mar 19, 2015 at 7:46

2 Answers 2

2

There's three parts to the logic that we need to model. Given a new document

var newDoc = { "guid" : 1, "ts" : 10, "data" : "asdf" }
  1. there is no document with the same guid as newDoc - insert newDoc
  2. there is a document oldDoc with the same guid as newDoc and oldDoc.ts < newDoc.ts - newDoc overrides oldDoc
  3. there is a document oldDoc with the same guid as newDoc and oldDoc.ts > newDoc.ts - newDoc has no effect

If the ts values are equal, I guess we don't care which of oldDoc or newDoc we keep.

There isn't one step to handle all this logic. We can handle the first condition with either the second or third using one update, but handling the three together requires multiple steps. Example code for the shell, assuming unique index on { "guid" : 1 }:

// just try to insert
var wr = db.test.insert(newDoc)
if (wr.getWriteError() && wr.getWriteError.code === 11000) {
    // this is case 2 or 3 - 11000 is duplicate key error
    // so update if case 2
    db.test.update({ "guid" : newDoc.guid, "ts" : { "$lte" : newDoc.ts } }, newDoc)
}
else {
    // this is case 1 - newDoc was inserted if there weren't other errors
}

I think this order works for concurrent requests. If we suppose we have two workers Alice and Bob wanting to work with the same guid, then, if the guid doesn't exist, one of Alice and Bob will perform the insert first and the other will receive the index error. Both, in some order, will run the second update, and the higher timestamp will win no matter what. If the guid does exist, we reduce to the second part of when it doesn't exist. I think we're ok in all cases, but concurrency is hard so you should think it over yourself. And test it.

It's important that these updates only hit one document - I'm depending on insert/update of one document being atomic.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! I believe I can deal with this.
0

This can be done in a single atomic operation if we slightly tweak wdberkeley's answer. If we create an unique index on guid field and use this query:

db.guids.replaceOne({guid : newDoc.guid, ts : {$lte : newDoc.ts}}, newDoc, {upsert:true});

Now the three cases work like this:

  1. There is no document with the same guid as newDoc - insert newDoc

  2. There is a document oldDoc with the same guid as newDoc and oldDoc.ts < newDoc.ts - newDoc overrides oldDoc

  3. There is a document oldDoc with the same guid as newDoc and oldDoc.ts > newDoc.ts - The insertion (upsert) will fail with this error which we can just ignore:

.

WriteError({
        "index" : 0,
        "code" : 11000,
        "errmsg" : "E11000 duplicate key error collection: test.guids index: guid_1 dup key: { : 1.0 }",
        "op" : {
            "q" : {
                "guid" : 1,
                "ts" : {
                    "$lte" : 14
                }
            },
            "u" : {
                "guid" : 1,
                "ts" : 14,
                "data" : "asdf"
            },
            "multi" : false,
            "upsert" : true
        }
    })

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.