0

I am new to mongodb and got stuck on this task for some time now. I need to perform a query on the collection with the following schema:

{
  ticker: String,
  asks: [[Number]],
  bids: [[Number]],
  timestamp: String,
  datetime: String
}

Here is the document example:

  { 
    "_id" : ObjectId("5fc17630cf8dff0cd5506dc4"), 
    "asks" : [ 
      [ "23685.0", "0.008" ],
      [ "23688.8", "0.000" ],
      [ "23696.7", "0.000" ]
    ], 
    "bids" : [ 
      [ "23553.7", "0.200" ],
      [ "23557.8", "0.207" ],
      [ "23558.4", "0.045" ],
      [ "23563.4", "0.020" ] 
    ],
    "timestamp" : 1606514176211,
    "datetime" : "2020-11-27T21:56:16.211Z",
    "ticker" : "YFI/USDT"
  }

Inner array of asks and bids has two elements: first element is the price and the second element is the quantity.

I have two questions:

  1. I need to query the collection to get all entries where price is greater than a value in both asks and bids ( only asks or only bids) for the time interval based on the datetime field. I tried to mess with $elemMatch and aggregate framework but had no luck so far.

  2. Another question I have is whether array or arrays field type is a good choice when it comes to querying those fields. I expect to have millions of records of this type. Please suggest whether adding indexes and/or use some other data structure to hold this data in the collection.

Any help is greatly appreciated. Thanks.

0

1 Answer 1

1

Question 1 :

You have to first $match your date to return only documents in the time interval you need, then $filter your array (asks or bids) to return only what you need.

Here's the query (of course you can do exactly the same for bids):

db.collection.aggregate([
  {
    "$match": {
      $expr: {
        "$and": [
          {
            "$gte": [
              "$datetime",
              "2020-11-01"
            ]
          },
          {
            "$lt": [
              "$datetime",
              "2020-12-01"
            ]
          }
        ]
      }
    }
  },
  {
    "$addFields": {
      "asks": {
        "$filter": {
          "input": "$asks",
          "as": "ask",
          "cond": {
            "$gte": [
              {
                "$toDouble": {
                  "$arrayElemAt": [
                    "$$ask",
                    0
                  ]
                }
              },
              23686
            ]
          }
        }
      }
    }
  }
])

You can test it here

Question 2 :

However, as you noticed, it's probably not the best way to store data and query it. I see some simple things to improve :

1- Replace your nested array by an object. It will be easier to query directly the object field instead of the position in array.

{
  "price" : yourPrice, 
  "quantity":yourQuantity
}

2- Use numbers (int/double) instead of strings, you'll avoid the conversion step, and so improve the performances of your aggreggation.

{
  "price" : 21563, 
  "quantity":0.154
}

3-Do you really need date and time? If not, just store the date, so you can put an index on it and take the benefit of it during the $match stage.

Here's a sample of such a dataset/query

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks matthPen! I wanted to avoid objects and went with array to save on record size - some records may have hundreds of inner arrays populated so I thought it will be redundant to have keys populated in there. I will try to add an index on date field and see if there is any difference.
I will mark your answer as solved once I test the solution. Thanks!
mongoplayground.net/p/Cg_QBzAa2qg this code still returns the records which don't qualify to be in the result set.
Not exactly what I needed but it is a good start. Thanks for help. I will play with it

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.