1

I'm experienced in MongoDB and I'm learning PostgreSQL. In this case I'm using Node.js with the pg library.

I have 2 tables: posts and comments. What I need to do is make a query which will return an array of posts, and each returned post should have a nested array of comments.

Now, in Mongo, I would simply store posts as a comments field which would be a nested array of objects. In PostgreSQL, what I did is I made every comment have a post_id column which references the it's parent post. My question is, how to I retrieve an array structured like the mongo example? That is, how do I return an array of posts, each post having a field called comments which would be an array of the appropriate comment rows?

Right now I'm doing a JOIN of the 2 tables, but what ends up happening is I get an array of objects which contain all the fields for the posts as well as all the fields for the comments.

What's happening now:

[
  {
     id: 1,
     post_title: "something",
     posted_at: "some date",
     comment_text: "nice post",
     author_id: 31
  },
  {
     id: 1,
     post_title: "something",
     posted_at: "some date",
     comment_text: "i dont like this post",
     author_id: 4
  }
]

What I want to happen

[
  {
     id: 1,
     post_title: "something",
     posted_at: "some date",
     comments: [
       {
         comment_text: "nice post",
         author_id: 31
       },
       {
         comment_text: "i dont like this post",
         author_id: 4
       }
  },
]
2
  • 1
    SQL Sets don't naturally support nested structures. So, the question becomes why do you Need the second structure to be supplied by the database? What's stopping your application layer from transforming a recordset (as per the first structure) in to the nested structure? Commented Apr 29, 2021 at 15:39
  • The reason I didn't want to do this is because, in mongo at least, it's way better (faster) to do everything regarding data structuring on the server (it's written in C++) than on the client (javascript). I don't know if the same logic applies for PostgreSQL, but if it doesn't I can definitely solve my initial problem by transforming the structure manually. Commented Apr 29, 2021 at 16:09

1 Answer 1

2

I'm guessing as to your table structure here, but it should be close enough. So let's say this is your setup:

CREATE TABLE posts (id INTEGER, post_title TEXT, posted_at TEXT);

INSERT INTO posts
VALUES
(1, 'title 1', 'date 1'),
(2, 'title 2', 'date 2'),
(3, 'title 3', 'date 3');

CREATE TABLE comments (post_id INTEGER, comment_text TEXT, author_id INTEGER);

INSERT INTO comments
VALUES
(1, 'comment text 1', 34),
(3, 'comment text 3 a', 45),
(3, 'comment text 3 b', 67);

The simplest way to do it would be like so:

SELECT JSON_AGG(x ORDER BY id)
FROM (
  SELECT p.id, p.post_title, p.posted_at, JSON_AGG(c) AS comments
  FROM posts p
  LEFT JOIN comments c
      ON p.id = c.post_id
  GROUP BY p.id, p.post_title, p.posted_at
) x;

Which returns approximately the structure you want:

[{
    "id": 1,
    "post_title": "title 1",
    "posted_at": "date 1",
    "comments": [{
        "post_id": 1,
        "comment_text": "comment text 1",
        "author_id": 34
    }]
}, {
    "id": 2,
    "post_title": "title 2",
    "posted_at": "date 2",
    "comments": [null]
}, {
    "id": 3,
    "post_title": "title 3",
    "posted_at": "date 3",
    "comments": [{
        "post_id": 3,
        "comment_text": "comment text 3 a",
        "author_id": 45
    }, {
        "post_id": 3,
        "comment_text": "comment text 3 b",
        "author_id": 67
    }]
}]

I notice your comments array doesn't contain the reference to the parent post id. The best way to select only a subset of fields depends on your use case, but this should be a performant enough way:

SELECT JSON_AGG(x ORDER BY id)
FROM (
  SELECT
    p.id,
    p.post_title,
    p.posted_at,
    (
      SELECT JSON_AGG(c)
      FROM (
        SELECT c.comment_text, c.author_id
        FROM comments c
        WHERE c.post_id = p.id
      ) c
    ) AS comments
  FROM posts p
) x;

Returns

[{
    "id": 1,
    "post_title": "title 1",
    "posted_at": "date 1",
    "comments": [{
        "comment_text": "comment text 1",
        "author_id": 34
    }]
}, {
    "id": 2,
    "post_title": "title 2",
    "posted_at": "date 2",
    "comments": null
}, {
    "id": 3,
    "post_title": "title 3",
    "posted_at": "date 3",
    "comments": [{
        "comment_text": "comment text 3 a",
        "author_id": 45
    }, {
        "comment_text": "comment text 3 b",
        "author_id": 67
    }]
}]

Obviously the comments for post 2 differ between them, you'd have to put some logic to decide what you want no comments to look like.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much! I forgot to add it (sorry) but the comments table does indeed contain a post_id col which references the parent post.
It should be noted that doing this transforms the results from binary representations to strings. This increases the workload on the database, increases the network traffic, and requires the string to be parsed locally to create a usable data-structure. That's not to say it's the wrong thing to do, just to explicitly state that there are many overheads involved; it's not free.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.