0

Suppose I have a table with 2 columns with Order ID and Student ID:

Order ID | Student ID |
-----------------------
1        | 1
1        | 2
1        | 3
2        | 1
2        | 3
3        | 1
3        | 2
4        | 1
4        | 2
4        | 3
5        | 2
5        | 3
.....

Here, it's a many-to-many relationship: one course can include many student and one student can enroll in many courses.

The question is: I want to filter courses that contains specific a specific set of student IDs. For example:

  • If the student ID set is (1,2,3), then the returned course IDs should be (1,4) as only those 2 courses got all students in the set enrolled.

  • If the student ID set is (1,2), then the returned course IDs should be (1,3,4).

  • If the student ID set is (2,3), then the result should be (1,4,5).

etc.

The student ID set can be varied in size to the limit of a set in Python.

Currently, I'm querying specific courses and stored the objects into specific lists, then filter with Python. However, querying from thousands of items several times from the table above is just slow.

4
  • What database are you using? This looks a lot like set logic. Commented Jan 14, 2020 at 19:15
  • @SunnyPatel It's PostGres, and I prefer to do it in SqlAlchemy before using the actual SQL. For testing, I use in-memory SqlLite and it was fast, but when actually deployed on the AWS cloud using PostGres, it was unacceptably slow and now I'm fixing it. Commented Jan 14, 2020 at 19:20
  • Here are a few examples of performing this type of query: stackoverflow.com/questions/58080691/…, stackoverflow.com/questions/49438529/…, stackoverflow.com/questions/42673699/… Commented Jan 14, 2020 at 19:41
  • 1
    I'm of the DB-first mindset to make sure my solution is possible with clean SQL code, because sometimes transforming it into ORM language may dilute it. So I provided a DB-based solution for you. I hope this helps you out! Commented Jan 14, 2020 at 19:44

1 Answer 1

2

This was a fun solve in PostgreSQL for me. Check out my DB Fiddle of this:

SELECT "Order ID"
FROM enrollments
GROUP BY "Order ID"
HAVING ARRAY[1, 2, 3] <@ array_agg("Student ID")

For the uninformed, the above query basically groups on the Order ID and filters in only the ones where the array (1, 2, 3) is contained completely within all the Student IDs of the same Order (Course).

This can be translated into SQLAlchemy (untested) to something like:

from sqlalchemy.dialects.postgresql import array, ARRAY, array_agg
session.query(Enrollments)
       .with_entities(Enrollments["Order ID"])
       .group_by(Enrollments["Order ID"])
       .having(array_agg(
               Enrollments["Student ID"],
               type_=ARRAY(Integer)
           )
           .contains([1, 2, 3])
       )
       .all()
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! @IljaEverilä
Thanks Sunny, your example helped! It is fast now. But, I want to slice the result further display only 10 rows at a time. I put a .slice(start ,end) right after the having clause, but it didn't work.
@Amumu, It should work (I personally haven't used it), did you take off the .all() finalizer after .slice()? It should fetch your results immediately.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.