Repository concerns

Question

We have a layered application with (basically): WebAPI, App Services, Domain and Repository layers. This fits for most situations, but now we face a slightly different challenge on where we need to build a query based on what we retrieve from the database.

Usually, when we use the repository, we know what query to use. This time it's not the case: now we have to retrieve some formulas, arrange and create a query with the formulas as columns.

In short we are doing the following:

//Method in our repository
public IEnumerable<Results> CreateCustomerResults(public Guid customerId)
{
    using var connection = new SqlConnection(connectionString);

    var formulaDefinitions = connection.Query(GET_CUSTOMER_FORMULAS_QUERY, { customerId} );
    formulaDefinitions.SortByDependency();

    var resultsQuery = BuildResultsQuery(formulaDefinitions);
    return connection.Query(resultsQuery);
}

As SortByDependency and BuildResultsQuery start to gain complexity, we started to think this is too much responsibility for a repository method.

We need some advice on how to manage this. Probably we should provide the repository formulas already sorted to method CreateCustomerResults and do all the sorting outside of the repository? We also have doubts with that approach, because we´ll have to read the formulas as entities, map to business, sort and map to entities again.

In other hand, could BuildResultsQuery be part of a helper maybe?

Maybe if we are flexible enough to introduce another pattern this could be solved in a more natural way?

Can we assume that it is a fixed truth that SortByDependency cannot be migrated in to the GET_CUSTOMER_FORMULAS_QUERY? I don't need to know the reason why but my first port of call is wondering why you're sorting in-memory instead of having the DB do it for you. — Flater
– Flater, Commented Nov 15, 2023 at 21:51
@Flater, if there is a way, we haven't find it. Basically, there are formulas that depend in other ones. For example: a=b+c; c=d*10. You can see "a" depends indirectly on "d", but I think this would be challenging to know in the DB side. With C# was not difficult. — zameb
– zameb, Commented Nov 16, 2023 at 1:55

Ewan · Accepted Answer · 2023-11-15 18:15:23Z

4

The repository pattern is designed for adding and retrieving domain/business/logic models from storage such as a database. It doesn't really fit a reporting scenario where you are querying data and utilising the database to perform calculations.

I'm presuming your Customer Formulae are fragments of sql like "avg(col6)", "where customerId=3" or the like.

Usually this practice of dynamically generating a query based on user input is frowned upon in an application context, as your user can make queries that crash the database, or access data they shouldn't see etc.

But, for reporting its a common scenario to allow the user to craft queries to show whatever results they like.

I think this is the choice you have to make.

Move the calculations and formula to your logic layer - Repository works
Restrict the possible formula to a small known set - Repository works
Point a reporting tool at your database and give the customer access to make whatever they want - Repository doesn't work

answered Nov 15, 2023 at 18:15

Ewan

84.4k5 gold badges91 silver badges189 bronze badges

The middle of the answer seems to be attacking a straw man. You assume that the formulae are an unvalidated string that could break things, and then you negatively judge that approach that you assumed might be the case. I don't see a basis for this claim, given that (a) we don't know what the formulae contain, (b) nor how they are stored, (c) nor how they are validated/restricted, (d) nor do we see how they are actually used inside the BuildResultsQuery. Any one of these can be doing things a different way from the one specific (bad) way you've assumed it works.

Flater
– Flater

2023-11-15 21:56:54 +00:00
Commented Nov 15, 2023 at 21:56
shrug there's always a bit of guesswork in answering, OP has said that the BuildResultQuery produces a string, I state my assumptions, answer 2 covers the case you seem to be talking about, where the calculations are performed on the DB, but are restricted to some set of known safe, and de/serialiseable operations you can handle in code.

Ewan
– Ewan

2023-11-15 22:34:30 +00:00
Commented Nov 15, 2023 at 22:34
Reasonable assumptions in answering is fine, I agree this is sometimes a necessity as we can't afford a lengthy back and forth. However, such assumptions should be tangential to the answer, it shouldn't be the focal point for the answer's conclusion. "OP has said that the BuildResultQuery produces a string" says nothing about what the formulae are and if they've been validated, and how rigorously they've been validated, and if validation is even needed (e.g. if there's no risk of accessing data that you don't already get to access anyway).

Flater
– Flater

2023-11-15 22:37:22 +00:00
Commented Nov 15, 2023 at 22:37
tbh i think ive added enough "I presumes" , "normalys" and "usuallys" to cover my arse, but sure, im taking a view here I think thats the main difference between our answering style. I like to keep it short but opinionated. If its hits the mark then its a great and useful answer, if it doesn't then yeah it's useless. But I think overall its more helpful to the questioner than an answer that covers all the bases but doesnt really offer a solution

Ewan
– Ewan

2023-11-15 22:41:55 +00:00
Commented Nov 15, 2023 at 22:41
Thanks for the comments and for narrow it to 3 decision points. The number 3 is definitely new to me and I will explore in that area. Fortunately, the team is open to have a few different implementations to compare and let the best to survive.

zameb
– zameb

2023-11-16 00:54:02 +00:00
Commented Nov 16, 2023 at 0:54

| Show 2 more comments

Flater · Accepted Answer · 2023-11-15 22:42:40Z

Dealing with implementation complexity

You've touched on several points that I could add to, but the core of your concern seems to be this:

As SortByDependency and BuildResultsQuery start to gain complexity, we started to think this is too much responsibility for a repository method.

The spirit is good, but it's being misapplied.

I suspect that your conclusion of needing to avoid complexity fits somewhere where it's trying to adhere to both the (admittedly vague) single responsibility principle and the general advice to not let methods get too long. These are actually two separate points with a different focus.

SRP

SRP does not (directly) care about the complexity of a method body. It cares about the purpose of a method (or class), and that that purpose isn't bundling totally unrelated concerns together.

Indirectly, the SRP argument is that this kind of bundling leads to a complex method body, which you don't want. But the logical inverse isn't true. A method body being complex does not prove that SRP is being violated. Simply put, you can have a complex method body that still only does one thing, which would not be great but not an SRP violation either.

You said:

As SortByDependency and BuildResultsQuery start to gain complexity, we started to think this is too much responsibility for a repository method.

And the key thing I want to point out here is that making a method more complex doesn't necessarily mean that you're actually expanding its responsibilities; it can just as well mean that you're rewriting the same responsibility to be handled a different way.

Avoiding complex method bodies

The general advice on not letting method bodies get too long or comples only cares about this method, not (directly) any submethods that this method might be calling.

So, yes, you are correct that you want to ensure that CreateCustomerResults's method body does not grow to be complex. However, you said:

As SortByDependency and BuildResultsQuery start to gain complexity, we started to think this is too much responsibility for a repository method.

Those are different methods. You could rewrite SortByDependency to be a massively monolithic function, and that would be a bad idea, but it doesn't impact our judgment of what the CreateCustomerResults method looks like since we didn't make a change to CreateCustomerResults.

So what should you do?

You need to distinguish more between what kinds of changes you could be making to this code.

If you are rewriting the same responsibility in a more complex way, then you can safely ignore SRP as it will be maintained (i.e. not violated any further). The only thing you really need to be on the lookout for is that when a given method becomes too complex, it needs to be broken down into smaller pieces. Whether those pieces are private submethods or a new public dependency is contextual. The latter usually suggests that the new complexity is introducing a new kind of responsibility.

If you are introducing a new responsibility, you need to abstract it away into its own thing which your original uses as a dependency. When your original class starts using this new dependency, it might be that this is the straw that breaks the camel's back in terms of method body complexity. If so, refer to the previous paragraph. If not, then no further action is needed.

The right layer for the job

You brought up the layers in your project, because you suspect that you might be able to shift some of this logic into another layer. That is another way in which this can be tackled, but I want to first state something very clearly:

The right layer for the job is defined by what the job is, not how complex the job is.

In order words, what this section is talking about is a decision that you need to make before you tackle the previous section, and decisions made in the previous section should not change the decision you make with regards to what layer you implement it in.

Whether or not the current implementation should live in your persistence layer hinges on one compound question:

Does the domain/application really only care about CustomerResults here, and are the rest of the details merely a persistence implementation detail?

If yes, then all good.

However, I suspect this is not the case. Persistence layers (at least their queries) should really stick to one retrieval, i.e. it really always boils down to "get me the XYZ" (this is not the same as "one database call"!).

You're tying the sourcing of your formulae to the retrieval of the subsequent customer results. I can't definitively judge your business requirements, but it looks to me like these are two separate domain concept.
If I'm right, then the formulae should become an input parameter to this method, and your application logic should be first retrieving the formulae (however it wants to retrieve them), and then subsequently pass those formulae to the method that retrieves the customer results.

We also have doubts with that approach, because we´ll have to read the formulas as entities, map to business, sort and map to entities again.

Unless you are dealing with vast number of items (I'm thinking upwards of 6 digits here), in-memory mapping is a negligible performance cost and should not be driving your design.

Secondly, you might not need to map back from the domain model back to the entity - but this depends on how you're using those formulae in the BuildResultsQuery implementation, which I simply don't know.

However, the main point here is a repeat of what I already pointed out: these concerns do not trump the decision on which layer should be handling which part of the job.

Clean layer separation is not done to squeeze every drop of runtime performance that you could possibly get. Clean layer separation gives you significantly simpler code maintenance costs and significantly increased readability, at the potential cost of having to do a bit of extra runtime legwork to transition from one layer to another.

If you only care about runtime performance above anything else, no matter the cost, then you're not going to be looking favorably at layer separation. However, runtime performance is generally not the main focus, development velocity is (as it defines what developers can achieve for a given budget).

Thank you very much for the detailed answer. I think you're right from start to finish. Initially we felt like was not a repository concern to Sort formulas and not even to build an unknown query. But those two were simple enough to act like it was not a problem. In other hand, we already tried the formulae resolution in the business layer, but we didn't get good performance (calculations are done once per obtained record vs calculations done once in the query), which drove us to try the "repo" aproach and resulted in a dramatic performance increase. — zameb
– zameb, Commented Nov 16, 2023 at 1:00
@zameb: I'm not convinced that merely moving the same implementation between layers and adding a mapping onto it somehow forces you to move away from a bulk calculation to a per-entity request, but this is breaking the question open to an unanswerably big size. What I proposed you do by moving it into two separate repo methods being called by the business layer would not require any more DB calls than what you had before. — Flater
– Flater, Commented Nov 16, 2023 at 4:43
@Flater: let's say I want these results: A, B, and (A + B)/30. But... for 300 item sales. I can get the 300 sales and obtain the (A + B)/30 result for each sale (in-program-loop). For this, I only need the repository to get the 300 sales, then perform the calculation and here its more or less easy to spot the layers. Until now, the optimal aproach, for us however, is to build a query which already contains A, B, (A + B)/30 and avoid any in-program-loop. I can't see yet how could I do a bulk calculation like this outside of the repository layer. — zameb
– zameb, Commented Nov 21, 2023 at 17:25
@zameb: At the end of the day, the database would similarly loop over your items and provide the calculated value, it doesn't do "calculations done once in the query". You're focusing on whether you're explicitly writing a loop construct, but I'm focusing on the fact that in either case, something will be looping over each entity to calculate the value. Avoiding an in-program loop is only really useful if you'd avoid something that isn't negligible. Your example is a trivial calculation on a trivial amount of items, and I would consider coding style/cleanliness more important here. — Flater
– Flater, Commented Nov 21, 2023 at 21:33

Stack Exchange Network

Repository concerns

2 Answers 2

Dealing with implementation complexity

The right layer for the job

Hot Network Questions

Repository concerns

2 Answers 2

Dealing with implementation complexity

The right layer for the job

Related

Hot Network Questions