-1

Edit #1

I believe that I misrepresented my intent when writing this question by focusing too much on the issues that have arisen from our misuse/misunderstanding of locks. I am making this edit to try to clear my goal up.

I am looking to gain advice on whether or not performing some sort of static analysis and building a call graph once at the beginning of the lifetime of a server is a good idea. I am intending to use this analysis to solve lock issues, but that is another issue. Is this feasible, or is it too costly of an operation?

Original Post

I am currently tasked with finding a way to fix deadlock problems in a server that is constantly being maintained and developed on. I am wondering about the feasability on building some sort of map of function call trees representing the paths we might take given a request to the server. The trees do not have to be fully built, as I will explain below.

Setup

  • The server is written in Java using the Spring framework.
  • We are using SQL to communicate with our relational database. Inside of our database is a table marked tb_lock that is used as way to simplify locking resources that might exist in multiple tables across the database.

tb_lock looks something like this:

function_nm
tb_member
tb_member^0
tb_member^1
CHANGE_BALANCE
CHANGE_BALANCE^1234
ADD_MEMBER
...

Instead of having to lock multiple tables that are involved in, say, changing the balance for one (or many) account, we are able to just lock one row in this tb_lock table.

Deadlock Issues

Currently, as the developer sees fit, locks are grabbed at various places during code execution. This, obviously, currently leads to deadlock issues that are hard to debug (this could be remedied by adding logs that mark where, when, and by who a lock is grabbed or released).

Proposed Solution(s)

In order to solve this, to me there seems to be multiple possible solutions. In general, each solution involves changing from grabbing locks at multiple points in the code to grabbing all locks needed at the initial execution at the entry point.

  • Force developers to manually trace back to entry points and add the locks they need to the list of locks that will be grabbed at the entry point.

This is obviously not a sustainable option. As code is refactored and grows, the locks that are grabbed need to be reanalyzed constantly. This option is also prone to human error.

  • Through custom Java annotations, allow developers to continue not bothering about when or where they grab a lock, but make them annotate the locks that will be grabbed. When the server is first run, build some sort of map of trees to determine what locks we need to grab (based off of a possibility of use, not on the certainty of use) and grab these locks at the entry point.

In my opinion, this seems like a much more sustainable solution, as well as much less prone to human error. However, the problem now is whether or not this is feasible. If the server were full of regular old POJO's and had an easily determined call tree it would be easy, but using Spring framework, as well as things like reflection, tracking down the possible paths from an entry point to all annotations seems like a much more difficult task.

All in all, I am looking for advice on whether or not this is a feasible path to go down. Could I use some sort of analysis to build some sort of lock tree, or should I rely on other methods?

7
  • 1
    I would expect most relational databases to only lock the required rows being updated instead of the whole table anyway. I don't think I've ever heard of having a custom table to determine row locks. Have you actually benchmarked just letting the database itself deal with locks and using concurrency checks like they usually do? Commented Dec 16, 2021 at 4:22
  • @Turksarama As our system is using InnoDB, we have both row-level locks as well as full table locks. There are situations in which we need to lock a table in order to stop more rows from being added, and situations where we can just lock a single row/rows. I agree that it is strange to have a table for managing locks, but it is currently how it is being done. I do not believe we have let the DB itself deal with locks, but as I have only been here for < 0.5 years, this might have happened in the past. I will update the question to reflect this comment Commented Dec 16, 2021 at 5:17
  • 1
    @notphilphil, a problem would be identifying all possible code paths from a static analysis of application code. Using method pointers (including those implied by normal OO code) also means that code paths can change dynamically. Also, avoiding deadlocks is about avoiding a conflicting ordering of lock-taking, so you need to know the order in which the code would take locks, not just that it does take the lock at some point. (1/2) Commented Dec 17, 2021 at 11:25
  • 1
    Thirdly, a question, are the deadlocks arising purely from your explicit locking mechanism, or from the implicit locks taken by the database engine? If deadlocks arise by the latter, then it would imply a more fundamental problem with the explicit locking mechanism, not just that they are badly ordered, but that they are less restrictive than necessary to maintain basic data integrity (if the implicit locks were not there to catch it). In other words, it's not just opposing trains being deadlocked at red signals, it's opposing trains heading for a crash when both have green signals. (2/2) Commented Dec 17, 2021 at 11:31
  • 1
    It is probably possible to show that this problem (listing all possible paths that the code may take) is equivalent to the halting problem, which is known to be undecidable in limited time. What might be possible is to keep lists of acquired locks and the places where they were acquired per thread, and dump those lists to files when you encounter a deadlock situation, to help in understanding the cause of deadlocks. Commented Dec 20, 2021 at 14:27

1 Answer 1

2

In light of your edit, I'd say your approach of building a call graph from a custom static analyser tool - and deriving enough useful information about the database locking from a large codebase - is not feasible in general.

Many paths are not apparent statically, but are assembled (and altered) dynamically by using and following method pointers (including those implied by OO programming). Determining these paths statically will vary from complicated to theoretically impossible (such as where the calls might be into external code that hasn't even yet been written).

Any conditional logic or iteration may also produce a large number of permutations of code paths. Some of these may be extreme or unlikely in practice, or made impossible by constraints implied elsewhere in the system, and would not be considered in a manual analysis.

You also haven't made clear whether the deadlocks arise purely from your explicit locking system, or whether implicit locks are involved. The exact effect of implicit locking cannot be statically analysed, as the locking is not determined until run time by the database engine.

Your proposed solution of simply grabbing every lock that might be necessary at the outset of a high-level operation, and holding them until the conclusion, may have the effect of simply serialising the whole system, or at least to enough degree that is unacceptable for performance. Giving the locks up again in a timely fashion is an important consideration in maintaining performance and high levels of concurrency.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.