Skip to main content
11 votes

Boneheaded exceptions should not be caught. Then how to provide fault tolerance and reliability?

Fail as early as possible, and catch in context. Going by the definition on https://ericlippert.com/2008/09/10/vexing-exceptions/, a boneheaded exception isn't one that should not be caught, it's in ...
Duroth's user avatar
  • 900
7 votes

What difference and relation are between fault tolerance and (high) availability?

The basic concepts are orthogonal, however, they are related. One has to do with the availability of your application, and the other has to do with the correctness of your application. Remember, ...
Berin Loritsch's user avatar
6 votes

Boneheaded exceptions should not be caught. Then how to provide fault tolerance and reliability?

Your examples are both in the area of interfaces to systems that are not under your control, which is different from the interfaces between components that you control and where you can ensure that ...
Hans-Martin Mosner's user avatar
6 votes

Design pattern for objects in invalid states

No no no no. First, stop using floating point numbers to represent base 10 money. Ints work fine if you count pennies and remember to add the decimal point when presenting them as dollars. ...
candied_orange's user avatar
5 votes

How does a distributed system both tolerate network partition and achieve consistency?

The CAP Theorem says that you can only achieve at maximum two out of the three properties of Consistency (every read receives the most recent write or an error), Availability (every read receives a ...
Jörg W Mittag's user avatar
4 votes

What is the crux of difference between N version programming and self monitoring architecture?

The difference is in what is done if the outputs are different: In the self-monitoring architecture, if the outputs are different then a fault is indicated; no recovery is possible - i.e. this is a ...
Philip Kendall's user avatar
4 votes

Boneheaded exceptions should not be caught. Then how to provide fault tolerance and reliability?

"Fail fast" is a good default, but fault tolerance may be worth it in some cases. You just have to be really careful how you do it, because an unexpected exception means some of the program ...
JacquesB's user avatar
  • 62.3k
3 votes

How does a distributed system both tolerate network partition and achieve consistency?

How can we have both P and C in the second case? I will answer with a common real-world example. Common CP system in AWS cloud Consider a distributed system made up of parts deployed to 3 ...
Jonas's user avatar
  • 14.9k
3 votes
Accepted

Unexpected shutdown before a saga completion

That’s not the way a saga works: every involved microservice performs a step, which is locally handled as a transaction. every completed step shall result in an event to be triggered the events must ...
Christophe's user avatar
  • 82.2k
3 votes
Accepted

Design pattern for objects in invalid states

How are responsibilities between classes? There is no single answer to that question. It's first a question of responsibilities: Shall using classes be responsible for verifying if they can do the ...
Christophe's user avatar
  • 82.2k
2 votes

How to guarantee HTTP message delivery in fault tolerant way

Your colleague is right. You can't eliminate all failure modes. The goal should be predictable failure modes, e.g. to meet a certain SLA perhaps you want 99.99% reliability and a response time of ...
John Wu's user avatar
  • 27k
2 votes
Accepted

How to guarantee HTTP message delivery in fault tolerant way

The database solution is definitively the best, transactional filesystem are not common, unless you consider that filesystem never fail (permission settings, disk full,...). I'll detail a more ...
Walfrat's user avatar
  • 3,536
2 votes

Design pattern for objects in invalid states

Exceptions thrown for normal object access (or "Solution 1") is known as the general pattern coined in Python as better ask for forgiveness than permission. This pattern is heavy on the user side ...
Diane M's user avatar
  • 2,116
2 votes

What is the difference between masking and tolerating failures?

From what I understand both are different in respect to the level of abtractions involved: "Masked" means here: Lower levels "mask" failure transparently for higher levels of the system. Failure on a ...
Thomas Junk's user avatar
  • 9,623
2 votes

Feedback on Multi-Process Software Architecture

If it makes the code easier to read/debug and the system easier to reason about, then your decision to use three separate applications is a good one. Your reasoning for using a file to communicate the ...
Bart van Ingen Schenau's user avatar
2 votes

Boneheaded exceptions should not be caught. Then how to provide fault tolerance and reliability?

It's okay to sandbox (narrowly as possible) While your ERP does indeed seem boneheaded, it is not boneheaded to sandbox a third party interaction. Just make it as narrow as possible. Also, it would be ...
John Wu's user avatar
  • 27k
1 vote

Boneheaded exceptions should not be caught. Then how to provide fault tolerance and reliability?

In the net, there are too many articles with advice on how to handle various types of exceptions. Even the wording "to handle an exception" puts the focus on the exception object instead of ...
Ralf Kleberhoff's user avatar
1 vote

How does a distributed system both tolerate network partition and achieve consistency?

I am going to add another perspective. CAP: if partitioning is happening, then the system may be either available or consistent. The million dollar question is what is partitioning? Let's say I have a ...
AndrewR's user avatar
  • 196
1 vote

Does stale data due to weak level of consistency count as Byzantine failure?

Byzantine fault can appear to be both functioning and not functioning to diffrent actors. a server can inconsistently appear both failed and functioning to failure-detection systems, presenting ...
Jonas's user avatar
  • 14.9k
1 vote

What is the difference between masking and tolerating failures?

Maybe a comparison could help you understanding the difference. Imagine you're going to an e-commerce website. You found a product you want to buy and you click on the “Add to cart” button. Under the ...
Arseni Mourzenko's user avatar
1 vote

When do I stop being paranoid about my code failing?

A lot of it depends on what kind of an application you are building and what SLAs you intent to provide. No system has been build to handle all the scenarios perfectly so that the developer can rest. ...
skott's user avatar
  • 509
1 vote

When do I stop being paranoid about my code failing?

if I do more checks, it becomes difficult to read through even for me This is the worrying part. As you spend more time crafting your code it should become easier to read. Considering the sheer ...
Martin Maat's user avatar
  • 18.6k
1 vote

Design pattern for objects in invalid states

Solution 1 w/ YAGNI applied: public class Wallet { /// <summary> /// Indicates the amount of Cash in the wallet /// </summary> public double Cash ...
RandomUs1r's user avatar
1 vote

Design pattern for objects in invalid states

I suspect you are suffering from “primitive obsession”. Having the validation code for valid states of cash in the wallet means anywhere else you use cash needs them too. If you create a new class ...
Adam B's user avatar
  • 1,660
1 vote

Concurrent fault-safe data structure

I think that you should look at using the TPL (Task Parallel Library by Microsoft). If I am correctly understanding the scenario outlined in your question, then this would provide you with the low ...
Chaplin Marchais's user avatar

Only top scored, non community-wiki answers of a minimum length are eligible