11
votes
Boneheaded exceptions should not be caught. Then how to provide fault tolerance and reliability?
Fail as early as possible, and catch in context.
Going by the definition on https://ericlippert.com/2008/09/10/vexing-exceptions/, a boneheaded exception isn't one that should not be caught, it's in ...
7
votes
What difference and relation are between fault tolerance and (high) availability?
The basic concepts are orthogonal, however, they are related. One has to do with the availability of your application, and the other has to do with the correctness of your application. Remember, ...
6
votes
Boneheaded exceptions should not be caught. Then how to provide fault tolerance and reliability?
Your examples are both in the area of interfaces to systems that are not under your control, which is different from the interfaces between components that you control and where you can ensure that ...
6
votes
Design pattern for objects in invalid states
No no no no.
First, stop using floating point numbers to represent base 10 money. Ints work fine if you count pennies and remember to add the decimal point when presenting them as dollars.
...
5
votes
How does a distributed system both tolerate network partition and achieve consistency?
The CAP Theorem says that you can only achieve at maximum two out of the three properties of Consistency (every read receives the most recent write or an error), Availability (every read receives a ...
4
votes
What is the crux of difference between N version programming and self monitoring architecture?
The difference is in what is done if the outputs are different:
In the self-monitoring architecture, if the outputs are different then a fault is indicated; no recovery is possible - i.e. this is a ...
4
votes
Boneheaded exceptions should not be caught. Then how to provide fault tolerance and reliability?
"Fail fast" is a good default, but fault tolerance may be worth it in some cases. You just have to be really careful how you do it, because an unexpected exception means some of the program ...
3
votes
How does a distributed system both tolerate network partition and achieve consistency?
How can we have both P and C in the second case?
I will answer with a common real-world example.
Common CP system in AWS cloud
Consider a distributed system made up of parts deployed to 3 ...
3
votes
Accepted
Unexpected shutdown before a saga completion
That’s not the way a saga works:
every involved microservice performs a step, which is locally handled as a transaction.
every completed step shall result in an event to be triggered
the events must ...
3
votes
Accepted
Design pattern for objects in invalid states
How are responsibilities between classes?
There is no single answer to that question. It's first a question of responsibilities:
Shall using classes be responsible for verifying if they can do the ...
2
votes
How to guarantee HTTP message delivery in fault tolerant way
Your colleague is right. You can't eliminate all failure modes. The goal should be predictable failure modes, e.g. to meet a certain SLA perhaps you want 99.99% reliability and a response time of ...
2
votes
Accepted
How to guarantee HTTP message delivery in fault tolerant way
The database solution is definitively the best, transactional filesystem are not common, unless you consider that filesystem never fail (permission settings, disk full,...).
I'll detail a more ...
2
votes
Design pattern for objects in invalid states
Exceptions thrown for normal object access (or "Solution 1") is known as the general pattern coined in Python as better ask for forgiveness than permission.
This pattern is heavy on the user side ...
2
votes
What is the difference between masking and tolerating failures?
From what I understand both are different in respect to the level of abtractions involved:
"Masked" means here: Lower levels "mask" failure transparently for higher levels of the system. Failure on a ...
2
votes
Feedback on Multi-Process Software Architecture
If it makes the code easier to read/debug and the system easier to reason about, then your decision to use three separate applications is a good one.
Your reasoning for using a file to communicate the ...
2
votes
Boneheaded exceptions should not be caught. Then how to provide fault tolerance and reliability?
It's okay to sandbox (narrowly as possible)
While your ERP does indeed seem boneheaded, it is not boneheaded to sandbox a third party interaction. Just make it as narrow as possible.
Also, it would be ...
1
vote
Boneheaded exceptions should not be caught. Then how to provide fault tolerance and reliability?
In the net, there are too many articles with advice on how to handle various types of exceptions. Even the wording "to handle an exception" puts the focus on the exception object instead of ...
1
vote
How does a distributed system both tolerate network partition and achieve consistency?
I am going to add another perspective.
CAP: if partitioning is happening, then the system may be either available or consistent. The million dollar question is what is partitioning?
Let's say I have a ...
1
vote
Does stale data due to weak level of consistency count as Byzantine failure?
Byzantine fault can appear to be both functioning and not functioning to diffrent actors.
a server can inconsistently appear both failed and functioning to failure-detection systems, presenting ...
1
vote
What is the difference between masking and tolerating failures?
Maybe a comparison could help you understanding the difference. Imagine you're going to an e-commerce website. You found a product you want to buy and you click on the “Add to cart” button.
Under the ...
1
vote
When do I stop being paranoid about my code failing?
A lot of it depends on what kind of an application you are building and what SLAs you intent to provide.
No system has been build to handle all the scenarios perfectly so that the developer can rest. ...
1
vote
When do I stop being paranoid about my code failing?
if I do more checks, it becomes difficult to read through even for me
This is the worrying part. As you spend more time crafting your code it should become easier to read.
Considering the sheer ...
1
vote
Design pattern for objects in invalid states
Solution 1 w/ YAGNI applied:
public class Wallet
{
/// <summary>
/// Indicates the amount of Cash in the wallet
/// </summary>
public double Cash
...
1
vote
Design pattern for objects in invalid states
I suspect you are suffering from “primitive obsession”. Having the validation code for valid states of cash in the wallet means anywhere else you use cash needs them too. If you create a new class ...
1
vote
Concurrent fault-safe data structure
I think that you should look at using the TPL (Task Parallel Library by Microsoft). If I am correctly understanding the scenario outlined in your question, then this would provide you with the low ...
Only top scored, non community-wiki answers of a minimum length are eligible
Related Tags
fault-tolerance × 14distributed-system × 6
distributed-computing × 3
architecture × 2
microservices × 2
linux × 2
eventual-consistency × 2
consistency × 2
design × 1
java × 1
c# × 1
design-patterns × 1
c++ × 1
algorithms × 1
database × 1
code-quality × 1
language-agnostic × 1
exceptions × 1
concurrency × 1
error-handling × 1
storage × 1
finite-state-machine × 1
services × 1
messaging × 1
object × 1