8

I am developing projects for my private use and have been wondering how I should design my projects. I always try to keep the code as efficient and concise as possible and as readable as possible. However, I still find it difficult to balance both sides and wonder, how professionals set up their projects.

Personally, I can imagine where writing and keeping two software codes makes sense. For example, one wants to have a test version that simply implements all the desired functionality and use that one as the base-line and test system. The second version of the code can then be the version that is performance critical, but still has to contain all features that exist in the first software version. This is however purely my thought and might not be how it's done in the industry. I can imagine well, that above's procedure is simply not applicable, because everything now costs twice the time and time is money.

9
  • 23
    "However, I still find it difficult to balance both sides" It's not clear to me: what are the two sides you're balancing? Commented Jan 14 at 18:40
  • 2
    I have never seen anyone maintain duplicate code bases, except one project where we had to do manual version control ('70s-era platform, no tools); we had one area for dev, one for test, and one for release, and code would be copied from one to the next as it moved through the process. Commented Jan 14 at 20:27
  • 1
    I think the question talks about Reference implementation... but to be maintained by the same person/entity is quite unlikely as @JohnBode said. Commented Jan 16 at 1:52
  • 3
    You're maintaining two code bases, purely for testing. That'll fail, because your implementation is different, and you'll have different bugs between the two. For productionising code, read up on differences between 'dev' and 'prod' builds, minification (e.g. for JavaScript prod builds), etc. Commented Jan 16 at 9:08
  • @AhmedTawfik Why would that fail? Two implementations and random testing/property testing is a great way to find those bugs Commented Jan 17 at 12:14

9 Answers 9

45

As a professional software engineer writing embedded software (i.e., the software itself is not a product, but it is part of a physical product), nearly all the time I am working with and extending an existing code base.

As I am not in the habit of doing things twice, we only have a single code base for a product (or a range of products), but using different build settings we can create variants with, for example, test and debug facilities enabled or disabled.

As software gets read way more often than that it gets written, the first rule of good software engineering is to write good readable (maintainable) code. Only if measurements have shown that the performance is not good enough, then there might be an argument to sacrifice readability of the code that is positively identified as the bottleneck in order to meet the performance requirements. But that is a rare occurrence in my experience.

22

Personally, I can imagine where writing and keeping two software codes makes sense. For example, one wants to have a test version that simply implements all the desired functionality and use that one as the base-line and test system.

Sometimes, we do this in professional projects. But we usually don't do this for full programs, we do this only for small, time critical parts, like single functions or single classes, and in rare cases for single modules. In the majority of cases, a concise and readable version of a module is already efficient enough. Even if we need an optimized version of a certain function, it can still be readable and concise enough for not having any need to maintain two variants in parallel.

But for sure, there are indeed cases where we keep a concise and readable, but slow function as a reference, whilst we also maintain a more sophisticated, faster, and more complex version of the same function. Whilst the complex version then goes into the production code, the slow function may be used inside some automated test to verify the correctness of the more complex version.

And yes, specificially when using compiled languages, we also do what Bart already wrote: making use of build settings to let our compiler or build system create two differently optimized variants from the same source code, one for debugging and one for production usage.

21

We actively do this. Our reference code base follows academic papers, and it's easy to show a one-to-one mapping. This gives us the confidence that the results are correct. It's rather slow, though - the theoretical model implemented was not designed for speed. On the plus side, it runs on pretty much all hardware.

We also have SIMD implementations for AVX and NEON CPU's. These implementations are much faster, not only due to the parallel execution but also manual optimizations. We can test these against the reference implementation, within rounding error.

This is isolated in a few classes, though, and 95% of the code is still identical. Which class is used is a simple compile-time choice based on target CPU.

16

One thing I would add here is that on the performance topic, I have encountered in my career far more code that some engineer thought might be performance critical and over-engineered an unreadable solution for, than code that was actually performance critical. (And we do have actual performance critical code where I work now). In general, I strongly recommend writing code to be a clear as possible. Only when you have a performance problem do you need to troubleshoot.

Now, troubleshooting performance hot spots is a valuable skill to learn and one a lot of developers do not have patience for. I know earlier this year, we had a performance issue with one of our products and for months various engineers would guess what was wrong and optimize that path, and invariably the issue would not be resolved or only be resolved in a very small subset of cases. Finally I got assigned it and actually spent some real time figuring out where we were spending our time (and documenting it) so that we could actually implement a real fix that addressed the the problem which was completely unrelated to what everyone was guessing the problem was.

On another topic you touched on: In practice if I see commented out code, I always have to question whether someone left something unfinished. In code I am responsible for, that is a hard no: you never leave commented out code lying around, it just confuses future developers.

In rare cases (like every couple of years or so) I am refactoring some legacy code where I think that maybe the refactor would cause us to lose some knowledge that we don't want to relearn. In those cases, I use our version control system to tag the previous revision of software and add a comment like: "This code previously used to connect to service "X" to perform its job. Connecting to "X" is a bit complex, if you want to see an example of how to do it please look at tag "Y" in git for an example of how we use to do this."

8

There are several more case not covered so far when several versions of the code maintained for some time or forever: reference implementations, dark launches, backward compatibility and versioned APIs. Usually for all these cases only small part of a product/service have multiple implementations - so the answer is it's unlikely for whole code base to be duplicated and maintained, but having multiple implementations of small pieces of functionality is very common, especially for products that have many customers.

In case of reference implementation it is unlikely the code will be in the same code base. I.e. people who wrote a standard for some feature like encryption may provide an implementation that is clear but slow and implementers of the standard for different libraries will have they own versions targeting different languages or platforms with different implementations. Another visible case of reference implementation is testing site like leetcode which have reference code for all problems to run against code submitted by users - again code maintained by different people in different places.

Dark launches in case of updates to the existing features usually implemented in the exact way described in the question (also again usually for a tiny part of the code base) - two variants of the code exist at the same time, and which one is executed is selected based on some condition at run-time. Unlike reference implementation usually the goal it to either pick better variant (A/B testing) or eventually replace the old one (rolling out new version). Making sure that unused version is removed requires process so... and if such process is missing/broken one will end up with multiple variants of the same code that had to be maintained.

The backward compatibility can be implemented in multiple ways and one of the ways is indeed to keep both old and new versions forever in the live code and maintain both forever allowing customers of the code to pick variant using some sort of switches at run-time. In extreme cases one may end up with multiple active versions of "the same" code that are maintained in parallel for a long time - as an example python2 vs. python3 or quirks mode in browsers can be considered such case.

Versioned API case is somewhat different from one asked in the question (as code explicitly behaves differently), but it is a case where very similar code sometimes had to be written and maintained for some time in the same project to provide incompatible versions of an API if your customers are unable/unwilling to move and there is significant financial benefit (or more likely legal requirement). For services usually there some way to stop "maintain for some time" to turn into "forever", but for applications that expose APIs for extensibility (i.e. AutoCAD, MS Word) keeping of API all variants working as long as possible is critical.

0
3

I wanted to flush out Bart's answer a bit.

As Bart noted typically we don't want to maintain multiple versions of code as generally speaking we don't see a return on the extra development effort, but we do have tools that make life easier, specifically:

  • We can choose debug or release builds, typically debug builds:
    • Include extra symbols to make it easier to figure out what went wrong.
    • Some languages include additional checks (automatically) to check for accessing data off the end of an array or dereferencing a null pointer.
    • Support Asserts which allow the programmer to state that they believe the state will be a certain way - typically these are removed from release builds.
  • Choose compiler optimization levels - reducing the optimization level can make it easier to debug issues.
  • Many languages include logging levels, hence more logging can be enabled only when trying to diagnose an issue, optionally a check for a logging level can be use to enter a debug routine, to print a whole bunch of extra details.
  • Cache configuration can also be different in production vs development environments.
2
  • One particular value of something like having logging levels and these other options is that you have an easy way to replicate something that is wrong in the production/"performance critical" environment in a testing environment, and take the time to write out debugging logs for things like SQL queries, or checking branching blocks in a piece of code, without worrying that you are changing how the flow would happen in production unless the flow that is logged in debugging environments is exactly what you want, short of the logging delays for better performance in production. Commented Jan 15 at 8:23
  • 4
    Just to sanity check, is that meant to be "flesh out" in the first sentence...? Commented Jan 16 at 17:15
3

Especially with Test-Driven-Development one often encounters a fast first solution, and then a reimplementation with more effort, special cases, border cases, extra data structures, paralellism, whatever.

Sometimes (rarely) it may be useful to keep separate implementations of one API. This is true if one implementation uses an embedded database, or is far more general (shorter, less cases) but slower. Those separate implementations may reside in the same code base, as API library and multiple implementation libraries.

Mostly one would keep one single best version. And try to avoid the mistake of leaving "alternative code" commented-out. Version control / local history should suffice.

-3

You are correct. Professionals do have different versions of their code for different purposes. Different programming languages handle this in different ways, let's take JavaScript as an example:

To design an interactive website you write a program in JavaScript. JS is an interpreted language so it doesn't need to be compiled (in to a machine readable but not easily human readable format). This source code can be sent from the server to a user's browser and run directly as it is. This is easy to debug but is not the most efficient way to transfer large codebases across the internet.

There are several ways of tackling this, but one is Webpack. It can take your program, place all the code in one file, remove all the comments, spaces, and line breaks, and rename all of your variables to be short names e.g. a, b, c. This single file, traditionally called bundle.js, can then be sent to the user's browser, which won't take so long as it will be smaller.

The problem is when trying to debug this bundle in your browser nothing looks the same as in your source code, it's possible, but hard to read. Webpack has a trick for this too, it can make something called a source map available on another url such that your browser, if it knows where to look, can reconstruct the original source code to make it easier to debug again.

Other languages have similar ideas but implemented in different ways. For example C++, which is a compiled language - so your source code is turned into something the machine can run directly, has many optimisations it can add between your source code and what the target computer runs; removing debugging information, adding common optimisations of you statements, etc.

All of these are jobs for a computer to do, they are complex repetitive tasks so we let the build pipeline do it for us.

It can be necessary to maintain two, or more, different copies of the source code of a program that will diverge over time. If you are still being paid for support on version 3, but you are working on version 4, you need to be able to add new features to v4 that would break v3, but still need to add bug fixes to v3. This is where source control and branches can help out.

You mention testing. We often do have different parts of the system designed to only ever be run when we are testing; test infrastructure, stubs, mocks, etc. But these are in addition to the main program's source code and are not considered a separate copy. The tests should, in general, be running the main source code to test it works and not an approximation. This isn't always possible and sometimes we use fake implementations such as in-memory databases so that our tests run faster, but only when we are testing other parts of the system and the database can be considered not under test in this case as it is tested elsewhere.

-3

When I was developing software for clients. I maintained multiple versions of code, but I did it through subversion branches. If I were to start a new project I would probably used github branches instead.

On my development platform I had multiple copies of the code. One was used for all releases to the client. If I was working on an issue (including a new feature) I would create a branch: the repository holds only differences. Successful changes were merged to the release branch, failures were thrown away. IMHO branching is the way to do what you describe; if you try to maintain separate sets of code manually, your code will eventually succumb to chaos.

2
  • I really prefer one repository, possibly many different targets, one file with target specific settings for each target (like one customer really wants Color X for buttons instead of Color Y), and feature flags per customer. Commented Jan 17 at 19:41
  • 1
    @gnasher729 I should have mentioned that I've been a developer for medical devices. The principal branch (or "trunk") is the code that will be submitted for regulatory approval, and it is vital that changes be kept separate until their impact has been assessed. So "successful changes" didn't just mean that the code "worked"; there need to be an analysis showing that there was no increase in risk (ideally a lower risk). And, yes, I agree about one repository (but multiple branches). Commented Jan 17 at 20:34

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.