0
\$\begingroup\$

Is this fastest and most reliable way to copy multiple database files (SQLite) from one location to other and overwrite them if they already exists, safely ?

Currently I'm using the below approach :

        string sourceDir = @"G:\test\from\";
        string backupDir = @"G:\test\to\";

        string[] databaseFiles = Directory.GetFiles(sourceDir, "*.db");

        foreach (string databaseFile in databaseFiles)
        {
            string fileName = databaseFile.Substring(sourceDir.Length);
            
            BackupDatabase(Path.Combine(sourceDir, fileName), Path.Combine(backupDir, fileName));
        }

Using this helper method:

    public static void BackupDatabase(string sourceFile, string destFile)
    {
        using (SQLiteConnection source = new SQLiteConnection(String.Format("Data Source = {0}", sourceFile)))
            using (SQLiteConnection destination = new SQLiteConnection(String.Format("Data Source = {0}", destFile)))
        {
            source.Open();
            destination.Open();
            source.BackupDatabase(destination, "main", "main", -1, null, -1);
        }
    }

Is there a better way to do this ?

\$\endgroup\$
1
  • 1
    \$\begingroup\$ Welcome to the Code Review Community. Our goal on this site is to help you improve your coding skills by making insightful observations about your code. Unlike Stack Overflow we want to see as much code as possible to provide as much help as possible. In programming languages such as C# we like to see complete classes including the using statements at the top. Please read A guide to Code Review for Stack Overflow users \$\endgroup\$ Commented Jan 1, 2023 at 15:59

1 Answer 1

1
\$\begingroup\$

most reliable way to copy [SQLite files] safely ?

I'm not sure I'm understanding the words safe & reliable in this context. Partly this is because sqlite is a single-user database and you didn't reveal how participants open / select / update / close in your environment. I can share what I know.

I'm not seeing any "multi file consistency" issues here, so treating your problem as "copy one file" multiple times seems suitable. If, say, a pair of recently updated files were tightly bound to one another, then I would advocate putting them in their own uniquely named directory and doing atomic rename on the directory.


We are replicating a source file to a destination location. Doesn't matter whether src & dst are on same filesystem, are on two different locally mounted filesystems, or are on different hosts.

Begin by copying to a temp file on the destination filesystem.

Roll a random number and generate a new unique temp filename. No one has ever used this name before, no one knows to look for it, so it is free of conflict from readers and writers.

Use whatever method you like to copy from src to temp. Alter the buffer size to find what's well suited to your setup, perhaps using /bin/dd bs=1M .... Suffer CTRL/C restarts, network interruptions, reboots, and other calamities, and then pick up where you left off, appending to a partially-written temp file. At the end, consider computing SHA3 hash of both temp and src, to verify the bits made it there alive.

Now, $ mv temp dst, or use the rename sys call. We are taking advantage of the fact that this is an atomic rename. So readers or writers that attempt to open(dst) will either see all the old bits, or all the new bits, and never a partly copied file, never a corrupt file.


This is a battle tested setup that works reliably in a wide range of practical deployments.

The qmail maildir format for inboxes manages to offer reliable service even over flakey Sun NFS mounts using this technique.

The highly efficient rsync utility accomplishes reliable network transfers via this technique.


Some minor caveats:

An interrupted transfer might leave behind a turd file. No harm done, just a little clutter and some disk blocks allocated. You should be able to both identify and delete ancient (day old?) temp files.

It took you some amount of time to read the source and write the destination. That is a "race window". To make it slightly less racy, you might examine source timestamp or data, and re-copy relevant portions of the file. Given that the bulk of the file is already at the destination, the 2nd pass might go quicker. There's still a race. This tends to work well with append-only syslog text files, where 2nd pass appends the occasional late-arriving log message.


For a belt-and-suspenders approach, do something like this:

$ shasum *.db  > manifest.txt

Transfer the manifest along with the DB files.

Verify the manifest at the far end. Now two applications, one which copies and one which computes hashes, would have to be buggy in the same way in order for an error to sneak past your end-to-end checking. GPG signatures offer a similar approach. (There are still perverse FS failure modes related to caching and to RAID which could let such a check pass yet deliver bad files at a later date. Not a big concern in most practical systems we see nowadays. According to your level of paranoia, you might find zfs to be a suitable backend store.)


Sqlite writers should COMMIT prior to your open + read of the file. Speaking for myself, I would have writers close the file, e.g. by exiting, before the read.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.