Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

4
  • This strategy seems to be a very good solution, because it is totally agnostic to the file types. Much better than the failing strategy of trying to parse complex PE Headers. In this way we can also guarantee that the client has exactly the final files we expect, especially if we perform a hash check on all target files before we start to patch them. Commented Sep 19, 2022 at 11:18
  • @JohnKugelman Where we make a cut depends solely on the contents of the sliding window used to calculate the rolling hash. Even if content changes, this doesn't affect the rolling hash after it leaves the window, so later blocks will remain the same. In the method outlined above, the sliding window will be much smaller than the block size. For example, consider each . to be one window wide, and what happens when the leading .... of block 5633 is deleted: the rolling hash will suggest a cut in the same place, but the block ID changes to 7a5d. Commented Sep 21, 2022 at 16:18
  • @JohnKugelman And you're right, I was thinking of gzip --rsyncable when writing that explanation. Rsync itself seems to use fixed-size blocks in the new version, and uses the rolling hash to find potential occurrences in the old version, which are then verified using a cryptographic hash. The existing blocks can then be reused when patching the file. The idea of using the rolling hash to cut blocks is advantageous in a fully offline setting where only one version is available, but not when both the old and new version are available. Commented Sep 21, 2022 at 16:23
  • 1
    This is exactly what Microsoft's "Remote Differential Compression" did - and there used to be a supported API for it that the OP could have used (since the OP's system seems Microsoft-oriented). But that API is gone. Commented Sep 24, 2022 at 2:02