10

Consider a web app that allows the user to upload files to a server. While uploading files, the app will generate hashes from the file content and send that hash to the server, along with the actual file content. Server will also generate a hash from the received file content, and compare it with the hash it received from the web app. This is done for checking file integrity, making sure the uploaded file is the same as the original.

But with https, is this unnecessary? Https encrypts any data with a key, so if something was altered during transfer, the decryption on the server would fail, right?

3
  • 7
    Are you trying to protect the file content against bit rot (accidental corruption) or against manipulation (by an attacker)? Commented Mar 10 at 21:47
  • 4
    And if you're trying to protect against manipulation, is the attacker the same person who uploads the file or a Man-in-the-Middle? HTTPS will protect you against MitM attacks. Commented Mar 11 at 10:47
  • @Bergi: corruption during network transfer Commented Mar 13 at 11:28

5 Answers 5

26

But with https, is this unnecessary? Https encrypts any data with a key, so if something was altered during transfer, the decryption on the server would fail, right?

It's not uncommon for a file upload service to allow for an MD5 hash to be provided in a header. The hash is then recalculated after the transfer is complete and if it doesn't match, an error is returned to the client. S3 provides this kind of feature, for example.

This is not redundant, but the point isn't related to security. I think what you are missing is that the mechanisms you are talking about with regard to HTTPS are only relevant to the transferred data. There are many things that can go wrong that HTTPS will not prevent. Your assumption is a bit like thinking that if your code compiles that it must execute properly.

For example, let's say your client fails to read the full content of the file or there is some sort of corruption during reading from disk. A common issue is unwanted conversion of newlines from one OS standard to another e.g. bytes that look like \n are converted to \r\n in a binary file. The file on the server is corrupt. But, as far as the HTTPS transfer is concerned, the stream was delivered exactly as it was sent and there are no issues to detect. The file is not the same on the server as it is on the client, though.

In such cases the successful upload will hash to something other than what the client calculated. If the client provided a hash with their content, the server can inform the client that what was received doesn't match the client's expectation.

11

Hashing alone cannot ensure integrity. If you assume the file can be manipulated by an attacker, then nothing prevents this attacker form also changing the hash and making it match the new file content. Integrity requires a digital signature or a message authentication code (the first is a public-key solution, the latter a symmetric approach).

But, yes, TLS already ensures integrity for the network connection between the client and the TLS server. There's no need for extra signatures or message authentication code, unless you require them outside of the TLS connection. For example, if the TLS connection is terminated before the actual server application, then it might make sense to have application-level integrity protection which covers the entire transmission from client app to server application. But this depends on the exact infrastructure and the requirements.

If you mean “integrity” outside the context of security (i.e., protecting data from accidental corruption rather than deliberate manipulation), see the answer of JimmyJames.

10

With https, data is first encrypted, then decrypted. The data could be damaged before or during encryption, or during or after decryption. To https, everything may look fine.

Adding a checksum will find such problems, except if damage happens before calculating the checksum, or after verifying the checksum.

Also, your server might support both http and https. You’d want a checksum with http, and you’d want identical code for http and https, and as a consequence of these two demands https also gets a checksum.

8

Another problematic factor of your method is data integrity in storage. Once the file is uploaded to the server, it's important to ensure that said file remains unchanged while stored. Hashing can be used to verify the integrity of the file at rest, allowing for checks during storage or before processing.

Yes, while data in transit typically poses the greatest risk, it's equally crucial to protect data at rest, such as that stored in a database.

1
  • 2
    While you are right that protecting against bit rot is important, this doesn't explain why the file should be hashed at client and server and the hashes be compared as OP described. Commented Mar 12 at 7:06
7

Hashing has a secondary purpose of also verifying that the data is complete -- e.g. was not truncated due to a connection interrupt. There are other ways to check for this, depending on how much access you have to the server implementation and transfer status events, but a hash could still be useful as a final sanity check.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.