Is hashing user input data redundant on HTTPS?

Question

Consider a web app that allows the user to upload files to a server. While uploading files, the app will generate hashes from the file content and send that hash to the server, along with the actual file content. Server will also generate a hash from the received file content, and compare it with the hash it received from the web app. This is done for checking file integrity, making sure the uploaded file is the same as the original.

But with https, is this unnecessary? Https encrypts any data with a key, so if something was altered during transfer, the decryption on the server would fail, right?

Are you trying to protect the file content against bit rot (accidental corruption) or against manipulation (by an attacker)? — Bergi
– Bergi, Commented Mar 10 at 21:47
And if you're trying to protect against manipulation, is the attacker the same person who uploads the file or a Man-in-the-Middle? HTTPS will protect you against MitM attacks. — MSalters
– MSalters, Commented Mar 11 at 10:47

JimmyJames · Accepted Answer · 2025-03-11 14:05:18Z

But with https, is this unnecessary? Https encrypts any data with a key, so if something was altered during transfer, the decryption on the server would fail, right?

It's not uncommon for a file upload service to allow for an MD5 hash to be provided in a header. The hash is then recalculated after the transfer is complete and if it doesn't match, an error is returned to the client. S3 provides this kind of feature, for example.

This is not redundant, but the point isn't related to security. I think what you are missing is that the mechanisms you are talking about with regard to HTTPS are only relevant to the transferred data. There are many things that can go wrong that HTTPS will not prevent. Your assumption is a bit like thinking that if your code compiles that it must execute properly.

For example, let's say your client fails to read the full content of the file or there is some sort of corruption during reading from disk. A common issue is unwanted conversion of newlines from one OS standard to another e.g. bytes that look like \n are converted to \r\n in a binary file. The file on the server is corrupt. But, as far as the HTTPS transfer is concerned, the stream was delivered exactly as it was sent and there are no issues to detect. The file is not the same on the server as it is on the client, though.

In such cases the successful upload will hash to something other than what the client calculated. If the client provided a hash with their content, the server can inform the client that what was received doesn't match the client's expectation.

Ja1024 · Accepted Answer · 2025-03-13 06:58:40Z

Hashing alone cannot ensure integrity. If you assume the file can be manipulated by an attacker, then nothing prevents this attacker form also changing the hash and making it match the new file content. Integrity requires a digital signature or a message authentication code (the first is a public-key solution, the latter a symmetric approach).

But, yes, TLS already ensures integrity for the network connection between the client and the TLS server. There's no need for extra signatures or message authentication code, unless you require them outside of the TLS connection. For example, if the TLS connection is terminated before the actual server application, then it might make sense to have application-level integrity protection which covers the entire transmission from client app to server application. But this depends on the exact infrastructure and the requirements.

If you mean “integrity” outside the context of security (i.e., protecting data from accidental corruption rather than deliberate manipulation), see the answer of JimmyJames.

gnasher729 · Accepted Answer · 2025-03-10 21:16:28Z

With https, data is first encrypted, then decrypted. The data could be damaged before or during encryption, or during or after decryption. To https, everything may look fine.

Adding a checksum will find such problems, except if damage happens before calculating the checksum, or after verifying the checksum.

Also, your server might support both http and https. You’d want a checksum with http, and you’d want identical code for http and https, and as a consequence of these two demands https also gets a checksum.

security_paranoid · Accepted Answer · 2025-03-11 00:08:27Z

8

Another problematic factor of your method is data integrity in storage. Once the file is uploaded to the server, it's important to ensure that said file remains unchanged while stored. Hashing can be used to verify the integrity of the file at rest, allowing for checks during storage or before processing.

Yes, while data in transit typically poses the greatest risk, it's equally crucial to protect data at rest, such as that stored in a database.

answered Mar 11 at 0:08

security_paranoid

4,7953 gold badges13 silver badges43 bronze badges

2

While you are right that protecting against bit rot is important, this doesn't explain why the file should be hashed at client and server and the hashes be compared as OP described.

Guntram Blohm
– Guntram Blohm

2025-03-12 07:06:16 +00:00
Commented Mar 12 at 7:06

Add a comment |

Miral · Accepted Answer · 2025-03-11 02:43:57Z

7

Hashing has a secondary purpose of also verifying that the data is complete -- e.g. was not truncated due to a connection interrupt. There are other ways to check for this, depending on how much access you have to the server implementation and transfer status events, but a hash could still be useful as a final sanity check.

answered Mar 11 at 2:43

Miral

2771 silver badge3 bronze badges

Add a comment |

Stack Exchange Network

Is hashing user input data redundant on HTTPS?

5 Answers 5

You must log in to answer this question.

Hot Network Questions

Is hashing user input data redundant on HTTPS?

5 Answers 5

You must log in to answer this question.

Related

Hot Network Questions