The Wayback Machine - https://web.archive.org/web/20240324153525/https://github.com/github/codeql/issues/11603
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] "Deserialization of user-controlled data" is overly broad to be useful to end users #11603

Open
JLLeitschuh opened this issue Dec 7, 2022 · 11 comments
Labels
question Further information is requested

Comments

@JLLeitschuh
Copy link
Contributor

Description of the issue

Looking at all the results that "Deserialization of user-controlled data" returns, from Alibaba JSON.parseObject, to Kryo, to XStream, to Java Deserialization, the results are all over the place. But the advice is incredibly simplistic and only provides a single example on how to fix this.

https://codeql.github.com/codeql-query-help/java/java-unsafe-deserialization/

Deserialization of user-controlled data has a massive impact, usually RCE, but the documentation on how, or even why, you might need to fix it, is basically completely missing.

The one example ObjectInputStream is useful, but how do I fix Kryo, how do I fix Alibaba JSON, how does CodeQL know that a given ObjectMapper from Jackson is configured to be vulnerable? None of this information is captured either in the documentation, or in the query results (for example, what ObjectMapper has enableDefaultTyping() enabled).

I'm a security researcher, and I know how these are vulnerable, but scrolling through the results on LGTM.com (which I know is about to die), most of these, I don't know how I'd fix them.

Additionally, you see results like JSON.parseObject(httpResponse.getHttpContentString()) getting flagged, and from what I can see from documentation, this was potentially fixed? Maybe? I though CodeQL had a policy against flagging vulnerabilities that can be fixed with a dependency update. Why this is getting flagged is inherently unclear.
https://jfrog.com/blog/cve-2022-25845-analyzing-the-fastjson-auto-type-bypass-rce-vulnerability/

In summary, this is a significant vulnerability finding, with potentially massive impact, but the CodeQL documentation doesn't do it justice, and doesn't give valuable information to the end user on determining if they are truly vulnerable, and then how to fix it.

@JLLeitschuh
Copy link
Contributor Author

JLLeitschuh commented Dec 14, 2022

It looks like XStream is now secure by default as well: https://x-stream.github.io/security.html

Looking at CodeQL, it's incorrectly flagging code as vulnerable if there isn't a call to XStream.addPermission(NoTypePermission.NONE). From looking at the above document, this is no longer required.

@smowton
Copy link
Contributor

smowton commented Dec 15, 2022

Is that as of a particular version? Can we recognise if the user is using a new enough version e.g. by looking for new or changed method signatures?

@JLLeitschuh
Copy link
Contributor Author

Can we recognise if the user is using a new enough version e.g. by looking for new or changed method signatures?

I don't know. But the general policy AFAIK has been to always write queries against the latest version and not to assume that the user is using out-of-date vulnerable versions. I believe the idea has been that suggesting dependency updates has been outside of the scope of CodeQL.

@JLLeitschuh
Copy link
Contributor Author

Is that as of a particular version?

From that doc:

XStream itself sets up a whitelist by default, i.e. it blocks all classes except those types it has explicit converters for. Until version 1.4.17 it used a blacklist by default, i.e. it tried to block all currently known critical classes of the Java runtime. Main reason for the blacklist were compatibility, it allowed to use newer versions of XStream as drop-in replacement. However, this approach has failed. A growing list of security reports has proven, that a blacklist is inherently unsafe, apart from the fact that types of 3rd libraries were not even considered. XStream provides the ability to setup a whitelist since version 1.4.7, a version released nine years before 1.4.18. Clients who have adapted their setup and initialize the security framework are able to use newer versions again as drop-in replacement. A blacklist scenario should be avoided in general, because it provides a false sense of security.

@smowton
Copy link
Contributor

smowton commented Dec 16, 2022

👍 ok agreed that we should change our alerting behaviour to match the library's changed characteristics. PR welcome to flip that around, or if you'd prefer to leave it to us I'll make an issue :) Are there libs you're aware of that have gone secure-by-default besides XStream and Kyro?

@JLLeitschuh
Copy link
Contributor Author

Related #11912

@JLLeitschuh
Copy link
Contributor Author

I'd like to propose that we continue to work on this problem, specifically with UnsafeDeserialization. As an example, I'd like to propose that the messaging that the message, which is currently just "Unsafe deserialization depends on a $@.", source.getNode(), "user-provided value" be expanded to provide more detail.

Here are some examples of messages that could be generated to better communicate the reason WHY this vulnerability is present.

  • Unsafe deserialization depends on a [user-provided value]. This use of Faster Jackson is unsafe because there are multiple types on the classpath annotated with @com.fasterxml.jackson.annotation.JsonTypeInfo(CLASS )
  • Unsafe deserialization depends on a [user-provided value]. This use of Faster Jackson is unsafe because there is one class com.example.company.Vulnerable on the classpath annotated with @com.fasterxml.jackson.annotation.JsonTypeInfo(CLASS )
  • Unsafe deserialization depends on a [user-provided value]. This use of Faster Jackson is unsafe because the ObjectMapper used has had enableDefaultTyping() called on it
  • Unsafe deserialization depends on a [user-provided value]. This use of Kryo is unsafe because Kryo.setRegistrationRequired(false was called
  • Unsafe deserialization depends on a [user-provided value]. This use of ObjectInputStream is vulnerable because it does not restrict the types the deserializer is allowed to instantiate. Consider using org.apache.commons.io.serialization.ValidatingObjectInputStream instead.
  • Ect...
@smowton
Copy link
Contributor

smowton commented Jan 17, 2023

We'll need to keep these brief since they're one-line-ish descriptions that get shown inline with source code (the qhelp gets to be more verbose), but where compatible with that constraint I agree this would be useful. For example, the Kyro message could be Unsafe deserialization depends on [user-provided value], and [registration is not required].

@JLLeitschuh
Copy link
Contributor Author

We'll need to keep these brief since they're one-line-ish descriptions that get shown inline

Why? This seems like an artificial limitation. What's the rationale, besides UI elements expecting it? Couldn't this be changed to support more complex formatted reports specific to the project?

@smowton
Copy link
Contributor

smowton commented Jan 18, 2023

Because they look like this: https://github.com/github/codeql/security/code-scanning/2401

If I hack that to include quite a long message it looks like this:

Screenshot 2023-01-18 at 23 07 21

You can see how that would really drown the user in text if there were lots of these in a file, particularly if they were using a narrow display. Also note we have the qhelp below which can give extra detail, so I'd suggest something more like [Unsafe ObjectInputStream instance] deserializes a [user-provided value]. Provide the extra link needed to specify what kind of alert we're raising and why, but leave the more verbose stuff for the qhelp.

@JLLeitschuh
Copy link
Contributor Author

JLLeitschuh commented Jan 19, 2023

You can see how that would really drown the user in text if there were lots of these in a file

What about exposing a header and a 'details' section?

Also, personally, I don't see an issue with this amount of data if we can provide more actionable information in this message with more specific links to locations in the code where the vulnerabilities root cause lies.

For example, the place where a jackson ObjectMapper is instantiated and configured can be far (in terms of data flow steps) from where it is used to deserialize JSON. Being able to link to the specific line where the ObjectMapper::enableDefaultTyping method is called to flag why CodeQL thinks a usage is vulnerable would significantly reduce cycle time on attempting to fix the vulnerability. Certainly the value in decreasing the cycle time of a responder to a vulnerability is worth the cognitive overhead of more information provided to them up-front.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
2 participants