Often I see developers debating using String vs. string as if it’s a simple style decision. No different than
discussing the position of braces, tabs vs. spaces, etc … A meaningless distinction where there is no right answer,
just finding a decision everyone can agree on. The debate between String and string though is not a simple
style debate, instead it has the potential to radically change the semantics of a program.
The keyword string has concrete meaning in C#. It is the type System.String which exists in the core runtime
assembly. The runtime intrinsictly understands this type and provides the capabilities developers expect for strings
in .NET. Its presence is so critical to C# that if that type doesn’t exist the compiler will exit before attempting to
even parse a line of code. Hence string has a precise, unambiguous meaning in C# code.
The identifier String though has no concrete meaning in C#. It is an identifier that goes through all the name
lookup rules as Widget, Student, etc … It could bind to string or it could bind to a type in another assembly
entirely whose purposes may be entirely differnt than string. Worse it could be defined in a way such that code
like String s = "hello"; continued to compile.
class TricksterString {
void Example() {
String s = "Hello World"; // Okay but probably not what you expect.
}
}
class String {
public static implicit operator String(string s) => null;
}
The actual meaning of String will always depend on name resolution. That means it depends on all the source files in the
project and all the types defined in all the referenced assemblies. In short it requires quite a bit of context to
know what it means.
True that in the vast majority of cases String and string will bind to the same type. But using String still
means developers are leaving their program up to interpretation in places where there is only one correct answer. When
String does bind to the wrong type it can leave developers debugging for hours, filing bugs on the compiler team
and generally wasting time that could’ve been saved by using string.
Another way to visualize the difference is with this sample:
string s1 = 42; // Errors 100% of the time
String s2 = 42; // Might error, might not, depends on the code
Many will argue that while this is information technically accurate using String is still fine because it’s
exceedingly rare that a code base would define a type of this name. Or that when String is defined it’s a sign of a
bad code base.
The reality though is quite different. Defining String happens with some regularity as is demonstrated by the
following BigQuery:
SELECT
sample_path, sample_repo_name
FROM `fh-bigquery.github_extracts.contents_net_cs`
WHERE
NOT STRPOS(sample_repo_name, 'coreclr') > 0
AND NOT STRPOS(sample_repo_name, 'corefx') > 0
AND NOT STRPOS(sample_repo_name, 'roslyn') > 0
AND NOT STRPOS(sample_repo_name, 'corert') > 0
AND NOT STRPOS(sample_repo_name, 'mono') > 0
AND STRPOS(content, 'class String ') > 0
LIMIT 100
Looking through these results you’ll see that String is defined for a number of completely valid purposes:
reflection helpers, serialization libraries, lexers, protocols, etc … For any of these libraries String vs.
string has real consequences depending on where the code is used.
So remember when you see the String vs. string debate this is about semantics, not style. Choosing string gives
crisp meaning to your code base. Choosing String isn’t wrong but it’s leaving the door open for surprises in the
future.
Note: This discussion is not limited to string. It also applies to object, int, long, short, etc …
essentially any of the type keywords introduced in C# 1.0.