The DISPLAY environment variable tells an application how to connect the X server. The X server is the part of the system that displays windows on the screen. A display is something on which windows can be displayed.
A unix system can have multiple displays, for example on multiple virtual consoles, or on multiple real consoles, or because some displays are virtual, or because some displays are accessed over the network. ssh -X forwards a connection to a remote display over the network.
Each display has a number. The purpose of these numbers is just to tell the displays apart. :0 is display number 0, :1 is display number 1, etc. One of the ways the display number is used is in how it allows applications to connect to the X server: it is used to calculate the name of a socket file (/etc/.X11-unix/X0, …) or the number of a TCP port (6000 plus the display number) on which the X server listens.
The .0 part after the display number is an obsolete concept. It's a “screen” number, where a display can consist of multiple screens, and a window is tied to a particular screen. On modern systems, the X server presents a single screen and allows application windows to move between monitors. :NUMBER is equivalent to :NUMBER.0.
There can be a machine name before the colon. This allows TCP communication between the application and the X server. This communication is not protected against network snooping and man-in-the-middle, so it's mostly deprecated on real networks, but can be useful in some cases, for example on a network between virtual machines running on the same host. In practice, if both work,localhost:NUMBER is functionally equivalent to :NUMBER (but :NUMBER may use a faster communication mechanism under the hood, and it's possible for only one of them to work because not all X servers listen both locally and via TCP; it's even technically possible to have different servers listening on localhost:NUMBER and :NUMBER but that would be a misconfiguration somewhere).
The entity that creates an X display must choose a display number. (It might do that by letting the server decide, but if so it needs to find out what number the server picked in order to set DISPLAY for applications.) Most programs pick the lowest available number, or a number that's hard-coded in some configuration file. In order to leave room for physical displays, SSH only picks numbers starting at 10.
It's generally not correct to set DISPLAY manually because only the entity that creates the display can pick the number. For example, in the case of ssh -X, that's SSH itself. If you set DISPLAY manually, you might get the number wrong, or you might advertise a display that doesn't exist (for example if X11 forwarding was refused).
Other users, or users on other machines, could attempt to connect to an X display that isn't their own. Because X was designed to allow remote connection, it couldn't just rely on the unix user. So X has an authorization mechanism: when an application wants to connect to an X display, it must prove that it is authorized. In the modern world, there's just one authorization mechanism, which is a “cookie” in the MIT-MAGIC-COOKIE-1 format (16 bytes, presented in hexadecimal). The cookie is a long random string that is generated when the server starts and stored in a file that only the legitimate user can read. In order to connect to the X display, the application must send the cookie value to the server. If the cookie value is incorrect, the server rejects the connection.
localhost:10.0and notlocalhost:1.0?" A: to leave space for 10 local X11 servers (0..9), assuming they still want to listen on a tcp socket (which they don't anymore do by default).