The kernel does not need to use a lot of memory, but it needs to have a mapping from virtual addresses to physical memory. When paging is enabled on the processor, the only way to access memory is through the paging mechanism, i.e. the addresses the CPU instructions use are virtual addresses, not physical addresses.
The kernel mapping is needed for the the kernel to be able to access any part of physical memory, for example to zero pages before mapping them into the process's address space.
When Linux was designed in the early 1990s, it was targeted to run on the 80386, which offered a 32-bit virtual address space. However, the typical PC at that time didn't have more than 8 megabytes of physical memory. The kernel was designed to create a 1:1 mapping where the virtual address 0xC0000000 points to physical address 0, the virtual address 0xC0001000 points to physical address 0x1000, and so on, until the end of physical memory. The kernel's code and data use part of this address space, but most of the memory are "free" pages that can be allocated to processes.
The process address space in this model is mapped to the address space below 3 GB. Several mappings to the same physical memory pages can coexist. A memory page allocated to a running process has at least two mappings, the "kernel view" described above, and the "process view", which uses addresses below the 3 GB mark.
When PCs were shipped with increasing amounts of RAM, the (originally luxurious) 1 GB kernel space got cramped, so all kinds of stopgap solutions were introduced. For example, we got the 2 GB + 2 GB split, which allowed for 2 GB of RAM, but at the same time limited the process address space to 2 GB. The "final" solution was of course moving to 64-bit processors, which has solved the problem for the time being.