This is the third post in a series: “The absolute minimum every Python web application developer must know about security.”

Transport Layer Security

TLS (Transport Layer Security), the compatible successor to SSL (Secure Socket Layer), is the basis of “https” secure web traffic and provides authenticated encryption.

Obsolete versions of TLS permit insecure algorithms. We have to ensure we only support the correct versions of TLS. Pen testing, or automated scanning tools, can verify this.

TLS is a hybrid cryptosystem: it uses both symmetric and asymmetric algorithms in unison. For example, asymmetric algorithms such as signature algorithms can be used to authenticate peers, while public key encryption algorithms or DiffieHellman exchanges can be used to negotiate shared secrets and authenticate certificates. On the symmetric side, stream ciphers (both native ones and block ciphers in a mode of operation) are used to encrypt the actual data being transmitted, and MAC algorithms are used to authenticate that data. TLS is the worldʼs most common cryptosystem, and hence probably also the most studied.

– Crypto 101

The part of your system handling the TLS protocol is said to be doing “TLS termination”. If you terminate the TLS at the first entrypoint to your network then traffic on your internal network may be unencrypted.

The four major components of TLS are:

The handshake
Certificate verification
Encryption (ciphers)
Data validation (MACs)

Handshake

The purpose of a TLS handshake is to establish a secure connection between a client and a server by exchanging messages to:

Verify each other: The server verifies the client’s certificate. The client verifies that the server has the certificate-associated private key.
Agree on session keys: The two parties generate session keys that will encrypt and decrypt all communications during the session.
Establish cryptographic algorithms: The two parties agree on the cryptographic algorithms they will use.
Authenticate the server: The server proves its identity to the client using public keys.
Sign data: The data is signed with a message authentication code (MAC) to ensure its integrity

Several different cryptography algorithms are used here.

Earlier versions of TLS (SSL 2.0 had unauthenticated handshakes) were vulnerable to a “downgrade attack” in the handshake, where an attacker could force a weaker encryption algorithm to be used. Your TLS termination technology (the part of your system that receives the TLS communication) must not support obsolete versions of TLS or insecure cryptography algorithms.

Certificate Verification

TLS certificates can be used to authenticate peers, but how do we authenticate the certificate? This is done by the standard TLS trust model of certificate authorities. TLS clients come with a list of trusted certificate authorities, commonly shipped with your operating system or your browser. These authorities will only sign a certificate, for a fee, if they verify their identity. To fake a certificate one of the certificate authorities would have to be compromised.

When a TLS client connects to a server, that server provides a certificate chain. Typically, their own certificate is signed by an intermediary CA certificate, which is signed by another, and another, and one that is signed by a trusted root certificate authority. Since the client already has a copy of that root certificate, they can verify the signature chain starting with the root.

– Crypto 101

By default most network libraries, like requests or httpx, will not communicate with a server when certificate verification fails. It could indicate a “man in the middle” (MITM) attack where you are not talking to who you think you are talking to, or your communications have been intercepted and are being snooped on.

If you want to sign your own certificates you’ll have to set yourself up as a certificate authority on your systems.

For local development, where you’re probably using a self-signed certificate, this verification can be switched off.

import httpx
# Making a get request with certificate verification disabled
r = httpx.get("https://example.org", verify=False)

Client certificates exist in TLS, for both sides of the communication to verify their identity, but are not widely used. mTLS addresses this issue.

TLS certificates are called X.509 certificates and a Certificate Authority (CA) will require a Certificate Signing Request (CSR) to create a certificate. The cryptography library has a tutorial with code showing you how to do this:

cryptography: X.509 certificate tutorial

This tutorial also shows you how to create a root CA with a signing intermediary.

Self-signed Certificates

For TLS communication in development environments you may be happy to switch off certificate verification and use a self signed certificate. These will show a warning in the browser as insecure.

You can use tools from the cryptography library to generate self signed certificates, in the form of .pem files used by the server.

We start with creating a new private key:

>>> from cryptography import x509
>>> from cryptography.x509.oid import NameOID
>>> from cryptography.hazmat.primitives import hashes
>>>
>>> # Generate our key
>>> key = rsa.generate_private_key(
...     public_exponent=65537,
...     key_size=2048,
... )
>>> # Write our key to disk for safe keeping
>>> with open("path/to/store/key.pem", "wb") as f:
...     f.write(key.private_bytes(
...         encoding=serialization.Encoding.PEM,
...         format=serialization.PrivateFormat.TraditionalOpenSSL,
...         encryption_algorithm=serialization.BestAvailableEncryption(b"passphrase"),
...     ))

Then we generate the certificate itself:

>>> # Various details about who we are. For a self-signed certificate the
>>> # subject and issuer are always the same.
>>> subject = issuer = x509.Name([
...     x509.NameAttribute(NameOID.COUNTRY_NAME, "US"),
...     x509.NameAttribute(NameOID.STATE_OR_PROVINCE_NAME, "California"),
...     x509.NameAttribute(NameOID.LOCALITY_NAME, "San Francisco"),
...     x509.NameAttribute(NameOID.ORGANIZATION_NAME, "My Company"),
...     x509.NameAttribute(NameOID.COMMON_NAME, "mysite.com"),
... ])
>>> cert = x509.CertificateBuilder().subject_name(
...     subject
... ).issuer_name(
...     issuer
... ).public_key(
...     key.public_key()
... ).serial_number(
...     x509.random_serial_number()
... ).not_valid_before(
...     datetime.datetime.now(datetime.timezone.utc)
... ).not_valid_after(
...     # Our certificate will be valid for 10 days
...     datetime.datetime.now(datetime.timezone.utc) + datetime.timedelta(days=10)
... ).add_extension(
...     x509.SubjectAlternativeName([x509.DNSName("localhost")]),
...     critical=False,
... # Sign our certificate with our private key
... ).sign(key, hashes.SHA256())
>>> # Write our certificate out to disk.
>>> with open("path/to/certificate.pem", "wb") as f:
...     f.write(cert.public_bytes(serialization.Encoding.PEM))

And now we have a private key and certificate that can be used for local testing.

Using certifi for certificates

System certificate stores can become out of date, causing communication breakdowns as newer certificates cannot be verified. An alternative is to use the certifi package which bundles an up to date certificate collection from Mozilla.

To reference the installed certificate authority (CA) bundle, you can use the provided where function:

>>> import certifi
>>> certifi.where()
'/usr/local/lib/python3.12/site-packages/certifi/cacert.pem'

Example of using certify certificates with an OAuth 2 JWT.

jwks_url = (  "https://login.microsoftonline.com/"  f"{tenant_id}/discovery/v2.0/keys")
ssl_context: ssl.SSLContext = ssl.create_default_context(cafile=certifi.where())
token = jwt.PyJWKClient(jwks_url, ssl_context=ssl_context)

mTLS

Mutual TLS (mTLS) is a protocol that verifies the identities of both a client and a server before they can communicate. It’s an industry standard that uses X.509 certificates and the TLS (Transport Layer Security) encryption protocol to ensure that traffic is secure.

mTLS can be used internally in networks, as well as cross networks, facilitating a zero trust architecture as part of a Defence in Depth approach to provide network security (authentication and encryption) even in the event of a network breach.

Networking

When we’re discussing security a large part of what we’re talking about is network security and the threats that come with being connected to the internet. We can’t understand security without understanding something of networking, how application servers run, how they are connected to the internet, and so on.

Network Layers

In the 1970s, the International Organization for Standardization (ISO) began work on the Open Systems Interconnection (OSI) model for networking. Under this framework, each of seven layers rely on the one below it, but only on the one below rather than directly on the entire stack.

Each layer is orthogonal in its concerns to those implemented in layers either higher or lower than it. Security concerns can arise at all layers, but for most part the specific threats and requirements can be analysed and isolated to the single layer of current concern. Many protocols with familiar names exist at each OSI layer.

In quick summary, we can think of the layers as:

Physical: RJ45 plug, 100Base-T, Bluetooth, etc.
Data link: Ethernet, 802.11, etc.
Network: IP, IPSec, etc.
Transport: TCP, UDP, QUIC, etc.
Session: Sockets
Presentation: MIME, TLS/SSL, etc.
Application, HTTP, HTTPS, FTP, SMTP, DNS, NTP, etc.

Web developers are primarily concerned about security at layer 7, but an awareness of what contributes to the application layer being possible at all is worthwhile.

Application Protocols

Network communication happens using protocols like HTTP (HyperText Transfer Protocol), secured with TLS, across networks using TCP/IP (Transport Control Protocol/Internet Protocol), DHCP (Dynamic Host Configuration Protocol) and DNS (Domain Name System), via sockets in an app server process running in a Kubernetes pod, on a virtual machine, in a VLAN (Virtual Local Area Network), as part of an SDN (Software Defined Network), running in the cloud. That is one common example.

HTTP is a text based protocol allowing client machines (running a browser or perhaps another service) to communicate with servers and exchange information. We can build REST (REpresentational State Transfer) APIs on top of HTTP, using JSON or XML to exchange data.

This doesn’t make up all network communication, there is also for example using UDP (User Datagram Protocol) to communicate instead of TCP; but API servers exposing an API over HTTP (secured with TLS) is the more common case.

There are protocols involved at every stage. Spolsky’s Law of Leaky Abstractions tells us that in order to work at any level of abstraction we need to have some understanding of the layer below. Because abstractions leak. When things go wrong the only way to understand what is happening is to understand the current layer of abstraction in terms of its implementation, what it is doing in the layer below.

Sockets

All networking [essentially] is done via sockets, Berkeley Sockets which originated in the BSD Operating System 4.2, released in 1983. They’re used for communicating from client to server, within the server to listen for client connections, for interprocess communication (IPC) on the same machine, or anywhere doing network communication.

Python exposes a low level interface to sockets through the sockets module. It’s not common to use this directly, but other libraries like web servers and web clients (urllib and http.server in the standard library for example) are built on top of them. The basics of socket communication are very simple, however, and it’s easy to build simple servers and clients directly with sockets.

PyMOTW: socket recipes

Sockets are either TCP or UDP. TCP is slower but more reliable. It verifies that the client received all the data, which is broken into packets for communication, and got it in the right order. UDP does not guarantee this, but is used for broadcast or where performance is important and it doesn’t matter if a few packets are dropped or reordered along the way.

Packets contain:

A header (containing):
- Source IP address: The client’s IP address.
- Destination IP address: The server’s IP address.
- Source Port: The client’s socket port.
- Destination Port: The server’s socket port (e.g., port 80 for HTTP).
- Packet Number: so they can be reassembled in the right order
- Protocol: useful information for firewalls to permit/allow traffic based on allowed protocols
The Payload: the data being transmitted

In order to communicate we need a system of addressing. For that we use IPv4 or IPv6. IPv4 uses “dotted-quad” numbers to specify addresses; “127.0.0.1” is the loopback address, or home, also known as “localhost”. IPv6 uses 128 bit addresses, giving it a much bigger address space. In IPv6 “::1” is the loopback address.

IPv6 addresses are written as eight segments of hexadecimal numbers separated by colons. For example, 2001:db8:3333:4444:5555:6666:7777:8888.

IPv6 addresses can be shortened by removing leading zeros or replacing consecutive sections of zeros with two colons (::). For example, 2001:0db8:0000:0000:0000:7a6e:0680:9668 can be shortened to 2001:db8::7a6e:680:9668.

Routing of packets is done using protocols like RIP (Routing Information Protocol) which are only rarely directly relevant to application developers.

If the communication is not on the same machine it will go via a “network interface”. Each network interface is on a network, and any machine may have several network interfaces.

As well as an IP address we also need a port number. Ports allow computers to use a single physical network connection for many incoming and outgoing requests.

Port numbers range from 0 to 65,535. Non-ephemeral ports are fixed and associated with a service or server (port 443 for HTTPS, port 22 for SSH, port 25 for SMTP etc), while ephemeral ports (49152 to 65535) are temporary and used on a client to communicate with servers.

Here’s code for an echo socket server in Python.

import socket

import sys

# Create a TCP/IP socket

# AF_INET means use IPv4

# SOCK_STREAM specifies TCP instead of UDP

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Bind the socket to the port

server_address = ('localhost', 10000)

print('starting up on {} port {}'.format(*server_address))

sock.bind(server_address)

# Listen for incoming connections

sock.listen(1)

while True:

    # Wait for a connection

    print('waiting for a connection')

    connection, client_address = sock.accept()

    with connection:

        print('connection from', client_address)

        # Receive the data in small chunks and retransmit it

        while True:

            data = connection.recv(16)

            print('received {!r}'.format(data))

            if data:

                print('sending data back to the client')

                connection.sendall(data)

            else:

                print('no data from', client_address)

                break

This code creates a TCP/IPv4 socket, binds it to an address (localhost on port 10000) and listens for connections. The call to “sock.accept” blocks until we receive a connection; when we get a client connection we can send and receive data.

This is all synchronous, blocking, code. Async programming uses non-blocking system calls to access sockets. These low level calls are available as methods on the loop event object, but you’ll almost always use higher level abstractions – provided by your web application framework or client library – to access them.

Here’s the client code to connect to the echo server:

import socket

import sys

# Create a TCP/IP socket

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Connect the socket to the port where the server is listening

server_address = ('localhost', 10000)

print('connecting to {} port {}'.format(*server_address))

sock.connect(server_address)

with sock:

    # Send data

    message = b'This is the message.  It will be repeated.'

    print('sending {message!}')

    sock.sendall(message)

    # Look for the response

    amount_received = 0

    amount_expected = len(message)

    while amount_received < amount_expected:

        data = sock.recv(16)

        amount_received += len(data)

        print(f'received {data!r}')

The client uses sock.connect instead of the sock.bind we used in the server code.

More complex protocols, like HTTP, are layered on top of socket communication like this. You can use a simple socket connection tool like telnet to connect to servers and manually provide text for the HTTP protocol which is based on the request response cycle of a client communicating with a server.

To use TLS for secure socket communication we can use the ssl module to “wrap sockets”:

import socket, ssl

HOST = "www.agileabstractions.com"

PORT = 443

GET = f"GET / HTTP/1.1\r\nHost: {HOST}\r\n\r\n "

context = ssl.create_default_context()

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

s_sock = context.wrap_socket(s, server_hostname=HOST)

with s_sock:

    s_sock.connect((HOST, 443))

    s_sock.send(GET.encode())

    while True:

        data = s_sock.recv(2048)

        if not data:

            break

        print(data)

A server socket listening with TLS over TCP and IPv4, it needs access to the certificate chain and the private key:

context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)

context.load_cert_chain('/path/to/certchain.pem', '/path/to/private.key')

with socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) as sock:

    sock.bind(('127.0.0.1', 8443))

    sock.listen(5)

    with context.wrap_socket(sock, server_side=True) as ssock:

        conn, addr = ssock.accept()

        ...

DNS

The Domain Name System is how a web address like “www.agileabstractions.com” is turned into an IP address (either IPv4 or IPv6) to enable socket communication. Every domain has registered “nameservers” that specify addresses for the domain.

The nameserver can have several kinds of records of addresses for the domain,

DNS Servers convert domain addresses into IP addresses by looking up the nameserver and fetching the address for a domain; so to “resolve” an address a socket has to query a DNS Server.

We can do this from Python with the socket.getaddrinfo function:

import socket

# Query for socket info - Criteria is IPv4, TCP

address_info = socket.getaddrinfo(

   host="example.com", port=80,

   family=socket.AF_INET, proto=socket.IPPROTO_TCP

)

# Print socket info

print(address_info)

DNS relies on the DNS Server addresses being configured on your machine making the query (/etc/resolv.conf on Linux) and the correct DNS records being published for the domain.

Reverse lookups, hostname from IP address, can be done with socket.gethostbyaddr:

>>> import socket
>>> socket.gethostbyaddr("69.59.196.211")
('stackoverflow.com', ['211.196.59.69.in-addr.arpa'], ['69.59.196.211'])

Request Response Cycle

REST APIs are exposed over HTTP, meaning standard HTTP server frameworks can be used to build API servers and standard HTTP client libraries used to write API clients. The Request-Response cycle is the level of abstraction that developers usually work with. Below this is HTTP, below that sockets, and below that are packets and transport protocols and networking.

HTTP is a text based protocol over TCP/IP and it is based on the “request -> response” model. The client makes a request specifying the request “method” and the resource as a path, along with any parameters for the request – the server replies with a response.

The request and response both contain headers (language, user-agent, cookies, compression, encoding, etc are specified in headers). Some request methods, and most responses, include a “body” and responses have a status code.

The basis of REST is that several of the request methods roughly correspond to the basic actions of a CRUD (Create/Read/Update/Delete) application with a database:

POST: Creates new resources (Create)
GET: Reads a representation of a resource (Read)
PUT: Replaces data (Update by replacing)
PATCH: Modifies resources (Update)
DELETE: Deletes a resource (Delete)

A normal request to fetch a web page/resource is a GET request. A form submission is a POST request.

A request is sent as text consisting of:

A request method (and http version)
A location on the server (a path)
Headers
Optionally data (message body or parameters)

Example GET request:

GET /hello.html HTTP/1.1

User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)

Host: www.agileabstractions.com

Accept-Language: en-us

Accept-Encoding: gzip, deflate

Connection: Keep-Alive

A single server may serve resources for several domains, so the host domain is specified as the Host header in the request. Server configuration will tell the server which applications should handle requests for specific domains.

If the request doesn’t reach a server the client library will likely raise a socket connection error. If the request reaches a server but there is a different kind of error, e.g. 401 for authentication, 404 for a missing resource, 500 for a server error, 501 for a timeout, etc, then the client will see an HTTP error instead. (urllib responds to these with an exception, requests returns a response object with the error status code on it.)

If the request reaches a server we get a response. The response consists of:

A status code
Headers
The message body (data)

Example response returning json:

HTTP/1.1 200 OK

Cache-Control: no-cache

Server: libnhttpd

Date: Wed Jul 4 15:32:03 2022

Connection: Keep-Alive:

Content-Type: application/json;charset=UTF-8

Content-Length: 5964

{"rowset": {"osname": "NCOMS", "dbname": alerts", ...}}}

Status codes range (in theory) from 100-599

◦ 100-199 Informational (rare)

◦ 200-299 Success

◦ 300-399 Redirect

◦ 400-499 Client error

◦ 500-599 Server error

Common status codes:

◦ 200 – Success, OK

◦ 301 – Temporary redirect

◦ 302 – Permanent redirect

◦ 400 – Bad request

◦ 401 – Unauthenticated

◦ 403 – Forbidden (even if authenticated)

◦ 404 – Resource missing

◦ 500 – Internal server error

◦ 501 – Timeout error

Cookies are a useful part of the http specification and may contain data that identifies a request as part of a “session”. This permits state relating to the client to be stored on the server. A JWT for authentication may be stored in a cookie.

Cookies may be sent as part of the request like this:

GET /sample_page.html HTTP/2.0

Host: www.example.org

Cookie: yummy_cookie=chocolate; tasty_cookie=strawberry

Cookies can be returned as part of a response (along with either a max age or an explicit expiry date) like this:

HTTP/2.0 200 OK

Content-Type: text/html

Set-Cookie: id=a3fWa; Expires=Thu, 31 Oct 2021 07:28:00 GMT;

Set-Cookie: id=a3fWa; Max-Age=2592000

Cookies without a Max-Age or Expires attribute are called “session cookies” and are deleted when the current session ends, although the session is defined by the browser and some browsers use “session restoring”, permitting session cookies to live indefinitely.

Storing session state on the server is a violation of one of the principles of REST which is that the state should only be stored by the client and each request is a separate transaction.

Basic Authentication

Basic Authentication is one of the most simple and common authentication schemes and it is built into HTTP (since version 1.0 in 1996). An “Authorization” header is sent as part of the request with the username and password encoded with base-64 (effectively sent in plaintext).

With TLS the headers are encrypted so it can be used to provide security for basic authentication.

The authentication header field is in the form of “Authorization: Basic <credentials>“, where <credentials> is the Base64 encoding of ID and password joined by a single colon :. e.g.

Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==

An unauthenticated request to a resource protected with authentication will raise a 401 error and a “WWW-Authenticate” header field. If the user is authenticated but does not have permission to access the requested resource a 403 error will be returned.

The server response header field requesting authentication, with an error 401, is constructed like this:

WWW-Authenticate: Basic realm="User Visible Realm", charset="UTF-8"

Here’s an example of using basic authentication with the httpx client library:

>>> import httpx

>>> auth = httpx.BasicAuth(username="finley", password="secret") 

>>> client = httpx.Client(auth=auth)

>>> response = client.get("https://httpbin.org/basic-auth/finley/secret")

>>> response

<Response [200 OK]>

Digest authentication is an alternative authentication scheme used with http, but as it is built on MD5 (using hashing to avoid sending the username/password in plaintext) it is considered obsolete because MD5 is broken.

JSON Web Tokens (JWT) (specified in RFC 7519), along with a protocol like OAuth 2 and an identity provider, are a much more modern way of doing authentication. JWT relies on other JSON-based standards: JSON Web Signature and JSON Web Encryption.

PyJWT: library for working with JWTs in Python

An example of encrypting and decrypting a JWT using a shared secret (public key and signing algorithm is an alternative approach):

import jwt

>>> encoded_jwt = jwt.encode({"some": "payload"}, "secret", algorithm="HS256")

>>> jwt.decode(encoded_jwt, "secret", algorithms=["HS256"])

{'some': 'payload'}

WebSockets

Some services may use WebSockets, originally implemented for the browser, which allow servers to push data to clients over long lived connections. WebSockets have a slightly different API for working with them but the underlying principles are the same. The websocket client typically makes a request which then blocks until data is available from the server. Websockets are also used for streaming data.

As with all socket communication security is usually handled by the TLS protocol.

The browser (JavaScript) doesn’t expose the request headers to the websocket API which unfortunately means that normal authentication protocols can’t work with websockets. Authentication is commonly done by sending a token as the first message.

The best reference on working with websockets in Python web applications is this article by Armin Ronacher:

Websockets 101

websockets is a library, built on top of asyncio but with sync and async interfaces, for building WebSocket servers and clients in Python:

websocket library

Network Interfaces, Routers and Firewalls

When we think of a network interface (a NIC – Network Interface Controller) we think of a physical card with an ethernet socket. Whilst most computers have physical network interfaces (or WiFi which provides a kind of physical interface) they will likely have virtual network interfaces too and if your code is running in a container or a virtual machine in the cloud then all the network interfaces will be virtual and part of an SDN (Software Defined Network).

For a connection from one computer to another, via sockets, to work both endpoints must be addressable. They must be connected to networks that can reach each other and both have IP addresses and know each other’s ports. A computer with a single NIC/IP address can have multiple sockets open for communication using different ports.

The IP address for a device belongs to the NIC, which (unless it is using static addressing) will normally get its IP address from DHCP (Dynamic Host Configuration Protocol). So a machine with multiple NICs will have an IP address per NIC.

In a simple network the DHCP server will run on the router. The DHCP server uses the MAC (Media Access Control) address of the device to assign IP addresses so the same device will get the same IP address whenever it requests an IP address.

Traffic from a socket will go over the NIC and then through multiple routers and firewalls until it reaches its destination. Each packet contains its source and destination IPs, which the routers use to route the traffic to its destination. Just putting a network behind a router provides some level of security as no devices outside of your router’s private network can see any device in it except the router itself.

Routers often contain firewalls which will block all traffic except explicitly permitted traffic. More complex networks for application deployment will have many network switches and specific firewalls.

A Local Area Network (LAN) consists of all the computers on a network, which may be behind a router with only a single public IP address. All of the computers on the local network will have addresses from one of the ranges of IP addresses reserved for private addresses.

Private IP addresses are reserved for private networks and are not publicly routable on the internet. The Internet Assigned Numbers Authority (IANA) has reserved the following IPv4 address ranges for private use:

Class A: 10.0.0.0–10.255.255.255
Class B: 172.16.0.0–172.31.255.255
Class C: 192.168.0.0–192.168.255.255

Private IP addresses are non-routable, meaning that they don’t appear on the internet. Private IP addresses are reusable across different private networks.

Your router probably has the address 192.168.0.1 (or similar) whilst your computer probably has an address like 19.168.0.15. Not all private networks use the 192.168 private address space.

Networks use Network Address Translation (NAT) to allow multiple devices to share a public IP address. This means that devices on a private network can’t directly receive internet traffic, but they can still access the internet through the network’s public IP address. Network Address Translation allows computers with a private IP address to communicate on the public internet. NAT is typically done on the router.

A complex real world scenario will likely have several connected networks, and if the network exists in the cloud then those networks will be defined in the SDN (Software Defined Network) using VLANs (Virtual LANs). (In AWS your virtual network is called a VPC – Virtual Private Cloud).

With physical networks all network traffic is visible to anyone on the network. Virtual LANs tag traffic as only belonging to a single VLAN, and computers on a VLAN will only “see” traffic tagged for that VLAN (although it’s all still present at the physical layer). VLANs are more effective in the cloud where access to the physical network by an attacker is (hopefully) unlikely.

Firewalls are used to create a barrier between a trusted internal network and an untrusted external network, such as the internet. They help prevent cyberattacks by enforcing policies that block unauthorized traffic from accessing a secure network.

Firewalls can be configured to permit or deny traffic based on various criteria, such as: Source and destination IP addresses, Port numbers, and Protocol type.

The standard Linux firewall is the notoriously arcane iptables. It is much easier to configure iptables rules through the UFW (Uncomplicated Firewall) interface.

One way to arrange your networks for security is to keep all servers that are exposed to the public internet in a single network (a subnetwork), the DMZ (DeMilitarized Zone). A machine (firewall machine or network switch) with multiple NICs can act as a gateway so that machines from the internal network can access the DMZ but nothing from the external internet can see the private network. Another firewall will protect the DMZ from the public internet.

Subnets are defined using a “subnet mask”, which represents a portion of the network address space using a CIDR (Classless Inter-Domain Routing). Another reason for a subnet within a network might be for a high speed traffic backbone only needed by some machines.

A Virtual Private Network (VPN) is a network architecture (along with tunnelling protocols etc) for virtually extending a private network. They’re often a fundamental component of security, permitting secure remote connection to a network. VPNs allow network hosts to exchange network messages across another [untrusted] network to access private content, as if they were part of the same network.

Container systems like LXC/LXD/docker will create a virtual NIC on your machine so they can do software defined networking for containers. All the containers will have a private address in this network. This may be done with a “network bridge” which is a network connection that allows multiple devices to communicate with each other, even if they are connected to different networks. It’s common to see the private address range 10.* used for container networking.

The basic unit of deployment in Kubernetes is a pod. Each pod has a unique IP address and a private network namespace that’s shared by all the containers within it. All containers in a pod can see each other on the local network. Using Kubernetes for deployment simplifies network design by limiting the options available to you.

Running and Exposing an App Server

An application server is a process which accepts socket connections. These are rarely directly exposed to the internet but will sit behind load balancers and servers. The server will accept connections from either a load balancer (which routes requests between multiple application servers) or from the internet. TLS termination (the handshake behind which traffic is unencrypted) is usually done by the server which needs access (via configuration) to the public and private keys. It’s less common to do TLS termination directly in the application server (except in zero trust systems).

The server will then route the request to the app server, which may mean launching a process per request or maintaining a process pool or connecting to a single app server that can handle multiple simultaneous requests (for example using threads, forking or an async event loop).

The application server will often run as a process in a container using something like gunicorn (for WSGI apps, for ASGI use uvicorn) which uses Unix forks (the pre-fork model) to manage app server processes and requests.

An app server is usually deployed alongside (inside the same pod in a K8s app) a database server (connected to fast storage), potentially with a load balancer in front of the app server to distribute traffic between multiple instances of the app server. Multiple instances of an app server can be used to provide High Availability (HA). Your deployment infrastructure and your devops tooling, Infrastructure As Code, all need to be understood and secure.

Concluding Paragraph

Ensuring secure web development starts with a holistic, defense-in-depth approach. This means baking security thinking into every stage of the development process, leveraging secure libraries and tools, and implementing layered protections. By adhering to these fundamental principles, Python web developers can build applications that are resilient to a wide range of security threats.

This is the final part of a series: “The absolute minimum every Python web application developer must know about security”.

Author

Michael Foord

I’m a Python trainer and contractor. I specialise in teaching Python and the end-to-end automated testing of systems. My passion is for simplicity and clarity in code, efficient lightweight processes and for well designed systems. As a Python core developer I wrote parts of unittest and created the mock library which became unittest.mock.
View all posts