Sockets and socket API are used to send messges across a network. They provided a form of inter-process communication (IPC). Example is the internet, which we connect to via our ISP.
The most common type of socket applications are client-server applications, where one side acts as the server and waits for connections from clients.
Python's socket module provides an interface to the Berkeyly Sockets API. The primary socket API functions and methods in this module are:
socket()
: creates a new socket and allocates resources to itbind()
: associates a socket with a socket address structure, i.e. a specified local IP address and a port number. It is used on the server side.listen()
: causes a bound Transmission Control Protocol (TCP) socket to enter a listening state. It is used on the server side.connect()
:assigns a free local port number to a socket. In case of a TCP socket, it causes an attempt to establish a new TCP connection. It is used on the client side.accept()
: accepts a received incoming attempt to create a new TCP connection from the remote client, and creates a new socket associated with the socket address pair of this connection. It is used on the server side.connect_ex()
: likeconnect()
, but returns an error indicator instead of raising an exception for errors returned by the C-level connect() call ( other problems, such as "host not found", can still raise exceptions). The error indicator is0
if the operation succeeded, otherwise the value of theerrno
variable. This is useful to support for example, asynchronous connects.send()
: to send datarecv()
: to receive dataclose()
: causes teh system to release resources allocated to a socket. In case of TCP, the connection is terminated.
As part of its standard library, Python also has classes that make using these low-level socket functions easier. Read about implementing internet protocols like HTTP and SMTP here.
We are going to use TCP socket here.
Here on the server side, socket()
creates new socket, bind()
associates the new socket with an address, and listen()
listens to a connection request. Whena client connects using connect()
, the server calls accept()
to accept the connection.
TCP uses three-way handsake to establish a connection.
- The client sends the server a synchronize (SYN) message with its own sequence number
x
. - The server replies with a synchronize-acknowledgment(SYN-ACK) message with its own sequence number
y
and acknowledgement numberx + 1
. - The client replies with an acknowledgement (ACK) message with acknowledge number
y + 1
.
The middle is the round-trip section, where data is exchanged between the client and server using cals to send()
and recv()
.
In the end, the client and server close()
their respective sockets to end the connection.
Here, the server will simply echo whatever it receives back to the client.
echo-server.py
import socket
# host can be a hostname, IP address or empty string
# IP address usually needs to be IPv4 formatted address string, for ex. 127.0.0.1
# empty string - server will accept connection from all available IPv interfaces
HOST = '127.0.0.1' # Standard loopback interface address (localhost)
# port should an integer between 1-65535 (0 is reserved)
PORT = 65432 # Port to listen on (non-privileged ports are > 1023)
# socket.socket() creates a socket object that supports the context manager type, so we can use with statement. Hence, there is no need to call s.close().
# context manager type - https://docs.python.org/3/reference/datamodel.html#context-managers
# with statement - https://docs.python.org/3/reference/compound_stmts.html#with
# The arguments passed to socket() specify the address family.
# AF_INET - internet address family for IPv4 https://en.wikipedia.org/wiki/IPv4
# SOCK_STREAM - socket type for TCP
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
# the value tha tis passed to bind() depends on the address family of the socket
# for AF_INET(IPv4), bind() expects a tuple (host, port)
s.bind((HOST, PORT))
s.listen()
# When a client connects, it returns
# conn - a new socket object representing the connection
# addr - a tuple holding the address of the client (host, port) for IPv4 or (host, port, flowinfo, scopeid) for IPv6
conn, addr = s.accept()
# conn is a different socket from 's' which was the original socket used to listen to and accept new connections
with conn:
print("Connected by", addr)
# an infinite while loop is used to loop over blocking calls to conn.recv().
# here it will read whatever data the client send and echoes it back using conn.sendall()
while True:
data = conn.recv(1024)
# if conn.recv() returns an empty bytes object, b'', then the loop is terminated
if not data:
break
conn.sendall(data)
create file echo-client.py
import socket
HOST = '127.0.0.1' # The server's hostname or IP address
PORT = 65432 # The port used by the server
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.connect((HOST, PORT))
s.sendall(b'Hello, world')
data = s.recv(1024)
print('Received', repr(data))
in one terminal run
python echo-server.py
in the next terminal run
python echo-client.py
server:
Connected by ('127.0.0.1', 54803)
client:
Received b'Hello, world'
On Windows, macOS and linux, we can see the current state of the host by using netstat
which is available in all of the platforms:
netstat -an
The output look like
Proto Local Address Foreign Address State
TCP 0.0.0.0:557 0.0.0.0:0 LISTENING
TCP 0.0.0.0:544 0.0.0.0:0 LISTENING
TCP 127.0.0.1:65432 0.0.0.0:0 LISTENING # this one is our server
Note: loopback interface or IP address 127.0.0.1 or ::1 is also refferred to as "localhost". The data never leaves the host or touches the external network.
Here, we will create a server and client that handles multiple connections using a selector
object created from the selectors module.
Some keywords:
select
module is a direct interface to the underlying operating system implementation. It monitors sockets, open files, and pipes (anything with a fileno() method that returns a valid file descriptor) until they become readable or writable, or a communication error occurs.selectors
is a python module which allows high-level and efficient I/O multiplexing, built upon the select module primitives. It defines a BaseSelector abstract base class, along with several concrete implementations (KqueueSelector, EpollSelector, etc), that can be used to wait for I/O readiness notification on multiple filie objects.
The main objective of a multi-connection server is to be non-blocking so that it can establish connection to other sockets.
Create multiconn-server.py
:
import selectors
import socket
import types
import sys
sel = selectors.DefaultSelector()
def accept_wrapper(sock):
# since listening socket was registered for the event selectors.EVENT_READ, it should be
# ready to read
# Hence we can call sock.accept()
conn, addr = sock.accept()
print('accepted connection from', addr)
# we set socket in non-blocking mode again
conn.setblocking(False)
# creates an object to hold the data we want included along with the socket
data = types.SimpleNamespace(addr=addr, inb=b'', outb=b'')
# monitors to check when the client connection is ready for reading and writing
events = selectors.EVENT_READ | selectors.EVENT_WRITE
# the arguments are socket, mask, and data
sel.register(conn, events, data=data)
def service_connection(key, mask):
# key is the 'namedtuple' returned from select() that contains
# - sockobject: fileobj
# - data object - data
sock = key.fileobj
data = key.data
# 'mask' contains the events taht are ready
if mask & selectors.EVENT_READ:
recv_data = sock.recv(1024) # should be ready to read
if recv_data:
# any data that's read is appended to data.outb so it can be sent later
data.outb += recv_data
# block if no data is received
# this means that the client has closed their socket
else:
print('closing connection to', data.addr)
# unregister so that it's no longer monitered by select()
sel.unregister(sock)
# close the socket
sock.close()
# a healthy socket should always be ready for writing
if mask & selectors.EVENT_WRITE:
if data.outb:
print('echoing', repr(data.outb), 'to', data.addr)
# any received data stored in data.outb is echoed to the client
sent = sock.send(data.outb) # should be ready to write
# the bytes sent are then removed from the buffer
data.outb = data.outb[sent:]
if len(sys.argv) != 3:
print("usage:", sys.argv[0], "<host> <port>")
sys.exit(1)
host, port = sys.argv[1], int(sys.argv[2])
# ...
lsock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
lsock.bind((host, port))
lsock.listen()
print("listening on", (host, port))
# this is different from 'echo-server.py' as lsock is set to non-blocking mode
# when it's used with sel.select() below, we can wait for events on one or more sockets
# and read and write data when it's ready
# - key: a SelectorKey 'namedtuple' that contains a 'fileobj' attribute
# - mask: an event mask of the operations that are ready
lsock.setblocking(False)
# this registers the socket to be monitored with sel.select() for the events we are
# interested in
# data - is used to store whatever arbitrary data we'd like along with the socket.
# It's returned when select() returns. data will be used to keep track what's been sent
# and received on the socket.
sel.register(lsock, selectors.EVENT_READ, data=None)
try:
while True:
# sel.select(timeout=None) blocks until there are sockets ready for I/O.
# It returns a list of (key, event) tuples, one for each socket.
events = sel.select(timeout=None)
for key, mask in events:
# if key.data 'None', we know it's from the listening socket and we need to accept()
# the connection
if key.data is None:
accept_wrapper(key.fileobj)
# if key.data is not 'None', we know it's a client that's already been accepted, and
# we need to service it.
else:
service_connection(key, mask)
except KeyboardInterrupt:
print("caught keyboard interrupt, exiting")
finally:
sel.close()
It is similar to multiconn-server.py
but instead of listening for connections, it starts by initiating connections via start_connections()
:
import socket
import selectors
import types
import sys
sel = selectors.DefaultSelector()
messages = [b'Message 1 from client.', b'Message 2 from client.']
# 'num_conns' is read from the command-line, which is the number of connections to create to the server
def start_connections(host, port, num_conns):
server_addr = (host, port)
for i in range(0, num_conns):
connid = i + 1
print("starting connection", connid, "to", server_addr)
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# each socket is set to non-blocking mode
sock.setblocking(False)
# 'connect_ex()' is used instead of 'connect()' since 'connect()' would immediately raise a
# 'BlockingIOError' exception.
# 'connect_ex()' initially returns an erro indicator 'errno.EINPROGRESS', instead of raising
# an exception while the connection is in progress
sock.connect_ex(server_addr)
# once the connection is completed, the socket is ready for reading and writing
# the status is returned by 'select()'
events = selectors.EVENT_READ | selectors.EVENT_WRITE
# creates data we want stored in the socket
# messages the client will send to the server are copied using 'list(messages)' since each
# connection will call 'socket.sent()' and modify the list.
# connid - connection id
# msg_total - total bytes of messages sent
# recv_total - total bytes of messages received
# messages - message contents sent
# outb - message that is sent per 'send()' operation
data = types.SimpleNamespace(connid=connid, msg_total=sum(len(m) for m in messages), recv_total=0, messages=list(messages), outb=b'')
sel.register(sock, events, data=data)
def service_connection(key, mask):
sock = key.fileobj
data = key.data
if mask & selectors.EVENT_READ:
recv_data = sock.recv(1024) # should be ready to read
if recv_data:
print('received', repr(recv_data), 'from connection', data.connid)
# keeps track of the number of bytes received from the server
data.recv_total += len(recv_data)
# when it is not receiving any data or when 'data.recv_total' == 'data.msg_total', close connection
if not recv_data or data.recv_total == data.msg_total:
print('closing connection', data.connid)
sel.unregister(sock)
sock.close()
if mask & selectors.EVENT_WRITE:
if not data.outb and data.messages:
# takes out the last element in the list and saves it to 'data.outb'
data.outb = data.messages.pop(0)
if data.outb:
print('sending', repr(data.outb), 'to connection', data.connid)
# sends 'data.outb'
sent = sock.send(data.outb) # should be ready to write
# empties 'data.outb'
data.outb = data.outb[sent:]
if len(sys.argv) != 4:
print("usage:", sys.argv[0], "<host> <port> <num_connections>")
sys.exit(1)
start_connections(sys.argv[1], int(sys.argv[2]), int(sys.argv[3]))
try:
while True:
events = sel.select(timeout=1)
if events:
for key, mask in events:
service_connection(key, mask)
# Check for a socket being monitored to continue
if not sel.get_map():
break
except KeyboardInterrupt:
print("caught keyboard interrupt, exiting")
finally:
sel.close()
Our client here keeps track of the number of bytes it's received from the server so it can close its side of the connection. When the server detects this, it also closes its side of the connection.
Hence, here, the server depends on the client being well-behaved. If the client doesn't close, the server will leave the connection open. In a real application, we may want to guard against this and prevent client connections from accumulating if they don't send a request after a certain amount of time or if a specific data usage limit has reached.
Start the server first
// usage: ./multiconn-server.py <host> <port>
python multiconn-server.py 127.0.0.1 65432
Then, start the client:
// usage: ./multiconn-client.py <host> <port> <num_connections>
python multiconn-client.py 127.0.0.1 5
``
Sample Client output:
starting connection 1 to ('127.0.0.1', 65432)
starting connection 2 to ('127.0.0.1', 65432)
starting connection 3 to ('127.0.0.1', 65432)
starting connection 4 to ('127.0.0.1', 65432)
starting connection 5 to ('127.0.0.1', 65432)
sending b'Message 1 from client.' to connection 4
sending b'Message 1 from client.' to connection 5
sending b'Message 1 from client.' to connection 1
sending b'Message 1 from client.' to connection 2
sending b'Message 1 from client.' to connection 3
received b'Message 1 from client.' from connection 4
sending b'Message 2 from client.' to connection 4
received b'Message 1 from client.' from connection 5
sending b'Message 2 from client.' to connection 5
sending b'Message 2 from client.' to connection 1
sending b'Message 2 from client.' to connection 2
sending b'Message 2 from client.' to connection 3
received b'Message 2 from client.' from connection 4
closing connection 4
received b'Message 2 from client.' from connection 5
closing connection 5
received b'Message 1 from client.Message 2 from client.' from connection 1
closing connection 1
received b'Message 1 from client.Message 2 from client.' from connection 2
closing connection 2
received b'Message 1 from client.Message 2 from client.' from connection 3
closing connection 3
Sample Server output:
listening on ('127.0.0.1', 65432)
accepted connection from ('127.0.0.1', 53242)
accepted connection from ('127.0.0.1', 53243)
accepted connection from ('127.0.0.1', 53244)
accepted connection from ('127.0.0.1', 53245)
accepted connection from ('127.0.0.1', 53246)
echoing b'Message 1 from client.' to ('127.0.0.1', 53245)
echoing b'Message 1 from client.' to ('127.0.0.1', 53246)
echoing b'Message 1 from client.' to ('127.0.0.1', 53242)
echoing b'Message 1 from client.' to ('127.0.0.1', 53243)
echoing b'Message 1 from client.' to ('127.0.0.1', 53244)
echoing b'Message 2 from client.' to ('127.0.0.1', 53245)
echoing b'Message 2 from client.' to ('127.0.0.1', 53246)
echoing b'Message 2 from client.' to ('127.0.0.1', 53242)
echoing b'Message 2 from client.' to ('127.0.0.1', 53243)
echoing b'Message 2 from client.' to ('127.0.0.1', 53244)
closing connection to ('127.0.0.1', 53245)
closing connection to ('127.0.0.1', 53246)
closing connection to ('127.0.0.1', 53242)
closing connection to ('127.0.0.1', 53243)
closing connection to ('127.0.0.1', 53244)
Apart from OSError
, timeout etc, the main error can occur can processing the data itself. TCP only understands that it is receiving and sending raw bytes to and form the network. But doesn't understand the kind of data being transferred. Hence, this is where application-layer comes in.
According to Bitesize, Application Layer is a networking layer which encodes or decodes a message in a form that is understood by the sender and the receipient (link). It is used to understand the length and format of the application.
When we're reading bytes with recv(), we need to keep up with how many bytes were read and figure out where the message boundaries are. How is this done?
- One way is to always send fixed-length message. This is not a inefficient messages or insufficient for if the data is larger than the size we defined.
- Another way is what HTTP also does. We use a header that includes the content length as well as any other fields we need. Once we've read the header, we can process it to determine the length of the message's content and then allocate resources to consume the expected number of bytes.
We'll implement this by creating a custom header class that can send and receive messages that contain text or binary data.
Another problem can occuer with data interpretation. For example, if we receive data and want to use it in a context where it's interpreted as multiple bytes, for example a 4-byte integer, we'll need to take into account that it could be in a format that's not native to our machine's CPU. IF this is the case, we'll need to convert it to the host's native byte before using it. We'll avoid this by taking advantage of Unicode for our message and using the encoing UTF-8. Since UTF-8 uses an 8-bit encoding, there are no byte ordering issues. Read more in python's Encodings and Unicode documentation here.
The byte oder is referred to as CPU's endianness. Depending on where the the system stores most significant byte of a word (smallest memory address or largest memory address), the endianness is categorized as big-endian(BE) or little-endian(LE).
We can determine the byte oder of our machine using sys.byteorder
by doing:
python -c 'import sys; print(repr(sys.byteorder))'
'little' # output on my laptop
My laptop has little-endian byte ordering
More on the application, the UTF-8 encoding will only be used for header. FOr the acutal content in the message, we might have to swap the byte order manually if needed. This will depend on the application and whether or not it needs to process multi-byte binary data form a machine with different endianness.
More will be added soon..
Thanks to RealPython.