Plenary sessions

Summaries of the plenary sessions

We will write up short summaries of the plenary sessions here.

You can always find code examples from the sessions at:

https://gitlab.com/kristahi/in3230-plenaries-h20/

Plenary 1: 27.07

We talked about the general purpose of the plenaries and went through some of the skills required to master the practical aspects of the course, with self-evaluation of the participants.

We did a brief tutorial on using NREC, corresponding to the guide we have written.

We had a live-programming session focusing on the C socket API, implementing a simple client/server program. Sadly we ran into some technical problems at the end of the session, but we were in fact done with the code. Really, just try :)

You can find code, both the actual code we wrote in the session as well as more complete and annotated code at https://gitlab.com/kristahi/in3230-plenaries-h20/-/tree/master/01_basic_sockets.

Finally, we also talked about the most important UNIX command (according to me): man, or how to answer your own questions, figure out how system calls work and what includes you need in your source files (and so much more!). To discover even more fascinating information about the manual pages system, try:

$ man man

Plenary 2: 03.09

You'll find code examples related to this session here.

Socket API: raw sockets

Raw sockets are a special type of socket that enable programmers to both write and read frames directly at the link layer level, bypassing most of the kernel network stack.

This also means we have to construct and decode the whole frame, including the link layer (Ethernet in our case) headers.

Receiving raw frames

Receiving raw frames is pretty straight forward, the trick is just to use the AF_PACKET address family combined with the SOCK_RAW socket type when creating our socket. Note that these are Linux-specific mechanisms, raw sockets on other platforms use slightly different parameters.

A single raw socket can send and receive from any available network interface. The user executing the program needs the CAP_NET_RAW capability to be permitted to set up a raw socket; the easiest for us is to run as the superuser (root), which obviously has this capability.

Sending raw frames

Things get a bit more interesting when it comes to sending frames.

Packed structs

As mentioned above, we will need to construct all headers ourselves. A useful tool here is to use C structs defined with the __attribute__((packed)) pragma.

What this does is to instruct the compiler to encode the struct in memory exactly as we specify. Usually, the compiler can do tricks behind the scenes such as adding padding bytes to make elements better aligned in memory; this usually improves performance, you can read more about memory alignment here. However, that is not very useful when we want to express a specific encoding so that we may directly copy the struct's in-memory representation into a frame.

Packed structs are technically not part of standard C, but are available in most common C compilers in some form; you can use this form in both GCC (which we use) and clang (the other main C compiler in the UNIX world, part of Apple's LLVM compiler project).

Enumerating interfaces

Filling in our own headers means we need to know about things such as the proper hardware MAC addresses corresponding to the host's network interfaces. We'll also be specifying which interface exactly to use for emitting each frame.

The only reasonable way to do this (hardcoding or expecting the user to input attributes such as the correct internal index into the interface list is not really reasonable) is to enumerate the available interfaces.

There are several ways of doing this, in the example we use getifaddrs which will return all the information we need. Other ways include using the ioclt API, etc.

getifaddrs also confronts us with some memory management issues. This function will dynamically allocate memory (since it returns a variable size, linked list of interfaces with their associated attributes), and hands it over to us. It then becomes our responsability to free this memory when we are done processing the data; there is a handy freeifaddrs function which will sort it for us, but we have to remember to call it.

One useful abstraction you can use to keep track of dynamic memory is to maintain a notion of "ownership" of dynamically allocated buffers/objects. The principle is that the owner of a buffer is the one responsible to clean it up (i.e. freeing it). In some cases it is necessary to hand over ownership to someone else (well, some other function usually), when the lifetime of the object is longer than the current execution context. This is the case for getifaddrs which returns before the caller has had the opportunity to process the data generated. Make sure to document these ownership handovers well and your memory-managing life will become easier!

Combining several buffers into one message with sendmsg

Since we are constructing the entire frame (message to be more general) in several logically distinct parts - Ethernet header, network header, transport header, application header, payload, etc. - it can be useful to also let some of these elements actually exist in different parts of memory and only assemble them when we make the system call to send them.

In terms of performance, recall that there will need to be a final copy operation between userspace and kernel space during the system call, so keeping copying operations to a minimum in our application code is going to be beneficial.

The standard library offers us an API for achieving exactly that: sendmsg and recvmsg.

These functions take a struct msghdr argument, which will point at an array of struct iovec elements that in turn specify buffer location and sizes. The kernel will then only copy these buffers into its kernel space buffer (or vice versa for receiving) when it is time to cross the user/kernel space boundary. This is called a scatter/gather operation.

Note that it is perfectly valid to construct your frames in other ways, for instance using memcpy operations or by pointing struct pointers into appropriate locations within one "bulk" buffer.

C aside: combining code from several files

The sniffer example also illustrates how to correctly combine code that is split across several source files (compilation units is the fancy name for them).

The DumpHex function we found on GitHub lives in its own file, DumpHex.c. To use it in, for example, packet_socket.c, we need to provide a forward declaration of the function to allow the compiler to do its work:

extern void DumpHex(const void* data, size_t size);

The extern qualifier is technically not mandatory here, because it is the default. However, I find it is helpful to specify it explicitly to highlight the fact this function is actually defined elsewhere. The complement to extern linkage is static linkage, which means the symbol (function or variable) is local to the current compilation unit only.

Mostly, we will organize such declarations in separate header (.h) files, more on that in an upcoming session.

Now we just need to tell the compiler (actually the linker, but these steps are all chained automatically for us) what files it needs to look at to find all definitions:

$ cc -o sniff sniff.c DumpHex.c

Now that we know the right way, it is worth highlighting that it is
not correct to try to use preprocessor includes to achieve this:

#include "DumpHex.c" /* WRONG! Never do this! */

It might actually work in simple cases, but quickly causes a lot of
issues.

Other socket programming resources

There are a number of good resources on BSD socket programming out there, although you need to remain critical since there is also a lot of incorrect information floating about.

One resource we can recommend is the rather well known "Beej guide".

Publisert 27. aug. 2020 15:48 - Sist endret 4. sep. 2020 11:10