-------------------------[ Readings ]------------------------- Today we looked into the programming patterns and tools that can help emulate Linux's TCP stack with most accuracy and least effort. Work through the details of the following diagrams: the TCP state machine, the sliding window algorithm, retransmission timers for multiple packets, and TCP connection teardown: http://www.tcpipguide.com/free/t_TCPOperationalOverviewandtheTCPFiniteStateMachineF-2.htm http://www.tcpipguide.com/free/t_TCPConnectionTermination-2.htm http://www.tcpipguide.com/free/t_TCPSlidingWindowDataTransferandAcknowledgementMech-5.htm http://www.tcpipguide.com/free/t_TCPSegmentRetransmissionTimersandtheRetransmission-3.htm The TCP 3-way handshake has a similar diagram: http://www.tcpipguide.com/free/t_TCPConnectionEstablishmentSequenceNumberSynchroniz-2.htm Handling TCP packets and connections in the Linux kernel converged to several data structures. These posts summarize Linux's per-packet structure SBK and its specific subset used to track TCP packets: http://vger.kernel.org/~davem/skb.html http://vger.kernel.org/~davem/skb_redundancy.html --how a packet is handled through the layers http://vger.kernel.org/~davem/skb_list.html http://vger.kernel.org/~davem/skb_data.html http://vger.kernel.org/~davem/tcp_skbcb.html --TCP-specific per-packet data http://vger.kernel.org/~davem/tcp_output.html --how outgoing packets are queued Of course, the kernel has a much bigger scope of functionality to implement that you do, so copying these data structures for your own use would be tremendously excessive; you will be able to get away with a lot less. But I believe it helps to see what data is collected into SKBs and TCBs, and why. Note that these structures do no include variables to track the TCP sliding window, only the packets themselves and per-packet information! ----------[ Multiplexing listening on several sockets ]---------------- The blueprint for a typical TCP server is built on the accept() blocking system call: the server blocks on a listening socket until new input is available, then hands off the processing of that input to a fork()-ed child process or thread, while continuing to listen for more input. The decision that accept() returns a new socket descriptor---which is inherited through a fork---is what makes this design work. However, note that this scheme breaks down if you have more than one source of input. Imagine that you have not one but two or more sockets to listen on, or a socket to read from and a timer to attend to. A blocking system call like accept() or recv() is only meant to react to _one_ source of input and one kind of event; you need something else to handle more than one! This was the motivation for introducing first the select() system call, then the poll() system call, and, lately, the even more versatile epoll(). In class, we considered poll(). Poll() allows you to wait on an array of sockets (more precisely, an array of "pollfd", which includes file descriptors and associated masks for events desired and observed), blocking until one of these sockets has data to read (or will allow you to write data into it, or encounters an error condition---we won't be using these events). You get to define which events you are interested in for each file descriptor by setting the bitmask "events" in the array data structure. When the call to poll() unblocks, this means that one or more of the desired events have occurred, and you can find which one(s) by examining their entries (the "revents" member of the array data structure). In the code example below, we combine this use of poll() with Linux timers. There are various sources on comparing select() vs poll() vs epoll(). A useful intro is http://www.linux-mag.com/id/357/, and http://stackoverflow.com/questions/970979/what-are-the-differences-between-poll-and-select provides more pointers. ----------------[ Linux timers ]---------------- Linux provides a convenient way of combining poll()-ing sockets and acting on timers. Specifically, you can create a timer that causes a poll()-able file descriptor to have data to read exactly when its associated timer expires. That way, you can poll() with the same system call line of code for two kinds of events: data arriving on a socket (in our case, a raw packet arriving on a raw socket), and a timer (such as a TCP retransmit timer) expiring. Example from class, combining timerfd_create() and poll(): http://www.cs.dartmouth.edu/~sergey/cs60/lab4/tcp-timer.c Read "man 2 poll", "man 7 time" and "man timerfd_create" in Linux for the documentation of these calls and their arguments. ----------------[ Libnet ]---------------- Libnet is the library for crafting IP packets of carious kinds, saving you the effort of recomputing IP and TCP checksums, and a number of other manual tasks needed to send a packet via raw sockets. Libnet was the basis of the first generation of network security tools that exposed many vulnerabilities of TCP/IP implementations. You can install Libnet in your virtual machines with "apt-get install libnet1-dev" as root. You will find the libnet tutorial at https://repolinux.wordpress.com/2011/09/18/libnet-1-1-tutorial/ , code examples from it on Github at https://github.com/repolho/Libnet-1.1-tutorial-examples . My example from class is in http://www.cs.dartmouth.edu/~sergey/cs60/lab4/libnet-example-icmp.c I posted a local copy of the libnet manual in http://www.cs.dartmouth.edu/~sergey/cs60/libnet1-doc/ Specifically, the function list is in http://www.cs.dartmouth.edu/~sergey/cs60/libnet1-doc/libnet-functions_8h.html Note that Libnet has a lot of knowledge about IP protocols encoded not only into its functions like libnet_build_ipv4() but also into its environment. When you build a TCP, UDP, or ICMP payload and save the tag from it, and then finish the packet by adding the IPv4 layer, you won't need to rebuild the IPv4 again for another ICMP packet---if you reuse the tag. Libnet uses tags to keep track of how packets were built, including their outer layers, and will rebuild them automatically if it can. This functionality is, of course, heuristic, but it can save you a lot of effort. Also note that Libnet's raw packet sending function, libnet_write(l), takes just the opaque context l, not the packet! This means that Libnet has the concept of the current packet being built, with all of its layers belonging together, and will send _that_ packet when you request it with libnet_write(). At the same time, you can have many buffers for packets lying around from previous libnet_build_* calls, identified by tags, and reuse them as the "current" packet. See https://repolinux.wordpress.com/2011/09/18/libnet-1-1-tutorial/#sending-multiple-packets about the use of tags. One caveat about Libnet: it exists in two incompatible versions, 1.0 and 1.1. Some tutorials and examples (such as my favorite tool Dsniff, https://www.monkey.org/~dugsong/dsniff/) use the older 1.0 version, which is very different from the more complex version we discussed. Libnet 1.0, though, has useful functions and is easier to read. You can find this older deprecated version at http://packetfactory.openwall.net/projects/libnet/index.html together with its reference manual. Thanks, --Sergey