------------------[ Readings ]------------------ Read chapters on UDP and ICMP in Shalunov's book. These are brief summaries; the Stevens textbook chapters on these are in-depth. Read Ch. 8 of the textbook about ICMP. We covered roughly 8.1--8.3; you can stop there at a first reading. Read Ch. 10 of the textbook about UDP. We covered roughly 10.1--10.4; you can stop there at a first reading. Re-read notes from Lecture 2 and think about how NAT works. Read textbook Ch 7.3 about NAT. A reminder: nice diagrams of IP, UDP, TCP, and ICMP headers can be found at https://nmap.org/book/tcpip-ref.html Keep looking at these! Read ahead: Ch. 12 (TCP preliminaries) ------------------[ Optional: more on NAT ]------------------ NAT is described in Ch. 7 of the textbook. We covered roughly Ch. 7.3. NAT is one of the "hacks" on the Internet protocols. In Linux, it is implemented by the Netfilter subsystem of the kernel's TCP/IP stack, and is described here: http://netfilter.org/documentation/HOWTO//NAT-HOWTO.html (read http://netfilter.org/documentation/HOWTO//networking-concepts-HOWTO.html for some terminology), and in detail here: https://www.netfilter.org/documentation/HOWTO/netfilter-hacking-HOWTO-4.html#ss4.3 Note that these recipes go back to the time when you Internet connection was over dialup, over the protocol called PPP and the interface named like "ppp0" which represented the modem's uplink connection. In the class, we just used a regular Ethernet interface connected to an external network. ==================[ Switches and LANs ]================== In networking terms, a LAN is a "broadcast domain": all hosts connected to a LAN see each others' broadcast packets (such as ARP requests). Broadcast on a LAN is how hosts can find each others' MAC addresses knowing only IP addresses (remember that the ARP who-has request is broadcast). ARP-ing for an IP of a host that cannot receive your broadcasts will fail (unless the ARP request is somehow specially proxied to the host, and its response is proxied back). So being a broadcast domain is a necessary condition for all computers on an Ethernet LAN to be able to talk to each other. A switch maintains a table (a "CAM table") that maps source MAC addresses to the (physical) port on which they were seen. Incoming packets (more correctly called "frames" at the Ethernet layer) get their destination MAC checked against this table; if the dst MAC is in the switch's CAM table, the frame will only be played to the port where the interface with that MAC is known to be connected. This creates "virtual circuits" on a switch; sniffing on a third port for the communications between machines connected to two other ports will get you no unicast packets going between these machines. Only broadcast packets sent by these machines occasionally will be heard on your port by your sniffer. A way of getting around this is ARP poisoning. Look for links about in the previous lecture notes. A VLAN is a grouping of a switch's ports into several broadcast domains: when a port is a part of a VLAN, only ports in the same VLAN group get the broadcast traffic. VLANs can be extended across switches by having the switch add a special header between the Ethernet header and the IP header to every incoming frame on a VLAN port, and having other ports strip it---except for the "trunk port" that connects to another switch similarly configured. This way the grouping of ports can be extended through many switches in an organization, all having the same configured VLAN numbers. Dartmouth uses VLANs heavily (and mostly without the users noticing). For example, all VoIP phones are on a separate VLAN from other computers. You can find out much about VLANs from this Cisco presentation: http://www.cs.dartmouth.edu/~sergey/me/netreads/L2-security-Bootcamp.pdf (but skip the intro to the MAC attacks). ------------------[ DHCP ]------------------- DHCP is described in Ch. 6 of the textbook. We covered roughly 6.1---6.2.4. A popular mechanism for a computer joining a LAN without an assigned IP address is DCHP (Dynamic Cost Configuration Protocol). Under this protocol, the hosts sends broadcast packets with a request for an available IP address and queries for several other network details such as netmask, default router/gateway, local DNS server, etc. These requests are wrapped in Ether/IP/UDP, but the host has no IP address yet; so the Ethernet destination address is ff:ff:ff:ff:ff:ff and the IP address is 255.255.255.255 (or 0xffffffff). DHCP payload could have been a special protocol like ARP, wrapped only in an Ethernet header, and with a separate Ethertype (e.g., IP is 0x0800, ARP is 0x0806, IPv6 is 0x86DD etc.---see more types at https://en.wikipedia.org/wiki/EtherType) A prominent feature of DHCP is a list of "options", i.e., pieces of configuration info requested. These sets of options are slightly different between different OSes, and can be used to fingerprint the OS of the requester. Find sample DHCP requests and responses in l3-dhcp.pcapng. The first six packets (3 requests and responses) are from one network, a hospital guest network (where the answering DHCP server has the address 1.1.1.1, which it legally cannot have---find out who it belongs to!). Then the laptop was closed and slept while being physically moved to another location (our classroom). Upon waking, it requested the same IP address it had on the previous network ("Requested IP address: 172.18.65.104"). But our classroom's Dartmouth Public uses a _different_ private range within 10/8, as opposed to the hospital's 172.18/16---so the DHCP server responded with a NAK---"you can't have it". After this, the laptop went through the full phase of DHCP discovery: sent a DHCP Discover packet to find out more about the server (since the DHCP server apparently changed), and got a DHCP Offer from it. This part was cut off from the hospital capture, but this is how DHCP is started on a new network. If you occasionally wonder why getting a new IP seems slow, remember that DHCP is a two-step protocol. DHCP is not authenticated & is easily attacked. See, e.g., http://hakipedia.com/index.php/DHCP_Starvation http://hakipedia.com/index.php/DHCP_Rogue_Server ----------------[ Datagrams and Multiplexing ]---------------- The two main principles of the Internet design are that (1) IP packets are _datagrams_, in the sense that each contains all the information needed to route it to the destination, and routing only needs this information (i.e., the dst IP address). Packets may even take different paths to the target, as we saw with our traceroutes to Google's servers. Each IP packet lives and dies on its own. (2) Connections are _multiplexed_ through several layers. Ethernet carries IP and ARP (and IPv6 when available); IP carries TCP, UDP, and ICMP (and many others), UDP carries DHCP and DNS, TCP carries HTTP, HTTP, SSH, and many more. In each layer, some number tells how to interpret the next layer's payload: in Ethernet, the 2-byte Ethertype field (0x0800 for IP, 0x806 for ARP, ...); in IP, the 1-byte protocol field at offset 9 (1 for ICMP, 6 for TCP, 17 for UDP, etc. ---see /etc/protocols). For TCP and UDP payloads, demultiplexing switches to port numbers, assigned by convention to application protocols: 80 for HTTP, 22 for SSH, 443 for HTTPS in TCP, 53 for DNS in UDP (as well as in TCP, for longer name queries). For more, see /etc/services . Since there are many application protocols, applications cannot be held to use only pre-assigned ports---this works only for a few "well-known" ports and applications. The rest register with the operating system and negotiate for port numbers; we'll see how this works when we look at the use of the bind() system call in the TCP server code. Figure 1.6 in the textbook illustrates this principle. ----------------[ Berkeley sockets ]---------------- Look at the TCP client code in tcpcli.c . "Beej's Guide to Network Programming" http://beej.us/guide/bgnet/output/html/multipage/index.html explains the function of the structs and the system calls. Look them up in this guide as you read the code line by line. NOTE: socket() and connect() are _system calls_: that is, with these functions you ask the kernel to perform a low-level task on your behalf. The small integer _socket descriptor_, which is returned by socket(), is used to connect all your requests together (so that then kernel knows which data and connection they pertain to). Moreover, you can read and write bytes from this socket descriptor the same way you read and write open files in C. That is the whole point of "stream" (TCP) sockets: they create an illusion that you read and write a sequential source, whereas in reality your reads and writes turn into chunks of data that traverse the 'net, encapsulated in TCP/IP packets. In Lab 1, you will write C client code for a custom protocol. ----------------[ Examining headers and data structs ]---------------- You might wonder why it takes so much work and a special C struct to pass just an IP address and a port number to Berkeley sockets. Berkeley sockets were designed to accommodate a variety of protocols and network address families (many of which have fallen out of use). We'll be using only a few combinations out of these many. You'll get some idea of that variety as you read "man 2 socket". It's important to not let this variety confuse you. For example, it helped me to know that AF_INET is just a define for "2", and it resides in /usr/include headers: $ grep -r AF_INET /usr/include | grep define /usr/include/sys/socket.h:#define AF_INET 2 /* internetwork: UDP, TCP, etc. */ /usr/include/sys/socket.h:#define AF_INET6 30 /* IPv6 */ /usr/include/sys/socket.h:#define PF_INET AF_INET /usr/include/sys/socket.h:#define PF_INET6 AF_INET6 and SOCK_STREAM is 1 : $ grep -r SOCK_STREAM /usr/include | grep define /usr/include/sys/socket.h:#define SOCK_STREAM 1 /* stream socket */ and the other defines nearby spell out other usable constants, like SOCK_DGRAM 2, and my favorite, SOCK_RAW 3. Another way to see what definitions are pulled into your program is to stop the C compiler just after the preprocessing stage, when all #include's are resolved: $ gcc -E tcpcli.c | less That will pull in _a lot_ of typedef's---because in network code the exact byte lengths of values are important, as must match exactly the fields of a packet on the wire. So you will see types for "8-bit unsigned integer" (uint8_t), "16-bit unsigned integer" (uint16_t), and so on: $ gcc -E tcpcli.c | grep uint8 | grep typedef typedef unsigned char __uint8_t; typedef __uint8_t sa_family_t; <-- see below, this is the "address family" like AF_INET typedef unsigned char uint8_t; typedef uint8_t uint_least8_t; typedef uint8_t uint_fast8_t; $ gcc -E tcpcli.c | grep uint16 | grep typedef typedef unsigned short __uint16_t; typedef __uint16_t __darwin_mode_t; <-- I am doing this on a Mac, hence Darwin typedef __uint16_t in_port_t; <--- that's a type for a port in UDP or TCP header typedef __uint16_t nlink_t; typedef unsigned short uint16_t; typedef uint16_t uint_least16_t; typedef uint16_t uint_fast16_t; You will also see the expanded structures definitions that system calls like connect() use, e.g.: struct sockaddr_in { __uint8_t sin_len; sa_family_t sin_family; in_port_t sin_port; <--- a two-byte integer for a port number struct in_addr sin_addr; <--- this must be an IP address char sin_zero[8]; }; and then you can search back for these types, like struct in_addr { in_addr_t s_addr; }; typedef __uint32_t in_addr_t; <--- a 4-byte integer, as expected of an IP address and so on. Once you've chased them through the code, these data structures don't look so intimidating. ----------------[ Look ahead: server C code ]---------------- Look ahead at the TCP _server_ code examples: tcpserv.c (a simple echo server that forks a child for each connection) tcpserver-nofork.c (no fork, all work is done in the main process)