=========================[ Readings ]========================= The BPF/Libpcap paper from 1992: http://www.vodun.org/papers/net-papers/van_jacobson_the_bpf_packet_filter.pdf A quick tutorial on using IPtables: https://www.netfilter.org/documentation/HOWTO/packet-filtering-HOWTO-7.html On IPtables internals: https://www.netfilter.org/documentation/HOWTO/netfilter-hacking-HOWTO.html, especially Section 3: https://www.netfilter.org/documentation/HOWTO/netfilter-hacking-HOWTO-3.html ==================[ Packet-level networking ]================== Berkeley sockets APIs abstract away the packet level of networking, giving you abstractions of streams (SOCK_STREAM) or "datagrams" (SOCK_DGRAM, i.e., UDP). However, most network tools---and the kernel itself---work with raw packets. Today we surveyed several of these mechanisms: 1) Raw sockets (SOCK_RAW). These are the primary way to send a crafted packet, bypassing the kernel's constructions of its headers (and at the cost of having to build all the headers yourself, including the pesky TCP/IP checksums). But for sniffing raw packets, there is a better way. 2) Libpcap + BPF (Berkeley Packet Filter). BPF is a kernel mechanism for taking packets from the network driver, buffering and filtering them, and then giving them to a userland application like tcpdump or Wireshark for analysis. These applications use the libpcap library to read these packets. We went through the code of tcpdump in class; I suggest you spend more time with it. See below for more detail. 3) IPtables/Netfilter in Linux and PF in *BSD (including MacOS). These firewalls dissect packets to apply rule-based decisions to them (e.g., block all packets for port 22 from a particular network, or allow only packets from a particular network like 129.170./16 and drop packets from all other sources). These firewalls also rewrite packets, replacing, e.g., source and IP addresses as needed by NAT. You specify the rules with the command like tool iptables. Linux's Netfilter has an amazing feature: you can write rules to match packets, "steal" them from the kernel into a buffer, give them into a userland program, and then, after the userland program works on them, re-inject the packets back into the kernel, changed or unchanged. This feature exists in two versions: IPQUEUE and NFQUEUE (newer version). That way you could do NAT in your own userland program if the kernel's version is, for some reason, insufficient. 4) Virtual network interfaces, TUN/TAP on Linux, UTUN on MacOS. These are used by VPNs such as OpenVPN. With these drivers loaded, you can tell the kernel to create and configure a network interface with its own IP address and netmask, and to route into it packets for certain destinations. Then a userland program can open() this interface and read these packets as byte buffers with read() and send packets on that interface with write(). This is close to raw sockets, but requires setting up routing so that packets would turn up. We will discuss (4) later in the course. (1)--(3) are the key tools of network administration. These mechanisms are like the Force: they hold networks together and make them understandable and manageable. =====================[ Libpcap, tcpdump ]===================== Read the paper 1992 paper http://www.vodun.org/papers/net-papers/van_jacobson_the_bpf_packet_filter.pdf Libpcap allows you to configure capture and filtering of packets by the kernel, and then run the endless loop that calls your "callback" function for every packet. This callback function gets a pointer to the raw bytes of a packet (3rd argument), a struct with packet metadata (2nd argument), and a pointer to some data structure of your own that will be the same between all calls, and is intended for the different invocations of the handler to share data without making that data global: From "man pcap_loop": #include typedef void (*pcap_handler)(u_char *user, const struct pcap_pkthdr *h, const u_char *bytes); struct pcap_pkthdr { struct timeval ts; /* time stamp */ bpf_u_int32 caplen; /* length of portion present */ bpf_u_int32 len; /* length this packet (off wire) */ #ifdef __APPLE__ char comment[256]; #endif }; This becomes the main loop of a pcap-using program: int pcap_loop(pcap_t *p, int cnt, pcap_handler callback, u_char *user); Most of Libpcap's functions (and there are many, all named pcap_*) take a "pcap descriptor" pcap_t, which is created when you open either a live interface for sniffing (pcap_open_live) or a saved file (pcap_open_offline). You can think of this descriptor as an object (in C++ or Java sense) on which these functions work as methods (in fact, C++ methods under the hood are functions that take an extra argument, a pointer to the object, called the "implicit" argument, which is available inside these functions as "this"). Note this pattern of callback called multiple times with a shared context. It is a very important C design pattern. Looking through tcpdump's main source file, tcpdump.c, (get http://www.tcpdump.org/release/tcpdump-4.9.0.tar.gz) you see the skeleton of libpcap-based programs: *ebuf = '\0'; pc = pcap_open_live(device, ndo->ndo_snaplen, !pflag, 1000, ebuf); if (pc == NULL) { ... error("%s", ebuf); } (device is a string name, such as "eth0", "lo", or "any"; ebuff is a buffer to catch a pcap-specific error message if any). status = pcap_loop(pd, cnt, callback, pcap_userdata); (main loop; only an error or ^C will break out of it; the rest is up to callback. Any shared context is passed via the pcap_userdata struct). See my pcap-hexprint.c example in http://www.cs.dartmouth.edu/~sergey/cs60/pcap/ Practice with extending the printer to print the IP addresses of the source and destination, and the ports for TCP and UDP for each frame. --------------------[ BPF filtering ]-------------------- Libpcap allows you to set a filter for packets to capture. This saves both CPU and storage. The filtering occurs in the kernel, on a buffer of frames received by the network driver; only those passing the filter are copied into the buffers given to your libpcap-based application. Even though packet filtering is ultimately the same as you'd write in C to check a packet's fields, it would be foolhardy to let tcpdump users inject C code into the kernel. Instead, the filter is compiled to bytecode for a very simple virtual machine, and that bytecode is sent to the kernel, and executed there in a VM. The idea is that limited bytecode is easier to analyze and constrain to just the desired operations than C code. You can see the bytecode with the -d option to tcpdump, and it's really easy to read: tcpdump -d -i en0 icmp (000) ldh [12] // get 2 bytes at offset 12, the ethertype (001) jeq #0x800 jt 2 jf 5 // compare with 0x800, ethertype for IP // jump to 002 if true, 005 if false (002) ldb [23] // get one byte at offset 23, i.e., offset 9 // into the IP header, the protocol number (003) jeq #0x1 jt 4 jf 5 // compare with 1, proto number for ICMP // jump to 004 if true, 005 if false (004) ret #65535 // accept packet, we want it (005) ret #0 // discard packet Check out the filters for "host 10.10.10.10" (anything to or from that IP) and for "not port 22" (useful when you run tcpdump on a machine you are ssh-ing into). Check out where in tcpdump code pcap_compile() creates a bytecode blob from your filter string, and where pcap_setfilter() sends it into the kernel. Note that frames given by a device driver to BPF may have link-layer headers of different sizes for different drivers and links. Ethernet header of 14 bytes (ethertype/protocol number in the two last bytes, offsets 12--13) is just one option. For this reason, pcap provides the function pcap_datalink(). The returned number is one of the DLT_ constants from /usr/include/pcap/bpf.h and is used to determine the offset of the IP header from the start of the frame, and the layout of the frame. --------------------[ Control flow ]-------------------- Tcpdump code consists mainly of parsers-and-printers for various protocols, contained in files named print-.c . The parsers for protocols lower in the stack pass control to those higher in the stack, based on values in the packet such as "ethertype" in the Ethernet frame ("ether[12:2]" in tcpdump filter language), "protocol" in the IP packet ("ip[9]"), well-known ports like UDP port 53 for DNS, and so on. For example, there's print-ip.c, with the main entry function ip_print(), which prints the bulk of info you see in tcpdump's output. This function is called from files that deal with link-layer protocols, most frequently print-ether.c [I use ctags for quickly finding where a function or variable is defined in code. Consult this tutorial (where ctags are applied to Linux kernel tree): https://courses.cs.washington.edu/courses/cse451/10au/tutorials/tutorial_ctags.html] $ grep -r ' ip_print' . | grep -v tags ... ./print-ether.c: ip_print(ndo, p, length); ... The caller is ethertype_print(), itself called from ether_print(), called in turn from ether_if_print(), the top-level routine of the printer: /* * This is the top level routine of the printer. 'p' points * to the ether header of the packet, 'h->len' is the length * of the packet off the wire, and 'h->caplen' is the number * of bytes actually captured. */ u_int ether_if_print(netdissect_options *ndo, const struct pcap_pkthdr *h, const u_char *p) { return (ether_print(ndo, p, h->len, h->caplen, NULL, NULL)); } But who calls ether_if_print()? It is called based on the data link type---the same as returned by pcap_datalink()---based on the top-level table called "printers". In version 4.3.0, this table is in tcpdump.c; in the latest version 4.9.0, it's in a separate file, print.c, and has some new utility functions, like has_printer() and lookup_printer() (trace the use of these through the code): static const struct printer printers[] = { { ether_if_print, DLT_EN10MB }, ... #ifdef DLT_IEEE802_15_4 { ieee802_15_4_if_print, DLT_IEEE802_15_4 }, // this is ZigBee #endif ... { raw_if_print, DLT_RAW }, #ifdef DLT_IPV4 { raw_if_print, DLT_IPV4 }, #endif ... #ifdef DLT_IPV6 { raw_if_print, DLT_IPV6 }, #endif #ifdef HAVE_PCAP_USB_H #ifdef DLT_USB_LINUX { usb_linux_48_byte_print, DLT_USB_LINUX}, // pcap can sniff the USB bus! #endif /* DLT_USB_LINUX */ ... #ifdef DLT_IEEE802_11 { ieee802_11_if_print, DLT_IEEE802_11}, // this is Wi-Fi in Monitor Mode #endif and so on. Note this pattern, a dispatch table: it is also very common in C. I suggest you spend some time with tcpdump code to see what a mature network tool looks like. =====================[ Netfilter and IPtables ]===================== Tcpdump code parses packets in userland. The kernel must also process packets, to give them to appropriate functions to further process, and so on. So kernel code branches out in a tree of per-protocols handlers just like tcpdump's (for processing rather than printing, of course). We saw how Netfilter hooks fit with the kernel's networking code in slides 4--9 http://www.cs.dartmouth.edu/~sergey/netreads/joanna-rutkowska-passive-covert-channels-slides.pdf All functions in blue boxes are actual function names in the Linux kernel; e.g., arp_rcv() is at http://lxr.free-electrons.com/source/net/ipv4/arp.c#L901 and so on. Netfilter is very powerful, and can be either a packet filtering firewall, a NAT, a port forwarder, a DDoS countermeasure, or all of the above. For examples, see: http://blog.erratasec.com/2016/10/configuring-raspberry-pi-as-router.html https://javapipe.com/iptables-ddos-protection https://www.karlrupp.net/en/computer/nat_tutorial Examples from my machine: # iptables-save # Generated by iptables-save v1.4.21 on Wed Apr 26 18:46:44 2017 *filter :INPUT ACCEPT [522874:336579980] :FORWARD ACCEPT [48849:6966028] :OUTPUT ACCEPT [165758:31420812] -A INPUT -s 218.64.0.0/15 -i eth0 -p tcp -m tcp --dport 22 -j DROP -A INPUT -s 103.193.200.0/22 -i eth0 -p tcp -m tcp --dport 22 -j DROP -A INPUT -s 121.16.0.0/13 -i eth0 -p tcp -m tcp --dport 22 -j DROP -A INPUT -s 210.13.0.0/16 -i eth0 -p tcp -m tcp --dport 22 -j DROP -A INPUT -s 220.191.0.0/16 -i eth0 -p tcp -m tcp --dport 22 -j DROP -A INPUT -s 221.192.0.0/14 -i eth0 -p tcp -m tcp --dport 22 -j DROP (these are blocks for incoming connections attempting to bruteforce my SSH passwords for root: # grep 218.65. /var/log/auth.log | head Apr 23 10:48:03 throk sshd[30863]: reverse mapping checking getaddrinfo for 38.30.65.218.broad.xy.jx.dynamic.163data.com.cn [218.65.30.38] failed - POSSIBLE BREAK-IN ATTEMPT! Apr 23 10:48:03 throk sshd[30863]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.38 user=root Apr 23 10:48:04 throk sshd[30863]: Failed password for root from 218.65.30.38 port 10518 ssh2 Apr 23 10:48:06 throk sshd[30863]: Failed password for root from 218.65.30.38 port 10518 ssh2 Apr 23 10:48:10 throk sshd[30863]: Failed password for root from 218.65.30.38 port 10518 ssh2 Apr 23 10:48:10 throk sshd[30863]: Received disconnect from 218.65.30.38: 11: [preauth] Apr 23 10:48:10 throk sshd[30863]: PAM 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=218.65.30.38 user=root Apr 23 10:48:40 throk sshd[30865]: reverse mapping checking getaddrinfo for 38.30.65.218.broad.xy.jx.dynamic.163data.com.cn [218.65.30.38] failed - POSSIBLE BREAK-IN ATTEMPT! ... # grep 218.65. /var/log/auth.log | wc 19763 288473 2345516 That's nearly 20,000 logged attempts from Apr 23 to Apr 25, until I blocked it, and almost 2M of log space. This is the Internet these days, folks. Moreover, these rules are mostly futile as a defense, as brute-force scanners change networks all the time, and there's no lack of devices with default passwords for root---like many IoT cameras---that join the ranks of scanners. It's a losing battle; I just put in these rules to save myself some scrolling while I had to watch the log in real time for something else. The right way to frustrate the scanners is to allow connections from specific networks and drop the rest: # iptables -A INPUT -s 129.170.0.0/16 -i eth0 -p tcp -m tcp --dport 22 -j ACCEPT # iptables -A INPUT -s -i eth0 -p tcp -m tcp --dport 22 -j ACCEPT # iptables -A INPUT -i eth0 -p tcp -m tcp --dport 22 -j DROP (-A means append. The rule order matters: if I were to start with the DROP rule, my connection would drop, too! It's a good idea to leave yourself a connection that you know would not be blocked by your new rules, or test new rules with "iptables-save > sane.rules" and then "iptables && sleep 20 && iptables-restore < sane.rules", which will reload sane rules after 20 seconds. I learned this the hard way.) Use your virtual machine to try various rules from https://www.netfilter.org/documentation/HOWTO/packet-filtering-HOWTO-7.html One rule we will need in Lab 4 will drop outgoing TCP RST packets (so that our own raw TCP implementation would not be interfered with by the kernel TCP stack, which knows nothing about our connections): # iptables -A OUTPUT -p tcp --tcp-flags RST RST -j DROP (the first RST is the set of flags to examine, the second is exactly which of these flags must be set to match the rule.) We saw that by setting this rule in the VM and observing that RSTs in response to a netcat connection no longer reach the host. Thus netcat doesn't immediately quit after getting a RST back, but keeps sending SYN packets.