------------------[ Readings ]------------------

Read chapters on UDP and ICMP in Shalunov's book. These are brief
summaries; the Stevens textbook chapters on these are in-depth.

Read Ch. 8 of the textbook about ICMP. We covered roughly 8.1--8.3;
you can stop there at a first reading.

Read Ch. 10 of the textbook about UDP. We covered roughly 10.1--10.4;
you can stop there at a first reading.

Re-read notes from Lecture 2 and think about how NAT works.
Read textbook Ch 7.3 about NAT.

A reminder: nice diagrams of IP, UDP, TCP, and ICMP headers can be
found at https://nmap.org/book/tcpip-ref.html Keep looking at these!

Read ahead: Ch. 12 (TCP preliminaries)

------------------[ Optional: more on NAT ]------------------

NAT is described in Ch. 7 of the textbook. We covered roughly Ch. 7.3.

NAT is one of the "hacks" on the Internet protocols. In Linux,
it is implemented by the Netfilter subsystem of the kernel's
TCP/IP stack, and is described here:
http://netfilter.org/documentation/HOWTO//NAT-HOWTO.html
(read http://netfilter.org/documentation/HOWTO//networking-concepts-HOWTO.html
 for some terminology), and in detail here:
https://www.netfilter.org/documentation/HOWTO/netfilter-hacking-HOWTO-4.html#ss4.3

Note that these recipes go back to the time when you Internet
connection was over dialup, over the protocol called PPP and the
interface named like "ppp0" which represented the modem's uplink
connection. In the class, we just used a regular Ethernet interface
connected to an external network.

==================[ Switches and LANs ]==================

In networking terms, a LAN is a "broadcast domain": all hosts
connected to a LAN see each others' broadcast packets (such as
ARP requests).

Broadcast on a LAN is how hosts can find each others' MAC addresses
knowing only IP addresses (remember that the ARP who-has request
is broadcast). ARP-ing for an IP of a host that cannot receive your
broadcasts will fail (unless the ARP request is somehow specially
proxied to the host, and its response is proxied back). So being
a broadcast domain is a necessary condition for all computers
on an Ethernet LAN to be able to talk to each other.

A switch maintains a table (a "CAM table") that maps source MAC
addresses to the (physical) port on which they were seen. Incoming
packets (more correctly called "frames" at the Ethernet layer)
get their destination MAC checked against this table; if the
dst MAC is in the switch's CAM table, the frame will only be
played to the port where the interface with that MAC is known
to be connected. 

This creates "virtual circuits" on a switch; sniffing on a third port
for the communications between machines connected to two other ports
will get you no unicast packets going between these machines. Only
broadcast packets sent by these machines occasionally will be heard on
your port by your sniffer.

A way of getting around this is ARP poisoning. Look for links
about in the previous lecture notes.

A VLAN is a grouping of a switch's ports into several broadcast
domains: when a port is a part of a VLAN, only ports in the same VLAN
group get the broadcast traffic. VLANs can be extended across switches
by having the switch add a special header between the Ethernet header
and the IP header to every incoming frame on a VLAN port, and having
other ports strip it---except for the "trunk port" that connects to
another switch similarly configured. This way the grouping of ports
can be extended through many switches in an organization, all having
the same configured VLAN numbers. 

Dartmouth uses VLANs heavily (and mostly without the users
noticing). For example, all VoIP phones are on a separate VLAN from
other computers.

You can find out much about VLANs from this Cisco presentation:
http://www.cs.dartmouth.edu/~sergey/me/netreads/L2-security-Bootcamp.pdf
(but skip the intro to the MAC attacks).

------------------[ DHCP ]-------------------

DHCP is described in Ch. 6 of the textbook. We covered roughly
6.1---6.2.4.

A popular mechanism for a computer joining a LAN without an assigned
IP address is DCHP (Dynamic Cost Configuration Protocol). Under this
protocol, the hosts sends broadcast packets with a request for an
available IP address and queries for several other network details
such as netmask, default router/gateway, local DNS server, etc.  These
requests are wrapped in Ether/IP/UDP, but the host has no IP address
yet; so the Ethernet destination address is ff:ff:ff:ff:ff:ff and the
IP address is 255.255.255.255 (or 0xffffffff). DHCP payload could have
been a special protocol like ARP, wrapped only in an Ethernet header,
and with a separate Ethertype (e.g., IP is 0x0800, ARP is 0x0806, IPv6
is 0x86DD etc.---see more types at
https://en.wikipedia.org/wiki/EtherType)

A prominent feature of DHCP is a list of "options", i.e., pieces of
configuration info requested. These sets of options are slightly
different between different OSes, and can be used to fingerprint the
OS of the requester.

Find sample DHCP requests and responses in l3-dhcp.pcapng. The first
six packets (3 requests and responses) are from one network, a
hospital guest network (where the answering DHCP server has the
address 1.1.1.1, which it legally cannot have---find out who it
belongs to!). Then the laptop was closed and slept while being
physically moved to another location (our classroom).  Upon waking, it
requested the same IP address it had on the previous network 
("Requested IP address: 172.18.65.104"). 

But our classroom's Dartmouth Public uses a _different_ private range
within 10/8, as opposed to the hospital's 172.18/16---so the DHCP
server responded with a NAK---"you can't have it". After this, the
laptop went through the full phase of DHCP discovery: sent a DHCP
Discover packet to find out more about the server (since the DHCP 
server apparently changed), and got a DHCP Offer from it. This part
was cut off from the hospital capture, but this is how DHCP is started
on a new network. If you occasionally wonder why getting a new IP 
seems slow, remember that DHCP is a two-step protocol.

DHCP is not authenticated & is easily attacked. See, e.g.,
  http://hakipedia.com/index.php/DHCP_Starvation
  http://hakipedia.com/index.php/DHCP_Rogue_Server

----------------[ Datagrams and Multiplexing ]----------------

The two main principles of the Internet design are that 

(1) IP packets are _datagrams_, in the sense that each contains all
    the information needed to route it to the destination, and routing
    only needs this information (i.e., the dst IP address). Packets
    may even take different paths to the target, as we saw with our
    traceroutes to Google's servers. Each IP packet lives and dies
    on its own.

(2) Connections are _multiplexed_ through several layers. Ethernet
    carries IP and ARP (and IPv6 when available); IP carries TCP, UDP,
    and ICMP (and many others), UDP carries DHCP and DNS, TCP carries
    HTTP, HTTP, SSH, and many more. 

    In each layer, some number tells how to interpret the next layer's
    payload: in Ethernet, the 2-byte Ethertype field (0x0800 for IP,
    0x806 for ARP, ...); in IP, the 1-byte protocol field at offset 9
    (1 for ICMP, 6 for TCP, 17 for UDP, etc. ---see /etc/protocols). 

    For TCP and UDP payloads, demultiplexing switches to port numbers,
    assigned by convention to application protocols: 80 for HTTP, 22
    for SSH, 443 for HTTPS in TCP, 53 for DNS in UDP (as well as in
    TCP, for longer name queries). For more, see /etc/services .

    Since there are many application protocols, applications cannot
    be held to use only pre-assigned ports---this works only for
    a few "well-known" ports and applications. The rest register
    with the operating system and negotiate for port numbers; we'll
    see how this works when we look at the use of the bind() system
    call in the TCP server code.

    Figure 1.6 in the textbook illustrates this principle.

----------------[ Berkeley sockets ]----------------

Look at the TCP client code in tcpcli.c . 

"Beej's Guide to Network Programming"
http://beej.us/guide/bgnet/output/html/multipage/index.html 
explains the function of the structs and the system calls.
Look them up in this guide as you read the code line by line.

NOTE: socket() and connect() are _system calls_: that is, with these
functions you ask the kernel to perform a low-level task on your
behalf. The small integer _socket descriptor_, which is returned by
socket(), is used to connect all your requests together (so that then
kernel knows which data and connection they pertain to). 

Moreover, you can read and write bytes from this socket descriptor the
same way you read and write open files in C. That is the whole point
of "stream" (TCP) sockets: they create an illusion that you read and
write a sequential source, whereas in reality your reads and writes
turn into chunks of data that traverse the 'net, encapsulated in
TCP/IP packets.

In Lab 1, you will write C client code for a custom protocol.

----------------[ Examining headers and data structs ]----------------

You might wonder why it takes so much work and a special C struct
to pass just an IP address and a port number to Berkeley sockets.

Berkeley sockets were designed to accommodate a variety of protocols
and network address families (many of which have fallen out of use).
We'll be using only a few combinations out of these many.
You'll get some idea of that variety as you read "man 2 socket".

It's important to not let this variety confuse you. For example,
it helped me to know that AF_INET is just a define for "2", and 
it resides in /usr/include headers:

$ grep -r AF_INET /usr/include | grep define
/usr/include/sys/socket.h:#define     AF_INET      2       /* internetwork: UDP, TCP, etc. */
/usr/include/sys/socket.h:#define     AF_INET6     30      /* IPv6 */
/usr/include/sys/socket.h:#define     PF_INET      AF_INET
/usr/include/sys/socket.h:#define     PF_INET6     AF_INET6
<some lines skipped>

and SOCK_STREAM is 1 :

$ grep -r SOCK_STREAM /usr/include | grep define
/usr/include/sys/socket.h:#define     SOCK_STREAM  1       /* stream socket */
<some lines skipped>

and the other defines nearby spell out other usable constants, like SOCK_DGRAM 2,
and my favorite, SOCK_RAW 3.

Another way to see what definitions are pulled into your program is
to stop the C compiler just after the preprocessing stage, when all
#include's are resolved:

$ gcc -E tcpcli.c | less 

That will pull in _a lot_ of typedef's---because in network code the exact
byte lengths of values are important, as must match exactly the fields of
a packet on the wire. So you will see types for "8-bit unsigned integer" (uint8_t), 
"16-bit unsigned integer" (uint16_t), and so on:

$ gcc -E tcpcli.c | grep uint8 | grep typedef
typedef unsigned char __uint8_t;
typedef __uint8_t sa_family_t;        <-- see below, this is the "address family" like AF_INET
typedef unsigned char uint8_t;
typedef uint8_t uint_least8_t;
typedef uint8_t uint_fast8_t;

$ gcc -E tcpcli.c | grep uint16 | grep typedef
typedef unsigned short __uint16_t;
typedef __uint16_t __darwin_mode_t;   <-- I am doing this on a Mac, hence Darwin
typedef __uint16_t in_port_t;         <--- that's a type for a port in UDP or TCP header 
typedef __uint16_t nlink_t;
typedef unsigned short uint16_t;
typedef uint16_t uint_least16_t;
typedef uint16_t uint_fast16_t;

You will also see the expanded structures definitions that system calls 
like connect() use, e.g.:

struct sockaddr_in {
    __uint8_t sin_len;         
    sa_family_t sin_family;   
    in_port_t sin_port;        <--- a two-byte integer for a port number
    struct in_addr sin_addr;   <--- this must be an IP address 
    char sin_zero[8];
};

and then you can search back for these types, like 

struct in_addr {
    in_addr_t s_addr;
};

typedef __uint32_t in_addr_t;   <--- a 4-byte integer, as expected of an IP address

and so on. Once you've chased them through the code, these data structures
don't look so intimidating.

----------------[ Look ahead: server C code ]----------------

Look ahead at the TCP _server_ code examples:  
  tcpserv.c (a simple echo server that forks a child for each connection)
  tcpserver-nofork.c (no fork, all work is done in the main process)