Network intrusion detection systems rely on one, or more means of traffic analysis in order to determine whether or not a given stream of network traffic is suspicious. Network analysts, and system administrators can make use of stateless traffic filters to help them understand what is going on inside of their network(s), where such filters can be used for traffic inspection, filtering, and shaping.

Packets can be filtered based on physical characteristics of the packet header (i.e. the signature) or based on one or more heuristics which would flag the packet as anomalous. This article will focus solely on signature-based packet filtering, specifically how to work with Berkeley/BSD Packet Filters (BPF).

The Berkeley/BSD packet filter can be used for stateless traffic inspection by examining both the headers, and payload associated with a given transmission. The nice thing about this is that since BPFs are attached directly to a socket, traffic can be inspected at the kernel layer without having to cross the kernel/user-space protection barrier.

The signature-based approach relies on a series of predefined rules to be matched against individual bytes within the payload, or protocol header associated with a given packet. Rules can be grouped hierarchically into rule chains as part of a larger filtering pattern in which a signature can be compared against one or more rule chains.

There are a few important restrictions imposed on BPF bytecode to make note of. First of all, only forward jumps are permitted in order to ensure that there are no loops in the generated filter, and that all expressions will terminate. There is also a fixed upper bound of 4096 discrete operations that can be performed in the context of any filter.

The following packet filter matches IPv4 traffic encapsulated by UDP on the lo interface, which can be expressed using the following BPF grammar:

$ sudo tcpdump -p -i lo -d 'ip and udp'
(000) ldh [12]
(001) jeq #0x800  jt 2  jf 5
(002) ldb [23]
(003) jeq #0x11   jt 4  jf 5
(004) ret #65535
(005) ret #0

The following control flow graph expresses the steps taken by the BPF finite state machine in order to evaluate the above grammar. If a given packet matches a given signature, then the library says so. It’s as simple as that.


What does this mean?

  • Load a half-word (2 bytes) from the IP header at offset 12 into the accumulator
  • Check if the return value is 0x800 (i.e. Ethernet)
    • If the return value is 0x800 JMP to instruction (002) (i.e. proceed to the next instruction)
    • Otherwise, JMP to instruction (005) (i.e. return with an exit status of 0)
  • Load a byte from the packet header at offset 23 (ie. the protocol field), and check if return value is 0x11 (i.e. UDP)
    • If the return value is 0x11 JMP to instruction (004) (i.e. return with an exit status of 1 since the packet matches the rule)
    • Otherwise, JMP to instruction (005) (i.e. return with an exit status of 0 as the packet does not match the rule)

Cool, right? Now, let’s try out that expression using tcpdump.

$ sudo tcpdump -i wlan0 'ip and udp' -v -c 1 -X
tcpdump: listening on wlp2s0, link-type EN10MB (Ethernet), capture size 262144 bytes
00:51:37.973442 IP (tos 0x0, ttl 1, id 8999, offset 0, flags [DF], proto UDP (17), length 195)
    midgard.57199 > 239.255.255.250.ssdp: UDP, length 167
  ...
	0x0040:  3535 2e32 3530 3a31 3930 300d 0a4d 414e  55.250:1900..MAN
	0x0050:  3a20 2273 7364 703a 6469 7363 6f76 6572  :."ssdp:discover
	0x0060:  220d 0a4d 583a 2031 0d0a 5354 3a20 7572  "..MX:.1..ST:.ur
	0x0070:  6e3a 6469 616c 2d6d 756c 7469 7363 7265  n:dial-multiscre
	0x0080:  656e 2d6f 7267 3a73 6572 7669 6365 3a64  en-org:service:d
	0x0090:  6961 6c3a 310d 0a55 5345 522d 4147 454e  ial:1..USER-AGEN
	0x00a0:  543a 2043 6872 6f6d 6975 6d2f 3433 2e30  T:.Chromium/43.0
	0x00b0:  2e32 3335 372e 3132 3520 4c69 6e75 780d  .2357.125.Linux.
	0x00c0:  0a0d 0a                                  ...
1 packet captured
5 packets received by filter
0 packets dropped by kernel

Now let’s try something a little more interesting. The following BPF expressions can be used to identify initial, intervening, and terminal fragments, respectively.

(ip[6] & 0x20 != 0) && (ip[6:2] & 0x1fff = 0)
(ip[6] & 0x20 != 0) && (ip[6:2] & 0x1fff != 0)
(ip[6] & 0x20 = 0)  && (ip[6:2] & 0x1fff != 0)

For the sake of simplicity, let’s just focus on the terminal fragments. One thing to keep in mind is that once packets are fragmented it may be necessary to reassemble the packet prior to performing your analysis. If you’re using Snort, or Suricata then you could always stream the packets through a pre-processor (I mean, if you’re into that kind of thing).

(ip[6] & 0x20 = 0)  && (ip[6:2] & 0x1fff != 0)

The grammar for examining terminal fragments may look intimidating, but not to worry. The expression can be simplified into two bit masking operations.

The first component of the expression checks to see if the More Fragments (MF) bit is unset. If it’s unset, then we know that we’re not dealing with an initial or interleaving fragment.

ip[6] & 0x20

The second component of the expression checks the fragment offset number, and in this case we check for a non-zero fragment offset.

ip[6:2] & 0x1fff

The low-level filtering expression used by tcpdump to filter IP and UDP traffic is expressed using the following grammar, and the control flow graph to the right shows the steps taken by the finite state machine in order to evaluate the expression.

$ sudo tcpdump -p -i lo -d '(ip[6] & 0x20 = 0)  && (ip[6:2] & 0x1fff != 0)'
(000) ldh [12]
(001) jeq #0x800    jt 2  jf 7
(002) ldb [20]
(003) jset #0x20    jt 7  jf 4
(004) ldh [20]
(005) jset #0x1fff  jt 6  jf 7
(006) ret #65535
(007) ret #0


How does this work?

  • Load a half-word (2 bytes) from the packet at offset 12 into the accumulator
  • Check if the return value is 0x800 (i.e. Ethernet)
  • Load a byte from the packet header at offset 20 into the accumulator and apply a mask of 0x20
    • If any bits match mask of 0x20 JMP to instruction (007)
    • Otherwise, JMP to instruction (004)
  • Apply a mask of 0x1fff to the value in the accumulator and check if any bits match mask of 0x1fff
    • If any bits match the mask, JMP to instruction (006) (i.e. return with an exit status of 1, packet matches the rule)
    • Otherwise, JMP to instruction (007) (i.e. return with an exit status of 0, packet does not match)

In order to test the BPF expression for matching terminal fragments, the following quote will be fragmented and sent to localhost on the lo interface.

“In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.”

Next, we need to tell tcpdump to listen for any connections on the lo interface, and dump any, and all application layer data.

# tcpdump -v -i lo -XX

We can use hping3 to encapsulate the quote within a TCP segment, and fragment each of the packets. Since the quote is 124 bytes long, we need to allocate 124 bytes for the quote using the -d switch.

$ sudo hping3 -d 124 --file payload.txt localhost --frag -c 1
HPING localhost (lo 127.0.0.1): NO FLAGS are set, 40 headers <li> 124 data bytes
[main] memlockall(): Success
Warning: can't disable memory paging!
...
len=40 ip=127.0.0.1 ttl=64 DF id=44613 sport=0 flags=RA seq=0 win=0 rtt=4.5 ms

Neat. Now, what do the packets look like?

tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 65535 bytes
13:11:44.507988 IP (tos 0x0, ttl 64, id 98, offset 0, flags [none], proto TCP (6), length 164)
    localhost.1867 > localhost.0: Flags [none], cksum 0xb024 (correct), seq 1039926756:1039926880, win 512, length 124
	0x0000:  0000 0000 0000 0000 0000 0000 0800 4500  ..............E.
	0x0010:  00a4 0062 0000 4006 7bf0 7f00 0001 7f00  ...b..@.{.......
	0x0020:  0001 074b 0000 3dfc 05e4 0116 65a4 5000  ...K..=.....e.P.
	0x0030:  0200 b024 0000 496e 2074 6865 2062 6567  ...$..In.the.beg
	0x0040:  696e 6e69 6e67 2074 6865 2055 6e69 7665  inning.the.Unive
	0x0050:  7273 6520 7761 7320 6372 6561 7465 642e  rse.was.created.
	0x0060:  2054 6869 7320 6861 7320 6d61 6465 2061  .This.has.made.a
	0x0070:  206c 6f74 206f 6620 7065 6f70 6c65 2076  .lot.of.people.v
	0x0080:  6572 7920 616e 6772 7920 616e 6420 6265  ery.angry.and.be
	0x0090:  656e 2077 6964 656c 7920 7265 6761 7264  en.widely.regard
	0x00a0:  6564 2061 7320 6120 6261 6420 6d6f 7665  ed.as.a.bad.move
	0x00b0:  0a00                                     ..
13:11:44.508011 IP (tos 0x0, ttl 64, id 42696, offset 0, flags [DF], proto TCP (6), length 40)
    localhost.0 > localhost.1867: Flags [R.], cksum 0x6627 (correct), seq 0, ack 1039926880, win 0, length 0
	0x0000:  0000 0000 0000 0000 0000 0000 0800 4500  ..............E.
	0x0010:  0028 a6c8 4000 4006 9605 7f00 0001 7f00  .(..@.@.........
	0x0020:  0001 0000 074b 0000 0000 3dfc 0660 5014  .....K....=..`P.
	0x0030:  0000 6627 0000                           ..f'..

What about the terminal fragment?

$ snort -i lo '(ip[6] & 0x20 = 0) && (ip[6:2] & 0x1fff != 0)' -deq
08/06-18:08:03.925861 00:00:00:00:00:00 -> 00:00:00:00:00:00 type:0x800 len:0x32
127.0.0.1 -> 127.0.0.1 TCP TTL:64 TOS:0x0 ID:156 IpLen:20 DgmLen:36
Frag Offset: 0x0010   Frag Size: 0x0010
0x0000: 00 00 00 00 00 00 00 00 00 00 00 00 08 00 45 00  ..............E.
0x0010: 00 24 00 9C 00 10 40 06 7C 26 7F 00 00 01 7F 00  .$....@.|&......
0x0020: 00 01 20 61 73 20 61 20 62 61 64 20 6D 6F 76 65  .. as a bad move
0x0030: 2E 0A

Helpful links

  • BPF: the forgotten bytecode
  • Linux socket filtering aka Berkeley Packet Filter
  • Notes about tcpdump filters
  • BPF man page