

		     P  C  A  P  N  A  V

            A tcpdump tracefile navigation library.

==============================================================

The purpose of this library is to read tcpdump trace files
with the ability to navigate around in the file between reads.
The API is intentionally much like that of the pcap library.
You can navigate in trace files both in time and space: you
can jump to a packet which is at appr. 2/3 of the trace, or
you can jump as closely as possible to a packet with a given
timestamp, and then read packets from there. In addition, the
API provides convenience functions for manipulating timeval
structures.

Like pcap, this library handles things through an opaque
handle struct. For tracefile navigation and reading packets,
this handle is enough. If you need to apply BPF filters or
write packets to disk, you can access the familiar pcap
handle that is used internally by using pcapnav_pcap().

The trace navigation algorithm in the pcapnav library is based
on Vern Paxson's tcpslice tool, with the following changes:

- a buffer abstraction was introduced to help reduce the
  number of local variables and parameters to functions.
  See pcapnav_buf.h.

- the original tcpslice version used the PACKET_HDR_LEN macro,
  yielding the size of a struct pcap_pkthdr, even when the
  trace file at hand actually uses the extended, larger
  patched headers.

- pcapnav doesn't use Vern's state-machine approach to deter-
  mine definitive header matches. I've done a lot of my
  testing with a trace that was captured while NFS-copying
  another trace file, thus containing lots of "bogus" headers
  to make things fun, and I've seen a number of problems in
  this case. This data causes a number of nasty problems like
  
  - large snaplens in the captured data, where a single packet
    may contain many smaller packets.

  - One of these payload packets may have a caplen that
    actually yields directly to the next valid header.

  Much of this should be handled through invalid timestamps,
  but this is not 100% reliable.

  To rectify this, pcapnav uses a different approach: once
  a header is found that does not instantly appear to be 
  crap, the chain of packets that it starts is followed, up
  to a maximum number of packets or until we're out of buffer
  space.

  For this, buffers already containing data loaded from disk
  are used as much as possible, but when this buffer doesn't
  suffice, more data is loaded from disk. The hope is that
  most attempts will point to invalid headers anyway so that
  this additional load never happens unless we have good
  reason to believe we've actually found a good header. The
  difference between PCAPNAV_PERHAPS and PCAPNAV_DEFINITELY
  is then based on the length of the chain found.

  While checking headers, the best valid header (ie the one
  with the longest chain) is remembered, as well as the off-
  set in the trace that'll be the successor of this packet,
  so that it isn't confused with a "new" good header.

  The fun part without doubt are header clashes. A clash in
  this new system occurs when two headers have the same,
  maximum, chain length and the same level of reliability
  of the chain lengths (eg, the chain search could have been
  stopped because we were out of buffer space or because we
  have hit the limit of packets we check -- the latter is
  considered more reliable).

  If we hit a clash, we simply forget the old best match and
  keep looking after the clash packet. If we cannot find any
  better headers afterwards, we return a clash, otherwise the
  best match found afterwards.

- I've seen traces with rather strange final packet headers,
  containing invalid caplen/len field values and packet data.
  To make sure we don't miss the last few correct packet
  headers, I've added some padding space and thus start
  looking for the last packet in the trace a bit earlier
  in the file. As the last-packet timestamp and offset is
  buffered in the pcapnav_t handle anyway, this performance
  hit is probably negligible.

- To find the last packet in a trace, we now go back a lot
  more from the end of a trace, then find a packet more
  reliably by using the chain approach described above,
  and then use pcap to iterate to the last valid packet.
  Slower, but safer.

There are probably still issues to be ironed out since this
is a lot of new code, but I think it's pretty easy to read.
All relevant functions are documented in gtk-doc format.
Hopefully I'll someday have the time to actually write up
more doco and generate HTML or PDFs from it.

Bugs, patches, feedback are appreciated, send them over to 
netdude-devel@lists.sourceforge.net.
 
                                            Cheers,
                                            -- Christian.
