-------[  Phrack Magazine --- Vol. 9 | Issue 55 --- 09.09.99 --- 09 of 19  ]


-------------------------[  Distributed Information Gathering  ]


--------[  hybrid <hybrid_@hotmail.com>  ]


----[  Overview


Information gathering refers to the process of determining the characteristics
of one or more remote hosts (and/or networks).  Information gathering can be
used to construct a model of a target host, and to facilitate future
penetration attempts.

This article will discuss and justify a new model for information gathering,
namely: distributed information gathering.

The focus is on eluding detection during the information gathering stage(s) of
an attack, particularly by NIDS (Network Intrusion Detection Systems).

This article is adjunct to the superb work of both Thomas H. Ptacek and Timothy
N. Newsham [1], and to horizon [2].

Please note that I do not claim to have discovered the distributed information
gathering methodology [3]; this article is a consolidation, discussion, and
extrapolation of existing work.


----[  Introduction

The current methods used to perform remote information gathering are well
documented [4], but are reiterated briefly here:

I.     Host Detection

Detection of the availability of a host.  The traditional method is to elicit
an ICMP ECHO_REPLY in response to an ICMP ECHO_REQUEST, using ping(1) or
fping(1).

II.    Service Detection

A.K.A. port scanning.  Detection of the availability of TCP, UDP, or RPC
services, e.g. HTTP, DNS, NIS, etc.  Methods include SYN and FIN scanning, and
variations thereof e.g. fragmentation scanning.

III.   Network Topology Detection

I know of only two methods - TTL modulation (traceroute), and record route
(e.g. ping -R), although classical 'sniffing' is another (non-invasive) method.

IV.    OS Detection

A.K.A TCP/IP stack fingerprinting.  The determination of a remote OS type by
comparison of variations in OS TCP/IP stack implementation behavior; see
nmap(1).


----[  Conventional Information Gathering Paradigm

The conventional method of information gathering is to perform information
gathering techniques with a 'one to one' or 'one to many' model; i.e. an
attacker performs techniques in a (usually) linear way against either one
target host or a logical grouping of target hosts (e.g. a subnet).

Conventional information gathering is often optimized for speed, and often
executed in parallel (e.g. nmap). 


----[  Distributed Information Gathering Paradigm

With a distributed method, information gathering is performed using a 'many to
one' or 'many to many' model.  The attacker utilizes multiple hosts to execute
information gathering techniques in a random, rate-limited, non-linear way.

The meta-goal of distributed information gathering is to avoid detection either
by N-IDS (network intrusion detection systems) or by human analysis (e.g.
system administrators).

Distributed information gathering techniques seek to defeat the attack
detection heuristic employed by N-IDS'; this heuristic is explained below.


----[  N-IDS Attack Detection Heuristic

Many methods exist to perform (pseudo) real-time intrusion detection analysis
of network traffic data, of which the two major categories are M-IDS (misuse
detection) and A-IDS (anomaly detection).  A-IDS exist at present primarily in
the research domain, such as at COAST [5]; M-IDS employ a signature analysis
method (analogous in some respects to virus scanning software), and are in
widespread use in commercial and free N-IDS.

N-IDS signatures can be delineated into two categories - those that use 
composite or atomic signatures.  Atomic signatures relate to a single "event"
(in general, a single packet), e.g. a large packet attack / ping attack.  
Composite signatures comprise multiple events (multiple packets), e.g. a port 
scan or SYN flood.

To detect malicious or anomalous behavior, composite signatures usually employ
a simple equation with THRESHOLD and DELTA components.  A THRESHOLD is a simple
integer count; a DELTA is a time duration, e.g. 6 minutes.

For example, a signature for a SYN flood [6] might be:

   'SYN flood detected if more than 10 SYN packets seen in under 75 seconds'

Therefore in the above example, the THRESHOLD is "10 packets", and the DELTA is
"75 seconds".


----[  N-IDS Subversion

Within each monitoring component of a N-IDS the THRESHOLD and DELTA values
associated with each signature must be carefully configured in order to flag
real attacks, but to explicitly not flag where no attack exists.  A 'false
positive' is defined as the incorrect determination of an attack; a 'false
negative' is defined as the failure to recognize an attack in progress.

This process of configuration is a non-trivial "balancing act" - too little and
the N-IDS will flag unnecessarily often (and likely be ignored), too much and
the N-IDS will miss real attacks.

Using this information, the goal of distributed information gathering is
therefore not only to gather information, but also to induce a false negative
'state' in any N-IDS monitoring a target.

The techniques employed by distributed information gathering to subvert N-IDS
are outlined below.


----[  Distributed Information Gathering Techniques

I.     Co-operation

By employing a 'many to one' or 'many to many' model, multiple hosts can be
used together to perform information gathering.  Multiple source hosts will
make the correlation and detection duties of a N-IDS more complex.

Co-operation seeks to subvert the THRESHOLD component of a N-IDS attack
recognition signature.

II.    Time Dilation

By extending (or 'time stretching') the duration of an attack (particularly
the host and service detection phases), we hope to 'fall below' the DELTA used
by N-IDS' to detect an attack.

III.   Randomization

Packets used to perform information gathering, such as an ICMP datagram or a
SYN packet, should employ randomness where possible (within the constraints of
the relevant RFC definition), e.g. random TCP sequence and acknowledgement
numbers, random source TCP port, random IP id, etc.  Libnet [7] is an excellent
portable packet generation library that includes randomization functionality.

Randomization should also be utilized in the timing between packets sent, and
the order of hosts and/or ports scanned.  For example, a port scan of ports 53,
111, and 23 with non-regular timing between each port probed (e.g. between 6
and 60 minutes) is preferential to a linear, incremental scan, executed within
a few seconds.

In the IP header, I suggest randomization of IP ID and possibly TTL; within the
TCP header the source port, sequence number, and acknowledgement number (where
possible); and within the UDP header the source port.

The algorithm used to perform randomization must be carefully selected, else
the properties of the algorithm may be recordable as a signature themselves!
There are multiple documents which discuss randomization for security, of 
which [8] is a good place to start.


----[  Advantages

The advantages in employing a distributed information gathering methodology are
therefore:

I.     Stealth

By employing co-operation, time dilation, and randomization techniques we hope
to elude N-IDS detection.

II.    Correlation Information

The acquisition of multiple 'points of view' of a target enables a more
complete model of the target to be constructed, including multiple route and
timing information.

III.   Pervasive Information Gathering

The 'r-box' countermeasures (such as dynamic router or firewall configuration)
employed by certain N-IDS becomes less effective when multiple source hosts are
employed.


----[  N-IDS Evolution

How will N-IDS evolve to counter distributed information gathering?  It is
likely that detection of distributed information gathering will be available
only as a retrospective function, opposed to (pseudo) real time.  Logs from
multiple N-IDS agents must be centralized and cross-correlated before
distributed information gathering attacks can be detected.
 
In a large enterprise (for example a military, government, or large corporation
installation) this process of event consolidation must be considered a
non-trivial task.


----[  Commercial Information Gathering Software a.k.a. Vulnerability Scanners


There exists several advantages in using a distributed scanning model for
commercial vendors of network vulnerability scanning technology.  A distributed
model would enable localized 'zones of authority' (i.e. delegation of
authority), could gather information behind NAT (and firewalls, where
configured), and overcome network topology specific bandwidth restrictions.

At this time I am aware of no commercial (or free) vulnerability scanners that
employ a distributed architecture.


----[  Conclusion

Distributed information gathering is an extrapolation and logical evolution of
the existing traditional information gathering paradigm.  It's primary goal
is to elude detection by automated (N-IDS) or human sources.

If you choose to employ distributed information gathering techniques, you
must trade immediacy of results against stealth.


----[  References


  [1] - "Insertion, Evasion, and Denial of Service: Eluding Network Intrusion
        Detection", Thomas H. Ptacek & Timothy N. Newsham, January 1998.

  [2] - "Defeating Sniffers and Intrusion Detection Systems", horizon, Phrack
        Magazine, Volume 8 Issue 54 Article 10 of 12, Dec 25th 1998.

  [3] - "SHADOW Indications Technical Analysis - Coordinated Attacks and
        Probes", Stephen Northcutt & Tim Aldrich, Sep 21 1998.

  [4] - "The Art of Port Scanning", Fyodor, Phrack Magazine, Volume 7 Issue 51
        article 11 of 17, September 01 1997.

  [5] - COAST, http://www.cs.purdue.edu/coast/ids

  [6] - "Project Neptune", daemon9 / route / infinity, Phrack Magazine, Volume
        7 Issue Forty-Eight File 13 of 18.

  [7] - Libnet, route, http://www.packetfactory.net/libnet

  [8] - RFC 1750, "Randomness Recommendations for Security", December 1994.

  [9] - Libpcap, LBNL Network Research Group, http://ee.lbl.gov


----[  EOF