-------[ Phrack Magazine --- Vol. 9 | Issue 55 --- 09.09.99 --- 09 of 19 ] -------------------------[ Distributed Information Gathering ] --------[ hybrid ] ----[ Overview Information gathering refers to the process of determining the characteristics of one or more remote hosts (and/or networks). Information gathering can be used to construct a model of a target host, and to facilitate future penetration attempts. This article will discuss and justify a new model for information gathering, namely: distributed information gathering. The focus is on eluding detection during the information gathering stage(s) of an attack, particularly by NIDS (Network Intrusion Detection Systems). This article is adjunct to the superb work of both Thomas H. Ptacek and Timothy N. Newsham [1], and to horizon [2]. Please note that I do not claim to have discovered the distributed information gathering methodology [3]; this article is a consolidation, discussion, and extrapolation of existing work. ----[ Introduction The current methods used to perform remote information gathering are well documented [4], but are reiterated briefly here: I. Host Detection Detection of the availability of a host. The traditional method is to elicit an ICMP ECHO_REPLY in response to an ICMP ECHO_REQUEST, using ping(1) or fping(1). II. Service Detection A.K.A. port scanning. Detection of the availability of TCP, UDP, or RPC services, e.g. HTTP, DNS, NIS, etc. Methods include SYN and FIN scanning, and variations thereof e.g. fragmentation scanning. III. Network Topology Detection I know of only two methods - TTL modulation (traceroute), and record route (e.g. ping -R), although classical 'sniffing' is another (non-invasive) method. IV. OS Detection A.K.A TCP/IP stack fingerprinting. The determination of a remote OS type by comparison of variations in OS TCP/IP stack implementation behavior; see nmap(1). ----[ Conventional Information Gathering Paradigm The conventional method of information gathering is to perform information gathering techniques with a 'one to one' or 'one to many' model; i.e. an attacker performs techniques in a (usually) linear way against either one target host or a logical grouping of target hosts (e.g. a subnet). Conventional information gathering is often optimized for speed, and often executed in parallel (e.g. nmap). ----[ Distributed Information Gathering Paradigm With a distributed method, information gathering is performed using a 'many to one' or 'many to many' model. The attacker utilizes multiple hosts to execute information gathering techniques in a random, rate-limited, non-linear way. The meta-goal of distributed information gathering is to avoid detection either by N-IDS (network intrusion detection systems) or by human analysis (e.g. system administrators). Distributed information gathering techniques seek to defeat the attack detection heuristic employed by N-IDS'; this heuristic is explained below. ----[ N-IDS Attack Detection Heuristic Many methods exist to perform (pseudo) real-time intrusion detection analysis of network traffic data, of which the two major categories are M-IDS (misuse detection) and A-IDS (anomaly detection). A-IDS exist at present primarily in the research domain, such as at COAST [5]; M-IDS employ a signature analysis method (analogous in some respects to virus scanning software), and are in widespread use in commercial and free N-IDS. N-IDS signatures can be delineated into two categories - those that use composite or atomic signatures. Atomic signatures relate to a single "event" (in general, a single packet), e.g. a large packet attack / ping attack. Composite signatures comprise multiple events (multiple packets), e.g. a port scan or SYN flood. To detect malicious or anomalous behavior, composite signatures usually employ a simple equation with THRESHOLD and DELTA components. A THRESHOLD is a simple integer count; a DELTA is a time duration, e.g. 6 minutes. For example, a signature for a SYN flood [6] might be: 'SYN flood detected if more than 10 SYN packets seen in under 75 seconds' Therefore in the above example, the THRESHOLD is "10 packets", and the DELTA is "75 seconds". ----[ N-IDS Subversion Within each monitoring component of a N-IDS the THRESHOLD and DELTA values associated with each signature must be carefully configured in order to flag real attacks, but to explicitly not flag where no attack exists. A 'false positive' is defined as the incorrect determination of an attack; a 'false negative' is defined as the failure to recognize an attack in progress. This process of configuration is a non-trivial "balancing act" - too little and the N-IDS will flag unnecessarily often (and likely be ignored), too much and the N-IDS will miss real attacks. Using this information, the goal of distributed information gathering is therefore not only to gather information, but also to induce a false negative 'state' in any N-IDS monitoring a target. The techniques employed by distributed information gathering to subvert N-IDS are outlined below. ----[ Distributed Information Gathering Techniques I. Co-operation By employing a 'many to one' or 'many to many' model, multiple hosts can be used together to perform information gathering. Multiple source hosts will make the correlation and detection duties of a N-IDS more complex. Co-operation seeks to subvert the THRESHOLD component of a N-IDS attack recognition signature. II. Time Dilation By extending (or 'time stretching') the duration of an attack (particularly the host and service detection phases), we hope to 'fall below' the DELTA used by N-IDS' to detect an attack. III. Randomization Packets used to perform information gathering, such as an ICMP datagram or a SYN packet, should employ randomness where possible (within the constraints of the relevant RFC definition), e.g. random TCP sequence and acknowledgement numbers, random source TCP port, random IP id, etc. Libnet [7] is an excellent portable packet generation library that includes randomization functionality. Randomization should also be utilized in the timing between packets sent, and the order of hosts and/or ports scanned. For example, a port scan of ports 53, 111, and 23 with non-regular timing between each port probed (e.g. between 6 and 60 minutes) is preferential to a linear, incremental scan, executed within a few seconds. In the IP header, I suggest randomization of IP ID and possibly TTL; within the TCP header the source port, sequence number, and acknowledgement number (where possible); and within the UDP header the source port. The algorithm used to perform randomization must be carefully selected, else the properties of the algorithm may be recordable as a signature themselves! There are multiple documents which discuss randomization for security, of which [8] is a good place to start. ----[ Advantages The advantages in employing a distributed information gathering methodology are therefore: I. Stealth By employing co-operation, time dilation, and randomization techniques we hope to elude N-IDS detection. II. Correlation Information The acquisition of multiple 'points of view' of a target enables a more complete model of the target to be constructed, including multiple route and timing information. III. Pervasive Information Gathering The 'r-box' countermeasures (such as dynamic router or firewall configuration) employed by certain N-IDS becomes less effective when multiple source hosts are employed. ----[ N-IDS Evolution How will N-IDS evolve to counter distributed information gathering? It is likely that detection of distributed information gathering will be available only as a retrospective function, opposed to (pseudo) real time. Logs from multiple N-IDS agents must be centralized and cross-correlated before distributed information gathering attacks can be detected. In a large enterprise (for example a military, government, or large corporation installation) this process of event consolidation must be considered a non-trivial task. ----[ Commercial Information Gathering Software a.k.a. Vulnerability Scanners There exists several advantages in using a distributed scanning model for commercial vendors of network vulnerability scanning technology. A distributed model would enable localized 'zones of authority' (i.e. delegation of authority), could gather information behind NAT (and firewalls, where configured), and overcome network topology specific bandwidth restrictions. At this time I am aware of no commercial (or free) vulnerability scanners that employ a distributed architecture. ----[ Conclusion Distributed information gathering is an extrapolation and logical evolution of the existing traditional information gathering paradigm. It's primary goal is to elude detection by automated (N-IDS) or human sources. If you choose to employ distributed information gathering techniques, you must trade immediacy of results against stealth. ----[ References [1] - "Insertion, Evasion, and Denial of Service: Eluding Network Intrusion Detection", Thomas H. Ptacek & Timothy N. Newsham, January 1998. [2] - "Defeating Sniffers and Intrusion Detection Systems", horizon, Phrack Magazine, Volume 8 Issue 54 Article 10 of 12, Dec 25th 1998. [3] - "SHADOW Indications Technical Analysis - Coordinated Attacks and Probes", Stephen Northcutt & Tim Aldrich, Sep 21 1998. [4] - "The Art of Port Scanning", Fyodor, Phrack Magazine, Volume 7 Issue 51 article 11 of 17, September 01 1997. [5] - COAST, http://www.cs.purdue.edu/coast/ids [6] - "Project Neptune", daemon9 / route / infinity, Phrack Magazine, Volume 7 Issue Forty-Eight File 13 of 18. [7] - Libnet, route, http://www.packetfactory.net/libnet [8] - RFC 1750, "Randomness Recommendations for Security", December 1994. [9] - Libpcap, LBNL Network Research Group, http://ee.lbl.gov ----[ EOF