

Standard wildcard-based text searches are limited in their forensic uses, basic full string and substring matches being the most common. A familiar format for individuals familiar with Perl or grep (GNU Regular Expression Parser), regular expressions can perhaps best be described as wildcards on steroids.

Regular expressions are a symbolic method for representing strings of text for the purposes of pattern matching. Searching for tangential terms based on initial search results and browsing of indexed words for similar spellings is extremely effective with index searching. Secondly, after the initial indexing, searching is extremely fast. Bitwise searches on these file types are ineffective as the data is not stored in a directly readable ASCII formation. This allows the proper searching of contents for applications like Excel (XLS files) and Acrobat (PDF files), which store data in a modified format, and compressed files such as WinZip (ZIP files). Because it is file-based, index searching can utilize hooks into various file types to index their contents in its native format. Index searching has two primary advantages over bitwise searching. However, when completed, searching the index can be done in near-instant time. The initial indexing can take hours or days. A search tool generally opens all files on a drive/share/image/partition, searches them for repeating strings of printable characters, and creates a table of the repeated strings with pointers to the original content. Index-based searches rely on the creation of an index of keywords based on the contents of files. Only the partition boot sector and file system metafiles are altered. When Windows formats a partition (FAT or NTFS), the actual partition contents are not touched.

Most of the techniques noted work with re-formatted drives as well. Conversely, a full bitwise search may be more relevant if a hard disk is being searched for deleted files or residual fragments of their contents. An index-based search may be used to provide quick, repeated searches with new terms on files copied from a shared drive. Bitwise searching performs a full, regular expressionbased search on the raw data, file-specific or not. Index-based searching generates a keyword index on the first pass through a series of files.

There are two primary search methods : index-based searching and bitwise searching. As in other forensic tasks, the context of the investigation determines the search type used. The searching can be file-based or slack-spacebased, and there are even searches of unallocated space. The most common forensic activity is searching a hard disk for strings of data.
