Regular Expression Searching in Web(1).pdf
(
76 KB
)
Pobierz
Regular Expression Searching
Regular expressions allow forensics analysts to search through
large quantities of text information for patterns of data such as
the following:
Telephone Numbers
Social Security Numbers
Computer IP Addresses
Credit Card Numbers
This data can be extracted because it occurs in known
patterns. For example, credit card numbers are typically
sixteen digits in length and are often stored in the following
pattern or format: xxxx–xxxx–xxxx–xxxx.
This appendix explains the following:
Understanding Regular Expressions
Predefined Regular Expressions
Going Further with Regular Expressions
1
AccessData Corp.
Understanding Regular Expressions
Forensics analysts specify a desired pattern by composing a
regular expression. These patterns are similar to arithmetic
expressions that have operands, operators, sub-expressions,
and a value. For example, the following table identifies the
mathematical components in the arithmetic expression,
5/((1+2)*3):
Component
Operands
Operators
Example
5, 1, 2, 3
/, ( ), +, *
Sub-Expressions (1+2), ((1+2)*3)
Value
Approximately 0.556
Like the arithmetic expression in this example, regular
expressions have operands, operators, sub-expressions, and a
value. How these expressions are created and used is explained
using simple expressions followed by more complex regular
expressions.
Note:
Unlike arithmetic expressions which can only have numeric
operands, operands in regular expressions can be any characters that
can be typed on a keyboard, such as alphabetic, numeric, and symbolic
characters.
Simple Regular Expressions
A simple regular expression can be made up entirely of
operands. For example, the regular expression
dress
causes the
search engine to return a list of all files that contain the
sequence of characters
d r e s s.
The regular expression
dress
corresponds to a very specific and restricted pattern of text,
that is, sequences of text that contain the sub-string
dress.
Files
containing the words “dress,” “address,” “dressing,” and
“dresser,” are returned in a search for the regular expression
dress.
2
FTK Regular Expressions Guide
AccessData Corp.
The search engine searches left to right. So in searching the
regular expression
dress,
the search engine opens each file and
scans its contents line by line, looking for a
d,
followed by an
r,
followed by an
e,
and so on.
Complex Regular Expressions—Visa and MasterCard Numbers
Operators allow regular expressions to search patterns of data
rather than specific values. For example, the operators in the
following expression enables the FTK's search engine to find
all Visa and MasterCard credit card numbers in case evidence
files:
\<((\d\d\d\d)[\– ]){3}\d\d\d\d\>
Without the use of operators, the search engine could look for
only one credit card number at a time.
Note:
The credit card expression discussion in this section is included in
FTK and is used here primarily for the explanation of advanced regular
expressions.
The following table identifies the components in the Visa and
MasterCard regular expression:
Component
Operands
Operators
Sub-Expressions
Value
Example
d,
\–, spacebar space
\d, \, <, ( ), [ ], {3}, \>
(\d\d\d\d), ((\d\d\d\d)[\– ])
Any sequence of sixteen decimal digits that is
delimited by three hyphens and bound on both
sides by non-word characters (xxxx–xxxx–
xxxx–xxxx).
As the regular expression search engine evaluates an
expression in left-to-right order, the first operand it
encounters is the backslash less-than combination (\<). This
combination is also known as the begin-a-word operator. This
operator tells the search engine that the first character in any
Regular Expression Searching
3
AccessData Corp.
search hit immediately follows a non-word character such as
white space or other word delimiter.
Tip:
A precise definition of non-word characters and constituent-word
characters in regular expressions is difficult to find. Consequently,
experimentation by FTK users may be the best way to determine if the
forward slash less-than (\<) and forward slash greater-than (\>) operators
help find the data patterns relevant to a specific searching task. The
hyphen and the period are examples of valid delimiters or non-word
characters.
The begin-a-word operator illustrates one of two uses of the
backslash character (\), often called the escape character: the
modification of operands and the modification of operators.
On its own, the left angle bracket (<) would be evaluated as an
operand, requiring the search engine to look next for a left
angle bracket character. However, when the escape character
immediately precedes the (<), the two characters are
interpreted together as the begin-a-word operator by the
search engine. When an escape character precedes a hyphen
(–) character, which is normally considered to be an operator,
the two characters (\–) require the search engine to look next
for a hyphen character and not apply the hyphen operator
(the meaning of the hyphen operator is discussed below).
The next operator is the parentheses ( ). The parentheses
group together a sub-expression, that is, a sequence of
characters that must be treated as a group and not as
individual operands.
The next operator is the
\d.
This operator, which is another
instance of an operand being modified by the escape
character, is interpreted by the search engine to mean that the
next character in search hits found may be any decimal digit
character from
0-9.
The square brackets ([ ]) indicate that the next character in
the sequence must be one of the characters listed between the
brackets or escaped characters. In the case of the credit card
expression, the backslash-hyphen-spacebar space ([\-spacebar
space])
means that the four decimal digits must be followed by
a hyphen or a spacebar space.
4
FTK Regular Expressions Guide
AccessData Corp.
Next, the
{3}
means that the preceding sub-expression must
repeat three times, back to back. The number in the curly
brackets ({ }) can be any positive number.
Finally, the forward slash greater-than combination (\>), also
know as the end-a-word operator, means that the preceding
expression must be followed by a non-word character.
Other Variations on the Same Expression
Sometimes there are ways to search for the same data using
different expressions. It should be noted that there is no one-
to-one correspondence between the expression and the
pattern it is supposed to find. Thus the preceding credit card
regular expression is not the only way to search for Visa or
MasterCard credit card numbers. Because some regular
expression operators have related meanings, there is more
than one way to compose a regular expression to find a specific
pattern of text. For instance, the following regular expression
has the same meaning as the preceding credit card expression:
\<((\d\d\d\d)(\–| )){3}\d\d\d\d\>
The difference here is the use of the pipe ( | ) or union
operator. The union operator means that the next character to
match is either the left operand (the hyphen) or the right
operand (the spacebar space). The similar meaning of the
pipe ( | ) and square bracket ([ ]) operators give both
expressions equivalent functions.
In addition to the previous two examples, the credit card
regular expression could be composed as follows:
\<\d\d\d\d(\–| )\d\d\d\d(\–| )\d\d\d\d(\–| )\d\d\d\d\>
This expression explicitly states each element of the data
pattern, whereas the {3} operator in the first two examples
provides a type of mathematical shorthand for more succinct
regular expressions.
Regular Expression Searching
5
Plik z chomika:
musli_com
Inne pliki z tego folderu:
3A(1).pdf
(343 KB)
A Closer Look At Ethical Hacking And Hackers(1).pdf
(83 KB)
A Practical Fault Attack on Square and Multiply(1).pdf
(366 KB)
A Primer on Scientific Programming with Python (2009)(1).pdf
(6983 KB)
A+(2).zip
(9992 KB)
Inne foldery tego chomika:
CloudStack
distribution
dsp
electronics
LPI
Zgłoś jeśli
naruszono regulamin