Codingassignmenthelper | Home Codingassignmenthelper | University

SENG 265 Software Development Methods Regular Expressions

University of Victoria

Assignment :

Regular Expressions

Background Sets of strings Stating a regular expression (simple) Python re module (simple) A bit of theory Stating a regular expression (more complex) Python re module (more complex) Using regexes for control flow

String patterns

We all use searches where we provide strings or substrings to some module or mechanism Google search terms Filename completion Command-line wildcards Browser URL completion Python string routines find(), index(), etc. Quite often these searches are simply expressed as a particular pattern An individual word Several words where some are strictly required while some are not The start or end of particular words -- or perhaps just the string appearing within a larger string This works well if strings follow the format we expect… Sometimes, however, we want to express a more complex pattern The set of all files ending with either ".c" or ".h" The set of all files starting with "ical". The set of all strings in which "FREQ" appears as a string (but not "FREQUENCY" or "INFREQUENT", but "fReQ" is fine) The set of all strings containing dates in MM/DD/YYYY format. Such a variety of patterns used to require language-specific operations SNOBOL Pascal More troubling was that most non-trivial patterns required several lines of code to express (i.e., a series of "if-then-else" statements) This is a problem as the resulting code can obscure the patterns for which we are searching Even worse, changing the pattern is tedious and error-prone as it means changing the structure of already written code.

C code to check for DD/MM/YYYY format

int is_date_format(char *check) {
 return 0;
 }
 if (!isdigit(check[0]) || !isdigit(check[1])) {
 return 0;
 }
 if (!isdigit(check[3]) || !isdigit(check[4])) {
 return 0;
 }
 for (i = 6; i < 10; i++) {
 if (!isdigit(check[i])) {
 return 0;
 }
 }
 if (check[2] != '/' || check[5] != '/') {
 return 0;
 }

 /* Still haven't even figured out of the DD makes sense, let alone
 * the MM!!!!
 */

Regular expressions

Needed: a language-independent approach to expressing such patterns Solution: a regular expression
– Sometimes called a regex or regexp
They are written in a formal language and have the property that we can build very fast recognizers for them Part of a hierarchy of languages
– Type 0: unrestricted grammars
– Type 1: context-sensitive grammars
– Type 2: context-free grammars
– Type 3: regular grammars
Type 2 and 3 grammars are used in Computer Science
– Type 2 is used in parsers for computer languages (i.e., compilers)
– Type 3 is used in regular expressions and lexical analyzers for compilers
To Continue Click Here > SENG 265 Software Development Methods Regular Expressions.pdf
Codingassignmenthelper | Home Codingassignmenthelper | Home