Beginning Perl for Bioinformatics


B.11 Conditionals and Logical Operators



Download 1.4 Mb.
Page27/28
Date conversion29.03.2017
Size1.4 Mb.
1   ...   20   21   22   23   24   25   26   27   28


B.11 Conditionals and Logical Operators

This section covers conditional statements and logical operators.



B.11.1 true and false

In a conditional test, an expression evaluates to true or false, and based on the result, a statement or block may or may not be executed.

A scalar value can be true or false in a conditional. A string is false if it's the empty string (represented as "" or ''). A string is true if it's not the empty string.

Similarly, an array or a hash is false if empty, and true if nonempty.

A number is false if it's 0; a number is true if it's not 0.

Most things you evaluate in Perl return some value (such as a number from an arithmetic expression or an array returned from a subroutine), so you can use most things in Perl in conditional tests. Sometimes you may get an undefined value, for instance if you try to add a number to a variable that has not been assigned a value. Then things might fail to work as expected. For instance:

use strict;

use warnings;

my $a;

my $b;


$b = $a + 2;

produces the warning output:

Use of uninitialized value in addition (+) at - line 5.

You can test for defined and undefined values with the Perl function defined.



B.11.2 Logical Operators

There are four logical operators:

not

and


or

xor

not turns true values into false and false values into true. Its use is best illustrated in code:

if(not $done) {...}

This executes the code only if $done is false.

and is a binary operator that returns true if both its operands are true. If one or both of the operands are false, the operator returns false:

1 and 1 returns true

'a' and '' returns false

'' and 0 returns false

or is a binary operator that returns true if one or both of the operands are true. If both operands are false, it returns false:

1 or 1 returns true

'a' or '' returns true

'' or 0 returns false

xor, or exclusive-OR, returns true if one operand is true and the other operand is false; xor returns false if both operands are true or if both operands are false:

1 xor 0 returns true

0 xor 1 returns true

1 xor 1 returns false

0 xor 0 returns false

There are also variants on most of these:

! for not

&& for and

|| for or

These have different precedence but otherwise behave the same. Some older versions of Perl may only have:

!

||


&&

instead of not or and.



B.11.3 Using Logical Operators for Control Flow

A quick and popular way to take an action depending on the results of a previous action is to chain the statements together with logical operators. For instance, it's common in Perl programs to see the following statement to open a file:

open(FH, $filename) or die "Cannot open file $filename: $!";

The use of or in this statement shows another important thing about the binary logical operators: they evaluate their arguments left to right. In this case, if the open succeeds, the or operator never bothers to check the value of the second operand (die, which exits the program with the message in the string, plus additional messages if $! is included). The or never bothers, because if one operand is true, the or is true, so it doesn't need to check the second operand. However, if the open fails, the or needs to check that the second operand is true or false, so it goes ahead and executes the die statement.

You can use the and statement similarly to test the second operand only if the first operand succeeds.

xor doesn't work for control flow, since both its arguments have to be evaluated each time.

I haven't used this chaining of logical operators much; I've used if statements instead. This is because I often find that I want to add more statements following a test, and it's easier if the original is written as an if statement with a block, and harder if the original is written as a logical operator.

B.11.4 The if Statement

Conditional tests are commonly found in if statements and in their variants and loops. Here's an example of an if statement:

if (open (FH, $filename) {

print "Hurray, I opened the file.";

}

The if statement is followed by a conditional expression enclosed in parentheses, which is followed by a block enclosed in curly braces { }. When the conditional expression evaluates as true, the statements in the block are executed.


The if statement may optionally be followed by an else, which is executed when the conditional evaluates to false:

if ( open(FH, $filename) {

print "Hurray, I opened the file.";

} else {


print "Rats. The file did not open.";

}

The if statement may also optionally include any number of elsif clauses, which check additional conditional statements if none of the preceding conditional statements are true:



if ( open(FH, $file1) {

print "Hurray, I opened file 1.";

} elsif ( open(FH, $file2) {

print "Hurray, I opened file 2.";

} elsif ( open(FH, $file3) {

print "Hurray, I opened file 3.";

} else {

print "None of the dadblasted files would open.";

}

In the preceding example, if file 1 opened successfully, the if statement doesn't try to open additional files.



There is also an unless statement, which is the same as an if statement with the conditional negated. So these two statements are equivalent:

unless ( open(FH, $filename) {

print "Rats. The file did not open.";

}
if ( not open(FH, $filename) {

print "Rats. The file did not open.";

}

B.12 Binding Operators

Binding operators are used for pattern matching, substitution, and transliteration on strings. They are used in conjunction with regular expressions that specify the patterns. Here's an example:

'ACGTACGTACGTACGT' =~ /CTA/

The pattern is the string CTA, enclosed by forward slashes //. The string binding operator is =~; it tells the program which string to search, returning true if the pattern appears in the string.

Another string binding operator is !~, which returns true if the pattern isn't in the string:

'ACGTACGTACGTACGT' !~ /CTA/

This is equivalent to:

not 'ACGTACGTACGTACGT' =~ /CTA/

You can substitute one pattern for another using the string binding operator. In the next example, s/thine/nine/ is the substitution command, which substitutes the first occurrence of thine with the string nine:

$poor_richard = 'A stitch in time saves thine.';

$poor_richard =~ s/thine/nine/;

print $poor_richard;

This produces the output:

A stitch in time saves nine.

Finally, the transliteration (or translate) operator tr substitutes characters in a string. It has several uses, but the two uses I've covered are first, to change bases to their complements A T

, C A
:

$DNA = 'ACGTTTAA';

$DNA =~ tr/ACGT/TGCA/;

This produces the value:

TGCAAATT


Second, the tr operator counts the number of a particular character in a string, as in this example which counts the number of Gs in a string of DNA sequence data:

$DNA = 'ACGTTTAA';

$count = ($DNA =~ tr/A//);

print $count;

This produces the value 3. This shows that a pattern match can return a count of the number of translations made in a string, which is then assigned to the variable $count.

B.13 Loops

Loops repeatedly execute the statements in a block until a conditional test changes value. There are several forms of loops in Perl:

while(CONDITION) {BLOCK}

until(CONDITION) {BLOCK}

for(INITIALIZATION ; CONDITION ; RE-INITIALIZATION ) {BLOCK}

foreach VAR (LIST) {BLOCK}

for VAR (LIST) {BLOCK}

do {BLOCK} while (CONDITION)

do {BLOCK} until (CONDITION)

The while loop first tests if the conditional is true; if so, it executes the block and then returns to the conditional to repeat the process; if false, it does nothing, and the loop is over. For example:

$i = 3;

while ( $i ) {



print "$i\n";

$i--;


}

This produces the output:

3

2

1


Here's how the loop works. The scalar variable $i is first initialized to 3 (this isn't part of the loop). The loop is then entered, and $i is tested to see if it has a true (nonzero) value. It does, so the number 3 is printed, and the decrement operator is applied to $i, which reduces its value to 2. The block is now over, and the loop starts again with the conditional test. It succeeds with the true value 2, which is printed and decremented. The loop restarts with a test of $i, which is now the true value 1; 1 is printed and decremented to 0. The loop starts again; 0 is tested to see if it's true, and it's not, so the loop is now finished.

Loops often follow the same pattern, in which a variable is set, and a loop is called, which tests the variable's value and then executes a block, which includes changing the value of the variable.

The for loop makes this easy by including the variable initialization and the variable change in the loop statement. The following is exactly equivalent to the preceding example and produces the same output:

for ( $i = 3 ; $i ; $i-- ) {

print "$i\n";

}

The foreach loop is a convenient way to iterate through the elements in an array. Here's an example:


@array = ('one', 'two', 'three');
foreach $element (@array) {

print $element\n";

}

This prints the output:



one

two


three

The foreach loop specifies a scalar variable $element to be set to each element of the array. (You may use any variable name or none, in which case the special variable $_ is used automatically.) The array to be iterated over is then placed in parentheses, followed by the block. You can use for instead of foreach as the name of this loop, with identical behavior.

The first time through the loop, the value of the first element of the array is assigned to the foreach variable $element. On each succeeding pass through the loop, the value of the next element of the array is assigned to the foreach variable $element. The loop exits after it has reached the end of the array.

There is one important point to make, however. If in the block you change the value of the loop variable $element, the array is changed, and the change stays in effect after you've left the foreach loop. For example:

@array = ('one', 'two', 'three');
foreach $element (@array) {

$element = 'four';

}
foreach $element (@array) {

print $element,"\n";

}

produces the output:



four

four


four

In the do-until loop, the block is executed before the conditional test, and the test succeeds until the condition is true:

$i = 3;

do {


print $i,"\n";

$i--;


} until ( $i );

This prints:

3

In the do-while loop, the block is executed before the conditional test, and the test succeeds while the condition is true:



$i = 3;

do {


print $i,"\n";

$i--;

} while ( $i );

This prints:

3

2

1



B.14 Input/Output

This section covers getting information into programs and receiving data back from them.



B.14.1 Input from Files

Perl has several convenient ways to get information into a program. In this book, I've emphasized opening files and reading in the information contained in them, because it is frequently used, and because it behaves very much the same way on all different operating systems. You've observed the open and close system calls and how to associate a filehandle with a file when you open it, which then is used to read in the data. As an example:

open(FILEHANDLE, "informationfile");

@data_from_informationfile = ;

close(FILEHANDLE);

This code opens the file informationfile and associates the filehandle FILEHANDLE with it. The filehandle is then used within angle brackets < > to actually read in the contents of the file and store the contents in the array @data_from_informationfile. Finally, the file is closed by referring once again to the opened filehandle.



B.14.2 Input from STDIN

Perl allows you to read in any input that is automatically sent to your program via standard input (STDIN). STDIN is a filehandle that by default is always open. Your program may be expecting some input that way. For instance, on a Mac, you can drag and drop a file icon onto the Perl applet for your program to make the file's contents appear in STDIN. On Unix systems, you can pipe the output of some other program into the STDIN of your program with shell commands such as:

someprog | my_perl_program

You can also pipe the contents of a file into your program with:

cat file | my_perl_program

or with:

my_perl_program < file.

Your program can then read in the data (from program or file) that comes as STDIN just as if it came from a file that you've opened:

@data_from_stdin = ;

B.14.3 Input from Files Named on the Command Line

You can name your input files on the command line. <> is shorthand for . The ARGV filehandle treats the array @ARGV as a list of filenames and returns the contents of all those files, one line at a time. Perl places all command-line arguments into the array @ARGV. Some of these may be special flags, which should be read and removed from @ARGV if there will also be datafiles named. Perl assumes that anything in @ARGV refers to an input filename when it reaches a < > command. The contents of the file or files are then available to the program using the angle brackets < > without a filehandle, like so:

@data_from_files = <>;

For example, on Microsoft, Unix, or on the MacOS X, you specify input files at the command line, like so:

% my_program file1 file2 file3

B.14.4 Output Commands

The print statement is the most common way to output data from a Perl program. The print statement takes as arguments a list of scalars separated by commas. An array can be an argument, in which case, the elements of the array are all printed one after the other:

@array = ('DNA', 'RNA', 'Protein');

print @array;

This prints out:

DNARNAProtein

If you want to put spaces between the elements of an array, place it between double quotes in the print statement, like this:

@array = ('DNA', 'RNA', 'Protein');

print "@array";

This prints out:

DNA RNA Protein

The print statement can specify a filehandle as an optional indirect object between the print statement and the arguments, like so:

print FH "@array";

The printf function gives more control over the formatting of the output of numbers. For instance, you can specify field widths; the precision, or number of places after the decimal point; and whether the value is right- or left-justified in the field. I showed the most common options in Chapter 12 and refer you to the Perl documentation that comes with your copy of Perl for all the details.

The sprintf function is related to the printf function; it formats a string instead of printing it out.

The format and write commands are a way to format a multiline output, as when generating reports. format can be a useful command, but in practice it isn't used much. The full details are available in your Perl documentation, and O'Reilly's Programming Perl contains an entire chapter on format. You can also see format in Chapter 12 of this book.


B.14.4.1 Output to STDOUT, STDERR, and Files

Standard output, with the filehandle STDOUT, is the default destination for output from a Perl program, so it doesn't have to be named. The following two statements are equivalent unless you used select to change the default output filehandle:

print "Hello biology world!\n";

print STDOUT "Hello biology world!\n";

Note that the STDOUT isn't followed by a comma. STDOUT is usually directed to the computer screen, but it may be redirected at the command line to other programs or files. This Unix command pipes the STDOUT of my_program to the STDIN of your_program:

my_program | your_program

This Unix command directs the output of my_program to the file outputfile:

my_program > outputfile

It's also common to direct certain error messages to the predefined standard error filehandle STDERR or to a file you've opened for input and named with a particular filehandle. Here are examples of these two tasks:

print STDERR "If you reached this part of the program, something is terribly wrong!";

open(OUTPUTFD, ">output_file");

print OUTPUTFD "Here is the first line in the output file output_file\n";

STDERR is also usually directed to the computer screen by default, but it can be directed into a file from the command line. This is done differently for different systems, for example, as follows (on Unix with the sh or bash shells):

myprogram 2>myprogram.error

You can also direct STDERR to a file from within your Perl program by including code such as the following before the first output to STDERR. This is the most portable way to redirect STDERR:

open (STDERR, ">myprogram.error") or die "Cannot open error file

myprogram.error:$!\n";

The problem with this is that the original STDERR is lost. This method, taken from Programming Perl, saves and restores the original STDERR:

open ERRORFILE, ">myprogram.error"

or die "Can't open myprogram.error";

open SAVEERR, ">&STDERR";

open STDERR, ">&ERRORFILE;

print STDERR "This will appear in error file myprogram.error\n";
# now, restore STDERR

close STDERR;

open STDERR, ">&SAVEERR";
print STDERR "This will appear on the computer screen\n";

There are a lot of details concerning filehandles not covered in this book, and redirecting one of the predefined filehandles such as STDERR can cause problems, especially as your programs get bigger and rely more on modules and libraries of subroutines. One safe way is to define a new filehandle associated with an error file and to send all your error messages to it:

open (ERRORMESSAGES, ">myprogram.error")

or die "Cannot open myprogram.error:$!\n";


print ERRORMESSAGES "This is an error message\n";

Note that the die function, and the closely related warn function, print their error messages to STDERR.



B.15 Regular Expressions

Regular expressions are, in effect, an extra language that lives inside the Perl language. In Perl, they have quite a lot of features. First, I'll summarize how regular expressions work in Perl; then, I'll present some of their many features.


B.15.1 Overview

Regular expressions describe patterns in strings. The pattern described by a single regular expression may match many different strings.

Regular expressions are used in pattern matching, that is, when you look to see if a certain pattern exists in a string. They can also change strings, as with the s/// operator that substitutes the pattern, if found, for a replacement. Additionally, they are used in the tr function that can transliterate several characters into replacement characters throughout a string. Regular expressions are case-sensitive, unless explicitly told otherwise.

The simplest pattern match is a string that matches itself. For instance, to see if the pattern 'abc' appears in the string 'abcdefghijklmnopqrstuvwxyz', write the following in Perl:

$alphabet = 'abcdefghijklmnopqrstuvwxyz';

if( $alphabet =~ /abc/ ) {

print $&;

}

The =~ operator binds a pattern match to a string. /abc/ is the pattern abc, enclosed in forward slashes // to indicate that it's a regular-expression pattern. $& is set to the matched pattern, if any. In this case, the match succeeds, since 'abc' appears in the string $alphabet, and the code just given prints out abc.


Regular expressions are made from two kinds of characters. Many characters, such as 'a' or 'Z', match themselves. Metacharacters have a special meaning in the regular-expression language. For instance, parentheses ( ) are used to group other characters and don't match themselves. If you want to match a metacharacter such as ( in a string, you have to precede it with the backslash metacharacter \( in the pattern.

There are three basic ideas behind regular expressions. The first is concatenation: two items next to each other in a regular-expression pattern (that's the string between the forward slashes // in the examples) must match two items next to each other in the string being matched (the $alphabet in the examples). So to match 'abc' followed by 'def', concatenate them in the regular expression:

$alphabet = 'abcdefghijklmnopqrstuvwxyz';

if( $alphabet =~ /abcdef/ ) {

print $&;

}

This prints:



abcdef

The second major idea is alternation. Items separated by the | metacharacter match any one of the items. For example:

$alphabet = 'abcdefghijklmnopqrstuvwxyz';

if( $alphabet =~ /a(b|c|d)c/ ) {

print $&;

}

prints as:


abc.

The example also shows how parentheses group things in a regular expression. The parentheses are metacharacters that aren't matched in the string; rather, they group the alternation, given as b|c|d, meaning any one of b, c, or d at that position in the pattern. Since b is actually in $alphabet at that position, the alternation, and indeed the entire pattern a(b|c|d)c, matches in the $alphabet. (One additional point: ab|cd means (ab)|(cd), not a(b|c)d.)

The third major idea of regular expressions is repetition (or closure). This is indicated in a pattern with the quantifier metacharacter *, sometimes called the Kleene star after one of the inventors of regular expressions. When * appears after an item, it means that the item may appear 0, 1, or any number of times at that place in the string. So, for example, all of the following pattern matches will succeed:

'AC' =~ /AB*C/;

'ABC' =~ /AB*C/;

'ABBBBBBBBBBBC' =~ /AB*C/;


B.15.2 Metacharacters

The following are metacharacters:

\ | ( ) [ { ^ $ * + ? .

B.15.2.1 Escaping with \

A backslash \ before a metacharacter causes it to match itself; for instance, \\ matches a single \ in the string.



B.15.2.2 Alternation with |

The pipe | indicates alternation, as described previously.



B.15.2.3 Grouping with ( )

The parentheses ( ) provide grouping, as described previously.



B.15.2.4 Character classes

Square brackets [ ] specify a character class. A character class matches one character, which can be any character specified. For instance, [abc] matches either a, or b, or c at that position (so it's the same as a|b|c). A -Z is a range that matches any uppercase letter, a-z matches any lowercase letter, and 0-9 matches any digit. For instance, [A-Za-z0-9] matches any single letter or digit at that position. If the first character in a character class is ^, any character except those specified match; for instance, [^0-9] matches any character that isn't a digit.



1   ...   20   21   22   23   24   25   26   27   28


The database is protected by copyright ©hestories.info 2017
send message

    Main page