Beginning Perl for Bioinformatics


Fixing Bugs with Comments and Print Statements



Download 1.4 Mb.
Page12/28
Date conversion29.03.2017
Size1.4 Mb.
1   ...   8   9   10   11   12   13   14   15   ...   28

6.6.2 Fixing Bugs with Comments and Print Statements

Sometimes you can identify misbehaving code by selectively commenting out sections of the program until you find the part that seems to cause the problem. You can also add print statements at suspicious parts of a misbehaving program to check what certain variables are doing. Both of these are time-honored programming techniques, and they work well in almost any programming language.

Commenting out sections of code can be particularly helpful when the error messages that you get from Perl don't point you directly at the offending line. This happens occasionally. When it does happen you may, by trial and error, discover that commenting out a small section of code causes the error messages to go away; then you know where the error is occurring.

Adding print statements can also be a quick way to pinpoint a problem, especially if you already have some idea of where the problem is. As a novice programmer, however, you may find that using the Perl debugger is easier than adding print statements. In the debugger, you can easily set print statements at any line. For instance, the following debugger command says to print the values of $i and $k before line 48:

a 48 print "$i $k\n"

Once you learn how to do it, this method is generally faster and easier than editing the Perl program and adding print statements by hand. Using this method is partly a matter of taste, since some extremely good Perl programmers prefer to do it the old-fashioned way, by adding print statements.


6.6.3 The Perl Debugger

My favorite way to deal with nonobvious bugs in my programs is to use the Perl debugger. The problem with bugs in code is that once a program starts running, all you can see is the output; you can't see the steps a program is taking. The Perl debugger lets you examine your program in detail, step by step, and almost always can lead you quickly to the problem. You'll also find that it's easy to use with a little practice.

There are situations the Perl debugger can't handle well: interacting processes that depend on timing considerations, for instance. The debugger can examine only one program at a time, and while examining, it stops the program, so timing considerations with other processes go right out the window.

For most purposes, the Perl debugger is a great, essential, programming tool. This section introduces its most important features.


6.6.3.1 A program with bugs

Example 6-4 has some bugs we can examine. It's supposed to take a sequence and two bases, and output everything from those two bases to the end of the sequence (if it can find them in the sequence). The two bases can be given as an argument, or if no argument is given, the program uses the bases TA by default.

There is one new thing in Example 6-4. The next statement affects the control flow in a loop. It immediately returns the control flow to the next iteration of the loop, skipping whatever else would have followed. Also, you may want to recall $_ , which we discussed back in Example 5-5 in the context of a foreach loop.



Example 6-4. A program with a bug or two

#!/usr/bin/perl

# A program with a bug or two

#

# An optional argument, for where to start printing the sequence,



# is a two-base subsequence.

#

# Print everything from the subsequence ( or TA if no subsequence



# is given as an argument) to the end of the DNA.
# declare and initialize variables

my $dna = 'CGACGTCTTCTAAGGCGA';

my @dna;

my $receivingcommittment;

my $previousbase = '';
my$subsequence = '';
if (@ARGV) {

my$subsequence = $ARGV[0];

}else{

$subsequence = 'TA';



}
my $base1 = substr($subsequence, 0, 1);

my $base2 = substr($subsequence, 1, 1);

# explode DNA

@dna = split ( '', $dna );

######### Pseudocode of the following loop:

#

# If you've received a committment, print the base and continue. Otherwise:



#

# If the previous base was $base1, and this base is $base2, print them.

# You have now received a committment to print the rest of the string.

#

# At each loop, save the previous base.


foreach (@dna) {

if ($receivingcommittment) {

print;

next;


} elsif ($previousbase eq $base1) {

if ( /$base2/ ) {

print $base1, $base2;

$recievingcommitment = 1;

}

}

$previousbase = $_;



}
print "\n";
exit;

Here's the output of two runs of Example 6-1:

$ perl example 6-4 AA
$ perl example 6-4

TA

Huh? It should have printed out AAGGCGA when called with the argument AA, and TAAGGCGA when called with no arguments. There must be a bug in this program. But, if you look it over, there isn't anything obviously wrong. It's time to fire up the debugger. What follows is an actual debugging session on Example 6-4, interspersed with comments to explain what's happening and why.


6.6.3.2 How to start and stop the debugger

The debugger runs interactively, and you control it from the keyboard.[6] The most common way to start it is by giving the -d switch to Perl at the command line. Since you're using buggy Example 6-4 to demonstrate the debugger, here's how to start that program:


[6] You also can run it automatically to produce a trace of the program in a file.

perl -d example6-4

Alternatively, you could have added a -d flag to the command interpreter:

#!/usr/bin/perl -d

On systems such as Unix and Linux where command interpretation works, this starts the debugger automatically.

To stop the debugger, simply type q.



6.6.3.3 Debugger command summary

First, let's try to find the bug in Example 6-4 when it's called with no arguments:

$ perl -d example6-4

Default die handler restored.


Loading DB routines from perl5db.pl version 1.07

Editor support available.


Enter h or 'h h' for help, or 'man perldebug' for more help.
main::(example6-4:11): my $dna = 'CGACGTCTTCTAAGGCGA';

DB<1>

Let's stop right here at the beginning and look at a few things. After some messages, which may not mean a whole lot right now, you get the excellent information that the commands h and h h give more help. Let's try h h:

DB<1> h h

List/search source lines: Control script execution:

l [ln|sub] List source code T Stack trace

- or . List previous/current line s [expr] Single step [in expr]

w [line] List around line n [expr] Next, steps over subs

f filename View source in file Repeat last n or s

/pattern/ ?patt? Search forw/backw r Return from subroutine

v Show versions of modules c [ln|sub] Continue until position

Debugger controls: L List break/watch/actions

O [...] Set debugger options t [expr] Toggle trace [trace expr]


<[<]|{[{]|>[>] [cmd] Do pre/post-prompt b [ln|event|sub] [cnd] Set breakpoint

! [N|pat] Redo a previous command d [ln] or D Delete a/all breakpoints

H [-num] Display last num commands a [ln] cmd Do cmd before line

= [a val] Define/list an alias W expr Add a watch expression

h [db_cmd] Get help on command A or W Delete all actions/watch

|[|]db_cmd Send output to pager ![!] syscmd Run cmd in a subprocess

q or ^D Quit R Attempt a restart

Data Examination: expr Execute perl code, also see: s,n,t expr

x|m expr Evals expr in list context, dumps the result or lists methods.

p expr Print expression (uses script's current package).

S [[!]pat] List subroutine names [not] matching pattern

V [Pk [Vars]] List Variables in Package. Vars can be ~pattern or !pattern.

X [Vars] Same as "V current_package [Vars]".

For more help, type h cmd_letter, or run man perldebug for all docs.

DB<2>

It's a bit hard to read, but you have a concise summary of the debugger commands. You can also use the h command, which gives several screens worth of information. The | h command displays those several pages one at a time; the pipe at the beginning of a debugger command pipes the output through a pager, which typically advances a page when you hit the spacebar on your keyboard. You should try those out. Right now, however, let's focus on a few of the most useful commands. But remember that typing h command can give you help about the command.


6.6.3.4 Stepping through statements with the debugger

Back to the immediate problem. When you started up the debugger, you saw that it stopped on the first line of real Perl code:

main::(example6-4:11): my $dna = 'CGACGTCTTCTAAGGCGA';

There's an important point about the debugger you should understand right away. It shows the line it's about to execute, not the line it just executed.

So really, Example 6-4 hasn't done anything yet. You can see from the command summary that p tells the debugger to print out values. If you ask it to print the value of $dna, you'll find:

DB<2> p $dna

DB<3>

It didn't show anything because there's nothing to show; it hasn't even seen the variable $dna yet. So you should execute the statement. There are two commands to use: n or s both execute the statement being displayed. (The difference is that n or "next" skips the plunge into a subroutine call, treating it like a single statement; s or "single step" enters a subroutine and single step you through that code as well.) Once you've given one of these commands, you can just hit Enter to repeat the same command.

Since there aren't any subroutines, you needn't worry about choosing between n and s, so let's use n:

DB<3> n


main::(example6-4:12): my @dna;

DB<3>


This shows the next line (you can see the line numbers of the Perl program at the end of the prompt). If you wish to see more lines, the w or "window" command will serve:

DB<3> w


9

10 # declare and initialize variables

11: my $dna = 'CGACGTCTTCTAAGGCGA';

12==> my @dna;

13: my $receivingcommittment;

14: my $previousbase = '';

15

16: my $subsequence = '';



17

18: if (@ARGV) {

DB<3>

The current line—the line that will be executed next—is highlighted with an arrow (==>).



The w seems like a useful thing. Let's get more information about it with the help command h w:

DB<3> h w

w [line] List window around line.

DB<4>

Actually, there's more—hitting w repeatedly keeps showing more of the program; a minus sign backs up a screen. But enough of that.

Now that $dna has been declared and initialized, the program seems wrong on the first statement:

DB<4> p $dna

CGACGTCTTCTAAGGCGA

DB<5>

That's exactly what was expected. There's no bug, so let's continue examining the lines, printing out values here and there:


DB<5> n

main::(example6-4:13): my $receivingcommittment;

DB<5> n

main::(example6-4:14): my $previousbase = '';



DB<5> n

main::(example6-4:16): my $subsequence = '';

DB<5> n

main::(example6-4:18): if (@ARGV) {



DB<5> p @ARGV
DB<6> w

15


16: my $subsequence = '';

17


18==> if (@ARGV) {

19: my $subsequence = $ARGV[0];

20 }else{

21: $subsequence = 'TA';

22 }

23


24: my $base1 = substr($subsequence, 0, 1);

DB<6> n


main::(example6-4:21): $subsequence = 'TA';

DB<6> n


main::(example6-4:24): my $base1 = substr($subsequence, 0, 1);

DB<6> p $subsequence

TA

DB<7> n


main::(example6-4:25): my $base2 = substr($subsequence, 1, 1);

DB<7> n


main::(example6-4:28): @dna = split ( '', $dna );

DB<7> p $base1

T

DB<8> p $base2



A

DB<9>


So far, everything is as expected; the default subsequence TA is being used, and the $base1 and $base2 variables are set to T and A, the first and second bases of the subsequence. Let's continue:

DB<9> n


main::(example6-4:39): foreach (@dna) {

DB<9> p @dna

CGACGTCTTCTAAGGCGA

DB<10> p "@dna"

C G A C G T C T T C T A A G G C G A

DB<11>

This shows a trick with Perl and printing arrays: normally they are printed without any spacing between the elements, but enclosing an array in double quotes in a print statement causes it to be displayed with spaces between the elements.

Again, everything seems okay, and we're about to enter a loop. Let's look at the whole loop first:

DB<11> w

36 #


37 # At each loop, save the previous base.

38


39==> foreach (@dna) {

40: if ($receivingcommittment) {

41: print;

42: next;

43 } elsif ($previousbase eq $base1) {

44: if ( /$base2/ ) {

45: print $base1, $base2;

DB<11> w


43 } elsif ($previousbase eq $base1) {

44: if ( /$base2/ ) {

45: print $base1, $base2;

46: $recievingcommitment = 1;

47 }

48 }


49: $previousbase = $_;

50 }


51

52: print "\n";

DB<11>

Despite the few repeated lines resulting from the w command, you can see the whole loop. Now you know something in here is going wrong: when you tested the program without giving it an argument, as it's running now, it took the default argument TA, and so far it seemed okay. However, all it actually did in your test was to print out the TA when it was supposed to print out everything in the string starting with the first occurrence of TA. What's going wrong?


6.6.3.5 Setting breakpoints

To figure out what's wrong, you can set a breakpoint in your code. A breakpoint is a spot in your program where you tell the debugger to stop execution so you can poke around in the code. The Perl debugger lets you set breakpoints in various ways. They let you run the program, stopping only to examine it when a statement with a breakpoint is reached. That way, you don't have to step through every line of code. (If you have 5,000 lines of code, and the error happens when you hit a line of code that's first used when you're reading the 12,000th line of input, you'll be happy about this feature.)

Notice that the part of this loop that prints out the rest of the string, once the starting two bases have been found, is the if block starting at line 40:

if ($receivingcommittment) {

print;

next;


}

Let's look at that $receivingcommittment variable.

Here's one way to do this. Let's set a breakpoint at line 40. Type b 40 and then c to continue, and the program proceeds until it hits line 40:

DB<11> b 40

DB<12> c

main::(example6-4:40): if ($receivingcommittment) {

DB<12> p

C

DB<12>



The last command, p , prints out the element from the @dna array you reached in the foreach loop. Since you didn't specify a variable for the loop, it used the default $_ variable. Many Perl commands such as print or pattern matching operate on the default $_ variable if no other variable is given. (It's the cousin of the @_ default array subroutines used to hold their parameters.) So the p debugger command shows that you're operating on C from the @dna array, which is the first character.

All well and good. But it would be good to have the program break when the variable $receivingcommittment has a change in its value, and then single step from there, to see why the program isn't printing out the rest of the string. Recall that this variable is the flag whose change tells the program to print the rest of the string. First let's delete all other breakpoints:

DB<12> D

Deleting all breakpoints...

You can "watch" the variable with W like so:

DB<12> W $receivingcommittment

DB<13> c

TA

Debugged program terminated. Use q to quit or R to restart,


use O inhibit_exit to avoid stopping after program termination,

h q, h R or h O to get additional info.

DB<13>

Wait a minute! The W command should indicate when $receivingcommittment changes value. But when the program continued running with the c command, it ran to the end, meaning that $receivingcommittment never changed value. So let's start up the program again and break on the line that changes its value:


DB<13> R

Warning: some settings and command-line options may be lost!

Default die handler restored.
Loading DB routines from perl5db.pl version 1.07

Editor support available.


Enter h or 'h h' for help, or 'man perldebug' for more help.
main::(example6-4:11): my $dna = 'CGACGTCTTCTAAGGCGA';

DB<13> w 45

42: next;

43 } elsif ($previousbase eq $base1) {

44: if ( /$base2/ ) {

45: print $base1, $base2;

46: $recievingcommitment = 1;

47 }


48 }

49: $previousbase = $_;

50 }

51


DB<14> b 46

DB<15> c


TAmain::(example6-4:46): $recievingcommitment = 1;

DB<15> n


main::(example6-4:49): $previousbase = $_;

DB<15> p $receivingcommittment


DB<16>

Huh? The code says it's assigning the variable a value of 1, but after you execute the code, with the n and try to print out the value, it doesn't print anything.

If you stare harder at the program, you see that at line 66 you misspelled $receivingcommittment as $recievingcommitment. That explains everything; fix it and run it again:

$ perl example6-4

TAAGGCGA

Success!


6.6.3.6 Fixing another bug

Now, did that fix the other bug when you ran Example 6-4 with an argument?

$ perl example6-4 AA

GACGTCTTCTAAGGCGA

Again, huh? You expected AAGGCGA. Can there be another bug in the program? Let's try the debugger again:

$ perl -d example6-4 AA

Default die handler restored.

Loading DB routines from perl5db.pl version 1.07

Editor support available.


Enter h or 'h h' for help, or 'man perldebug' for more help.
main::(example6-4:11): my $dna = 'CGACGTCTTCTAAGGCGA';

DB<1> n


main::(example6-4:12): my @dna;

DB<1> n


main::(example6-4:13): my $receivingcommittment;

DB<1> n


main::(example6-4:14): my $previousbase = '';

DB<1> n


main::(example6-4:16): my $subsequence = '';

DB<1> n


main::(example6-4:18): if (@ARGV) {

DB<1> n


main::(example6-4:19): my $subsequence = $ARGV[0];

DB<1> n


main::(example6-4:24): my $base1 = substr($subsequence, 0, 1);

DB<1> n


main::(example6-4:25): my $base2 = substr($subsequence, 1, 1);

DB<1> n


main::(example6-4:28): @dna = split ( '', $dna );

DB<1> p $subsequence


DB<2> p $base1
DB<3> p $base2

DB<4>

Okay, for some reason the $subsequence, and therefore the $base1 and $base2 variables, are not getting set right. How come?

Check out line 19 where you declared a new my variable in the block of the if statement with the same name, $subsequence. That's the variable you're setting, but it's disappearing as soon as the if statement is over, because it's scoped in the block since it's a my variable.

So again, you fix that problem by removing the my declaration on line 19 and instead inserting an assignment $subsequence = $ARGV[0]; and run the program again:

$ perl example6-4

TAAGGCGA


$ perl example6-4 AA

AAGGCGA


Here, finally, is success.

6.6.3.7 use warnings; and use strict; redux

Example 6-4 was somewhat artificial. It turns out that these problems would have been reported easily if warnings had been used. So let's see an actual example of the benefits of use strict; and use warnings;, as discussed earlier in this chapter.

If you go back to the original Example 6-4 and add the use warnings; directive near the top of the program, you get the following output:

$ perl example6-4

Name "main::recievingcommitment" used only once: possible typo at example6-4 line 47.

TA

As you see, the warnings found the first bug immediately. They noticed there was a variable that was used only once, usually a sign of a misspelled variable. (I can never spell "receiving" or "commitment" properly.) So fix the misspelling at line 66, and run it again:



$ perl example6-4

TAAGGCGA

$ perl example6-4 AA

substr outside of string at example6-4 line 26.

Use of uninitialized value in regexp compilation at example6-4 line 45.

Use of uninitialized value in print at example6-4 line 46.

GACGTCTTCTAAGGCGA

So, the first bug is fixed. The second bug remains with a few warnings that are, perhaps, hard to understand. But focus on the first error message, and see that it complains about line 26:

my $base2 = substr($subsequence, 1, 1);

So, there's something wrong with $subsequence. Often, error messages will be off by one line, so it may well be that the error starts on the line before, the first time $subsequence is operated on by the substr. But that's not the case here.

Nonetheless, the warnings have pointed directly to the problem. In this case, you still have to take a little initiative; look back at the $subsequence variable and notice the extra my declaration within the if block on line 20 that is preventing the variable from being initialized properly. Now this is not necessarily always a bug—declaring a variable scoped within a block and that overrides another variable of the same name that is outside the block. In fact, it's perfectly legal, so the programmers who wrote the warnings did not flag it as an obvious error. However, it seems to have caused a real problem here!

One final point: if you go back to the original, buggy program, notice there's no use strict; in the program. If you add that and run the program without arguments, you get the following:

$ perl example6-4

Global symbol "$recievingcommitment" requires explicit package name at example6-4 line 47.

Execution of example6-4 aborted due to compilation errors.

Fixing the misspelled variable, and running the program with the argument, you get:

$ perl example6-4 AA

GACGTCTTCTAAGGCGA

You can see that use strict; didn't help for the other bug. Remember, it's best to employ both use strict; and use warnings;.


6.7 Exercises

Exercise 6.1

Write a subroutine to concatenate two strings of DNA.



Exercise 6.2

Write a subroutine to report the percentage of each nucleotide in DNA. You've seen the plus operator +. You will also want to use the divide operator / and the multiply operator *. Count the number of each nucleotide, divide by the total length of the DNA, then multiply by 100 to get the percentage. Your arguments should be the DNA and the nucleotide you want to report on. The int function can be used to discard digits after the decimal point, if needed.



Exercise 6.3

Write a subroutine to prompt a user with any message, and collect the user's answer. The subroutine's argument should be the message, and the return value should be the (one-line) answer.



Exercise 6.4

Write a subroutine to look for command-line arguments such as -help, -h, and --help. Recall that command-line arguments appear in the @ARGV array. Call your subroutine from a main program. If you give the program any of the named command-line arguments, when you pass them into the subroutine it should return a true value. If this is the case, have the program print out a help message in a $USAGE variable and exit.



Exercise 6.5

Write a subroutine to check if a file exists, is a regular file, and is nonzero in size. Use the file test operators (See Appendix B).


Exercise 6.6

Use Exercise 6.3 in a subroutine that keeps prompting until a valid file is entered by the user or until five attempts have failed.


Exercise 6.7

Write a module that contains subroutines that report various statistics on DNA sequences, for instance length, GC content, presence or absence of poly-T sequences (long stretches of mostly T's at the 5' (left) end of many $DNA sequences), or other measures of interest.



Exercise 6.8

Write a subroutine to do something a biologist normally does. (Here's an opportunity to look around the lab and write a useful program!)



Exercise 6.9

Read the documentation about the debugger and become familiar with its use by applying it during your programming.


Exercise 6.10

Write a subroutine that alters an array of lines in a file. Use pass by reference for the array. Pass the subroutine a reference to the array, a regular expression, and a string to replace the regular expression. All the lines of the array should be altered by substituting the matches found for the regular expression by the replacement string.



1   ...   8   9   10   11   12   13   14   15   ...   28


The database is protected by copyright ©hestories.info 2017
send message

    Main page