PyTexas 2010 - Debugging in Python

A Hands On Exercise in Python Debugging

Before we even begin, here's how look it up for yourself:

Useful Resources

http://docs.python.org/library/pdb.html - the official pdb documentation

http://openp2p.com/pub/a/python/2005/09/01/debugger.html - lengthy O'Reilly article

http://pythonconquerstheuniverse.wordpress.com/category/the-python-debugger/ - concise blog entry about the basics

http://aymanh.com/python-debugging-techniques - broad debugging blog entry

http://www.jcameroncooper.com/pytexas2010 - this document;

http://www.jcameroncooper.com/pytexas - in-class page

see also http://pytexas.org/TalkProposals2010 for original proposal

This is about it! Doesn't feature in Dive Into Python or most other general Python sources.

Who Am I?

J. Cameron Cooper. Long-time Python/Zope/Plone developer, author of "Building Websites with Plone. Independent developer, sometime trainer.

And I started using structured debugging tools way too late; I was intimidated. But you shouldn't be!

How are you debugging?

Staring at the code? Developing or repeating code in the console? Changing things randomly? Throwing in print statements? Throwing in log statements? Those all have their place, but pdb can make it easier.

Here's some very simple code for calculating a mean, with a test at the end:

def mean(first, second):

result = first + second / 2

return result

shouldbe = 6

m = mean(4, 8)

print("mean(4,8) =", m)

print("should be:", shouldbe)

print("correct?", m==shouldbe)

Go ahead and save this locally and run it with your Python. (It's written for Python 3, but just change the prints for Python 2 if that's what you have. Python 3 also has different division rules...)

There's (at least) one problem in here, as the program will tell you when it runs.

This is pretty simple, and you can probably figure it out by staring. You can use 'print', but you might have to make a few runs to figure it out...

def mean(first, second):

result = first + second / 2

print("first", first)

print("second", second)

print("result", result)

print("first+second", a+b

return result

is maybe helpful, but it's also a lot of typing, and other expressions and experiments might tell you more. Plus, it mixes debugging prints with real prints, and you have to sort them visually. Imagine if there were lots of output.

So let's just jump in with pdb and see how it works. The magic words are import pdb; pdb.set_trace()...

def mean(first, second):

result = first + second / 2

import pdb; pbd.set_trace()

return result

Then run it:

> .../mean.py(4)mean()

-> return result

(Pdb) first

4

(Pdb) second

8

(Pdb) result

8.0

(Pdb) first+second

12

(Pdb) first+second/2

8.0

(Pdb) 12/2

6.0

(Pdb) (first+second)/2

6.0

(Pdb) c

mean(4,8) = 8.0

should be: 6

correct? False

Ah, order of operations. Should have parens in there. So, what did we just do?

The Wonderful 'pdb'

We just got into a 'pdb' session, basically an interactive console in the middle of the running program. We were looking at the live variables at that point in time. We can examine other parts of the program state as well, and move around in it.

There's several ways to start a pdb session; the easiest is to pick a point in the code and insert pdb.set_trace(). It's handy to stack the import with it so it's easy to remove. Thus the magic line above. When the interpreter hits this call, it opens a pdb session on the console, right in the middle of the running program; you'll note the output after the call only happens after we quit pdb.

Let's get back in there and look around some more.

It has interactive help if you type "?":

(Pdb) ?

Documented commands (type help <topic>):

========================================

EOF bt cont enable jump pp run unt

a c continue exit l q s until

alias cl d h list quit step up

args clear debug help n r tbreak w

b commands disable ignore next restart u whatis

break condition down j p return unalias where

Miscellaneous help topics:

==========================

exec pdb

Undocumented commands:

======================

print retval rv

You can also use ? for specifics about anything listed. Take the 'q' command. What does it do?

(Pdb) ? q

q(uit) or exit - Quit from the debugger.

The program being executed is aborted.

You get the same for ? exit; this is a thing pdb does; its commands come in word or one-letter format. In fact, the ? is a shorthand for help. Often these are written like h(elp); meaning the bit outside the parens or the whole thing can be used. The single-letter versions are easier to type, so you'll usually see those.

Here's another thing pdb does: hit enter on a blank line.

(Pdb)

q(uit) or exit - Quit from the debugger.

The program being executed is aborted.

Unlike the normal interpreter, it repeats the last command when you return a blank line. In a debugger, we're often running the same command a lot (to go through uninteresting areas of code or pass through loops.)

Besides the commands listed, pdb will also print out values of variables, like we did with second and such. But what if our variables are named like the pdb commands? Just use the bang: !

Exit out of pdb (you know how now), and modify the code to include a retval variable; this is a common name and also a pdb command. (So are many single letters.)

def mean(first, second):

retval = first + second / 2

import pdb; pdb.set_trace()

return retval

Then run it:

(Pdb) retval

*** Not yet returned!

(Pdb) ? retval

*** No help on retval

(Pdb) !retval

8.0

(Pdb) p retval

8.0

(Pdb) print retval

8.0

The bang says "use the variable by this name, not the command." You can also use print/p to do this. Getting into the habit of using one of these for variable inspection can be a good idea if you like one-letter variables.

Let's look at what else we can do.

Examine the Environment

show location with l(ist)

(Pdb) ?l

l(ist) [first [,last]]

List source code for the current file.

Without arguments, list 11 lines around the current line

or continue the previous listing.

With one argument, list 11 lines starting at that line.

With two arguments, list the given range;

if the second argument is less than the first, it is a count.

Helpful for seeing where we are in the code.

(Pdb) l

1 def mean(first, second):

2 retval = first + second / 2

3 import pdb; pdb.set_trace()

4 -> return retval

5

6 shouldbe = 6

7 m = mean(4, 8)

8 print("mean(4,8) =", m)

9 print("should be:", shouldbe)

10 print("correct?", m==shouldbe)

[EOF]

Here, we can see that the next line to execute is #4. Subsequent uses of list will keep listing code; you could hit l, enter, enter, enter to get a long list of code below where you are. If you want to see the same place again, you'll need to use arguments.

(Pdb)

[EOF]

(Pdb) l

[EOF]

(Pdb) l 1

1 def mean(first, second):

2 retval = first + second / 2

3 import pdb; pdb.set_trace()

4 -> return retval

5

6 shouldbe = 6

7 m = mean(4, 8)

8 print("mean(4,8) =", m)

9 print("should be:", shouldbe)

10 print("correct?", m==shouldbe)

[EOF]

see args with a(rgs)

(Pdb) ?a

a(rgs)

Print the arguments of the current function.

A shorthand for printing all the function/method arguments individually.

(Pdb) a

first = 4

second = 8

see return value with retval

Undocumented (it has no help entry.)

Not all that common, and you can do the same thing knowing the name of the return variable. If the return is not a single variable but an expression, it will let you know what is calculated. Let's try this code:

def mean(first, second):

import pdb; pdb.set_trace()

return first + second / 2

The next line will return the value; we can go to it with the n(ext) command (more on that soon) and check what it will return.

> .../mean.py(3)mean()

-> return first + second / 2

(Pdb) next

--Return--

> .../mean.py(3)mean()->8.0

-> return first + second / 2

(Pdb) retval

8.0

Note that the "current line" info also provides the value.

look at stack with w(here)

(Pdb) ?w

w(here)

Print a stack trace, with the most recent frame at the bottom.

An arrow indicates the "current frame", which determines the

context of most commands. 'bt' is an alias for this command.

You've seen stack traces; this creates one without an error; helpful to see who called you.

(Pdb) w

<string>(1)<module>()

.../mean.py(6)<module>()

-> m = mean(4, 8)

> .../mean.py(3)mean()->8.0

-> return first + second / 2

mean.py on line 6 called us, and we're currently in line 3 in the method mean() about to return the value 8.0.

execute statements with ! or p(print)

(Pdb) ? exec

(!) statement

Execute the (one-line) statement in the context of

the current stack frame.

The exclamation point can be omitted unless the first word

of the statement resembles a debugger command.

To assign to a global variable you must always prefix the

command with a 'global' command, e.g.:

(Pdb) global list_options; list_options = ['-l']

We've seen this already, but it does more than just values; you can execute whole (one-line) expressions. And you can do it without the ! if you don't start with a debugger reserved word.

Going back to this code:

def mean(first, second):

retval = first + second / 2

import pdb; pdb.set_trace()

return retval

we can do this:

(Pdb) !retval+first

12.0

(Pdb) first+second

12

You can even change variables before letting the program run on:

(Pdb) !retval = 33

(Pdb) continue

mean(4,8) = 33

should be: 6

correct? False

Neat!

Move Around

You don't have to stay stuck in the spot you put the set_trace.

go forward with s(tep), n(ext)

(Pdb) ? n

n(ext)

Continue execution until the next line in the current function

is reached or it returns.

(Pdb) ? s

s(tep)

Execute the current line, stop at the first possible occasion

(either in a function that is called or in the current function).

These two commands walk us through the program. They only differ when they hit a function/method call.

Let's modify our code so we can look at these better. Let's inspect the "main" part of the program by moving the set_trace into there:

def mean(first, second):

retval = first + second / 2

return retval

import pdb; pdb.set_trace()

shouldbe = 6

m = mean(4, 8)

print("mean(4,8) =", m)

print("should be:", shouldbe)

print("correct?", m==shouldbe)

The next command keeps us at the current level:

> .../mean.py(6)<module>()

-> shouldbe = 6

(Pdb) n

> .../mean.py(7)<module>()

-> m = mean(4, 8)

(Pdb) n

> .../mean.py(8)<module>()

-> print("mean(4,8) =", m)

(Pdb) n

mean(4,8) = 8.0

> .../mean.py(9)<module>()

-> print("should be:", shouldbe)

(Pdb) n

should be: 6

> .../mean.py(10)<module>()

-> print("correct?", m==shouldbe)

(Pdb) n

correct? False

--Return--

> .../mean.py(10)<module>()->None

-> print("correct?", m==shouldbe)

Note how the line number keeps increasing, and when we hit the mean() method, we don't see inside it. We also see the print function printing our values. Of course, we could stop and check on them at any time.

The step command follows into method calls:

> .../mean.py(6)<module>()

-> shouldbe = 6

(Pdb)

(Pdb) s

> .../mean.py(7)<module>()

-> m = mean(4, 8)

(Pdb) s

--Call--

> .../mean.py(1)mean()

-> def mean(first, second):

(Pdb) s

> .../mean.py(2)mean()

-> retval = first + second / 2

(Pdb) s

> .../mean.py(3)mean()

-> return retval

(Pdb) s

--Return--

> .../mean.py(3)mean()->8.0

-> return retval

(Pdb) s

> .../mean.py(8)<module>()

-> print("mean(4,8) =", m)

(Pdb) s

--Call--

> .../python31/lib/idlelib/rpc.py(546)__getattr__()

-> def __getattr__(self, name):

(Pdb)

We went through the details of mean. And then...Oops, we went into the print method. Should've used next for that one. We'll see how to save ourselves from this situation next.

fast-forward with r(eturn), c(ontinue), and unt(il)

(Pdb) ? r

r(eturn)

Continue execution until the current function returns.

(Pdb) ? c

c(ont(inue))

Continue execution, only stop when a breakpoint is encountered.

(Pdb) ? until

unt(il)

Continue execution until the line with a number greater than the current

one is reached or until the current frame returns

Note that "until" has an odd short form unt; u is for "up".

The return command fast forwards to the return statement for this method and then stops, remaining in pdb.

We can escape from the print method (where we got testing step) by returning.

(Pdb) r

--Return--

> c:\program files\python31\lib\idlelib\rpc.py(550)__getattr__()-><idlelib.rpc....at 0x01F43DB0>

-> return MethodProxy(self.sockio, self.oid, name)

(Pdb) r

mean(4,8) = 8.0

> .../mean.py(9)<module>()

-> print("should be:", shouldbe)

(Pdb)

If we set the trace back inside mean and re-run we can return into a more normal result:

(Pdb) r

--Return--

> .../mean.py(4)mean()->8.0

-> return retval

(Pdb)

See how it tells us what it's about to return. Useful if you don't care about the rest of the method.

The continue command doesn't stop at method boundaries; it just lets the program run. If you have another pdb breakpoint (a set_trace, say), you'll go back into pdb; otherwise, it'll go

like normal.

With only one break point:

> .../mean.py(3)mean()

-> retval = first + second / 2

(Pdb) c

mean(4,8) = 8.0

should be: 6

correct? False

Let's add another:

def mean(first, second):

import pdb; pdb.set_trace()

retval = first + second / 2

return retval

shouldbe = 6

m = mean(4, 8)

import pdb; pdb.set_trace()

print("mean(4,8) =", m)

print("should be:", shouldbe)

print("correct?", m==shouldbe)

And we can see:

> .../mean.py(3)mean()

-> retval = first + second / 2

(Pdb) n

> .../mean.py(4)mean()

-> return retval

(Pdb) c

> .../mean.py(9)<module>()

-> print("mean(4,8) =", m)

(Pdb) !m

8.0

(Pdb) c

mean(4,8) = 8.0

should be: 6

correct? False

The until command is rarer, and is fairly specific. Put in this code:

import pdb; pdb.set_trace()

a = 22

for x in range(a): mean(x, x+1)

b = 8

and try to step through it.

> .../mean.py(12)<module>()

-> a = 22

(Pdb) n

> .../mean.py(13)<module>()

-> for x in range(a): mean(x, x+1)

(Pdb) n

> .../mean.py(13)<module>()

-> for x in range(a): mean(x, x+1)

(Pdb) n

> .../mean.py(13)<module>()

-> for x in range(a): mean(x, x+1)

(Pdb)

Hey, that's pretty annoying.

(Pdb) unt

> .../mean.py(14)<module>()

-> b = 8

Ah, that's better.

go back with j(ump)

(Pdb) ? j

j(ump) lineno

Set the next line that will be executed.

You can go backwards, too. Or just plain random access. Use with care!

Note that you can't leave your current scope.

Let's try this code:

print(1)

print(2)

print(3)

import pdb; pdb.set_trace()

print(4)

print(5)

print(6)

print(7)

Jump around!

1

2

3

> .../sequence(5)<module>()

-> print(4)

(Pdb) l

2 print(2)

3 print(3)

4 import pdb; pdb.set_trace()

5 print(4)

6 print(5)

7 -> print(6)

8 print(7)

[EOF]

(Pdb) n

4

> .../sequence(6)<module>()

-> print(5)

(Pdb) j 1

> .../sequence(1)<module>()

-> print(1)

(Pdb) n

1

> .../sequence(2)<module>()

-> print(2)

(Pdb) n

2

> .../sequence(3)<module>()

-> print(3)

(Pdb) j 6

> .../sequence(6)<module>()

-> print(5)

(Pdb) n

5

> .../sequence(7)<module>()

-> print(6)

(Pdb) c

6

7

The Stack

You can also navigate the stack, going into methods that called you to see how they look when they call you.

printing the stack with w(here)... and what it means

We've seen this above, actually, but it's handy to look at the stack to see where we're going. Time to load up some new code:

def top():

a = 12

b = 3

print("Hello")

middle()

print("done!")

def middle():

a = 0

c = 9

import pdb; pdb.set_trace()

print("world")

bottom()

def bottom():

d = 88

print("!")

top()

When we get into it:

Hello

> .../nested(12)middle()

-> print("world")

(Pdb) w

.../nested(19)<module>()

-> top()

.../nested(5)top()

-> middle()

> .../nested(12)middle()

-> print("world")

You can read that stack trace just like any other. We're in middle, on line 12, called from top at line 5, called from the module on line 19.

One unusual thing, which is important: see the right angle next to nested(12)middle()? That's where we are in the stack--the end.

navigating the stack with u(p) and d(own)

We can easily see the value of a in middle:

(Pdb) !a

0

but what does it look like in top? We can go into our caller using up:

(Pdb) u

> .../nested(5)top()

-> middle()

(Pdb) !a

12

(Pdb) w

.../nested(19)<module>()

-> top()

> .../nested(5)top()

-> middle()

.../nested(12)middle()

-> print("world")

The right angle has moved up a line in the trace, and you can see we are looking at the value of a in top.

You should also try list here.

Of course, we get out of there using down (and only after an up.)

(Pdb) d

> .../nested(12)middle()

-> print("world")

(Pdb) !a

0

Moving through the stack is quite useful in circumstances where a failure occurs and you're not quite sure how you get there, or exactly how you were called. You can't really go forwards or backwards from the other frame, though.

Other Ways Into pdb

We've been hard-coding so far with set_trace, used either once or multiple times. There are some other ways that are a bit less intrusive, or useful in other scenarios.

After a crash: post-mortem with 'pm' and 'post_mortem'

You can also debug a crashing program with pdb, and you don't have to guess where to stick in a set_trace. This is "post-mortem" debugging.

Here's some new code. Save it in a file (mean.py is fine), start the interpreter and import it.

def divide(x, y):

a = 3 / x

b = 4 / y

return a + b

pdb.pm() uses the last generated traceback. Run this in the interpreter to create a crash:

>>> divide(3,4)

2.0

>>> divide(4,0)

Traceback (most recent call last):

File "<pyshell#77>", line 1, in <module>

divide(4,0)

File "<pyshell#75>", line 3, in divide

b = 4 / y

ZeroDivisionError: int division or modulo by zero

Then we can pm it:

>>> import pdb; pdb.pm()

> ../mean.py(7)divide()

-> b = 4 / y

(Pdb) !x

4

(Pdb) !y

0

(Pdb) l

2 retval = first + second / 2

3 return retval

4

5 def divide(x, y):

6 a = 3 / x

7 -> b = 4 / y

8 return a + b

9

10

11 shouldbe = 6

12 m = mean(4, 8)

We can grab a specific traceback (or the current one) with pdb.post_mortem():

>>> try:

... divide(4,0)

... except:

... pdb.post_mortem()

> .../mean.py(7)divide()

-> b = 4 / y

(Pdb) c

or

>>> try:

... divide(12,0)

... except:

... import sys

... tb = sys.exc_info()[2]

>>> tb

<traceback object at 0x0198F738>

>>> import pdb; pdb.post_mortem(tb)

> .../mean.py(7)divide()

-> b = 4 / y

(Pdb) c

As you can see, you can run post-mortems automatically in exception handling, or on demand interactively.

Without error or source changes: -m, run, runeval, and runcall

You can start from line 1 in pdb by doing:

$ python -m pdb program.py

This might be useful with a simple file, but is also useful for running without modifying the source. Perhaps you don't have write permission, or just don't want to get in there.

Another way is to open up the interpreter and use one of several methods: run, runeval, and runcall. If you do this, the program has to be importable, so see to your Python path.

Let's say we have the mean-calculating code saved as mean.py.

$ python

>>> import pdb

>>> from mean import mean

>>> pdb.run("mean(1,4)")

> <string>(1)<module>()->None

(Pdb) s

--Call--

> .../mean.py(1)mean()

-> def mean(first, second):

(Pdb) l

1 -> def mean(first, second):

2 retval = first + second / 2

3 return retval

4

5 shouldbe = 6

6 m = mean(4, 8)

7 print("mean(4,8) =", m)

8 print("should be:", shouldbe)

9 print("correct?", m==shouldbe)

[EOF]

(Pdb) c

>>>

You have to step through the eval, but after that you're into the code.

Using run you pass in some statement as a string, as you see above. Returns nothing.

Using runeval you also pass in a string, except it returns the evaluated value.

>>> pdb.runeval("mean(2,4)")

> <string>(1)<module>()->None

(Pdb) c

4.0

>>>

Using runcall you pass in a callable object (the method/function) and parameters, rather than a string.

>>> pdb.runcall(mean, 2, 4)

> c:\users\cameron\documents\pytexas\mean-correct.py(2)mean()

-> retval = first + second / 2

(Pdb) c

4.0

>>>

Which one depends on what you're doing. You can run the whole program, or, cleverly, only the part that's known to fail.

Once you're in pdb

Of course, simply running pdb can be a big pain without the right breakpoint; lots of nexts would be required in some programs. But pdb also lets you set breakpoints without code.

setting breakpoints with b(reak) and tbreak

(Pdb) ?b

b(reak) ([file:]lineno | function) [, condition]

With a line number argument, set a break there in the current

file. With a function name, set a break at first executable line

of that function. Without argument, list all breaks. If a second

argument is present, it is a string specifying an expression

which must evaluate to true before the breakpoint is honored.

The line number may be prefixed with a filename and a colon,

to specify a breakpoint in another file (probably one that

hasn't been loaded yet). The file is searched for on sys.path;

the .py suffix may be omitted.

(Pdb) ?tbreak

tbreak same arguments as break, but breakpoint is

removed when first hit.

Point at a place to make a breakpoint by line number or function name, with a possible condition. The condition lets you skip iterations of a loop that aren't interesting, or other time-wasting scenarios.

We need a new program to see this best; call it callself.py:

def callself(a=0):

a = a + 1

print(a)

if a < 10:

callself(a)

Import it in console:

>>> from callself import callself

>>> import pdb; pdb.run("callself()")

> <string>(1)<module>()

(Pdb) s

--Call--

> .../callself(1)callself()

-> def callself(a=0):

(Pdb) l

1 -> def callself(a=0):

2 a = a + 1

3 print(a)

4 if a < 10:

5 callself(a)

[EOF]

(Pdb) b 3

Breakpoint 1 at .../callself:3

(Pdb) l 1

1 -> def callself(a=0):

2 a = a + 1

3 B print(a)

4 if a < 10:

5 callself(a)

[EOF]

(Pdb) c

> .../callself(3)callself()

-> print(a)

(Pdb) l

1 def callself(a=0):

2 a = a + 1

3 B-> print(a)

4 if a < 10:

5 callself(a)

[EOF]

(Pdb) !a

1

(Pdb) c

1

> .../callself(3)callself()

-> print(a)

(Pdb) !a

2

We've stopped at the temporary breakpoint. Note the B indicating breakpoint. Temporary is similar, but goes away first time by.

managing breakpoints with cl(ear), disable, enable

Also notice above how it gives us a "breakpoint number". You can use this with disable to stop using the breakpoint temporarily.

(Pdb) disable 1

(Pdb) c

2

3

4

5

6

7

8

9

10

You can do the same with clear for permanent removal. Also, enable will undo disable.

skipping breakpoints with condition and ignore

We can do a similar think with ignore; ignore waits for a few times by:

>>> import pdb; pdb.run("callself()")

> <string>(1)<module>()

(Pdb) s

--Call--

> c:\users\cameron\documents\pytexas\callself(1)callself()

-> def callself(a=0):

(Pdb) b 3

Breakpoint 1 at c:\users\cameron\documents\pytexas\callself:3

(Pdb) ignore 1 3

Will ignore next 3 crossings of breakpoint 1.

(Pdb) c

1

2

3

> c:\users\cameron\documents\pytexas\callself(3)callself()

-> print(a)

(Pdb) !a

4

And condition sets a condition (which you can also do on breakpoint set):

(Pdb) condition 1 a==8

(Pdb) c

5

6

7

> c:\users\cameron\documents\pytexas\callself(3)callself()

-> print(a)

(Pdb) !a

8

hooking into breakpoints with 'commands'

You can also specify some commands to run when hitting a breakpoint:

(Pdb) condition 1

Breakpoint 1 is now unconditional.

(Pdb) commands 1

(com) print "I broke"

(com) print a

(com) end

(Pdb) c

8

'I broke'

9

> c:\users\cameron\documents\pytexas\callself(3)callself()

-> print(a)

(Pdb) c

9

'I broke'

10

> c:\users\cameron\documents\pytexas\callself(3)callself()

-> print(a)

On a signal (like Ctrl-C)

Put this code at the top of your program, and you can launch pdb at any point with Ctrl-C.

import signal

import pdb

def int_handler(signal, frame):

pdb.set_trace()

signal.signal(signal.SIGINT, int_handler)

Useful for hanging or tightly looping situations, but somewhat time-dependent.

You'll want to n(ext) to get to the actual code (because it stops in the int_handler.) To continue, use c(ontinue); to stop use q/exit. It'll throw an ugly exception, but don't worry about it. Note that time.sleep has its own problems with being interrupted.

An easy test: put the following code after the signal setup, run, and hit Ctrl-C.

print(1)

while True:

x= 1

print(2222)

Cool, huh?

Logging

Logging still has a place in debugging; if you have lots of tight loops, or want to see a big spew of data, or want end-users to be able to report.

Basic logging is pretty simple:

import logging

logging.debug("asdf")

logging.error("bang")

By default just goes to console, which is usually helpful in debugging, but it's very flexible and easy to send to a file or elsewhere.