PyTexas 2010 - Debugging in Python
A Hands On Exercise in Python Debugging
Before we even begin, here's how look it up for yourself:
Useful Resources
http://docs.python.org/library/pdb.html - the official pdb documentation
http://openp2p.com/pub/a/python/2005/09/01/debugger.html - lengthy O'Reilly article
http://pythonconquerstheuniverse.wordpress.com/category/the-python-debugger/ - concise blog entry about the basics
http://aymanh.com/python-debugging-techniques - broad debugging blog entry
http://www.jcameroncooper.com/pytexas2010 - this document;
http://www.jcameroncooper.com/pytexas - in-class page
see also http://pytexas.org/TalkProposals2010 for original proposal
This is about it! Doesn't feature in Dive Into Python or most other general Python sources.
Who Am I?
J. Cameron Cooper. Long-time Python/Zope/Plone developer, author of "Building Websites with Plone. Independent developer, sometime trainer.
And I started using structured debugging tools way too late; I was intimidated. But you shouldn't be!
How are you debugging?
Staring at the code? Developing or repeating code in the console? Changing things randomly? Throwing in print statements? Throwing in log statements? Those all have their place, but pdb can make it easier.
Here's some very simple code for calculating a mean, with a test at the end:
def mean(first, second):
result = first + second / 2
return result
shouldbe = 6
m = mean(4, 8)
print("mean(4,8) =", m)
print("should be:", shouldbe)
print("correct?", m==shouldbe)
Go ahead and save this locally and run it with your Python. (It's written for Python 3, but just change the prints for Python 2 if that's what you have. Python 3 also has different division rules...)
There's (at least) one problem in here, as the program will tell you when it runs.
This is pretty simple, and you can probably figure it out by staring. You can use 'print', but you might have to make a few runs to figure it out...
def mean(first, second):
result = first + second / 2
print("first", first)
print("second", second)
print("result", result)
print("first+second", a+b
return result
is maybe helpful, but it's also a lot of typing, and other expressions and experiments might tell you more. Plus, it mixes debugging prints with real prints, and you have to sort them visually. Imagine if there were lots of output.
So let's just jump in with pdb and see how it works. The magic words are import pdb; pdb.set_trace()...
def mean(first, second):
result = first + second / 2
import pdb; pbd.set_trace()
return result
Then run it:
> .../mean.py(4)mean()
-> return result
(Pdb) first
4
(Pdb) second
8
(Pdb) result
8.0
(Pdb) first+second
12
(Pdb) first+second/2
8.0
(Pdb) 12/2
6.0
(Pdb) (first+second)/2
6.0
(Pdb) c
mean(4,8) = 8.0
should be: 6
correct? False
Ah, order of operations. Should have parens in there. So, what did we just do?
The Wonderful 'pdb'
We just got into a 'pdb' session, basically an interactive console in the middle of the running program. We were looking at the live variables at that point in time. We can examine other parts of the program state as well, and move around in it.
There's several ways to start a pdb session; the easiest is to pick a point in the code and insert pdb.set_trace(). It's handy to stack the import with it so it's easy to remove. Thus the magic line above. When the interpreter hits this call, it opens a pdb session on the console, right in the middle of the running program; you'll note the output after the call only happens after we quit pdb.
Let's get back in there and look around some more.
It has interactive help if you type "?":
(Pdb) ?
Documented commands (type help <topic>):
========================================
EOF bt cont enable jump pp run unt
a c continue exit l q s until
alias cl d h list quit step up
args clear debug help n r tbreak w
b commands disable ignore next restart u whatis
break condition down j p return unalias where
Miscellaneous help topics:
==========================
exec pdb
Undocumented commands:
======================
print retval rv
You can also use ? for specifics about anything listed. Take the 'q' command. What does it do?
(Pdb) ? q
q(uit) or exit - Quit from the debugger.
The program being executed is aborted.
You get the same for ? exit; this is a thing pdb does; its commands come in word or one-letter format. In fact, the ? is a shorthand for help. Often these are written like h(elp); meaning the bit outside the parens or the whole thing can be used. The single-letter versions are easier to type, so you'll usually see those.
Here's another thing pdb does: hit enter on a blank line.
(Pdb)
q(uit) or exit - Quit from the debugger.
The program being executed is aborted.
Unlike the normal interpreter, it repeats the last command when you return a blank line. In a debugger, we're often running the same command a lot (to go through uninteresting areas of code or pass through loops.)
Besides the commands listed, pdb will also print out values of variables, like we did with second and such. But what if our variables are named like the pdb commands? Just use the bang: !
Exit out of pdb (you know how now), and modify the code to include a retval variable; this is a common name and also a pdb command. (So are many single letters.)
def mean(first, second):
retval = first + second / 2
import pdb; pdb.set_trace()
return retval
Then run it:
(Pdb) retval
*** Not yet returned!
(Pdb) ? retval
*** No help on retval
(Pdb) !retval
8.0
(Pdb) p retval
8.0
(Pdb) print retval
8.0
The bang says "use the variable by this name, not the command." You can also use print/p to do this. Getting into the habit of using one of these for variable inspection can be a good idea if you like one-letter variables.
Let's look at what else we can do.
Examine the Environment
show location with l(ist)
(Pdb) ?l
l(ist) [first [,last]]
List source code for the current file.
Without arguments, list 11 lines around the current line
or continue the previous listing.
With one argument, list 11 lines starting at that line.
With two arguments, list the given range;
if the second argument is less than the first, it is a count.
Helpful for seeing where we are in the code.
(Pdb) l
1 def mean(first, second):
2 retval = first + second / 2
3 import pdb; pdb.set_trace()
4 -> return retval
5
6 shouldbe = 6
7 m = mean(4, 8)
8 print("mean(4,8) =", m)
9 print("should be:", shouldbe)
10 print("correct?", m==shouldbe)
[EOF]
Here, we can see that the next line to execute is #4. Subsequent uses of list will keep listing code; you could hit l, enter, enter, enter to get a long list of code below where you are. If you want to see the same place again, you'll need to use arguments.
(Pdb)
[EOF]
(Pdb) l
[EOF]
(Pdb) l 1
1 def mean(first, second):
2 retval = first + second / 2
3 import pdb; pdb.set_trace()
4 -> return retval
5
6 shouldbe = 6
7 m = mean(4, 8)
8 print("mean(4,8) =", m)
9 print("should be:", shouldbe)
10 print("correct?", m==shouldbe)
[EOF]
see args with a(rgs)
(Pdb) ?a
a(rgs)
Print the arguments of the current function.
A shorthand for printing all the function/method arguments individually.
(Pdb) a
first = 4
second = 8
see return value with retval
Undocumented (it has no help entry.)
Not all that common, and you can do the same thing knowing the name of the return variable. If the return is not a single variable but an expression, it will let you know what is calculated. Let's try this code:
def mean(first, second):
import pdb; pdb.set_trace()
return first + second / 2
The next line will return the value; we can go to it with the n(ext) command (more on that soon) and check what it will return.
> .../mean.py(3)mean()
-> return first + second / 2
(Pdb) next
--Return--
> .../mean.py(3)mean()->8.0
-> return first + second / 2
(Pdb) retval
8.0
Note that the "current line" info also provides the value.
look at stack with w(here)
(Pdb) ?w
w(here)
Print a stack trace, with the most recent frame at the bottom.
An arrow indicates the "current frame", which determines the
context of most commands. 'bt' is an alias for this command.
You've seen stack traces; this creates one without an error; helpful to see who called you.
(Pdb) w
<string>(1)<module>()
.../mean.py(6)<module>()
-> m = mean(4, 8)
> .../mean.py(3)mean()->8.0
-> return first + second / 2
mean.py on line 6 called us, and we're currently in line 3 in the method mean() about to return the value 8.0.
execute statements with ! or p(print)
(Pdb) ? exec
(!) statement
Execute the (one-line) statement in the context of
the current stack frame.
The exclamation point can be omitted unless the first word
of the statement resembles a debugger command.
To assign to a global variable you must always prefix the
command with a 'global' command, e.g.:
(Pdb) global list_options; list_options = ['-l']
We've seen this already, but it does more than just values; you can execute whole (one-line) expressions. And you can do it without the ! if you don't start with a debugger reserved word.
Going back to this code:
def mean(first, second):
retval = first + second / 2
import pdb; pdb.set_trace()
return retval
we can do this:
(Pdb) !retval+first
12.0
(Pdb) first+second
12
You can even change variables before letting the program run on:
(Pdb) !retval = 33
(Pdb) continue
mean(4,8) = 33
should be: 6
correct? False
Neat!
Move Around
You don't have to stay stuck in the spot you put the set_trace.
go forward with s(tep), n(ext)
(Pdb) ? n
n(ext)
Continue execution until the next line in the current function
is reached or it returns.
(Pdb) ? s
s(tep)
Execute the current line, stop at the first possible occasion
(either in a function that is called or in the current function).
These two commands walk us through the program. They only differ when they hit a function/method call.
Let's modify our code so we can look at these better. Let's inspect the "main" part of the program by moving the set_trace into there:
def mean(first, second):
retval = first + second / 2
return retval
import pdb; pdb.set_trace()
shouldbe = 6
m = mean(4, 8)
print("mean(4,8) =", m)
print("should be:", shouldbe)
print("correct?", m==shouldbe)
The next command keeps us at the current level:
> .../mean.py(6)<module>()
-> shouldbe = 6
(Pdb) n
> .../mean.py(7)<module>()
-> m = mean(4, 8)
(Pdb) n
> .../mean.py(8)<module>()
-> print("mean(4,8) =", m)
(Pdb) n
mean(4,8) = 8.0
> .../mean.py(9)<module>()
-> print("should be:", shouldbe)
(Pdb) n
should be: 6
> .../mean.py(10)<module>()
-> print("correct?", m==shouldbe)
(Pdb) n
correct? False
--Return--
> .../mean.py(10)<module>()->None
-> print("correct?", m==shouldbe)
Note how the line number keeps increasing, and when we hit the mean() method, we don't see inside it. We also see the print function printing our values. Of course, we could stop and check on them at any time.
The step command follows into method calls:
> .../mean.py(6)<module>()
-> shouldbe = 6
(Pdb)
(Pdb) s
> .../mean.py(7)<module>()
-> m = mean(4, 8)
(Pdb) s
--Call--
> .../mean.py(1)mean()
-> def mean(first, second):
(Pdb) s
> .../mean.py(2)mean()
-> retval = first + second / 2
(Pdb) s
> .../mean.py(3)mean()
-> return retval
(Pdb) s
--Return--
> .../mean.py(3)mean()->8.0
-> return retval
(Pdb) s
> .../mean.py(8)<module>()
-> print("mean(4,8) =", m)
(Pdb) s
--Call--
> .../python31/lib/idlelib/rpc.py(546)__getattr__()
-> def __getattr__(self, name):
(Pdb)
We went through the details of mean. And then...Oops, we went into the print method. Should've used next for that one. We'll see how to save ourselves from this situation next.
fast-forward with r(eturn), c(ontinue), and unt(il)
(Pdb) ? r
r(eturn)
Continue execution until the current function returns.
(Pdb) ? c
c(ont(inue))
Continue execution, only stop when a breakpoint is encountered.
(Pdb) ? until
unt(il)
Continue execution until the line with a number greater than the current
one is reached or until the current frame returns
Note that "until" has an odd short form unt; u is for "up".
The return command fast forwards to the return statement for this method and then stops, remaining in pdb.
We can escape from the print method (where we got testing step) by returning.
(Pdb) r
--Return--
> c:\program files\python31\lib\idlelib\rpc.py(550)__getattr__()-><idlelib.rpc....at 0x01F43DB0>
-> return MethodProxy(self.sockio, self.oid, name)
(Pdb) r
mean(4,8) = 8.0
> .../mean.py(9)<module>()
-> print("should be:", shouldbe)
(Pdb)
If we set the trace back inside mean and re-run we can return into a more normal result:
(Pdb) r
--Return--
> .../mean.py(4)mean()->8.0
-> return retval
(Pdb)
See how it tells us what it's about to return. Useful if you don't care about the rest of the method.
The continue command doesn't stop at method boundaries; it just lets the program run. If you have another pdb breakpoint (a set_trace, say), you'll go back into pdb; otherwise, it'll go
like normal.
With only one break point:
> .../mean.py(3)mean()
-> retval = first + second / 2
(Pdb) c
mean(4,8) = 8.0
should be: 6
correct? False
Let's add another:
def mean(first, second):
import pdb; pdb.set_trace()
retval = first + second / 2
return retval
shouldbe = 6
m = mean(4, 8)
import pdb; pdb.set_trace()
print("mean(4,8) =", m)
print("should be:", shouldbe)
print("correct?", m==shouldbe)
And we can see:
> .../mean.py(3)mean()
-> retval = first + second / 2
(Pdb) n
> .../mean.py(4)mean()
-> return retval
(Pdb) c
> .../mean.py(9)<module>()
-> print("mean(4,8) =", m)
(Pdb) !m
8.0
(Pdb) c
mean(4,8) = 8.0
should be: 6
correct? False
The until command is rarer, and is fairly specific. Put in this code:
import pdb; pdb.set_trace()
a = 22
for x in range(a): mean(x, x+1)
b = 8
and try to step through it.
> .../mean.py(12)<module>()
-> a = 22
(Pdb) n
> .../mean.py(13)<module>()
-> for x in range(a): mean(x, x+1)
(Pdb) n
> .../mean.py(13)<module>()
-> for x in range(a): mean(x, x+1)
(Pdb) n
> .../mean.py(13)<module>()
-> for x in range(a): mean(x, x+1)
(Pdb)
Hey, that's pretty annoying.
(Pdb) unt
> .../mean.py(14)<module>()
-> b = 8
Ah, that's better.
go back with j(ump)
(Pdb) ? j
j(ump) lineno
Set the next line that will be executed.
You can go backwards, too. Or just plain random access. Use with care!
Note that you can't leave your current scope.
Let's try this code:
print(1)
print(2)
print(3)
import pdb; pdb.set_trace()
print(4)
print(5)
print(6)
print(7)
Jump around!
1
2
3
> .../sequence(5)<module>()
-> print(4)
(Pdb) l
2 print(2)
3 print(3)
4 import pdb; pdb.set_trace()
5 print(4)
6 print(5)
7 -> print(6)
8 print(7)
[EOF]
(Pdb) n
4
> .../sequence(6)<module>()
-> print(5)
(Pdb) j 1
> .../sequence(1)<module>()
-> print(1)
(Pdb) n
1
> .../sequence(2)<module>()
-> print(2)
(Pdb) n
2
> .../sequence(3)<module>()
-> print(3)
(Pdb) j 6
> .../sequence(6)<module>()
-> print(5)
(Pdb) n
5
> .../sequence(7)<module>()
-> print(6)
(Pdb) c
6
7
The Stack
You can also navigate the stack, going into methods that called you to see how they look when they call you.
printing the stack with w(here)... and what it means
We've seen this above, actually, but it's handy to look at the stack to see where we're going. Time to load up some new code:
def top():
a = 12
b = 3
print("Hello")
middle()
print("done!")
def middle():
a = 0
c = 9
import pdb; pdb.set_trace()
print("world")
bottom()
def bottom():
d = 88
print("!")
top()
When we get into it:
Hello
> .../nested(12)middle()
-> print("world")
(Pdb) w
.../nested(19)<module>()
-> top()
.../nested(5)top()
-> middle()
> .../nested(12)middle()
-> print("world")
You can read that stack trace just like any other. We're in middle, on line 12, called from top at line 5, called from the module on line 19.
One unusual thing, which is important: see the right angle next to nested(12)middle()? That's where we are in the stack--the end.
navigating the stack with u(p) and d(own)
We can easily see the value of a in middle:
(Pdb) !a
0
but what does it look like in top? We can go into our caller using up:
(Pdb) u
> .../nested(5)top()
-> middle()
(Pdb) !a
12
(Pdb) w
.../nested(19)<module>()
-> top()
> .../nested(5)top()
-> middle()
.../nested(12)middle()
-> print("world")
The right angle has moved up a line in the trace, and you can see we are looking at the value of a in top.
You should also try list here.
Of course, we get out of there using down (and only after an up.)
(Pdb) d
> .../nested(12)middle()
-> print("world")
(Pdb) !a
0
Moving through the stack is quite useful in circumstances where a failure occurs and you're not quite sure how you get there, or exactly how you were called. You can't really go forwards or backwards from the other frame, though.
Other Ways Into pdb
We've been hard-coding so far with set_trace, used either once or multiple times. There are some other ways that are a bit less intrusive, or useful in other scenarios.
After a crash: post-mortem with 'pm' and 'post_mortem'
You can also debug a crashing program with pdb, and you don't have to guess where to stick in a set_trace. This is "post-mortem" debugging.
Here's some new code. Save it in a file (mean.py is fine), start the interpreter and import it.
def divide(x, y):
a = 3 / x
b = 4 / y
return a + b
pdb.pm() uses the last generated traceback. Run this in the interpreter to create a crash:
>>> divide(3,4)
2.0
>>> divide(4,0)
Traceback (most recent call last):
File "<pyshell#77>", line 1, in <module>
divide(4,0)
File "<pyshell#75>", line 3, in divide
b = 4 / y
ZeroDivisionError: int division or modulo by zero
Then we can pm it:
>>> import pdb; pdb.pm()
> ../mean.py(7)divide()
-> b = 4 / y
(Pdb) !x
4
(Pdb) !y
0
(Pdb) l
2 retval = first + second / 2
3 return retval
4
5 def divide(x, y):
6 a = 3 / x
7 -> b = 4 / y
8 return a + b
9
10
11 shouldbe = 6
12 m = mean(4, 8)
We can grab a specific traceback (or the current one) with pdb.post_mortem():
>>> try:
... divide(4,0)
... except:
... pdb.post_mortem()
> .../mean.py(7)divide()
-> b = 4 / y
(Pdb) c
or
>>> try:
... divide(12,0)
... except:
... import sys
... tb = sys.exc_info()[2]
>>> tb
<traceback object at 0x0198F738>
>>> import pdb; pdb.post_mortem(tb)
> .../mean.py(7)divide()
-> b = 4 / y
(Pdb) c
As you can see, you can run post-mortems automatically in exception handling, or on demand interactively.
Without error or source changes: -m, run, runeval, and runcall
You can start from line 1 in pdb by doing:
$ python -m pdb program.py
This might be useful with a simple file, but is also useful for running without modifying the source. Perhaps you don't have write permission, or just don't want to get in there.
Another way is to open up the interpreter and use one of several methods: run, runeval, and runcall. If you do this, the program has to be importable, so see to your Python path.
Let's say we have the mean-calculating code saved as mean.py.
$ python
>>> import pdb
>>> from mean import mean
>>> pdb.run("mean(1,4)")
> <string>(1)<module>()->None
(Pdb) s
--Call--
> .../mean.py(1)mean()
-> def mean(first, second):
(Pdb) l
1 -> def mean(first, second):
2 retval = first + second / 2
3 return retval
4
5 shouldbe = 6
6 m = mean(4, 8)
7 print("mean(4,8) =", m)
8 print("should be:", shouldbe)
9 print("correct?", m==shouldbe)
[EOF]
(Pdb) c
>>>
You have to step through the eval, but after that you're into the code.
Using run you pass in some statement as a string, as you see above. Returns nothing.
Using runeval you also pass in a string, except it returns the evaluated value.
>>> pdb.runeval("mean(2,4)")
> <string>(1)<module>()->None
(Pdb) c
4.0
>>>
Using runcall you pass in a callable object (the method/function) and parameters, rather than a string.
>>> pdb.runcall(mean, 2, 4)
> c:\users\cameron\documents\pytexas\mean-correct.py(2)mean()
-> retval = first + second / 2
(Pdb) c
4.0
>>>
Which one depends on what you're doing. You can run the whole program, or, cleverly, only the part that's known to fail.
Once you're in pdb
Of course, simply running pdb can be a big pain without the right breakpoint; lots of nexts would be required in some programs. But pdb also lets you set breakpoints without code.
setting breakpoints with b(reak) and tbreak
(Pdb) ?b
b(reak) ([file:]lineno | function) [, condition]
With a line number argument, set a break there in the current
file. With a function name, set a break at first executable line
of that function. Without argument, list all breaks. If a second
argument is present, it is a string specifying an expression
which must evaluate to true before the breakpoint is honored.
The line number may be prefixed with a filename and a colon,
to specify a breakpoint in another file (probably one that
hasn't been loaded yet). The file is searched for on sys.path;
the .py suffix may be omitted.
(Pdb) ?tbreak
tbreak same arguments as break, but breakpoint is
removed when first hit.
Point at a place to make a breakpoint by line number or function name, with a possible condition. The condition lets you skip iterations of a loop that aren't interesting, or other time-wasting scenarios.
We need a new program to see this best; call it callself.py:
def callself(a=0):
a = a + 1
print(a)
if a < 10:
callself(a)
Import it in console:
>>> from callself import callself
>>> import pdb; pdb.run("callself()")
> <string>(1)<module>()
(Pdb) s
--Call--
> .../callself(1)callself()
-> def callself(a=0):
(Pdb) l
1 -> def callself(a=0):
2 a = a + 1
3 print(a)
4 if a < 10:
5 callself(a)
[EOF]
(Pdb) b 3
Breakpoint 1 at .../callself:3
(Pdb) l 1
1 -> def callself(a=0):
2 a = a + 1
3 B print(a)
4 if a < 10:
5 callself(a)
[EOF]
(Pdb) c
> .../callself(3)callself()
-> print(a)
(Pdb) l
1 def callself(a=0):
2 a = a + 1
3 B-> print(a)
4 if a < 10:
5 callself(a)
[EOF]
(Pdb) !a
1
(Pdb) c
1
> .../callself(3)callself()
-> print(a)
(Pdb) !a
2
We've stopped at the temporary breakpoint. Note the B indicating breakpoint. Temporary is similar, but goes away first time by.
managing breakpoints with cl(ear), disable, enable
Also notice above how it gives us a "breakpoint number". You can use this with disable to stop using the breakpoint temporarily.
(Pdb) disable 1
(Pdb) c
2
3
4
5
6
7
8
9
10
You can do the same with clear for permanent removal. Also, enable will undo disable.
skipping breakpoints with condition and ignore
We can do a similar think with ignore; ignore waits for a few times by:
>>> import pdb; pdb.run("callself()")
> <string>(1)<module>()
(Pdb) s
--Call--
> c:\users\cameron\documents\pytexas\callself(1)callself()
-> def callself(a=0):
(Pdb) b 3
Breakpoint 1 at c:\users\cameron\documents\pytexas\callself:3
(Pdb) ignore 1 3
Will ignore next 3 crossings of breakpoint 1.
(Pdb) c
1
2
3
> c:\users\cameron\documents\pytexas\callself(3)callself()
-> print(a)
(Pdb) !a
4
And condition sets a condition (which you can also do on breakpoint set):
(Pdb) condition 1 a==8
(Pdb) c
5
6
7
> c:\users\cameron\documents\pytexas\callself(3)callself()
-> print(a)
(Pdb) !a
8
hooking into breakpoints with 'commands'
You can also specify some commands to run when hitting a breakpoint:
(Pdb) condition 1
Breakpoint 1 is now unconditional.
(Pdb) commands 1
(com) print "I broke"
(com) print a
(com) end
(Pdb) c
8
'I broke'
9
> c:\users\cameron\documents\pytexas\callself(3)callself()
-> print(a)
(Pdb) c
9
'I broke'
10
> c:\users\cameron\documents\pytexas\callself(3)callself()
-> print(a)
On a signal (like Ctrl-C)
Put this code at the top of your program, and you can launch pdb at any point with Ctrl-C.
import signal
import pdb
def int_handler(signal, frame):
pdb.set_trace()
signal.signal(signal.SIGINT, int_handler)
Useful for hanging or tightly looping situations, but somewhat time-dependent.
You'll want to n(ext) to get to the actual code (because it stops in the int_handler.) To continue, use c(ontinue); to stop use q/exit. It'll throw an ugly exception, but don't worry about it. Note that time.sleep has its own problems with being interrupted.
An easy test: put the following code after the signal setup, run, and hit Ctrl-C.
print(1)
while True:
x= 1
print(2222)
Cool, huh?
Logging
Logging still has a place in debugging; if you have lots of tight loops, or want to see a big spew of data, or want end-users to be able to report.
Basic logging is pretty simple:
import logging
logging.debug("asdf")
logging.error("bang")
By default just goes to console, which is usually helpful in debugging, but it's very flexible and easy to send to a file or elsewhere.