Another blog post in which I use sys.settrace. This time to solve a real problem.
When working with new modules, it is sometimes beneficial to get a glimpse of which entities of a module are actually used. I wrote something comparable in my blog post Instrumenting Java Code to Find and Handle Unused Classes, but this time, I need it in Python and with method-level granularity.
TL;DR
Download trace.py from GitHub and use it to print a call tree and a list of used methods and classes to the error output:
import trace trace.setup(r"MODULE_REGEX", print_location_=True)
Implementation
This could be a hard problem, but it isn’t when we’re using sys.settrace to set a handler for every method and function call, reapplying the knowledge we gained in my Let’s create a debugger together series to develop a small utility.
There are essentially six different types of functions (this sample code is on GitHub):
def log(message: str):
print(message)
class TestClass:
# static initializer of the class
x = 100
def __init__(self):
# constructor
log("instance initializer")
def instance_method(self):
# instance method, self is bound to an instance
log("instance method")
@staticmethod
def static_method():
log("static method")
@classmethod
def class_method(cls):
log("class method")
def free_function():
log("free function")
This is important because we have to handle them differently in the following. But first, let’s define a few helpers and configuration variables:
indent = 0 module_matcher: str = ".*" print_location: bool = False
We also want to print a method call-tree, so we use indent to track the current indentation level. The module_matcher is the regular expression that we use to determine whether we want to consider a module, its classes, and methods. This could, e.g., be __main__ to only consider the main module. The print_location tells us whether we want to print the path and line location for every element in the call tree.
Now to the main helper class:
def log(message: str):
print(message, file=sys.stderr)
STATIC_INIT = "<static init>"
@dataclass
class ClassInfo:
""" Used methods of a class """
name: str
used_methods: Set[str] = field(default_factory=set)
def print(self, indent_: str):
log(indent_ + self.name)
for method in sorted(self.used_methods):
log(indent_ + " " + method)
def has_only_static_init(self) -> bool:
return (
len(self.used_methods) == 1 and
self.used_methods.pop() == STATIC_INIT)
used_classes: Dict[str, ClassInfo] = {}
free_functions: Set[str] = set()
The ClassInfo stores the used methods of a class. We store the ClassInfo instances of used classes and the free function in global variables.
Now to the our call handler that we pass to sys.settrace:
def handler(frame: FrameType, event: str, *args):
""" Trace handler that prints and tracks called functions """
# find module name
module_name: str = mod.__name__ if (
mod := inspect.getmodule(frame.f_code)) else ""
# get name of the code object
func_name = frame.f_code.co_name
# check that the module matches the define regexp
if not re.match(module_matcher, module_name):
return
# keep indent in sync
# this is the only reason why we need
# the return events and use an inner trace handler
global indent
if event == 'return':
indent -= 2
return
if event != "call":
return
# insert the current function/method
name = insert_class_or_function(module_name, func_name, frame)
# print the current location if neccessary
if print_location:
do_print_location(frame)
# print the current function/method
log(" " * indent + name)
# keep the indent in sync
indent += 2
# return this as the inner handler to get
# return events
return handler
def setup(module_matcher_: str = ".*", print_location_: bool = False):
# ...
sys.settrace(handler)
Now, we “only” have to get the name for the code object and collect it properly in either a ClassInfo instance or the set of free functions. The base case is easy: When the current frame contains a local variable self, we probably have an instance method, and when it contains a cls variable, we have a class method.
def insert_class_or_function(module_name: str, func_name: str,
frame: FrameType) -> str:
""" Insert the code object and return the name to print """
if "self" in frame.f_locals or "cls" in frame.f_locals:
return insert_class_or_instance_function(module_name,
func_name, frame)
# ...
def insert_class_or_instance_function(module_name: str,
func_name: str,
frame: FrameType) -> str:
"""
Insert the code object of an instance or class function and
return the name to print
"""
class_name = ""
if "self" in frame.f_locals:
# instance methods
class_name = frame.f_locals["self"].__class__.__name__
elif "cls" in frame.f_locals:
# class method
class_name = frame.f_locals["cls"].__name__
# we prefix the class method name with "<class>"
func_name = "<class>" + func_name
# add the module name to class name
class_name = module_name + "." + class_name
get_class_info(class_name).used_methods.add(func_name)
used_classes[class_name].used_methods.add(func_name)
# return the string to print in the class tree
return class_name + "." + func_name
But how about the other three cases? We use the header line of a method to distinguish between them:
class StaticFunctionType(Enum):
INIT = 1
""" static init """
STATIC = 2
""" static function """
FREE = 3
""" free function, not related to a class """
def get_static_type(code: CodeType) -> StaticFunctionType:
file_lines = Path(code.co_filename).read_text().split("\n")
line = code.co_firstlineno
header_line = file_lines[line - 1]
if "class " in header_line:
# e.g. "class TestClass"
return StaticFunctionType.INIT
if "@staticmethod" in header_line:
return StaticFunctionType.STATIC
return StaticFunctionType.FREE
These are, of course, just approximations, but they work well enough for a small utility used for exploration.
If you know any other way that doesn’t involve using the Python AST, feel free to post in a comment below.
Using the get_static_type function, we can now finish the insert_class_or_function function:
def insert_class_or_function(module_name: str, func_name: str,
frame: FrameType) -> str:
""" Insert the code object and return the name to print """
if "self" in frame.f_locals or "cls" in frame.f_locals:
return insert_class_or_instance_function(module_name,
func_name, frame)
# get the type of the current code object
t = get_static_type(frame.f_code)
if t == StaticFunctionType.INIT:
# static initializer, the top level class code
# func_name is actually the class name here,
# but classes are technically also callable function
# objects
class_name = module_name + "." + func_name
get_class_info(class_name).used_methods.add(STATIC_INIT)
return class_name + "." + STATIC_INIT
elif t == StaticFunctionType.STATIC:
# @staticmethod
# the qualname is in our example TestClass.static_method,
# so we have to drop the last part of the name to get
# the class name
class_name = module_name + "." + frame.f_code.co_qualname[
:-len(func_name) - 1]
# we prefix static class names with "<static>"
func_name = "<static>" + func_name
get_class_info(class_name).used_methods.add(func_name)
return class_name + "." + func_name
free_functions.add(frame.f_code.co_name)
return module_name + "." + func_name
The final thing left to do is to register a teardown handler to print the collected information on exit:
def teardown():
""" Teardown the tracer and print the results """
sys.settrace(None)
log("********** Trace Results **********")
print_info()
# trigger teardown on exit
atexit.register(teardown)
Usage
We now prefix our sample program from the beginning with
import trace trace.setup(r"__main__")
collect all information for the __main__ module, which is directly passed to the Python interpreter.
We append to our program some code to call all methods/functions:
def all_methods():
log("all methods")
TestClass().instance_method()
TestClass.static_method()
TestClass.class_method()
free_function()
all_methods()
Our utility library then prints the following upon execution:
standard error:
__main__.TestClass.<static init>
__main__.all_methods
__main__.log
__main__.TestClass.__init__
__main__.log
__main__.TestClass.instance_method
__main__.log
__main__.TestClass.<static>static_method
__main__.log
__main__.TestClass.<class>class_method
__main__.log
__main__.free_function
__main__.log
********** Trace Results **********
Used classes:
only static init:
not only static init:
__main__.TestClass
<class>class_method
<static init>
<static>static_method
__init__
instance_method
Free functions:
all_methods
free_function
log
standard output:
all methods
instance initializer
instance method
static method
class method
free function
Conclusion
This small utility uses the power of sys.settrace (and some string processing) to find a module’s used classes, methods, and functions and the call tree. The utility is pretty helpful when trying to grasp the inner structure of a module and the module entities used transitively by your own application code.
I published this code under the MIT license on GitHub, so feel free to improve, extend, and modify it. Come back in a few weeks to see why I actually developed this utility…
This article is part of my work in the SapMachine team at SAP, making profiling and debugging easier for everyone.
Johannes Bechberger is a JVM developer working on profilers and their underlying technology in the SapMachine team at SAP. This includes improvements to async-profiler and its ecosystem, a website to view the different JFR event types, and improvements to the FirefoxProfiler, making it usable in the Java world. He started at SAP in 2022 after two years of research studies at the KIT in the field of Java security analyses. His work today is comprised of many open-source contributions and his blog, where he writes regularly on in-depth profiling and debugging topics, and of working on his JEP Candidate 435 to add a new profiling API to the OpenJDK.