Introduction
Python code objects are hidden, but they are a crucial part of how Python runs your code. These objects contain all the information needed to execute your code and are often used to create compiled .pyc files. Dan Crosta explains the concept well in his blog post, so I recommend reading that for more information.
Attributes
A code object contains the bytecode of your script and more.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| co_argcount
co_cellvars
co_code
co_consts
co_filename
co_firstlineno
co_flags
co_freevars
co_kwonlyargcount
co_lnotab
co_name
co_names
co_nlocals
co_posonlyargcount
co_stacksize
co_varnames
|
(In Python 3.8) These are all the attributes. I will go over them one by one.
To get the code object from a function we use the __code__
attribute.
1
2
3
4
| def foo():
print("bar")
print(foo.__code__)
|
1
| <code object foo at 0x000002473D20F940, file "<string>", line 4>
|
co_argcount
co_argcount
, the name speaks for itself, it’s the number of arguments the function can take (arguments with predefined values are also counted). The arguments after an asterisk won’t be counted.
1
2
3
4
| def foo(bar, fizz="buzz"):
pass
print(foo.__code__.co_argcount)
|
1
2
3
4
| def foo(bar, * ,fizz="buzz"):
pass
print(foo.__code__.co_argcount)
|
co_cellvars
co_cellvars
, is not important for reversing purposes. It’s a tuple of the referenced locals in a nested function. In this case, bar
is a local variable of foo
and in the nested function fizz
we reference it.
1
2
3
4
5
| def foo(bar):
def fizz():
return bar
print(foo.__code__.co_cellvars)
|
co_code
co_code
is the most important one of them all, it contains the bytecode of the code. It’s saved in bytes, later on we’ll look at it more and modify it.
1
2
3
4
5
6
| def foo():
print("bar")
print(foo.__code__.co_code)
|
1
| b't\x00d\x01\x83\x01\x01\x00d\x00S\x00'
|
co_consts
co_consts
might be the second most important, as the name implies, it contains all the constants. Also referenced to as literals
. These can be strings, integers, tuples, and other code objects. Maybe even other types that I haven’t stumbled upon.
1
2
3
4
5
6
| def foo():
print("bar", 1)
def fizz():
pass
print(foo.__code__.co_consts)
|
1
2
3
4
5
| (None, 'bar', 1,
<code object fizz at 0x000002492F8E98F0,
file "co_consts.py",
line 3>,
'foo.<locals>.fizz')
|
co_filename
co_filename
is the filename of the file from which the code object is compiled. You can also compile code objects yourself so you would have to pass a filename yourself, which can be anything. If it’s trivial code, then <string>
is used.
1
2
3
4
| def foo():
pass
print(foo.__code__.co_filename)
|
1
2
| code = compile("1+1", "<string>", "exec")
print(code.co_filename)
|
co_firstlineno
co_firstlineno
is the first line number of the function.
1
2
3
4
| def foo():
pass
print(foo.__code__.co_firstlineno)
|
co_flags
co_flags
contains some more information about the script, for example, to see if the script is optimized or not. You can use this function from dis
to view the actual meaning of the flags. It is encoded into one number but multiple flags can be extracted from it.
PyArmor also uses it to see if a code object is still encrypted or if it has already been decrypted by setting a custom flag.
1
2
3
4
| def foo():
pass
print(foo.__code__.co_flags)
|
1
2
3
| import dis
print(dis.pretty_flags(foo.__code__.co_flags))
|
1
| OPTIMIZED, NEWLOCALS, NOFREE
|
co_lnotab
co_lnotab
is used to convert the bytecode offset to the corresponding line number. Also not important for reversing.
1
2
3
4
| def foo():
pass
print(foo.__code__.co_lnotab)
|
co_name
co_name
is the name of the code object, in the case of a function, this is the function’s name.
1
2
3
4
| def foo():
pass
print(foo.__code__.co_name)
|
co_names
co_names
are all the names used in the bytecode, this goes from variable names to function names, even built-in functions like print
.
1
2
3
4
| def foo():
print(a)
print(foo.__code__.co_names)
|
co_nlocals
co_nlocals
is the number of local variables, arguments are also counted.
1
2
3
4
| def foo(bar):
return bar
print(foo.__code__.co_nlocals)
|
1
2
3
4
| def foo(bar):
pass
print(foo.__code__.co_nlocals)
|
co_stacksize
co_stacksize
is the maximum capacity that the stack requires.
1
2
3
4
| def foo():
return 1+1
print(foo.__code__.co_stacksize)
|
co_varnames
co_varnames
are all the local variable names.
1
2
3
4
| def foo(bar):
return fizz
print(foo.__code__.co_varnames)
|
Why
This knowledge is key in Python reversing because most good obfuscators will depend on it. Let’s take , they have pretty on how they do their obfuscation. They encrypt the code objects and at runtime they call a function in their ``.dll (or
.pyd`) which decrypts the code object at runtime and continues the code. After the code is done it will encrypt the code object again, so no code object is left unencrypted in memory. One of the next posts will be about PyArmor, we will go more in-depth there.
Having a thorough understanding of code objects is essential for Python reversing, as many effective obfuscators rely on them. For example, PyArmor, a popular obfuscator, encrypts code objects and decrypts them at runtime using a function in their .dll or .pyd file. After the code is executed, the code object is encrypted again to prevent it from being left unencrypted in memory. You can read more on how they work here. In a future post, we will delve deeper into PyArmor and its obfuscation techniques.
Patching!
We will modify a code object which is necessary if we want to patch a program for example. Let’s take this really simple authentication system, it takes a password and checks if the password is correct.
1
2
3
4
5
6
| password = input("Password: ")
if password == "svenskithesource":
print("Correct!")
else:
print("Wrong!")
|
The disassembled bytecode looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| 1 0 LOAD_NAME 0 (input)
2 LOAD_CONST 0 ('Password: ')
4 CALL_FUNCTION 1
6 STORE_NAME 1 (password)
3 8 LOAD_NAME 1 (password)
10 LOAD_CONST 1 ('svenskithesource')
12 COMPARE_OP 2 (==)
14 POP_JUMP_IF_FALSE 26
4 16 LOAD_NAME 2 (print)
18 LOAD_CONST 2 ('Correct!')
20 CALL_FUNCTION 1
22 POP_TOP
24 JUMP_FORWARD 8 (to 34)
6 >> 26 LOAD_NAME 2 (print)
28 LOAD_CONST 3 ('Wrong!')
30 CALL_FUNCTION 1
32 POP_TOP
>> 34 LOAD_CONST 4 (None)
36 RETURN_VALUE
|
In this case, the JUMP_IF_FALSE
opcode at index 14 checks the top element of the stack, which is the result of comparing the input password with the correct password. If the comparison is false, the code will jump to offset 26. We can change this opcode to JUMP_IF_TRUE
to always continue the code, but the result of the comparison will still be on the stack. In this case, it would be better to use the POP_TOP
opcode to remove the top element of the stack.
To make the changes, we can use the opcode
library to obtain the values of the opcodes, which will work across different Python versions. In newer versions of Python, bytecode is divided into two parts: an opcode, which is 2 bytes, and an argument, which is also 2 bytes. To make working with bytecode easier, we can convert the co_code
attribute, which holds the bytecode, to a bytearray
. Once we’re finished, we can convert it back to raw bytes.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| import opcode
print(opcode.opname)
print(opcode.opmap)
code = compile("""password = input("Password: ")
if password == "svenskithesource":
print("Correct!")
else:
print("Wrong!")""", "<string>", "exec")
co_code = bytearray(code.co_code)
for i in range(0, len(co_code), 2):
op = co_code[i]
arg = co_code[i+1]
print(opcode.opname[op], arg)
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| ['<0>', 'POP_TOP', 'ROT_TWO', 'ROT_THREE', 'DUP_TOP', 'DUP_TOP_TWO', 'ROT_FOUR', '<7>', '<8>', 'NOP', 'UNARY_POSITIVE', ...]
{'POP_TOP': 1, 'ROT_TWO': 2, 'ROT_THREE': 3, 'DUP_TOP': 4, 'DUP_TOP_TWO': 5, 'ROT_FOUR': 6, 'NOP': 9, 'UNARY_POSITIVE': 10, ...}
LOAD_NAME 0
LOAD_CONST 0
CALL_FUNCTION 1
STORE_NAME 1
LOAD_NAME 1
LOAD_CONST 1
COMPARE_OP 2
POP_JUMP_IF_FALSE 26
LOAD_NAME 2
LOAD_CONST 2
CALL_FUNCTION 1
POP_TOP 0
JUMP_FORWARD 8
LOAD_NAME 2
LOAD_CONST 3
CALL_FUNCTION 1
POP_TOP 0
LOAD_CONST 4
RETURN_VALUE 0
|
To determine whether an opcode requires an argument, we can check if its value is below 90. The Python developers designed it this way to make it easy to determine whether an opcode takes an argument.
In the disassembly of the bytecode, we can see that the jump is at index 14. To modify it, we can change the corresponding value in the bytearray at that index. Once we’re finished, we need to convert the bytearray
back to bytes and replace the co_code
attribute with the modified bytecode.
1
2
| code = code.replace(co_code=bytes(co_code))
exec(code)
|
If we run the code now, we will find that every password will be considered valid. This is because we have modified the code to always continue regardless of the result of the password comparison.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| import opcode
code = compile("""password = input("Password: ")
if password == "svenskithesource":
print("Correct!")
else:
print("Wrong!")""", "<string>", "exec")
co_code = bytearray(code.co_code)
co_code[14] = opcode.opmap["POP_TOP"]
code = code.replace(co_code=bytes(co_code))
exec(code)
|
1
2
| Password: random password
Correct!
|