Home Code Objects
Post
Cancel

Code Objects

Introduction

Python code objects are hidden, but they are a crucial part of how Python runs your code. These objects contain all the information needed to execute your code and are often used to create compiled .pyc files. Dan Crosta explains the concept well in his blog post, so I recommend reading that for more information.

Attributes

A code object contains the bytecode of your script and more.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
co_argcount
co_cellvars
co_code
co_consts
co_filename
co_firstlineno
co_flags
co_freevars
co_kwonlyargcount
co_lnotab
co_name
co_names
co_nlocals
co_posonlyargcount
co_stacksize
co_varnames

(In Python 3.8) These are all the attributes. I will go over them one by one.

To get the code object from a function we use the __code__ attribute.

1
2
3
4
def foo():
    print("bar")

print(foo.__code__)
1
<code object foo at 0x000002473D20F940, file "<string>", line 4>

co_argcount

co_argcount, the name speaks for itself, it’s the number of arguments the function can take (arguments with predefined values are also counted). The arguments after an asterisk won’t be counted.

1
2
3
4
def foo(bar, fizz="buzz"):
    pass

print(foo.__code__.co_argcount)
1
2


1
2
3
4
def foo(bar, * ,fizz="buzz"):
    pass

print(foo.__code__.co_argcount)
1
1

co_cellvars

co_cellvars, is not important for reversing purposes. It’s a tuple of the referenced locals in a nested function. In this case, bar is a local variable of foo and in the nested function fizz we reference it.

1
2
3
4
5
def foo(bar):
    def fizz():
        return bar

print(foo.__code__.co_cellvars)
1
('bar',)

co_code

co_code is the most important one of them all, it contains the bytecode of the code. It’s saved in bytes, later on we’ll look at it more and modify it.

1
2
3
4
5
6
def foo():
    print("bar")

print(foo.__code__.co_code)


1
b't\x00d\x01\x83\x01\x01\x00d\x00S\x00'


co_consts

co_consts might be the second most important, as the name implies, it contains all the constants. Also referenced to as literals. These can be strings, integers, tuples, and other code objects. Maybe even other types that I haven’t stumbled upon.

1
2
3
4
5
6
def foo():
    print("bar", 1)
    def fizz():
        pass

print(foo.__code__.co_consts)
1
2
3
4
5
(None, 'bar', 1,
<code object fizz at 0x000002492F8E98F0,
        file "co_consts.py",
        line 3>,
'foo.<locals>.fizz')


co_filename

co_filename is the filename of the file from which the code object is compiled. You can also compile code objects yourself so you would have to pass a filename yourself, which can be anything. If it’s trivial code, then <string> is used.

1
2
3
4
def foo():
    pass

print(foo.__code__.co_filename)
1
C:\path\to\my\file.py


1
2
code = compile("1+1", "<string>", "exec")
print(code.co_filename)
1
<string>

co_firstlineno

co_firstlineno is the first line number of the function.

1
2
3
4
def foo():
    pass

print(foo.__code__.co_firstlineno)
1
1

co_flags

co_flags contains some more information about the script, for example, to see if the script is optimized or not. You can use this function from dis to view the actual meaning of the flags. It is encoded into one number but multiple flags can be extracted from it.
PyArmor also uses it to see if a code object is still encrypted or if it has already been decrypted by setting a custom flag.

1
2
3
4
def foo():
    pass

print(foo.__code__.co_flags)
1
67


1
2
3
import dis

print(dis.pretty_flags(foo.__code__.co_flags))
1
OPTIMIZED, NEWLOCALS, NOFREE

co_lnotab

co_lnotab is used to convert the bytecode offset to the corresponding line number. Also not important for reversing.

1
2
3
4
def foo():
    pass

print(foo.__code__.co_lnotab)
1
b'\x00\x01'

co_name

co_name is the name of the code object, in the case of a function, this is the function’s name.

1
2
3
4
def foo():
    pass

print(foo.__code__.co_name)
1
foo

co_names

co_names are all the names used in the bytecode, this goes from variable names to function names, even built-in functions like print.

1
2
3
4
def foo():
    print(a)

print(foo.__code__.co_names)
1
('print', 'a')

co_nlocals

co_nlocals is the number of local variables, arguments are also counted.

1
2
3
4
def foo(bar):
    return bar

print(foo.__code__.co_nlocals)
1
1


1
2
3
4
def foo(bar):
    pass

print(foo.__code__.co_nlocals)
1
1

co_stacksize

co_stacksize is the maximum capacity that the stack requires.

1
2
3
4
def foo():
    return 1+1

print(foo.__code__.co_stacksize)
1
1

co_varnames

co_varnames are all the local variable names.

1
2
3
4
def foo(bar):
    return fizz

print(foo.__code__.co_varnames)
1
('bar',)

Why

This knowledge is key in Python reversing because most good obfuscators will depend on it. Let’s take , they have pretty on how they do their obfuscation. They encrypt the code objects and at runtime they call a function in their ``.dll (or .pyd`) which decrypts the code object at runtime and continues the code. After the code is done it will encrypt the code object again, so no code object is left unencrypted in memory. One of the next posts will be about PyArmor, we will go more in-depth there.
Having a thorough understanding of code objects is essential for Python reversing, as many effective obfuscators rely on them. For example, PyArmor, a popular obfuscator, encrypts code objects and decrypts them at runtime using a function in their .dll or .pyd file. After the code is executed, the code object is encrypted again to prevent it from being left unencrypted in memory. You can read more on how they work here. In a future post, we will delve deeper into PyArmor and its obfuscation techniques.

Patching!

We will modify a code object which is necessary if we want to patch a program for example. Let’s take this really simple authentication system, it takes a password and checks if the password is correct.

1
2
3
4
5
6
password = input("Password: ")

if password == "svenskithesource":
    print("Correct!")
else:
    print("Wrong!")

The disassembled bytecode looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
  1           0 LOAD_NAME                0 (input)
              2 LOAD_CONST               0 ('Password: ')
              4 CALL_FUNCTION            1
              6 STORE_NAME               1 (password)

  3           8 LOAD_NAME                1 (password)
             10 LOAD_CONST               1 ('svenskithesource')
             12 COMPARE_OP               2 (==)
             14 POP_JUMP_IF_FALSE       26

  4          16 LOAD_NAME                2 (print)
             18 LOAD_CONST               2 ('Correct!')
             20 CALL_FUNCTION            1
             22 POP_TOP
             24 JUMP_FORWARD             8 (to 34)

  6     >>   26 LOAD_NAME                2 (print)
             28 LOAD_CONST               3 ('Wrong!')
             30 CALL_FUNCTION            1
             32 POP_TOP
        >>   34 LOAD_CONST               4 (None)
             36 RETURN_VALUE

In this case, the JUMP_IF_FALSE opcode at index 14 checks the top element of the stack, which is the result of comparing the input password with the correct password. If the comparison is false, the code will jump to offset 26. We can change this opcode to JUMP_IF_TRUE to always continue the code, but the result of the comparison will still be on the stack. In this case, it would be better to use the POP_TOP opcode to remove the top element of the stack.

To make the changes, we can use the opcode library to obtain the values of the opcodes, which will work across different Python versions. In newer versions of Python, bytecode is divided into two parts: an opcode, which is 2 bytes, and an argument, which is also 2 bytes. To make working with bytecode easier, we can convert the co_code attribute, which holds the bytecode, to a bytearray. Once we’re finished, we can convert it back to raw bytes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import opcode

print(opcode.opname)
print(opcode.opmap)

code = compile("""password = input("Password: ")

if password == "svenskithesource":
    print("Correct!")
else:
    print("Wrong!")""", "<string>", "exec")

co_code = bytearray(code.co_code)

for i in range(0, len(co_code), 2):
    op = co_code[i]
    arg = co_code[i+1]

    print(opcode.opname[op], arg)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
['<0>', 'POP_TOP', 'ROT_TWO', 'ROT_THREE', 'DUP_TOP', 'DUP_TOP_TWO', 'ROT_FOUR', '<7>', '<8>', 'NOP', 'UNARY_POSITIVE', ...]
{'POP_TOP': 1, 'ROT_TWO': 2, 'ROT_THREE': 3, 'DUP_TOP': 4, 'DUP_TOP_TWO': 5, 'ROT_FOUR': 6, 'NOP': 9, 'UNARY_POSITIVE': 10, ...}
LOAD_NAME 0
LOAD_CONST 0
CALL_FUNCTION 1
STORE_NAME 1
LOAD_NAME 1
LOAD_CONST 1
COMPARE_OP 2
POP_JUMP_IF_FALSE 26
LOAD_NAME 2
LOAD_CONST 2
CALL_FUNCTION 1
POP_TOP 0
JUMP_FORWARD 8
LOAD_NAME 2
LOAD_CONST 3
CALL_FUNCTION 1
POP_TOP 0
LOAD_CONST 4
RETURN_VALUE 0

To determine whether an opcode requires an argument, we can check if its value is below 90. The Python developers designed it this way to make it easy to determine whether an opcode takes an argument.

In the disassembly of the bytecode, we can see that the jump is at index 14. To modify it, we can change the corresponding value in the bytearray at that index. Once we’re finished, we need to convert the bytearray back to bytes and replace the co_code attribute with the modified bytecode.

1
2
code = code.replace(co_code=bytes(co_code))
exec(code)

If we run the code now, we will find that every password will be considered valid. This is because we have modified the code to always continue regardless of the result of the password comparison.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import opcode

code = compile("""password = input("Password: ")

if password == "svenskithesource":
    print("Correct!")
else:
    print("Wrong!")""", "<string>", "exec")

co_code = bytearray(code.co_code)

co_code[14] = opcode.opmap["POP_TOP"]
code = code.replace(co_code=bytes(co_code))

exec(code)
1
2
Password: random password
Correct!
This post is licensed under CC BY 4.0 by the author.
Trending Tags