The inner workings of RTPy

This document will help you understand the memory layout of the RTPy VM and the decisions the RTPy compiler makes when generating bytecode.

The RTPy stack

The RTPy stack is the memory used to execute a program. It has a few distinct features and is best explained using the included image. The stack is divided into scopes and the heap. Scopes grow from the top while the heap grows from the bottom. Unless lists are allowed to grow dynamically, RTPy does not use heap memory and relies solely on the stack. 

Scopes

In the C interpreter, a scope is represented as the following:

struct Scope {

    Instruction *ptr_to_function;

    uint8_t *stack_ptr; 

};

Instruction ptr_to_function is fixed during execution, and points to the code corresponding to a RTPy function. The stack_ptr is set to the end of stack memory when a function is called. Because we remember where each function starts on the stack, nested functions can call variables from their parent scope. 

def a():

    var1 = 1

    def b():

        print(var1)

There are also four reserved scopes of which the location on the stack (stack_ptr) fixed during execution. These are the Registers, Global variables, Class templates, Literal values and Jump points scopes. Registers are used for arithmetic, literal values are used to store any value that is hard coded into the program and jump points store the locations where if-else instructions can jump to.

The size of the stack is determined by statically analyzing the RTPy application. It is as small as possible while guaranteeing that there is no risk of an stack-overflow.

Heap

The heap is located at the other side of the stack. The size of the heap must be determined by the programmer. Using the heap brings the risk of creating a stack-overflow, so use it carefully. The only time the heap can be used (if enabled), is by a list. See the chapter about lists to learn more.

Variable size, arithmetic and execution speed

The three are related by the fact that arithmetic operations require a register to operate on. Provided that the values are already in registers, a arithmetic operation is very fast. When however when two variables stored on the stack are provided, these variables must first be copied into registers before any operation can be executed. This takes a significantly longer time. 

Since RTPy is a register based machine, any variable on the stack can be seen as a register. This only holds true however, if the variable is as big as a register. For example, on a 32 bit controller where the register size is also 32 bit. Two uint16_t variables must first be moved into register to perform an ADD operation. A uint32_t variable does not have to be moved, the ADD operation can directly read from the stack memory.

In RTPy, using the built-in types such as int and float makes sure the variable has the same size as a RTPy register on the target controler. It is recommended to use smaller types such as uint8_t  inside lists, where the memory footprint can grow rapidly. Of course this tradeoff between execution speed and memory footprint is unique to each project.

Classes

In RTPy, classes do not work the same as in regular Python. The key differences are:


Considering the following RTPy example, we are going to figure out what happens in memory.

class B:

    varB1: uint32_t = 2

    def __init__(self, varB2: uint32_t):

        self.varB2 = varB2

class A:

    varA1: B

    varA2: &B

    varA3 = B(4)

instanceA = A()

Class templates

The first thing that happens when this program is loaded into the interpreter, is the creation of two class templates. Assuming a 32bit architecture, the size of each template is 64 bits for B and 160 bits for A

Class B has one constant defined outside of the __init__ function, this value is included in the Class template. For self.varB2 the default value of uint32_t is loaded, which is 0.

Class A has two instances of Class B, and one reference (borrow) to Class B. The B class instances are loaded as a vector in Class A, giving the Class template a larger size. 

Layout of class template B
Layout of class template A

Runtime template initialisation

Until now the templates are initialised before the program is ran, and you might ask yourself what will happen to  varA3 = B(4)? This will be executed in the first run of the program, in the global scope which is the same run as where the line instanceA = A()will also be executed. So what happens? When varA3 = B(4)  is reached, the Class B __init__ function is going to run and change the class template of Class A to the following:

Layout of class template A after global initialisation

Variable initialisation

When instanceA = A() is called, it copies the class template A to the global variable of instanceA, where enough bytes are reserved by the compiler to paste the class. If available, the __init__ method is called, where the self keyword refers to the copies instance of the class template at the location of instanceA. Notice that A.varA2 still refers to the class template of B. Unitil A.varA2 is properly initialised, any change to A.varA2 to will change the class template of Class B.


Why class templates?

Class templates ensure that there is always a default value to fall back on for uninitialised classes.

Different kinds of classes

As seen in the example snippet, there are two types of classes.

There is however a third invisible kind used by the compiler: The single instance class. If the compiler detects that of a specific class, there is only ever one instance, it optimises this class functions to directly use this class instance.


Single instance class

Let's consider the addition of an add function to class A:

class A:

    varA1: B

    def add(self, a: int):

        return self.varA1.varB1 + a

 th Since there is only one instance of class A, the compiler knows that the self keyword will always reference this instance and thus self.varA1 will always points to the same variable. The compiler removes the self keyword, and hard codes the location of self.varA1 into the bytecode.


Lists

A list in the C interperter is defined as followed:

typedef struct RTPyList {

    uint16_t lenght; // the amount of items in the list

    uint16_t capacity; // the amount of items that can be stored in the list

    struct RTPyList *next; // pointer to the next list, if the list is full

    uint8_t items[0]; // the items in the list

} RTPyList;

RTPy requires the user to specify the size of a list in the annotation, this is represented as the capacity. When appending to a list, the length gets increased. If the length reaches the capacity of the list, two things can happen depending on if the "extend lists into heap" setting is set to True or False

Extend lists into heap

A new list is created withing the heap. It uses the next free spot that has capacity for at least 6 items, if more memory is available it allocates at maximum the same size as the original list.

Do not extend list into heap

A fully static approach where a stack overflow is not possible. The standard RTPy interperter implementation disregards this append, and does nothing. 

It is however also possible to implement a version where RTPyList struct has a variable "overflow_length". Now append circles back to the beginning of the list and overwrites elements starting at index 0.