Generic function support for LPython - Report

luthfan
Published: 09/08/2022

Overview

Project: Generic function support for LPython (https://github.com/lcompilers/lpython)
Organization: Python Software Foundation (LPython)

Mentors: Ondřej Čertík, Gagandeep Singh, Rohit Goswami
Contributor: Luthfan Anshar Lubis

Project Summary

Generics is a common functionality found in statically-typed programming languages to allow easier maintenance of programs with identical implementations but different type signatures. LPython, a statically-typed Python compiler does not yet support this. In this project, we introduce generic functions as functions with type variables; these variables are statically type-checked and made concrete when lower-level code is generated. To allow type checks for generic function calls, restrictions (similar to traits in Rust) are provided to ensure safety of type parameters.

Background

Programs in LPython are statically-typed, requiring type annotations by users. They are first compiled into Abstract Semantic Representation (ASR), an intermediate representation that carries all the syntactic and semantic (including typing) information of the programs. From this IR, the programs are further compiled to lower level languages such as C and LLVM.

Another important feature of LPython is ensuring that programs for LPython can also be run in CPython.

Implementation

The project will be explained through several phases:

1. Introducing type variables

The first phase of the implementation was adding syntax support for type variable declarations. On the Python level, we rely on TypeVar provided by Python 3. On the ASR level, I added a new kind of type for expression called TypeParameter. This allows the declaration of the following function:

T = TypeVar('T')

def add(x: T, y: T) -> T:
  return x + y

The arguments and return value of the function add are typed with a TypeParameter on the ASR level.

Simple checks on the type variable declarations are also implemented. First, ensuring that a type variable declaration is unique. Second, ensuring that a function can only be typed with an already declared type variable.

Related PR:

https://github.com/lcompilers/lpython/pull/625
https://github.com/lcompilers/lpython/pull/831

Related blog posts:

Adding type variable support to LPython, link: https://blogs.python-gsoc.org/en/luthfans-blog/adding-type-variable-support-to-lpython-1st-week/
Dealing with conflict and ASR, link: https://blogs.python-gsoc.org/en/luthfans-blog/dealing-with-conflict-and-asr-2nd-week/

2. Implementing generic function instantiation

A big leap from the first phase, I implemented a working generic function instantiation that can be compiled to LLVM and can be run, but still without type checks.

Using the example from the previous phase, it is now possible to have the following code:

T = TypeVar('T')

def add(x: T, y: T) -> T:
  return x + y

print(add(1,2))
print(add('a','b'))

The two function calls add(1,2) and add('a','b') instantiate the function add into two different functions on the ASR level, each with different type signatures. If we were to express it with Python syntax, we would have:

def __lpython_generic_add_0(x: i32, y: i32) -> i32: ...
def __lpython_generic_add_1(x: str, y: str) -> str: ...

Here, i32 represents 32-bit integer and str represents string.

At the same time, the function calls themselves are correspondingly renamed into:

print(__lpython_generic_add_0(1,2))
print(__lpython_generic_add_1('a','b'))

At the function call site, the consistence of the type substitution between the arguments types (i32 or str) with the type variables (T) is also checked. The function call add(1,'a') is rejected.

To allow compilation to LLVM, the generic function (add) is ignored while the instantiated functions (__lpython_generic_add_0, __lpython_generic_add_1) and renamed functions calls are compiled normally with the existing compilation.

This step particularly took a long time due to several reasons. First, the lack of familiarity with the already existing tools within LPython that can be utilized for function instantiations. Originally I started with a function instantiator that I personally defined, although later on finding out from one of the mentors about an ASR duplicator. Second, we did not start with a concrete design for the ASR grammar. This resulted in a couple iterations on the grammar for generic functions.

Related PR:

https://github.com/lcompilers/lpython/pull/625
https://github.com/lcompilers/lpython/pull/831

https://github.com/lcompilers/lpython/pull/903

Related blog posts:

Instantiating generic functions Into valid functions in ASR, link: https://blogs.python-gsoc.org/en/luthfans-blog/instantiating-generic-functions-into-valid-functions-in-asr-3rd-week/
Revisiting function generation, link: https://blogs.python-gsoc.org/en/luthfans-blog/revisiting-function-generation-4th-week/

Getting the generated ASR, link: https://blogs.python-gsoc.org/en/luthfans-blog/getting-the-generated-asr-for-generic-functions-5th-week/
Completing the generics early prototype, link: https://blogs.python-gsoc.org/en/luthfans-blog/completing-the-generics-early-prototype-6th-week/

3. Adding necessary functionalities and handling ASR changes

I added support for Array and List types, along with corresponding examples where these types are used.

I also added support for generic Subfunction (functions without return values). However, during the implementation, the main branch of LPython made a change and combined both Function and Subfunction, requiring modifications on the generic functions support.

Related PR:

https://github.com/lcompilers/lpython/pull/907
https://github.com/lcompilers/lpython/pull/989

Related blog posts:

Adding more generic examples, link: https://blogs.python-gsoc.org/en/luthfans-blog/adding-more-generic-examples-7th-week/

4. Adding hard-coded restrictions for type variables

The next step is adding a mechanism to type check generic function calls and operations involving values with type variables. Consider again the add program, there was no check on the operation x+y against the arguments with type T. There was also no type check on add(1,2) to make sure that the arguments 1 and 2 can handle the operation inside add.

As the first approach to this, I added restrictions on the type variable declarations. The restrictions I added were SupportsPlus, SupportsZero, and Divisible. These restrictions are hard-coded in the way that they are hard-coded into the compiler to be associated with specific computations. With restrictions, type variables declarations become:

T = TypeVar('T', bound=SupportsPlus)

The restriction SupportsPlus is associated with addition between integer, real, and string. However, it only supports addition between same typed arguments.

With these restrictions, it is possible to check whether the generic operation x+y is safe by checking the restrictions on the type variables associated with both arguments. It is also possible to check whether add(1,2) is safe by checking that 1 and 2 are typed as integers which does support addition.

Related PR:

https://github.com/lcompilers/lpython/pull/1007

Related blog posts:

Next step, adding restriction, link: https://blogs.python-gsoc.org/en/luthfans-blog/next-step-adding-restriction-8th-week/
Working further on restriction type check, link: https://blogs.python-gsoc.org/en/luthfans-blog/working-further-on-restriction-type-check-9th-week/

Extending restriction and handling ASR changes, link: https://blogs.python-gsoc.org/en/luthfans-blog/extending-restriction-and-handling-asr-changes-10th-week/

5. Designing and implementing user-defined restrictions

The above approach to restrictions does handle type check on operations, but limit them to only those hard-coded into the compilers. We require a mechanism that can allow users to freely define restrictions according to need.

To illustrate this, let us consider a different generic function meanthat takes a list of T as arguments and calculate the mean value of the elements in the list:

def mean(x: list[T]) -> f64:
    k: i32 = len(x)
    if k == 0:
        return 0.0
    res: T
    res = 0.0
    i: i32
    for i in range(k):
        res = res + x[i]
    return res/k

Several computations are made on the variable res with type T. A zero assignment, an addition, and a division. To type check the body of mean, we can abstract the computations as restrictions, expressed as body-less functions.

@restriction
def zero(x: T) -> T:
    pass

@restriction
def add(x: T, y: T) -> T:
    pass

@restriction
def div(x: T, k: i32) -> f64:
    pass

def mean(x: list[T], **kwargs) -> f64:
    k: i32 = len(x)
    if k == 0:
        return 0.0
    res: T
    res = zero(x[0])
    i: i32
    for i in range(k):
        res = plus(res, x[i])
    return div(res/k)

Then, the function calls can assign specific functions that will take over the place of these restrictions (add, zero, div).

print(mean([1,2], zero=int.__zero__, add=int.__add__, div=int.__div__))
print(mean([1.0,2.0], zero=real.__zero__, add=real.__add__, div=real.__div__))

The assignment between a restriction (eg. zero) and a function (eg. int.__zero__) is also checked by making sure if the type substitution remains consistent or not.

Related PR:

https://github.com/lcompilers/lpython/pull/1048 (still open)

Related blog posts:

Designing user-defined restrictions, link: https://blogs.python-gsoc.org/en/luthfans-blog/designing-user-defined-restrictions-11th-week/

Result

I have completed a prototype implementation for LPython. The implementation is strongly type-checked (strong concept) by using restrictions to define computations on type variables. Type-checked generic function calls are compiled into ASR then LLVM.

Future Work

- More complicated generic functions, such as recursive generic functions and nested generic function calls will require some modifications on both type checks and function instantiations
- Designs can be further simplified to avoid defining similar restrictions

DJDT

Versions

Time

Settings from gsoc.settings

Headers

Request

SQL queries from 1 connection

Static files (2312 found, 3 used)

Templates (11 rendered)

Cache calls from 1 backend

Signals

Log messages