luthfan's Blog

Generic function support for LPython - Report

luthfan
Published: 09/08/2022

Overview

Project: Generic function support for LPython (https://github.com/lcompilers/lpython)
Organization: Python Software Foundation (LPython)

Mentors: Ondřej Čertík, Gagandeep Singh, Rohit Goswami
Contributor: Luthfan Anshar Lubis

Project Summary

Generics is a common functionality found in statically-typed programming languages to allow easier maintenance of programs with identical implementations but different type signatures. LPython, a statically-typed Python compiler does not yet support this. In this project, we introduce generic functions as functions with type variables; these variables are statically type-checked and made concrete when lower-level code is generated. To allow type checks for generic function calls, restrictions (similar to traits in Rust) are provided to ensure safety of type parameters.

Background

Programs in LPython are statically-typed, requiring type annotations by users. They are first compiled into Abstract Semantic Representation (ASR), an intermediate representation that carries all the syntactic and semantic (including typing) information of the programs. From this IR, the programs are further compiled to lower level languages such as C and LLVM.

Another important feature of LPython is ensuring that programs for LPython can also be run in CPython.

Implementation

The project will be explained through several phases:

1. Introducing type variables

The first phase of the implementation was adding syntax support for type variable declarations. On the Python level, we rely on TypeVar provided by Python 3. On the ASR level, I added a new kind of type for expression called TypeParameter. This allows the declaration of the following function:

T = TypeVar('T')

def add(x: T, y: T) -> T:
  return x + y

The arguments and return value of the function add are typed with a TypeParameter on the ASR level.

Simple checks on the type variable declarations are also implemented. First, ensuring that a type variable declaration is unique. Second, ensuring that a function can only be typed with an already declared type variable.

Related PR:

https://github.com/lcompilers/lpython/pull/625
https://github.com/lcompilers/lpython/pull/831

Related blog posts:

Adding type variable support to LPython, link: https://blogs.python-gsoc.org/en/luthfans-blog/adding-type-variable-support-to-lpython-1st-week/
Dealing with conflict and ASR, link: https://blogs.python-gsoc.org/en/luthfans-blog/dealing-with-conflict-and-asr-2nd-week/

2. Implementing generic function instantiation

A big leap from the first phase, I implemented a working generic function instantiation that can be compiled to LLVM and can be run, but still without type checks.

Using the example from the previous phase, it is now possible to have the following code:

T = TypeVar('T')

def add(x: T, y: T) -> T:
  return x + y

print(add(1,2))
print(add('a','b'))

The two function calls add(1,2) and add('a','b') instantiate the function add into two different functions on the ASR level, each with different type signatures. If we were to express it with Python syntax, we would have:

def __lpython_generic_add_0(x: i32, y: i32) -> i32: ...
def __lpython_generic_add_1(x: str, y: str) -> str: ...

Here, i32 represents 32-bit integer and str represents string.

At the same time, the function calls themselves are correspondingly renamed into:

print(__lpython_generic_add_0(1,2))
print(__lpython_generic_add_1('a','b'))

At the function call site, the consistence of the type substitution between the arguments types (i32 or str) with the type variables (T) is also checked. The function call add(1,'a') is rejected.

To allow compilation to LLVM, the generic function (add) is ignored while the instantiated functions (__lpython_generic_add_0, __lpython_generic_add_1) and renamed functions calls are compiled normally with the existing compilation.

This step particularly took a long time due to several reasons. First, the lack of familiarity with the already existing tools within LPython that can be utilized for function instantiations. Originally I started with a function instantiator that I personally defined, although later on finding out from one of the mentors about an ASR duplicator. Second, we did not start with a concrete design for the ASR grammar. This resulted in a couple iterations on the grammar for generic functions.

Related PR:

https://github.com/lcompilers/lpython/pull/625
https://github.com/lcompilers/lpython/pull/831

https://github.com/lcompilers/lpython/pull/903

Related blog posts:

Instantiating generic functions Into valid functions in ASR, link: https://blogs.python-gsoc.org/en/luthfans-blog/instantiating-generic-functions-into-valid-functions-in-asr-3rd-week/
Revisiting function generation, link: https://blogs.python-gsoc.org/en/luthfans-blog/revisiting-function-generation-4th-week/

Getting the generated ASR, link: https://blogs.python-gsoc.org/en/luthfans-blog/getting-the-generated-asr-for-generic-functions-5th-week/
Completing the generics early prototype, link: https://blogs.python-gsoc.org/en/luthfans-blog/completing-the-generics-early-prototype-6th-week/

3. Adding necessary functionalities and handling ASR changes

I added support for Array and List types, along with corresponding examples where these types are used.

I also added support for generic Subfunction (functions without return values). However, during the implementation, the main branch of LPython made a change and combined both Function and Subfunction, requiring modifications on the generic functions support.

Related PR:

https://github.com/lcompilers/lpython/pull/907
https://github.com/lcompilers/lpython/pull/989

Related blog posts:

Adding more generic examples, link: https://blogs.python-gsoc.org/en/luthfans-blog/adding-more-generic-examples-7th-week/

4. Adding hard-coded restrictions for type variables

The next step is adding a mechanism to type check generic function calls and operations involving values with type variables. Consider again the add program, there was no check on the operation x+y against the arguments with type T. There was also no type check on add(1,2) to make sure that the arguments 1 and 2 can handle the operation inside add.

As the first approach to this, I added restrictions on the type variable declarations. The restrictions I added were SupportsPlus, SupportsZero, and Divisible. These restrictions are hard-coded in the way that they are hard-coded into the compiler to be associated with specific computations. With restrictions, type variables declarations become:

T = TypeVar('T', bound=SupportsPlus)

The restriction SupportsPlus is associated with addition between integer, real, and string. However, it only supports addition between same typed arguments.

With these restrictions, it is possible to check whether the generic operation x+y is safe by checking the restrictions on the type variables associated with both arguments. It is also possible to check whether add(1,2) is safe by checking that 1 and 2 are typed as integers which does support addition.

Related PR:

https://github.com/lcompilers/lpython/pull/1007

Related blog posts:

Next step, adding restriction, link: https://blogs.python-gsoc.org/en/luthfans-blog/next-step-adding-restriction-8th-week/
Working further on restriction type check, link: https://blogs.python-gsoc.org/en/luthfans-blog/working-further-on-restriction-type-check-9th-week/

Extending restriction and handling ASR changes, link: https://blogs.python-gsoc.org/en/luthfans-blog/extending-restriction-and-handling-asr-changes-10th-week/

5. Designing and implementing user-defined restrictions

The above approach to restrictions does handle type check on operations, but limit them to only those hard-coded into the compilers. We require a mechanism that can allow users to freely define restrictions according to need.

To illustrate this, let us consider a different generic function meanthat takes a list of T as arguments and calculate the mean value of the elements in the list:

def mean(x: list[T]) -> f64:
    k: i32 = len(x)
    if k == 0:
        return 0.0
    res: T
    res = 0.0
    i: i32
    for i in range(k):
        res = res + x[i]
    return res/k

Several computations are made on the variable res with type T. A zero assignment, an addition, and a division. To type check the body of mean, we can abstract the computations as restrictions, expressed as body-less functions.

@restriction
def zero(x: T) -> T:
    pass

@restriction
def add(x: T, y: T) -> T:
    pass

@restriction
def div(x: T, k: i32) -> f64:
    pass

def mean(x: list[T], **kwargs) -> f64:
    k: i32 = len(x)
    if k == 0:
        return 0.0
    res: T
    res = zero(x[0])
    i: i32
    for i in range(k):
        res = plus(res, x[i])
    return div(res/k)

Then, the function calls can assign specific functions that will take over the place of these restrictions (add, zero, div).

print(mean([1,2], zero=int.__zero__, add=int.__add__, div=int.__div__))
print(mean([1.0,2.0], zero=real.__zero__, add=real.__add__, div=real.__div__))

The assignment between a restriction (eg. zero) and a function (eg. int.__zero__) is also checked by making sure if the type substitution remains consistent or not.

Related PR:

https://github.com/lcompilers/lpython/pull/1048 (still open)

Related blog posts:

Designing user-defined restrictions, link: https://blogs.python-gsoc.org/en/luthfans-blog/designing-user-defined-restrictions-11th-week/

Result

I have completed a prototype implementation for LPython. The implementation is strongly type-checked (strong concept) by using restrictions to define computations on type variables. Type-checked generic function calls are compiled into ASR then LLVM.

Future Work

- More complicated generic functions, such as recursive generic functions and nested generic function calls will require some modifications on both type checks and function instantiations
- Designs can be further simplified to avoid defining similar restrictions

View Blog Post

Designing user-defined restrictions (11th week)

luthfan
Published: 08/29/2022

My apologies for posting this very late.

What I did this week

1. Merged the hard-coded restrictions

After making some fixes on the implementation, I had the hard-coded restriction that I explained in the previous post merged.

2. Designed user-defined restrictions

Ultimately, the restrictions on type parameters that we want for LPython is one that users can freely defined according to need. While the hard-coded restrictions can support generics to a certain degree, it cannot handle the various functions that users may define.

Take the mean example from the previous week. With user-defined restrictions, it turns into:

T = TypeVar('T')

@restriction
def zero(x: T) -> T:
    pass

@restriction
def plus(x: T, y: T) -> T:
    pass

@restriction
def div(x: T, k: i32) -> f64:
    pass

def mean(x: list[T]) -> f64:
    k: i32 = len(x)
    if k == 0:
        return 0.0
    res: T
    res = zero(x[0])
    i: i32
    for i in range(k):
        res = plus(res, x[i])
    return div(res, k)

With the hard-coded restrictions, the mean function is limited to handling numbers. However, with user-defined restrictions, it would be possible to pass arguments of any types as long as they have the restriction functions plus, zero, and div provided.

What is coming up next week

I have to conduct a meeting with my mentor and evaluate the current design for user-defined restrictions. Then iterate again on the designs based on the meeting.

Did I get stuck anywhere

I did not have any issue this week as I was freely designing the grammar for the restrictions.

View Blog Post

Extending restriction and handling ASR changes (10th week)

luthfan
Published: 08/21/2022

What I did this week

1. Adding more types of hard-coded restrictions

Again, continuing from last week's work, I added more types of restrictions for type parameters.

Currently there are four restrictions: SupportsZero, SupportsPlus, Divisible, Any. These restrictions are hard-coded into the compiler's frontend, meaning that each restriction is essentially associated with specific basic types available in LPython. For example, a type parameter restricted to SupportsZero has to be of type Integer or Real.

With these restrictions, it is possible to have the following generic function:

T = TypeVar('T', bound=SupportsPlus|SupportsZero|Divisible)

def mean(x: list[T]) -> f64:
    k: i32 = len(x)
    if k == 0:
        return 0.0
    sum: T
    sum = 0
    i: i32
    for i in range(k):
        sum = sum + x[i]
    return sum/k 

print(mean([1,2,3]))
print(mean([1.0,2.0,3.0]))

Given the restrictions, any function instantiation is only required to check against the generic function's signature.

2. Handling ASR changes

The ASR design was recently changed so that Subroutine (function without return values) is merged into Function. Therefore I had to make modifications when adding generic functions and subroutines into the symbol table. This was done in PR #957 and has been subsequently merged.

What is coming up next week

1. Finalize the current hard-coded restrictions and have it merged
2. I did not have the time to handle arrays last week, so hopefully I can work on adding more support for generic arrays next week

Did I get stuck anywhere

This week was smooth sailing, although it took some iterations to have a clean pull request for supporting generic subroutines.

View Blog Post

Working further on restriction type check (9th week)

luthfan
Published: 08/13/2022

What I did this week

Continuing on the restriction's work from last week, I implemented more support for restriction check on both generic function definition and generic function call. So let's say we have the following function:

T = TypeVar('T', bound=SupportsZero)

def f(x: T) -> T
  print(x)
  x = 0
  return x

f(1)
f('a')

For the function definition, the assignment x = 0 would check whether the type of x supports zero assignment (SupportsZero) or not. In this case, it does, so it passes the type check.

For the first function call, because 1 is an integer, it is possible to reassign it with zero. But in the second function call, the character a cannot be reassigned with the number zero. The type check considers this and reject the second function call.

What is coming up next week

1. Access the current restriction design on ASR level
2. Provide more flexible support for generic array

Did I get stuck anywhere

I'm currently a bit confused on how to make a decision for the restriction design on the ASR level. My mentor has provided some high-level design that we want to realize in LPython, but I have not grasped the concrete design yet. I will consult on this matter with my mentor ASAP.

View Blog Post

Next step, adding restriction (8th week)

luthfan
Published: 08/09/2022

What I did this week

1. Merged the generic function implementation

After drafting it for quite sometimes and solving the conflicts with the main branch, we finally have the prototype implementation for generic function merged into LPython (PR #900).

The array support that I mentioned in a previous post was also merged into LPython (PR #907). Unfortunately, although generic arrays work in LLVM, it does not yet work with CPython due to incompatible usage of TypeVar.

2. Worked on restrictions for type variables

The prototype implementation mentioned above would type check every function instantiated by a generic function call, which would slow down the compilation time. Ideally, we want a generics type system that only requires checks on the function call arguments.

To do so, I worked on adding restriction for type variables. An example we have right now is as follows:

from ltypes import TypeVar, SupportsPlus

T = TypeVar('T', bound=SupportsPlus)

def f(x: T, y: T) -> T:
    return x + y

print(f(1,2))
print(f("a","b"))

Where SupportsPlus is a restriction. This denotes that the any argument that will substitute the type variable T must support the plus operation.

The type check is executed in the following two steps:

1. Check the function definition. If a generic parameter (x: T) is used for a certain operation O, then we have to check if the restriction R on T satisfies operation O.
2. Check the function call. If the argument substitutes a parameter with type
T and restriction R, we have to check if the actual type of the argument satisfies that restriction.

What is coming up next week

Continue working on restriction. Currently I have had only implemented a very simple type check on the generic function definition with restriction. Next would be implement type check on the generic function calls.

Did I get stuck anywhere

There were some conflicts with the main branch that I personally could not solve. These conflicts kept my PR from being merged. It turned out to be a section that I changed without properly cross-checking with the main branch. Thankfully, with the help of both my mentors it was solved right away.

View Blog Post
DJDT

Versions

Time

Settings from gsoc.settings

Headers

Request

SQL queries from 1 connection

Static files (2312 found, 3 used)

Templates (28 rendered)

Cache calls from 1 backend

Signals

Log messages