Overview
Project: Generic function support for LPython (https://github.com/lcompilers/lpython)
Organization: Python Software Foundation (LPython)
Mentors: Ondřej Čertík, Gagandeep Singh, Rohit Goswami
Contributor: Luthfan Anshar Lubis
Project Summary
Generics is a common functionality found in statically-typed programming languages to allow easier maintenance of programs with identical implementations but different type signatures. LPython, a statically-typed Python compiler does not yet support this. In this project, we introduce generic functions as functions with type variables; these variables are statically type-checked and made concrete when lower-level code is generated. To allow type checks for generic function calls, restrictions (similar to traits in Rust) are provided to ensure safety of type parameters.
Background
Programs in LPython are statically-typed, requiring type annotations by users. They are first compiled into Abstract Semantic Representation (ASR), an intermediate representation that carries all the syntactic and semantic (including typing) information of the programs. From this IR, the programs are further compiled to lower level languages such as C and LLVM.
Another important feature of LPython is ensuring that programs for LPython can also be run in CPython.
Implementation
The project will be explained through several phases:
1. Introducing type variables
The first phase of the implementation was adding syntax support for type variable declarations. On the Python level, we rely on TypeVar
provided by Python 3. On the ASR level, I added a new kind of type for expression called TypeParameter
. This allows the declaration of the following function:
T = TypeVar('T')
def add(x: T, y: T) -> T:
return x + y
The arguments and return value of the function add
are typed with a TypeParameter
on the ASR level.
Simple checks on the type variable declarations are also implemented. First, ensuring that a type variable declaration is unique. Second, ensuring that a function can only be typed with an already declared type variable.
Related PR:
https://github.com/lcompilers/lpython/pull/625
https://github.com/lcompilers/lpython/pull/831
Related blog posts:
Adding type variable support to LPython, link: https://blogs.python-gsoc.org/en/luthfans-blog/adding-type-variable-support-to-lpython-1st-week/
Dealing with conflict and ASR, link: https://blogs.python-gsoc.org/en/luthfans-blog/dealing-with-conflict-and-asr-2nd-week/
2. Implementing generic function instantiation
A big leap from the first phase, I implemented a working generic function instantiation that can be compiled to LLVM and can be run, but still without type checks.
Using the example from the previous phase, it is now possible to have the following code:
T = TypeVar('T')
def add(x: T, y: T) -> T:
return x + y
print(add(1,2))
print(add('a','b'))
The two function calls add(1,2)
and add('a','b')
instantiate the function add
into two different functions on the ASR level, each with different type signatures. If we were to express it with Python syntax, we would have:
def __lpython_generic_add_0(x: i32, y: i32) -> i32: ...
def __lpython_generic_add_1(x: str, y: str) -> str: ...
Here, i32
represents 32-bit integer and str
represents string.
At the same time, the function calls themselves are correspondingly renamed into:
print(__lpython_generic_add_0(1,2))
print(__lpython_generic_add_1('a','b'))
At the function call site, the consistence of the type substitution between the arguments types (i32
or str
) with the type variables (T
) is also checked. The function call add(1,'a')
is rejected.
To allow compilation to LLVM, the generic function (add
) is ignored while the instantiated functions (__lpython_generic_add_0, __lpython_generic_add_1
) and renamed functions calls are compiled normally with the existing compilation.
This step particularly took a long time due to several reasons. First, the lack of familiarity with the already existing tools within LPython that can be utilized for function instantiations. Originally I started with a function instantiator that I personally defined, although later on finding out from one of the mentors about an ASR duplicator. Second, we did not start with a concrete design for the ASR grammar. This resulted in a couple iterations on the grammar for generic functions.
Related PR:
https://github.com/lcompilers/lpython/pull/625
https://github.com/lcompilers/lpython/pull/831
https://github.com/lcompilers/lpython/pull/903
Related blog posts:
Instantiating generic functions Into valid functions in ASR, link: https://blogs.python-gsoc.org/en/luthfans-blog/instantiating-generic-functions-into-valid-functions-in-asr-3rd-week/
Revisiting function generation, link: https://blogs.python-gsoc.org/en/luthfans-blog/revisiting-function-generation-4th-week/
Getting the generated ASR, link: https://blogs.python-gsoc.org/en/luthfans-blog/getting-the-generated-asr-for-generic-functions-5th-week/
Completing the generics early prototype, link: https://blogs.python-gsoc.org/en/luthfans-blog/completing-the-generics-early-prototype-6th-week/
3. Adding necessary functionalities and handling ASR changes
I added support for Array
and List
types, along with corresponding examples where these types are used.
I also added support for generic Subfunction
(functions without return values). However, during the implementation, the main branch of LPython made a change and combined both Function
and Subfunction
, requiring modifications on the generic functions support.
Related PR:
https://github.com/lcompilers/lpython/pull/907
https://github.com/lcompilers/lpython/pull/989
Related blog posts:
Adding more generic examples, link: https://blogs.python-gsoc.org/en/luthfans-blog/adding-more-generic-examples-7th-week/
4. Adding hard-coded restrictions for type variables
The next step is adding a mechanism to type check generic function calls and operations involving values with type variables. Consider again the add
program, there was no check on the operation x+y
against the arguments with type T
. There was also no type check on add(1,2)
to make sure that the arguments 1
and 2
can handle the operation inside add
.
As the first approach to this, I added restrictions on the type variable declarations. The restrictions I added were SupportsPlus
, SupportsZero
, and Divisible
. These restrictions are hard-coded in the way that they are hard-coded into the compiler to be associated with specific computations. With restrictions, type variables declarations become:
T = TypeVar('T', bound=SupportsPlus)
The restriction SupportsPlus
is associated with addition between integer, real, and string. However, it only supports addition between same typed arguments.
With these restrictions, it is possible to check whether the generic operation x+y
is safe by checking the restrictions on the type variables associated with both arguments. It is also possible to check whether add(1,2)
is safe by checking that 1
and 2
are typed as integers which does support addition.
Related PR:
https://github.com/lcompilers/lpython/pull/1007
Related blog posts:
Next step, adding restriction, link: https://blogs.python-gsoc.org/en/luthfans-blog/next-step-adding-restriction-8th-week/
Working further on restriction type check, link: https://blogs.python-gsoc.org/en/luthfans-blog/working-further-on-restriction-type-check-9th-week/
Extending restriction and handling ASR changes, link: https://blogs.python-gsoc.org/en/luthfans-blog/extending-restriction-and-handling-asr-changes-10th-week/
5. Designing and implementing user-defined restrictions
The above approach to restrictions does handle type check on operations, but limit them to only those hard-coded into the compilers. We require a mechanism that can allow users to freely define restrictions according to need.
To illustrate this, let us consider a different generic function mean
that takes a list of T
as arguments and calculate the mean value of the elements in the list:
def mean(x: list[T]) -> f64:
k: i32 = len(x)
if k == 0:
return 0.0
res: T
res = 0.0
i: i32
for i in range(k):
res = res + x[i]
return res/k
Several computations are made on the variable res
with type T
. A zero assignment, an addition, and a division. To type check the body of mean
, we can abstract the computations as restrictions, expressed as body-less functions.
@restriction
def zero(x: T) -> T:
pass
@restriction
def add(x: T, y: T) -> T:
pass
@restriction
def div(x: T, k: i32) -> f64:
pass
def mean(x: list[T], **kwargs) -> f64:
k: i32 = len(x)
if k == 0:
return 0.0
res: T
res = zero(x[0])
i: i32
for i in range(k):
res = plus(res, x[i])
return div(res/k)
Then, the function calls can assign specific functions that will take over the place of these restrictions (add
, zero
, div
).
print(mean([1,2], zero=int.__zero__, add=int.__add__, div=int.__div__))
print(mean([1.0,2.0], zero=real.__zero__, add=real.__add__, div=real.__div__))
The assignment between a restriction (eg. zero
) and a function (eg. int.__zero__
) is also checked by making sure if the type substitution remains consistent or not.
Related PR:
https://github.com/lcompilers/lpython/pull/1048 (still open)
Related blog posts:
Designing user-defined restrictions, link: https://blogs.python-gsoc.org/en/luthfans-blog/designing-user-defined-restrictions-11th-week/
Result
I have completed a prototype implementation for LPython. The implementation is strongly type-checked (strong concept) by using restrictions to define computations on type variables. Type-checked generic function calls are compiled into ASR then LLVM.
Future Work
- More complicated generic functions, such as recursive generic functions and nested generic function calls will require some modifications on both type checks and function instantiations
- Designs can be further simplified to avoid defining similar restrictions