Summary and Notes:
The first week, I took some notes on the end-goal of the XSparse package. At a higher level, XSparse will eventually integrate with a Python API in a lazy fashion. Similar to dask, we can envision an API such as:
# this is just some tensor operation defined in memory, but X holds the function signature, rather than the values of X X = A + B @ Y / C - D @ E
Where each element in the above equation corresponds to a tensor that is properly shaped. We see that this is a composition of tensor operations and may be stored in a variety of different formats, so we only “evaluate” the expression when a user called “compute(X)”.
# this would actually compute the values of X and store in memory X = sparse.compute(X)
This would be enabled by a runtime compilation e.g. using cppyy, which is a runtime package that automatically generates Python bindings from C++. Thus, we need efficient compiler-time C++ code that will generate efficient code for operating on any combinations of levels and tensor operations. Thus, the focus of this project is to implement such abstractions.
The core part of this project is the idea of the merge lattice, which is essentially coiterating over multiple level formats. Merge lattices is a higher-level construct that begins placing meaning to the levels that are co-iterated. For example, given a tuple of tensors that are joined by different operators (e.g. +, *, /, -), we will have a single tensor output usually. Each input dimension of each tensor will get mapped to an output dimension that we care about.
For example, say we have the following operation written in Einstein notation:
output_ij = input1_ik input2_kj
=> figure out what loops to write in tensor notation
= take Einstein summation as input
-> generate code base off of that
I’ve continued work on my PR https://github.com/hameerabbasi/xsparse/pull/19, where I want to add co-iteration ability for levels with unordered format combined with ordered formats in a conjunctive merge (AND operation; analogous to tensor multiplication). The idea is that 0’s will not contribute to the result, so we only need to iterate over the ordered formats, and locate into the unordered formats. Once the iteration over the ordered formats is complete, then we are good because even if there are non-zeros left in the unordered formats, they are irrelevant.
Clarifying Questions to Ask Mentors:
If we only get unordered formats, is this supported? I would say… no?
If we get an arbitrary combination of ordered and unordered formats, that is fine as long as the unordered formats all have “locate” function right?
Work to do:
Continue reviewing the phd thesis of Fredrik Berg Kjølstad http://tensor-compiler.org/files/kjolstad-phd-thesis-taco-compiler.pdf
Sketch out ideas for how to add co-iteration with a subset of level formats being of the form “locate”
The current co-iteration codebase co-iterates over all passed in levels, which is a special case of what we want.
We need to store the indices of the levels that are ordered.
We then want to co-iterate over the ordered level formats and at each (index, pointer), we want to locate into the unordered formats:
If any of the unordered formats returns NULLOPT (i.e. a value of 0), then note the result will not contain anything at this index. But continue iterating
Once all ordered formats reach the end, then we have reached the end of the co-iteration.