Adam2392's Blog

June 11 - June 18: Week 3

Adam2392
Published: 06/16/2023

This week, I’ve been digging into the details of how to best carry out co-iteration when there are levels that are not ordered. The thinking before was that co-iteration when there is a conjunctive merge was we can make the following changes:

 

  1. Advancing iterators: Advance only the ordered level iterators, since the other iterators can do `locate(PKM1, min_ik)`. 

  2. Dereferencing iterators to get PKs: Since we have locate, we can directly get the PK for a unordered level with the `locate()` function. 

 

However, in 1., this is not as straightforward. Say we have A \intersect B and A has 1000 non-zero elements and B has 1 non-zero element. If A is ordered and B is not, we have to iterate over the entirety of A, when in reality, we should be able to exit early. The dereferencing part though can definitely be implemented and should be changed.

 

Other improvements I have made in the code are:

 

  1. Add `BaseTraits::I i` as an unused parameter in `hashed::iter_helper`

    iteration_helper iter_helper([[maybe_unused]] typename BaseTraits::I i,typename BaseTraits::PKM1 pkm1)

     

  2. In `Coiterate::coiteration_helper`, I changed the initialization of the `iterators` member to

    std::tuple<typename levels::iteration_helper::iterator...=""> it) noexcept

    From

    std::tuple<typename levels::levelcapabilities::iteration_helper::iterator...=""> it) noexcept

    Where we remove LevelCapabilities from the namespace. This was required since the `iteration_helper` for the hashed level is not implemented as part of the `LevelCapabilities` namespace, so this change allows `Coiterate` to be defined with some levels that include a hashed level.</typename></typename>

  3. Rename `Coiterate::coiteration_helper::iterator.locate` to `deref_PKs`, since that is what it is actually doing.
                   

 template <class iter=""></class>

                inline auto deref_PKs(iter i) const noexcept

                {

                    return (std::get<0>(*i) == min_ik)

                               ? std::optional<std::tuple_element_t<1, decltype="">>(</std::tuple_element_t<1,>

                                   std::get<1>(*i))

                               : std::nullopt;

                }

  1. Also in the actual `get_PKs` function, we now use locate if the iterator has the locate function, otherwise we apply dereferencing.

     

                inline auto get_PKs() const noexcept

                {

                    /**

                     * @brief Return tuple of PKs from each level.

                     *

                     * @details If the level is ordered, return the PK from the iterator using

                     * dereferencing `*iter`. If the level is unordered, return the PK from

                     * the iterator using `iter.locate()`.

                     */

                    return std::apply(

                        [&](auto&... args)

                        {

                            return std::make_tuple((has_locate_v<decltype(args)></decltype(args)>

                                                        ? args.locate(m_coiterHelper.m_pkm1, min_ik)

                                                        : deref_PKs(args))...);

                        },

                        this->iterators);

                }

The only issue is that this produces a compiler error.

 

Besides these improvements, I started reviewing the MergeLattice implementation inside the existing taco compiler. The code here is implemented using run-time code. There the implementation uses a builder class to construct a MergeLattice.

 

The MergeLattice at a high level should take in a tuple of levels that are “merge points” on the merge lattice. In addition, it should take in a index expression that dictates how the levels are merged.The index expression in taco uses a set of strings like `expr = C(i, j) = A(i,j) + B(i,j);`, whereas we would want to define an arbitrarily complex index expression… I still have to do some more reading to get an idea of how this part is implemented.

 

Internally, the MergeLattice given the index expression will be able to determine which levels are co-iterated. Moreover, it must construct the `F` function that is passed to `Coiterate`. 

 

Overall, we would like the following higher-level function as well:

  1. Construct a union (disjunction) over lattice points

  2. Construct an intersection (conjunction) over lattice points

 

A merge lattice is constructed per index iterator in a tensor operation. For example, say we have:

Aij = (Bij + Cij) @ Dij

 

To set index i for A, we have to iterate over B_i, C_i and D_i.

To set index j for A, we have to iterate over B_*j, C_*j, and D_*j.

 

Each index constitutes a merge lattice that we need to construct to then call Coiteration. We want to extract the operators “+” and “*” to determine addition and multiplication, where addition is converted to a disjunction and multiplication is converted to a disjunction.
 

In the following example:

 

A_i = b_i + c_i d_i 

 

Has a conjunction and disjunction. We proceed by:

 

  1. Create leaf of the merge lattice from the tensor access rule(?

  2. Create merge lattice for c_i d_i by computing conjunctive merge (ciΛ di) a_i = this if this lattice point is reached

  3. Create merge lattice for b_i since there is no other conjunctive merge with it.

  4. Create upper-most merge latticepoint for disjunctive merge (bi) v (ciΛ di)

    1. So the merge lattice points starts with a whole expression that is a disjunction among conjunctions

    2. Then it traverses through each lattice point, which trims down parts that are not necessary for co-iteration 

 

Questions:

  1. What is the general input for a merge lattice? What is calling it? Will we have to implement, for example, an iteration graph?

  2. Do we implement the “LatticePoint”? 

  3. How do we expect the “tensor operations” to be represented?

View Blog Post

Week 2 - Hashed LevelCapabilities and Learning more Compile-time template metaprogramming

Adam2392
Published: 06/08/2023

This week, I’ve finished the PR adding level properties as a public member to the level classes. 

 

However, now I am encountering difficulties in adding the ability for hash levels to be coiterated on. Currently, the `Coiterate` class implements a coiteration_helper class, which in turn currently relies on initializing iteration helpers from each level.

 

```

std::tuple<typename Levels::LevelCapabilities::iteration_helper...> m_iterHelpers;

, m_iterHelpers(std::apply([&](auto&... args)

                                          { return std::tuple(args.iter_helper(i, pkm1)...); },

                                          coiterate.m_levelsTuple))

```

 

For example, in the above code, we see that `m_iterHelpers` initalizes the `iter_helper` inside each level in `m_levelsTuple`. In addition, `m_iterHelpers` is a tuple of iteration_helpers, but hash levels do not contain this level capability. I need to implement a modification in the design, so that `m_iterHelpers` is only defined on the subset of levels that are ordered.

 

However, it turns out that this is not an issue that requires tackling. Instead, I realized upon inspection and meeting with my mentor Hameer that all levels should have an `iteration_helper` defined through the `LevelCapabilities` namespace inside their class. The `hashed` level contains an `iteration_helper`, but not through the `LevelCapabilities` class, so the next step I realized was to refactor the existing implementation of the `hashed::iterator`, so that it was contained within the `LevelCapabilities` namespace. This is a bit complicated because the hashed level implements a custom iterator and there is some advanced template metaprogramming going on, which I have to figure out. Currently I’m running into two sets of errors that are confusing to me:

 

```

/Users/adam2392/Documents/xsparse/include/xsparse/util/base_traits.hpp:34:9: error: static_assert failed due to requirement 'std::is_convertible_v<xsparse::util::container_traits<std::vector, std::set, std::unordered_map>, unsigned long>' "`PK` must be convertible to uintptr_t."

        static_assert(std::is_convertible_v<PK, uintptr_t>,

        ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

```

 

And

 

```

/Users/adam2392/Documents/xsparse/test/source/hashed_test.cpp:31:34: error: no member named 'iter_helper' in 'xsparse::levels::hashed<std::tuple<>, unsigned long, unsigned long, xsparse::util::container_traits<std::vector, std::set, std::unordered_map>, xsparse::level_properties<false, false, false, false, false>>'

    for (auto const [i2, p2] : h.iter_helper(ZERO))

                               ~ ^

```

 

Future Work - Although this simple PR on improving the co-iteration algorithm has turned into quite a rabbit hole, the main bulk of the GSoC is dedicated to implementing and testing a “merge lattice” data structure, which will leverage the complete co-iteration algorithm.

 

We briefly discussed merge lattices. Merge Lattices co-iterate over subset of levels that are necessary based on properties of the function. For example:

 

E.g. A_ij + B_ik 

 

When iterating over i, both A and B are iterated. When iterating over j, only A is iterated. When iterating over k only B is iterated.

 

Review:

  • Taco paper: on only dense/hashed

  • Taco code: but note they implement this in runtime, rather than at compile time

    • And we need to re-write things in compile-time

View Blog Post

Week 1 - Coiteration and exposing level properties PR

Adam2392
Published: 06/02/2023

This week, I’ve focused on finishing the PR to add co-iteration of non-ordered levels as long as they are part of a conjunction with ordered levels. That involves some checking of the packed parameter `levels` and the function `f`.

For instance: F(a, b, c) = a & (b | c), that is F is a function that takes in boolean values representing a, b, c and then does some boolean expression on them. During initialization of the Coiterate, we would be able to know what levels are ordered during compile-time since these are properties of the levels. So say a and b are ordered, then we would just need to test "F(a, b, c)", which we would run the function F(false, false, true), which evaluates to false.

Say c is unordered, then check F(false, false, true) so we would check F(false, false, false), so for all combinations of unordered levels, check True/False for those.

For storing what levels are formatted: have a constexpr function with tuple of levels input that spits out a tuple of true/false indicating ordered/unordered elements in levels.

For dereferencing: Have to modify the algorithm to only return the PKs for the ordered ones and then locate into all unordered levels.

Along the way, I also started adding Doxygen style C++ docstrings to the relevant LOC I’m altering. To build locally, I needed to:

  1. Install bison and link it:
    brew install bison
    brew upgrade bison
    brew link bison –force
     

  2. Install doxygen following the instructions here: https://www.doxygen.nl/download.html
     

  3. Build documentation site locally:
    cmake -S documentation -B build/doc
    cmake --build build/doc --target GenerateDocs
    # view the docs

open build/doc/doxygen/html/index.html

This allows me to check the validity of my docstrings locally, rather than pushing to the PR branch and letting CI take awhile to do so.

Another PR that arose was adding the ability for instances of level formats to access their level properties and querying whether or not the level is ordered, compact, unique, branchless and full. https://github.com/hameerabbasi/xsparse/pull/22 This is close to being merged and is waiting on review.


 

View Blog Post

Community Bonding and First Week

Adam2392
Published: 05/26/2023

Summary and Notes:

The first week, I took some notes on the end-goal of the XSparse package. At a higher level, XSparse will eventually integrate with a Python API in a lazy fashion. Similar to dask, we can envision an API such as:

# this is just some tensor operation defined in memory, but X holds the function signature, rather than the values of X

X = A + B @ Y / C - D @ E

Where each element in the above equation corresponds to a tensor that is properly shaped. We see that this is a composition of tensor operations and may be stored in a variety of different formats, so we only “evaluate” the expression when a user called “compute(X)”.

# this would actually compute the values of X and store in memory

X = sparse.compute(X)

This would be enabled by a runtime compilation e.g. using cppyy, which is a runtime package that automatically generates Python bindings from C++. Thus, we need efficient compiler-time C++ code that will generate efficient code for operating on any combinations of levels and tensor operations. Thus, the focus of this project is to implement such abstractions.

The core part of this project is the idea of the merge lattice, which is essentially coiterating over multiple level formats. Merge lattices is a higher-level construct that begins placing meaning to the levels that are co-iterated. For example, given a tuple of tensors that are joined by different operators (e.g. +, *, /, -), we will have a single tensor output usually. Each input dimension of each tensor will get mapped to an output dimension that we care about. 

For example, say we have the following operation written in Einstein notation: 

output_ij = input1_ik input2_kj

=> figure out what loops to write in tensor notation

= take Einstein summation as input

-> generate code base off of that

Work:

I’ve continued work on my PR https://github.com/hameerabbasi/xsparse/pull/19, where I want to add co-iteration ability for levels with unordered format combined with ordered formats in a conjunctive merge (AND operation; analogous to tensor multiplication). The idea is that 0’s will not contribute to the result, so we only need to iterate over the ordered formats, and locate into the unordered formats. Once the iteration over the ordered formats is complete, then we are good because even if there are non-zeros left in the unordered formats, they are irrelevant.

 

Clarifying Questions to Ask Mentors:

  1. If we only get unordered formats, is this supported? I would say… no?

  2. If we get an arbitrary combination of ordered and unordered formats, that is fine as long as the unordered formats all have “locate” function right?

 

Work to do:

  1. Continue reviewing the phd thesis of Fredrik Berg Kjølstad http://tensor-compiler.org/files/kjolstad-phd-thesis-taco-compiler.pdf

  2. Sketch out ideas for how to add co-iteration with a subset of level formats being of the form “locate”

    1. The current co-iteration codebase co-iterates over all passed in levels, which is a special case of what we want.

    2. We need to store the indices of the levels that are ordered.

    3. We then want to co-iterate over the ordered level formats and at each (index, pointer), we want to locate into the unordered formats:

      1. If any of the unordered formats returns NULLOPT (i.e. a value of 0), then note the result will not contain anything at this index. But continue iterating

      2. Once all ordered formats reach the end, then we have reached the end of the co-iteration.


 

View Blog Post