# July 17 - Jul 23, 2023: Week 8

Published: 07/22/2023

`Tensor` would be a tuple of levels and the last level will point into the data vector

• “one Tensor” per tensor involved in the tensor operation. E.g. A+ B= C has A, B and C as a `Tensor`.

• TODO: add some validations and stuff, but initially we can have it as just a lightweight template class to hold a “bunch of levels”.

Coiterate questions:

1. My recursion to check the template is not working properly.

Tensor Questions:

1. What operations should a tensor have?

1. Should it have insert value at indices?

2. No, because it should be defined during compile-time.

2. What checks should it do during construction?

3. How is the template datatype used?

MergeLattice Questions:

1. What operations should a mergeLattice do?

1. How does a mergeLattice work in the context of say (A_ij + B_ij) @ D_i = C_i?

1. In this case, F := (A || B) && D would be our “function passed”

1. When iterating over i, F(A_i, B_i, D_i) gets the booleans based on if values are present

2. When iterating over j for a fixed i, F(A_ij, B_ij, true) gets the booleans based on if values in A_ij, B_ij are present

3. “j” is advanced if “i”

2. Vector of indices for A,B,D, C is (0, 1), (0, 1), (0), (0, 1)

3. Order of iteration = i, then j

4. E.g. merger = MergeLattice(pair((A, (0, 1)), (B, (0,1)), (D, (0)))

5. Since D_ij is not present, we would skip “j” in A and B when iterating

6. merger.merge_iterators()

2. E.g. of something not possible with the constraint on increasing indices in vectors: D_ij + A_ji

`MergeLattice` would take in a tuple of these input Tensors:

• E.g. The input Tensors would be (A, B)

• Tuple of vector of integers of the same length as the tuple of input tensors.

• Can check this during compile-time.

• For each tuple element, i.e. Tensor in `input_tensors`, and vector[int] in `input_indices`, we would check the number of levels associated with that `Tensor` and the size of the vector. They should match.

• We need a tuple of tuple of integers at compile time, rather than a tuple of vectors.

• Or constexpr std::vector

• This can be compile-time check ideally if so.

• For each input indices, they should be strictly increasing (for now)

• We can relax this restriction once workspaces are implemented

•
• We also want the function F

• Note: the coiterator already knows the current start/end when we pass in valid Ik/Pk from the previous levels. I.e. it infers the start/end iterators for the current level.

template <class F, typename… Levels>

class MergeLattice(std::tuple[Levels]& input_levels, int output_dims, std::tuple[std::vector[int]] input_indices)

{

// constexpr check that no input_indices are greater than output_dims

// constexpr check that input_indices lists each has same length as each vector of input levels

// second check: each std::vector[int] should be the size of the input level dimension

// Notes: see above

}

public:

{

inline constexpr auto get_merge_points() const noexcept

{

// return a vector of mergepoints

}

inline constexpr auto merge_iterators() const noexcept

{

// k-way/two-way merge algorithm(?) that is, we merge an iterator into 1

// uses a coiterator over a subset of the iterators that can be co-iterated over

// then iterates over them and assigns them into a new level/iterator(?) e.g. dense?

}

}

If that level exists in upper level, then we would go below and recurse and fix the upper level to true.

If we are doing an outersum. For example:

A_i + B_j = C_ij

Function `F` would be `A || B`.

For A, we would put in `true` for the function whenever A value exists and then loop over B, and when A value doesn’t exist, then put in `false` for the function and loop over B as well.

If `F` was `A && B`, then this would not loop over B. It would be like giving an empty A level fixing A values always to missing for all values of B.

Iterate over each level, find out if it has a value in the upper level. If there is no upper-level, then just check if there's a value now in this level corresponding to `true` as the initial value per input.

Coiterate over everything that exists for that dimension and then for other values put in the false/true that we get.

View Blog Post

# July 10 - Jul 16, 2023: Week 7

Published: 07/12/2023

This week, I continued work on implementing the compile-time check for valid co-iteration.

The algorithm proceeds by calling two functions during instantiation of the Coiterate class.

1. Are all levels ordered, or has the locate function?

1. If this does not return true, then error out

2. Next, check a more complicated compile-time check:

1. Where we use template recursion up to a depth of 1024 to check all possible combinations of unordered levels with “true/false” input to `m_comparisonHelper`

1. We are template recursing over levels. E.g. for each unordered level, we can run 2 static_asserts m_comparisonHelper(..., true,...) and m_comparisonHelper(..., false, …)

2. If the level we are checking is ordered, just static_assert false in the correct place.

3. Note: most likely we will never really hit the 1024 depth limit in practice because the higher-level API will most likely consolidate the tensor operations with temporary workspaces in the middle.

2. E.g.

m_levelsTuple = (A, B, C, D)

And

A, and D are unordered with locate() function defined and B and C are ordered.

We want to check that the output is always false:

1. m_comparisonHelper(false, false, false, false)

2. m_comparisonHelper(true, false, false, true)

3. m_comparisonHelper(false, false, false, true)

4. m_comparisonHelper(true, false, false, false)

I have C++ code that defines a class called Coiterate. Coiterate gets as input a tuple of levels, represented by `m_levelsTuple`. The goal is to perform a compile-time check that goes through all levels in `m_levelsTuple`. Each level must be ordered, or have the `locate` function defined.

To determine if a level is ordered, we can check each level’s `LevelProperties::is_ordered` namespace property. For example, `A::LevelProperties::is_ordered` would determine if `A` is ordered or not. To determine if a level has the `locate` function defined, we have the following `has_locate_v` function.

Merge Lattice:

[Summary] Each input to the merge lattice is a stack of levels (i.e. dimensions), and we need to determine are we actually recursing into this specific input at this dimension or not, and if so, call Coiterate… If not, what do we do?

If a level is broadcasted, then it would not be present. Then since that “level is not there”, we may need to construct helper classes to represent what that level should be. If the level above tells you that the current level is present/not, then you put true/false and fix it in the `m_comparisonHelper`.

Input indices to output indices:

1. Single uint_p index which is number of output dimensions
E.g. 2 in this example

2. For each input’s levels, we have a list of integers corresponding to
[0, 1] and [0, 1] (can use std::vector since it is constexpr as of C++20)

The lists should be the same length as the input they correspond to. Not necessarily the size of the output dimensions (e.g. 2 in this case).

A_ij + B_ij = C_ij

Another example:

A_i + B_j = C_ij

1. Output dimension = 2

2. Input dimension lists: [0] and [1]

TODO:

1. Start thinking about writing unit-tests first regarding merge lattices (keep in mind the code should be compile-time as much as possible)

2. Sketching out the API, possibly from taco:

2. Change string operations to index operations

3. Keep in mind the above input -> output index operations

3. Keep reviewing the first taco paper and Kjolstad’s thesis

View Blog Post

# July 3 - Jul 9, 2023: Week 6

Published: 07/06/2023

I discussed the following debuggers for C++:UBSan and Valgrind. And if I can get my Linux machine working, gdb. I started using LLDB on VSCode C++ and it was wonderful.

I was able to solve with the help of my mentors the compiler-time refactoring of the code. Each functionality within the coiterator now fully unwraps into a level check:

1. Computing the min_ik: we now ignore the unordered levels when we do this.

2. Advance_iter: we now ignore the unordered levels when we do this.

3. Compare: we now only compare the ordered levels and default to ‘true’ for unordered level comparisons.

The coiterator has the implicit assumption that only ordered levels are supported if there is a disjunctive merge. If there is a conjunctive merge, it must contain at least one ordered level and for all other unordered levels, they must have the `locate` function defined.

In order to make sure this is valid during compile-time, we want to implement a compile-time check during the instantiation of the coiterator. This was the initial goal of the PR, but since we uncovered a bunch of problems, we had to table it. Now that we have refactored the code into a more generalized state, we can revisit this compile-time check.

The way this is done is via fold expression most likely:

If:

* 1. the levels are all ordered (i.e. has the `is_ordered == True` property)

* 2. if any of the level are do not have the is_ordered property, it must have the locate

* function, else return False. Then do a check that `m_comparisonHelper` defines

* a conjunctive merge (i.e. AND operation).

* Otherwise, coiteration is not allowed.

The check with respect to `m_comparisonHelper` must be done for all unordered levels.

View Blog Post

# June 25 - Jul 2, 2023: Week 5

Published: 06/28/2023

I discussed the following debuggers for C++:UBSan and Valgrind. And if I can get my Linux machine working, gdb. I started using LLDB on VSCode C++ and it was wonderful.

Refactored the `get_PKs_level` function into two functions now where the `get_PKs_level` helps pass the fold expression and the corresponding iterator to the `get_PK_level` function, which can then use the iterator to perform compile-time if/else functions, such as dereferencing, or locating into the iterator.

Furthermore, I spent this week refactoring the C++ code specifically for calculating the “min_ik”, “advancing the iterators” and “getting the PKs”, which all follow a design that is more readable, yet still compile-time evaluated.

View Blog Post

# June 18 - Jun 25, 2023: Week 4

Published: 06/24/2023

This week, I’ve been continuing the PR https://github.com/hameerabbasi/xsparse/pull/25. I am currently running into a segfault when running the unit-tests of coiterating over a (dense, hashed) levels because the hashed level seems like it is trying to access a previous element after already iterating through it. This is an issue because `min_ik` is getting set to a previous index since hashed level iterates out of order.

My suspicion is that the `min_ik` design needs to be changed to account for co-iterating over unordered levels. I will discuss this with my mentors.

Since I wasn’t able to discuss merge lattices with my mentors last week, here are the same notes for this week’s discussion:

The MergeLattice at a high level should take in a tuple of levels that are “merge points” on the merge lattice. In addition, it should take in a index expression that dictates how the levels are merged.The index expression in taco uses a set of strings like `expr = C(i, j) = A(i,j) + B(i,j);`, whereas we would want to define an arbitrarily complex index expression… I still have to do some more reading to get an idea of how this part is implemented.

Internally, the MergeLattice given the index expression will be able to determine which levels are co-iterated. Moreover, it must construct the `F` function that is passed to `Coiterate`.

Overall, we would like the following higher-level function as well:

1. Construct a union (disjunction) over lattice points

2. Construct an intersection (conjunction) over lattice points

A merge lattice is constructed per index iterator in a tensor operation. For example, say we have:

Aij = (Bij + Cij) @ Dij

To set index i for A, we have to iterate over B_i, C_i and D_i.

To set index j for A, we have to iterate over B_*j, C_*j, and D_*j.

Each index constitutes a merge lattice that we need to construct to then call Coiteration. We want to extract the operators “+” and “*” to determine addition and multiplication, where addition is converted to a disjunction and multiplication is converted to a disjunction.

In the following example:

A_i = b_i + c_i d_i

Has a conjunction and disjunction. We proceed by:

1. Create leaf of the merge lattice from the tensor access rule(?

2. Create merge lattice for c_i d_i by computing conjunctive merge (ciΛ di) a_i = this if this lattice point is reached

3. Create merge lattice for b_i since there is no other conjunctive merge with it.

4. Create upper-most merge latticepoint for disjunctive merge (bi) v (ciΛ di)

1. So the merge lattice points starts with a whole expression that is a disjunction among conjunctions

2. Then it traverses through each lattice point, which trims down parts that are not necessary for co-iteration

Questions:

1. What is the general input for a merge lattice? What is calling it? Will we have to implement, for example, an iteration graph?

2. Do we implement the “LatticePoint”?

3. How do we expect the “tensor operations” to be represented?

View Blog Post