Most of my time was spent on working on coming up with good implementations for PCA on LiberTEM. To fully utilize LiberTEM's architectural benefits, I needed to make sure that the algorithm can be 1) computed in parallel and 2) two disjoint computation results for PCA can be merged into a single PCA result. I have been exploring Incremental PCA algorith, which allows adding a batch of data to computed PCA results, and distributed PCA, which computes disjoint set of data in parallel and merges them by only keeping the first n components. Most of such parallel algorithms are approximation of the true PCA vectors at best by design, so I need to come up with sound way of testing these functions. At the same time, the dataset that LiberTEM is dealing with is 4D matrix (or 3D, depending on the application), where the first two represents the scanned position and the last two represents the diffraction pattern image data. Thus, I had to come up with some way of computing PCA over this dataset. One way is to 'flatten' each diffraction pattern into 1d vector, and stack them up so that each row would constitute a single 'observation.' Currently, I'm trying to preprocessing the data using radial binning so that PCA can be applied on a smaller number of variables.
What did I do this week?
In continuing what I have done during the community bonding period, I was researching more about online algorithms for dimensionality reduction techniques. I've implemented the skeleton code for online Principal Component Analysis (PCA) using incremental algorithm and submitted as PR
What is coming up next?
I'm currently trying to construct good test cases for PCA code. On the side, I have been writing code for parallel Non-negative matrix factorization (NNMF). I plan to submit PR for this soon as well.
Did you get stuck anywhere?
At the moment, I'm very much stuck at constructing test cases. Since most online algorithms are approximations at best, some with nondeterministic output, it is quite hard to come up with a sound test case.
Hi all. For the summer, I will be working on implementing various dimensionality reduction techniques on LiberTEM. During the community bonding period, I've been trying to understand the codebase better as well as researching on the existing methods for distributed implementation of dimensionality reduction techniques, namely PCA, ICA and UMAP. As I enter into the official coding period, the first part of the project will be somewhat of a combination of implementing the interface for User-Defined Functions in LiberTEM as well as distributed implementation of PCA. More about this in the upcoming blog.