Week #1 Blog post (in detail)

Published: 06/04/2019

Most of my time was spent on working on coming up with good implementations for PCA on LiberTEM. To fully utilize LiberTEM's architectural benefits, I needed to make sure that the algorithm can be 1) computed in parallel and 2) two disjoint computation results for PCA can be merged into a single PCA result. I have been exploring Incremental PCA algorith, which allows adding a batch of data to computed PCA results, and distributed PCA, which computes disjoint set of data in parallel and merges them by only keeping the first n components. Most of such parallel algorithms are approximation of the true PCA vectors at best by design, so I need to come up with sound way of testing these functions. At the same time, the dataset that LiberTEM is dealing with is 4D matrix (or 3D, depending on the application), where the first two represents the scanned position and the last two represents the diffraction pattern image data. Thus, I had to come up with some way of computing PCA over this dataset. One way is to 'flatten' each diffraction pattern into 1d vector, and stack them up so that each row would  constitute a single 'observation.' Currently, I'm trying to preprocessing the data using radial binning so that PCA can be applied on a smaller number of variables.