This week, I continued working on Principal Component Analysis. To cope with LiberTEM's architecture, I separated the algorithm into two parts, where the first part concerns the local computation of processing individual frames and the second part concerns the global computation of merging the outputs from the first part. For the first part, I used the algorithm from "Candid Covariance-Free Incremental PCA" paper. For the second part, I used the algorithm from a ph.d thesis that introduces efficient merging of SVDs. To test the result, I was trying to use frobenius norm error to measure the similarity between two eigenspace/eigenvector matrices, one that was approximated by the algorithms that I used and the other that was computed using full-batch PCA (i.e., exact solution). One trouble I had was how to set the reasonable error bound. In other words, what is a reasonable frobenius norm error bound to say that the current method "well-approximates" the true solution? I opened up an issue to document the researches that I have done related to PCA and NNMF, as well as my current progress on the subjects. While working on this issue, I also tried working a bit on documentation for UDF and submitted a PR.
What did I do this week?
I worked on documentation for UDF interface and continued working on researching/implementing Principal Component Analysis
Did I get stuck anywhere?
I implemented PCA and it ran fine on a small toy data, but had two major problems. First, it gave me a memory error on a real data, implying that the matrix that I'm using is probably too big. Also, I have yet to formulate testing scheme for the algorithm
What's coming up next week?
I will continue to work on PCA. Also, I will research into NNMF and if it can be more easily implemented in LiberTEM