Week #12
js94
Published: 08/24/2019
I unsuccessfully tried to resolve the issue with dimensions in NMF. The main problem that I was facing was with specifying the correct dimensions for matrix operations within NMF algorithm and I have yet to identify the correct dimensions. Furthermore, there is no incremental algorithms for NMF as in PCA. Thus, if a large data is given, it becomes infeasible to perform NMF. There are several heuristics to go around this issue with scalability, namely parallel NMF, which divides columns (i.e. features) of the data matrix into several subsets, perform NMF independently on each of the column subsets, and then join the results at the end. Unfortunately, this algorithm was not feasible for LiberTEM because with LiberTEM, one only has access to the rows of the data matrix (i.e., images) and not to the full columns (i.e., feature vectors) at each step. I also tried to clean up jupyter notebook and reorganize.
What did I do this week?
Work on NMF
What will I work on next week?
Write documentation
View Blog Post
Week #11
js94
Published: 08/12/2019
Again, I mostly focused on designing edge cases and here ere are some of the ways that I attempted for the edge cases of PCA. First of all, I tried to see if the component matrices returned by standard PCA and the implemented PCA are approximately the same. After adjusting for the signs (in PCA, signs are non deterministic unless some other measures are implemented), I checked that the component matrix returned by both algorithms are almost identical, which adds more credibility to the implemented PCA. I also checked the performance of implemented PCA by changing the number of partitions and by checking the performance on synthetic data matrix, which was generated from collinear vectors. For both methods, the performance of implemented PCA was on par with the standard PCA. Then I tried different methods related to hyperbox method by designing special cases for loading and component matrices. So far, no noticeable differences were present and for the coming week, I need to brush up the conceptual confusion that I'm having on this issue.
What did I do this week?
Design test cases to differentiate between standard PCA and implemented PCA
What will I work on next week?
Further work on testing framework for PCA. Continue implementing NNMF
View Blog Post
Week #10
js94
Published: 08/04/2019
I continued working on developing test cases. I tried to write up an overview for testing frameworks since such a framework could potentially be useful for other methods beyond PCA. I still haven't found a test where the implemented PCA falls short of the standard PCA, which is good in the sense that the performance is on par with the standard PCA but potentially harmful since we don't know its vulnerabilities. Meanwhile, I'm in the process of implementing NMF. I'm currently am trying to use the hyperbox method that I used in PCA. Unlike PCA, however, it is generally not possible to apply NMF in an incremental manner as we did in PCA so I'm reading up papers to resolve this issue.
What did I do this week?
Design test cases for PCA. Partially implemented code for NNMF.
What will I work on next week?
Testing framework for PCA and other methods. Continue implementing NNMF
View Blog Post
Week #9
js94
Published: 08/04/2019
I explored different ways in which PCA can be tested. More specifically, I tried to find some testing schemes under which the PCA I developed fails while the standard PCA method works. This practice is to help me understand the potential limitations of the PCA method. Unfortunately, I have yet to find a case where the PCA method fails. So far, it appears that the standard PCA and the implemented PCA performs more or less on the same level. Currently, I'm trying to exploit the fact that "hyperbox" method was used by drawing the loading matrix from different heavy tail distributions.
What did I do this week?
Design test cases for PCA
What will I work on next week?
Devising test cases for PCA. Implement NNMF
View Blog Post
Week #8
js94
Published: 07/22/2019
For the most part, I spent time in reading about parallel Non-negative Matrix Factorization as this would most likely be the next course of action to take. Unlike PCA which had many scalable, parallel implementations available, NMF seems to be less well-studied and I could not find an implementation that would be suitable for reference. At the same time, I fixed some things and the performance time increased back to ~15 minutes. I also pushed a prototype jupyter notebook that replicates what the current PCA implementation in LiberTEM does, just with standard python libraries (e.g., sklearn, fbpca), and obtained satisfactory performance in reconstruction error. As per my mentor's suggestion, I will be looking into testing edge cases and if the current PCA works under these edge cases.
What did I do this week?
Improved overall PCA performance
What will I work on next week?
Devising edge test cases for PCA. Study Parallel NMF in depth
View Blog Post