Week #11

js94
Published: 08/12/2019

 

Again, I mostly focused on designing edge cases and here ere are some of the ways that I attempted for the edge cases of PCA. First of all, I tried to see if the component matrices returned by standard PCA and the implemented PCA are approximately the same. After adjusting for the signs (in PCA, signs are non deterministic unless some other measures are implemented), I checked that the component matrix returned by both algorithms are almost identical, which adds more credibility to the implemented PCA. I also checked the performance of implemented PCA by changing the number of partitions and by checking the performance on synthetic data matrix, which was generated from collinear vectors. For both methods, the performance of implemented PCA was on par with the standard PCA. Then I tried different methods related to hyperbox method by designing special cases for loading and component matrices. So far, no noticeable differences were present and for the coming week, I need to brush up the conceptual confusion that I'm having on this issue.

What did I do this week?

Design test cases to differentiate between standard PCA and implemented PCA

What will I work on next week?

Further work on testing framework for PCA. Continue implementing NNMF

View Blog Post

Week #10

js94
Published: 08/04/2019

I continued working on developing test cases. I tried to write up an overview for testing frameworks since such a framework could potentially be useful for other methods beyond PCA. I still haven't found a test where the implemented PCA falls short of the standard PCA, which is good in the sense that the performance is on par with the standard PCA but potentially harmful since we don't know its vulnerabilities. Meanwhile, I'm in the process of implementing NMF. I'm currently am trying to use the hyperbox method that I used in PCA. Unlike PCA, however, it is generally not possible to apply NMF in an incremental manner as we did in PCA so I'm reading up papers to resolve this issue.

What did I do this week?

Design test cases for PCA. Partially implemented code for NNMF. 

What will I work on next week?

Testing framework for PCA and other methods. Continue implementing NNMF

View Blog Post

Week #9

js94
Published: 08/04/2019

I explored different ways in which PCA can be tested. More specifically, I tried to find some testing schemes under which the PCA I developed fails while the standard PCA method works. This practice is to help me understand the potential limitations of the PCA method. Unfortunately, I have yet to find a case where the PCA method fails. So far, it appears that the standard PCA and the implemented PCA performs more or less on the same level. Currently, I'm trying to exploit the fact that "hyperbox" method was used by drawing the loading matrix from different heavy tail distributions.

What did I do this week?

Design test cases for PCA

What will I work on next week?

Devising test cases for PCA. Implement NNMF

View Blog Post

Week #8

js94
Published: 07/22/2019

For the most part, I spent time in reading about parallel Non-negative Matrix Factorization as this would most likely be the next course of action to take. Unlike PCA which had many scalable, parallel implementations available, NMF seems to be less well-studied and I could not find an implementation that would be suitable for reference. At the same time, I fixed some things and the performance time increased back to ~15 minutes. I also pushed a prototype jupyter notebook that replicates what the current PCA implementation in LiberTEM does, just with standard python libraries (e.g., sklearn, fbpca), and obtained satisfactory performance in reconstruction error. As per my mentor's suggestion, I will be looking into testing edge cases and if the current PCA works under these edge cases.

What did I do this week?

Improved overall PCA performance

What will I work on next week?

Devising edge test cases for PCA. Study Parallel NMF in depth

View Blog Post

Week #7

js94
Published: 07/15/2019

I managed to reduce the performance time to ~100 sec from 26 minutes by fetching a bigger chunk of data at a time than was fetched before. Furthermore, I successfully implemented hyperbox for loading matrix, which essentially is a method of subsampling from loading matrix so that the data is well-represented with less sample. Although more loss was induced by this, I'm inclined to conclude that the cost is manageable and the benefit in terms of speeding up the performance outweighs the cost. In fact, since I'm applying approximation PCA method, some degree of loss for this PCA is expected.

What did I do this week?

Improved overall PCA performance

Did I get stuck anywhere?

I was getting errors because I miscalculated the dimension needed for reconstruction

What will I work on next week?

Seems like PCA has almost come to an end (?). Once I clean up the code, produce some example notebooks for reference, with mentor's approval, I will probably begin working on non-negative matrix factorization, if not this week, next week.

View Blog Post