Final work submission and future work
yashlamba
Published: 08/26/2019
As the final week ended, we had to submit a compilation of our work during GSoC. Below are some insights:
What was the original aim?
Adding new machine learning models to DFFML, the proposed models are given below:
- Model 1: Ordinary Least Square Regression (OLSR)
- Model 2: Logistic Regression
- Model 3: k-Nearest Neighbour (kNN)
- Model 4: Naive Bayes
Decided modifications during community bonding:
During the community bonding period, the proposed work was modified to achieve optimized result from the summer. The finalized work was:
- Adding Linear Regression Model from scratch
- Adding Linear Regression and other proposed models using scikit-learn
- Adding tests for the added models
- Documenting the models
Tasks Completed:
-
Added Linear Regression model from scratch with tests
Simple Linear Regression model implemented from scratch. This was successfully completed with tests and documentation, and was also releasd on PyPI.
-
Added scikit models with dynamic support Initially, it was planned to add certain number of models from scikit but as I did it with one model (Multiple Linear Regression with scikit), we decided to extend this and make a base for all scikit models and make other model classes dynamic. This was successful and now adding scikit models to DFFML is as easy as appending the model name to a python dictionary. The tests are complete and the documentation material is ready but we are still figuring out a more understandable way of documenting this before release.
Future Work:
The project was started just before GSoC'19 and it has come a long way since. I plan on contributing significantly to the project after GSoC'19. Few of the planned stuff:
- Adding more scikit models
- Working on more machine learning libraries and add models
- Contruct DFFML Web UI from scratch which was conceptualized during summer and much more.
More detailed report: https://gist.github.com/yashlamba/5e0845a6cd5a1198f166ddedfba78802
View Blog Post
Final week check-in
yashlamba
Published: 08/22/2019
Targets for this week:
Almost all my targets and extended goals were achieved before 19th August, this week I have been working on documentation for scikit models in dffml. We want to make the packages docs as clear as possible before release.
About the final evaluation:
I haven't yet submitted the final evaluation at the time writing this blog. I am waiting for documentation to get finished some how in the next couple days so that I can add it too in work chart. I think I have done my work well and I have bonded with project a lot that I think I will continue contributing to this project for a long time.
Challenges:
The challenge right now is thinking of the best way to document scikit model that we are brute forcing over discussions recently. I'll post couple more final blogs soon regarding contributing to larger projects as a beginner and probably one more about how to document your code.
View Blog Post
Final Coding Week - Covering whatever is left
yashlamba
Published: 08/14/2019
Completed Work:
Majority of the stuff I had proposed is complete now. I have successfully added the following models to dffml:
- Linear Regression from Scratch (Released with complete documentation)
- Scikit Models (Merged and waiting for documentation):
- https://github.com/intel/dffml/blob/3d04591dc664fcde1b9a95650b40fb76b6569abf/model/scikit/dffml_model_scikit/scikit_models.py#L46-L87
Targets for this week:
Complete the documentation for scikit models and fix a few issues.
Challenges:
I anticipate quite a few challenges this week. As the model class creation for scikit is dynamic, the documentation in particular is going to be a tricky task. Making it understandable and readable for new users is a priority. I'll try to fill it examples for better understanding.
View Blog Post
Contributing to DFFML, A guide to new contributors
yashlamba
Published: 08/06/2019
I wasn't myself a python expert or a machine learning when I started, all you need to have is some patience before contributing to any open source project. (I'll use dffml as a reference)
Browse which project interests you first, this is the most important. You should understand why you want to contribute, is it something you have been using, does it have something you want to learn to do, is it a project assigned to you or something like this. Then first go about the README, read how to setup the project, go through the guidelines and set it up locally. This can be difficult, if you face some problems, ask on the channel probably on irc, slack, gitter, whatever the organisation uses without hesitating. Open source is open, so ask without fear.
Once setup, you should go through the issues. If you are a beginner, there might be a label 'good first issue' or find something you can fix in the docs. Fix it according to the guidelines and open a pull request. It might be long that you have to wait for a review, be patient. Make changes if requested and boom you have made a contribution.
This was a very beginner guide and I'll make sure to make an advanced contribution guide.
View Blog Post
Final Sprint to finish scikit and make it even easier to add models
yashlamba
Published: 08/01/2019
So far I have successfully added 2 models to dffml out which dffml-model-scratch (Linear regression) is already released and available on pypi. Other scikit model has been merged and is awaiting release.
Doing the scikit made us realize that we can do it in better and a much faster way. We have now decided a different procedure to go about adding scikit classifiers such that we have minimum repeated code and adding classifiers is as simple as appending it to a dictionary.
We have just planned this and I can't really assess how much time it would take, but I hope to complete it before the final evaluation. The models I would be adding can be found here: https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
Link to my first authored pypi package: https://pypi.org/project/dffml-model-scratch/
View Blog Post