Articles on sudharsana-kjl's Bloghttps://blogs.python-gsoc.orgUpdates on different articles published on sudharsana-kjl's BlogenMon, 26 Aug 2019 14:48:10 +0000Final Weekly Check-inhttps://blogs.python-gsoc.org/en/sudharsana-kjls-blog/final-weekly-check-in/<p>In the final week of coding, I was refining the hadoop source PR.</p> <h2 class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-409 cms-render-model cms-plugin-aldryn_newsblog-article-lead_in-462">What did I do this week?</h2> <p>The dockerfile is finally working now. We are able to set up hadoop using dockerfile. Also the connection set up is well established in the application.</p> <h2 class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-409 cms-render-model cms-plugin-aldryn_newsblog-article-lead_in-462">What is coming up next?</h2> <p>There are few bug fixes to be done. Also Hadoop feature and MySQL feature are going to be packaged and uploaded in PyPi similar to the models. I will be working on that as well.</p> <h2 class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-409 cms-render-model cms-plugin-aldryn_newsblog-article-lead_in-462">Did you get stuck anywhere?</h2> <p>Fixing the hadoop source connection and writing a data into HDFS stream was an issue. My mentor and I had another meeting this week and we fixed it.</p>kjlsudharsana@gmail.com (sudharsana-kjl)Mon, 26 Aug 2019 14:48:10 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/final-weekly-check-in/Blog #6https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/blog-6/<p>In the past week, my mentor and I tried to fix the dockerfile that sets up hadoop in a ubuntu container from scratch. Since that was becoming tidious, we tried setting up a mini hadoop cluster.</p> <p>Apache has this mini <a href="https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CLIMiniCluster.html">mini hadoop cluster </a>set up that gives a single node cluster. I tried building this using a maven docker image. The documentation has very little information on where hadoop is actually getting downloaded and the ports it'll be connecting to by default. My mentor and I debugged the dockerfile and tried to get this up and running but still there is a problem with ports and I'm working on it. Also, we figured out how to get the files from hdfs which can be either CSV or JSON type of files. I have implemented those changes as well.</p> <p>Hopefully by next week I can finish this project.</p>kjlsudharsana@gmail.com (sudharsana-kjl)Thu, 22 Aug 2019 02:17:20 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/blog-6/Weekly Check-in #10https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-10/<p>In the pat week, I was trying to set up hadoop using Dockerfile.</p> <h2 class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-409 cms-render-model">What did I do this week?</h2> <p>Setting up Hadoop in Docker with my limited knowledge in both is becoming a more difficult task due to the limited resources available on how to particularly set this up over docker using dockerfile. Also everytime I have to build the container from scratch, downloading all the files again and setting it up is a time consuming process. I tried an approach this week that got most of the instructions on the dockerfile working, yet there is an issue with starting the containers. I have addded the corresponding config files that would be used by docker and also a start-up shell script that is run while building the container to start hadoop after installing it. </p> <h2 class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-409 cms-render-model">What is coming up next?</h2> <p>I need to get the dockerfileworking by this week so that i can move ahead and refine the hadoop source classes and add more tests if possible.</p> <h2 class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-409 cms-render-model">Did you get stuck anywhere?</h2> <p>Debugging the dockerfile was a difficult task for me. My mentor was very understanding and helped me in fixing it.</p>kjlsudharsana@gmail.com (sudharsana-kjl)Thu, 22 Aug 2019 01:25:51 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-10/Blog #5https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/blog-5/<p>In this week I was trying my hands on in setting up hadoop in docker.</p> <p>The next phase of our project involves making it compatible with input from a hadoop data source. With my limited knowledge in hadoop and docker, I was trying to set it up. First I set it up in my local computer and made it work. I had written the basic classes that will be needed to establish a connection and successfully set up a connection.</p> <p>I also added config() and args() method that can be used to fetch the arguments and its corresponding values specific to hadoop source. In hadoop, the challenging part is to handle the files from the HDFS. These files can be either CSV or JSON files. So i have to discuss with my mentor about how I can handle this.</p>kjlsudharsana@gmail.com (sudharsana-kjl)Thu, 22 Aug 2019 01:06:05 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/blog-5/Weekly Check-in #9https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-9/<p>In the past week, I was working on making certain changes in my previous PRs and make it ready for the upcoming release.</p> <h2>What did I do this week?</h2> <p>For MySQL source, I wa trying to set up the travis build. This was my first time working with travis CI for this project and it was a good learning experience. Label for CSV Source merge test was done by my mentor, there are certain tests failing now, I was trying to fix that as well. I tried setting up hadoop in docker. Initially my approach was to build a hadoop container and push it to the docker hub so that anyone can fetch from that. After discussing with my mentor, I'm going to set up a Dockerfile so that people can see the commands that are being run and it'll be easier to set up as well.</p> <h2>What is coming up next?</h2> <p>I'll try to set up the Hadoop connection in a week. I have to speed up things a bit so that I can focus on other things to be done once the connection is set up.</p> <h2>Did you get stuck anywhere?</h2> <p>I did get stuck on setting up hadoop. After discussing with my mentor, I have a clear idea on how to approach now.</p>kjlsudharsana@gmail.com (sudharsana-kjl)Tue, 06 Aug 2019 17:03:38 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-9/Weekly Check-in #8https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-8-1/<p>In the past week, I was working on setting up Hadoop and trying to import data from it. I got my PRs reviewed by my mentor and working on the changes he suggested.</p> <h2>What did I do this week?</h2> <p>I had initially set up Hadoop in my Ubuntu system. But setting this up would be difficult in Travis CI. So I was exploring other options. The easy way to do this is through docker but there is no official Hadoop distribution in docker. I was checking out cloudera's quick start VM but when I was trying to set this up my laptop started to hang. I will continue to look into other options. Also my mentor had reviewed my HDFS source PR and guided me on how to proceed further. </p> <h2>What is coming up next?</h2> <p>I'll have to work on the docker set up for Hdfs source. I'll probably have to write a script or a docker-compose script. MySQL PR had an issue while my mentor was adding a merge test. Will work on that as well. We'll be preparing for a release soon.</p> <h2>Did you get stuck anywhere?</h2> <p>I struggled a bit with the Hadoop set up. My mentor gave me some input on this and hopefully I'll be able to create a docker set up by next week.</p> <p> </p>kjlsudharsana@gmail.com (sudharsana-kjl)Thu, 01 Aug 2019 04:20:27 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-8-1/Blog #4https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/blog-4/<div class="lead"> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-270 cms-render-model">In the past week, I was working on the second phase of the project. My mentor had reviewed the PR and suggested changes.</p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-270 cms-render-model">For phase 2 of my project, I set up a mysql docker container. The source from the mysqldatabases can be sent to the models so that they can be trained and the trained data can be again dumped into the database. My mentor had suggested changes and make it more optimised and cover all the edge cases. One such change was adding an argument to get the order in which the columns are arranged so that we maintain the same while getting queries from the user. Also I had to make sure that mysql injection should not be possile. For more security, I also added an additional SSL parameter while making the connection to the database using aiomysql. For testing purpose, I added a self-signed certificate. We need to improve more on this so that we can ask the user to input the path to the certificate.</p> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-270 cms-render-model">Once mysql source is done, I'll be exploring Hadoop source. Looking forward to it! </p> </div>kjlsudharsana@gmail.com (sudharsana-kjl)Mon, 22 Jul 2019 07:19:14 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/blog-4/Weekly Check-in #7https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-7-1/<p>In this week, I made some progress on the second phase of the project. With the database established, I made a PR.</p> <h2 class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-269 cms-render-model cms-plugin-aldryn_newsblog-article-lead_in-271">What did I do this week?</h2> <p>To know that I'm approaching in the right direction, I made a work in progress PR and got it reviewed by my mentor. He had suggested some changes and I've been working on it.</p> <h2 class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-269 cms-render-model cms-plugin-aldryn_newsblog-article-lead_in-271">What is coming up next?</h2> <p>I still have to refine those changes and get it reviewed again. Once that is done, I can move on to a NoSQL Source.</p> <h2 class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-269 cms-render-model cms-plugin-aldryn_newsblog-article-lead_in-271">Did you get stuck anywhere?</h2> <p>It was initially difficult to understand how the application interacts with the database and also how i should define my database. With help from my mentor, I was able to clarify it.</p>kjlsudharsana@gmail.com (sudharsana-kjl)Tue, 16 Jul 2019 02:10:49 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-7-1/Weekly Check-in #6https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-6/<p>In this week, I started working on the second phase of the project.</p> <h2 class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-269 cms-render-model">What did I do this week?</h2> <p>In my project, there was an example demo app that had the necessary connections to be made and interaction with the database, I went though the code base and add the required config classes for setting up mysql source. I'm yet to refine it and make it more efficient and work towards testing this feature.</p> <h2 class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-269 cms-render-model">What is coming up next?</h2> <p>I'll work on adding test cases to this feature. Once my mentor reviews it, I'll work on the changes requested, if any.</p> <h2 class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-269 cms-render-model">Did you get stuck anywhere?</h2> <p>This week has been pretty smooth.</p>kjlsudharsana@gmail.com (sudharsana-kjl)Tue, 09 Jul 2019 14:30:26 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-6/Blog #3https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/blog-3/<p>In the past week, I have started working on phase 2 of my project and wrapping up phase 1.</p> <p>For phase 2 of my project, after having a discussion with my mentor, I learnt docker and mysql and set up a mysql docker container. Once that was done i was going through the example in the code base which had a demoapp that interacts with MariaDb similar to mysql. I tried understanding the code and planned on how to approach the next phase.</p> <p>For my previous phase, implementing Label for CSVSource caused a problem in merging. So my mentor and I discussed it and i tuned the logic a bit. Once the test_merge is fixed, I'll have to implement the same for CSVSource again.</p> <p>This week also had the first evaluation. I have passed it and received good feedback from my mentor. I'll work towards it and hoping to add more test cases in future!</p>kjlsudharsana@gmail.com (sudharsana-kjl)Tue, 09 Jul 2019 14:22:30 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/blog-3/Weekly Check-in #5https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-5-2/<p>In the fifth week of coding, I spent time learning the requirements for the next phase of my project. Also worked on modifying the last part of phase 1 by tweaking it a bit.</p> <h2>What did I do this week?</h2> <p>The second part of my project involves docker and mysql and i'll be trying to set up a connection to mysql from our application to import data from a db and train it and dump it in db. In this week, I learnt docker and setting up mysql container and tried to run one of the examples specified in the project. Also worked on modifying the code in my previous PR according to the logic as discussed with the mentor.</p> <p> </p> <h2>What is coming up next?</h2> <p>I'll start working on the phase 2 of my proposal where i'll be trying to first set up a connection to db. I'll be adding a mysql config related classes and try to implement importing data from a mysql database. I'm also working on adding extra test cases to my previous PR.</p> <p> </p> <h2>Did you get stuck anywhere?</h2> <p>I was quite confused about the logic in my previous PR. After discussing with my mentor, it became clear and I made the corresponding changes.</p>kjlsudharsana@gmail.com (sudharsana-kjl)Tue, 09 Jul 2019 14:07:03 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-5-2/Weekly Check-in #4https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-4/In the fourth week of coding, I worked on fixing the changes requested by my mentor in my previous PR and also on implementing label feature for the next type of source. What did I do this week? In this week, my previous PR was reviewed and my mentor had requested changes. I worked on a different approach to solve that as suggested by my mentor and gave a PR and got it reviewed.After that, I started working on implementing labels for another type of FileSource namely CSVSource. I have submitted a PR for this and I'm waiting for it to get reviewed. What is coming up next? Once my PR gets merged, I will start working on the phase two of my project which involves experimenting with different databases for our application and improving the storage. Did you get stuck anywhere? The task in hand was well explained because of which I was able to finish it smoothly.kjlsudharsana@gmail.com (sudharsana-kjl)Mon, 24 Jun 2019 09:45:25 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-4/Blog #2https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/blog-2/This post describes in detail about what I've been working on in the past few days. In my previous blog posts and weekly check-ins, I have written about how i've been contributing to the project. This week saw good improvement. As per my schedule, I'm a bit ahead. I've already implemented label feature for JSONSource that was successfully merged and closed. I was working on solving the issue #48. After that I was working on implementing the label feature for CSVSource as well. I have successfully given a PR and I'm hoping that it gets reviewed and merged by this week. I have also added test cases for this feature. Looking forward to wrap up with this soon so that i can start working on the next phase of my project!kjlsudharsana@gmail.com (sudharsana-kjl)Mon, 24 Jun 2019 09:35:11 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/blog-2/Weekly Check-in #3https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-3-2/<p>In the third week of coding, I worked on fixing the PR and moved on to work in the next type of source. </p> <h3>What did I do this week?</h3> <p>In this week, my PR was successfully merged and closed. Once that was done, u started working on implementing labels for another type of FileSource namely CSVSource. Before I could directly start working on the feature, I had to fix an issue which was related to it. The issue can be found <a href="https://github.com/intel/dffml/issues/48">here</a>. I submitted a PR to fix this issue and that can be found <a href="https://github.com/intel/dffml/pull/99">here</a>. The PR was successfully merged and closed. </p> <h3>What is coming up next?</h3> <p>My mentor decided to go in a different approach to solve the issue that I had been working on since the current approach required FileSource to be changed directly. I'm working on it.</p> <h3>Did you get stuck anywhere?</h3> <p>This week was quite smooth. Now that I have a good understanding of the code base and what has to be done to fix the issue, I didn't face any blocks as such.</p>kjlsudharsana@gmail.com (sudharsana-kjl)Wed, 19 Jun 2019 08:02:48 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-3-2/Weekly Check-in #2https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-2/<p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-70 cms-render-model">In the second week of coding, my mentor reviewed my PR and helped in debugging the code. We were having problems in passing the test cases and it required changes outside the expected part of the code. More details about the PR can be found <a href="https://github.com/intel/dffml/pull/80">here</a>.</p> <h2 class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-25 cms-render-model cms-plugin-aldryn_newsblog-article-lead_in-70"><span style="font-family: Verdana,Geneva,sans-serif;">What did I do this week?</span></h2> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-70 cms-render-model">The entire week was spent in debugging the code. In between we tried a different approach to the problem. We also discussed this during the weekly catch up call and finally, after a lot of effort we debugged them.</p> <h2 class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-70 cms-render-model">What is coming up next?</h2> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-70 cms-render-model">I'll be adding more tests to open and update data simultaneously as this would be required.</p> <h2 class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-25 cms-render-model cms-plugin-aldryn_newsblog-article-lead_in-70">Did you get stuck anywhere?</h2> <p class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-70 cms-render-model">Thankfully I didnt get stuck anywhere this week. Thanks to the efforts of my mentor who clearly instructed me on what is required to be done.</p>kjlsudharsana@gmail.com (sudharsana-kjl)Mon, 10 Jun 2019 10:11:52 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-2/Blog #1https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/blog-1/<p>This is a detailed blog post regarding my first week of code.</p> <p>My work involves adding a label feature to tag the datasets such that when I train multiple models on the same datasets i'm able to distinguish between them which would be very helpful while training multiple models simultaneously.</p> <p>As a stepping stone, my mentor asked me to implement it for a specific type of source, and hence JSONSource was chosen. The codebase has grown a lot since i wrote my proposal so most of the time was spent in understanding the workflow so that i can analyse where the changes have to be made. After trying to understand the code by giving so many print statements in between lines, finally I got an idea of how i'm supposed to achieve my goal.</p> <p>Now that I know what changes have to be done, the challenge was in not only making those changes but also testing it. While making the changes, there have been instances where unexpectedly we see some whacky behaviour. This lead to code changes in places we didn't expect in the beginning. Thanks to this, now we have tightened up few loose ends. After making the necessary changes, my mentor and I worked on testing it.</p> <p>Initial testing was a failure and there were so many unexpected errors. Since the changes were done in a function that dumps the data into a file, all the tests that were directly or indirectly using this were failing. We figured out everything expect for JSONDecodeError that was occuring when we are trying to do json.load() in dump_fd() in JSONSource.</p> <p>We somehow figured why this error occured and fixed it in the future week. See the next blog post for updates on this!</p>kjlsudharsana@gmail.com (sudharsana-kjl)Mon, 10 Jun 2019 08:32:07 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/blog-1/Weekly Check-in #1https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-1/<p>In the first week of coding, I worked on implementing a part of the feature in the project. More details on what I'll be working on throughout GSoC can be found <a href="https://github.com/intel/dffml/issues/9">here</a>.</p> <h2 class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-25 cms-render-model"><span style="font-family: Verdana,Geneva,sans-serif;">What did I do this week?</span></h2> <p><span style="font-family: Verdana,Geneva,sans-serif;">I submitted a PR on the part I was working on. My mentor reviewed my code and requested changes. I have modified according to it. After that, we are working on debugging the errors that occur while running tests. </span></p> <h2>What is coming up next?</h2> <p>I'll be working on debugging and add few more tests for this feature. If I'm able to finish this before next weekend, I'll work on implementing the same feature for a different type of FileSource.</p> <h2 class="cms-plugin cms-plugin-aldryn_newsblog-article-lead_in-25 cms-render-model">Did you get stuck anywhere?</h2> <p>Currently, my mentor and I are working on debugging the errors. It shows some whacky behaviour so we are trying to fix it.</p> <p> </p>kjlsudharsana@gmail.com (sudharsana-kjl)Sun, 02 Jun 2019 03:37:56 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-1/Weekly Check-in #0https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-0/<p>Hola Amigos! I'm Sudharsana K J L from Tamil Nadu, India. I'll be contributing to DFFML. To know more about this cool project visit <a href="https://intel.github.io/dffml/">here.</a></p> <h1><span style="font-family: Verdana,Geneva,sans-serif;">What did I do this week?</span></h1> <p><span style="font-family: Verdana,Geneva,sans-serif;">During the community bonding period, the timing for our weekly catch up meetings was decided and I've had two calls with my mentor already in which we discussed about the project and about my proposal and how it could be achieved. A lot of things have changed in the code base since i wrote the proposal, so i spent my time this week trying to understand the code changes. I've also moved to a new house, so things have been pretty tight with the packing and unpacking. </span><span style="font-family: Verdana,Geneva,sans-serif;">During the sessions with my mentor, he explained me and my fellow GSoC'er Yash about the new additions in the code and how it helps improve DFFML. The past week, I've been trying to understand and figure out how to add the feature I'm gonna work on.</span></p> <h1>What is coming up next?</h1> <p>With the coding session starting next week, I'll be working on adding the initial part of the feature. This will help me get a clear idea of how to proceed with the other features I'm hoping to add.</p> <h1>Did you get stuck anywhere?</h1> <p>I was stuck on understanding the new code and the working of few classes but my mentor is very active and responds to my silly queries patiently and helps me in getting to know the project better.</p> <p> </p> <p> </p>kjlsudharsana@gmail.com (sudharsana-kjl)Fri, 24 May 2019 13:48:02 +0000https://blogs.python-gsoc.org/en/sudharsana-kjls-blog/weekly-check-in-0/