Articles on programmer290399's Bloghttps://blogs.python-gsoc.orgUpdates on different articles published on programmer290399's BlogenMon, 23 Aug 2021 05:55:33 +0000Weekly Blog Post #12 [Aug. 23, 2021]https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-blog-post-12-aug-23-2021/<h3><u>What did you do this week?</u></h3> <ul> <li>This week I <a href="https://github.com/intel/dffml/pull/1174/commits/22e477d6a70a2ba1f18c907a8adf3aaa46ac2dfe">streamlined</a> the <code>SimpleModel</code> class which derives from the <code>Model</code> base class in which I recently added archive support code. </li> <li>I also made multiple minor changes [<a href="https://github.com/intel/dffml/pull/1174/commits/3bf07e8efc7a5d91ab620f3a0dc912cb5b90632e">1</a>,<a href="https://github.com/intel/dffml/pull/1174/commits/780183e2e1b517059bc497f44b54330b81a7f88d">2</a>] as requested by my mentor.</li> <li>Other than that I updated all the models to reduce code duplication and support archive storage.</li> <li>While I was updating all the models I noticed that I was having problems in saving and loading trained models for the spaCy model, I discovered that there was an <a href="https://github.com/intel/dffml/issues/1198">issue</a> in the archive creation code. </li> <li>I quickly fixed the problem in a <a href="https://github.com/intel/dffml/pull/1199">subsequent PR</a>.</li> <li>I also <a href="https://github.com/intel/dffml/pull/1174/commits/794c99caa78844e83bf9efa86f6d37f10e45d7d6">added a tutorial</a> on how to save/load models as archives.</li> <li>Last but not the least I <a href="https://github.com/intel/dffml/pull/1174/commits/a9fdb498112909a4fc473ebeb63d16c8fbc84217">added test cases</a> to cover <code>dffml.df.archive</code>.</li> </ul> <h3><u>What is coming up next?</u></h3> <ul> <li>This week marks the end of my GSoC'21 journey, I would continue contributing to DFFML in the future.</li> </ul> <h3><u> Did you get stuck anywhere?</u></h3> <ul> <li>Not really, I was stuck for some time figuring out why the spaCy model was not working, but I was able to spot out the <a href="https://github.com/intel/dffml/issues/1198">issue</a> and <a href="https://github.com/intel/dffml/pull/1199">fix it</a> pretty quickly.</li> </ul>programmer290399@gmail.com (programmer290399)Mon, 23 Aug 2021 05:55:33 +0000https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-blog-post-12-aug-23-2021/Weekly Check-In #11 [Aug. 16, 2021]https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-check-in-11-aug-16-2021-1/<h3><u>What did you do this week?</u></h3> <ul> <li>At the beginning of this week, I received some feedback from my mentor regarding the flow and some other stuff[<a href="https://github.com/intel/dffml/pull/1174#discussion_r686169181">1</a>,<a href="https://github.com/intel/dffml/pull/1174#discussion_r686189809">2</a>].</li> <li>I worked on the changes requested by my mentor and fixed various bugs [<a href="https://github.com/intel/dffml/pull/1174/commits/af4eb3b401fb46a3a9af48f3075d65db994ea5e9">1</a>,<a href="https://github.com/intel/dffml/pull/1174/commits/4ad7f6e8981f3624bb17c4518014319125e6514d">2</a>,<a href="https://github.com/intel/dffml/pull/1174/commits/f57ebf157620b2ac65edd0670d600933b8a7f417">3</a>,<a href="https://github.com/intel/dffml/pull/1174/commits/33f17b90f3b688e62ce6a0046d3eecc257fc8354">4</a>,<a href="https://github.com/intel/dffml/pull/1174/commits/f6a4a69eb69e93877d988cd61f1ea4c21fec3dd4">5</a>].</li> <li>Other than that I also fixed some test cases that were failing after I made the changes last week [<a href="https://github.com/intel/dffml/pull/1174/commits/e4be924af3fe410dc255aa28b04d9d59e348da5d">1</a>,<a href="https://github.com/intel/dffml/pull/1174/commits/4ad7f6e8981f3624bb17c4518014319125e6514d">2</a>].</li> </ul> <h3><u>What is coming up next?</u></h3> <ul> <li>Testing and implementing/fixing archive storage support in all 11 models listed in <a href="https://github.com/intel/dffml/pull/1174">this PR</a>.</li> <li>Other than I have noticed that some models have repetitive code blocks, I would try to eliminate any such duplication. </li> </ul> <h3><u> Did you get stuck anywhere?</u></h3> <ul> <li>Nope, this week I spent most of my time reading the model code and analyzing where and what changes would be needed.</li> </ul>programmer290399@gmail.com (programmer290399)Mon, 16 Aug 2021 05:33:31 +0000https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-check-in-11-aug-16-2021-1/Weekly Blog Post #10 [Aug. 9, 2021]https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-blog-post-10-aug-9-2021/<h3><u>What did you do this week?</u></h3> <ul> <li>This week I worked upon suggestions [<a href="https://github.com/intel/dffml/pull/1174#discussion_r677663867">1</a>,<a href="https://github.com/intel/dffml/pull/1174#discussion_r677673301">2</a>,<a href="https://github.com/intel/dffml/pull/1174#discussion_r677678944">3</a>] from my mentor related to <a href="https://github.com/intel/dffml/pull/1174">archive support PR</a>. </li> <li>I also came up with <a href="https://github.com/intel/dffml/pull/1174#issuecomment-893176136">this</a> flow to make sure that I and my mentor(s) are on the same page. </li> <li>There was <a href="https://github.com/intel/dffml/pull/1174#discussion_r682608990">some confusion</a> around how configs could be restored which I <a href="https://github.com/intel/dffml/pull/1174#discussion_r686517568">resolved</a> with my mentor in this week's meet. </li> <li>Other than that I <a href="https://github.com/intel/dffml/pull/1174/commits/d8f8433f81d8438dec1a798fa9ae2b18090c086e">removed the dataflow creation code</a> for saving and loading from archives from the Model base class. </li> <li>And <a href="https://github.com/intel/dffml/pull/1174/commits/243e088612e34d28c61cf1d5c73a3775b5690be4">added a much less verbose helper function</a> to create those dataflows in <a href="https://github.com/intel/dffml/blob/master/dffml/df/archive.py"><code>dffml.df.archive</code></a>.</li> <li>Also added <a href="https://github.com/intel/dffml/pull/1174/commits/a38fd1574d083446db51106fe43deb4e18d777dc">config loading code</a> which I am planning to update in the coming week, alongside adding standard <a href="https://github.com/intel/dffml/pull/1174/commits/cce31d5070ec91619cae84e412527dbc03bff43b">definitions for loading/saving flows</a> for models.</li> </ul> <h3><u>What is coming up next?</u></h3> <ul> <li>I would be working on the flow linked above and would be streamlining various other parts of code to work properly with archive code.</li> <li>I would be perusing through the code for various models to understand and plan where &amp; what all changes would be needed to properly support archive storage. </li> <li>Other than that I'd be fixing any bugs that might have been introduced in my recent commits.</li> </ul> <h3><u> Did you get stuck anywhere?</u></h3> <ul> <li>I got stuck when I was trying to condense the code for archive dataflow creation, but with some time, thought and basic math, I was able to significantly reduce the code's verbosity while improving its readability. </li> </ul>programmer290399@gmail.com (programmer290399)Wed, 11 Aug 2021 07:04:13 +0000https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-blog-post-10-aug-9-2021/Weekly Check-In #9 [Aug. 2, 2021]https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-check-in-9-aug-2-2021/<h3><u>What did you do this week?</u></h3> <ul> <li>This week I got two of my PRs merged after a long time:<ul> <li>ci : lint : commits : Adding ci job to validate commit message format <a href="https://github.com/intel/dffml/pull/1076">#1076</a></li> <li>util:log: Added log_time decorator <a href="https://github.com/intel/dffml/pull/1101">#1101</a></li> </ul> </li> <li>However merging <a href="https://github.com/intel/dffml/pull/1076">PR #1076</a> introduced an <a href="https://github.com/intel/dffml/runs/3173950826">issue in master</a> in which the lint commit command kept running forever and was killed by Github Actions for exceeding run time limits. </li> <li>I fixed this issue quickly in <a href="https://github.com/intel/dffml/pull/1177">another PR</a> which simply skipped the linting test on master branch.</li> <li>Other than that I continued working on my possibly <a href="https://github.com/intel/dffml/pull/1174">final PR</a> for my GSoC project based on the inputs [<a href="https://github.com/intel/dffml/pull/1174#discussion_r677663867">1</a>,<a href="https://github.com/intel/dffml/pull/1174#discussion_r677673301">2</a>,<a href="https://github.com/intel/dffml/pull/1174#discussion_r677678944">3</a>] received from my mentor.</li> </ul> <h3><u>What is coming up next?</u></h3> <ul> <li>As per the feedback received from my mentor on archive support part I have to make some changes which I'd be picking up in coming week. </li> <li>Also, currently the archive dataflow creation code is very verbose and I will try to improve upon it.</li> </ul> <h3><u> Did you get stuck anywhere?</u></h3> <ul> <li>Not really, I have been thinking a lot about how the verbosity of dataflow creation code can be reduced, how various blocks of code which are pretty similar can be removed and how the overall mechanism can be made to look more pythonic and clean.</li> </ul>programmer290399@gmail.com (programmer290399)Mon, 02 Aug 2021 13:09:05 +0000https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-check-in-9-aug-2-2021/Weekly Blog Post #8 [July 26, 2021]https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-blog-post-8-july-26-2021/<h3><u>What did you do this week?</u></h3> <ul> <li>This week I worked on two PRs:<ol> <li>high_level: Move code into a directory and splitting out into files <a href="https://github.com/intel/dffml/pull/1172">#1172</a></li> <li>WIP: model : Add Support for Archive Storage of Models <a href="https://github.com/intel/dffml/pull/1174">#1174</a></li> </ol> </li> <li>The first one was related to <a href="https://github.com/intel/dffml/pull/1155#discussion_r667428972">an issue I faced</a> earlier and the second PR was dependent on the first one. </li> <li>So before making the first PR I opened up an <a href="https://github.com/intel/dffml/issues/1170">issue</a> to get feedback from my mentor before I proceeded with the implementation. </li> <li>After the first PR was <a href="https://github.com/intel/dffml/commit/dcc64122ac1ec1616020ce3c43dfd4cd67c74ec4">merged</a> I continued working on the second one. </li> <li>There were some issues I faced in both of these PRs which I'd be discussing later in this post.</li> <li>Other than that I rebased my other two PRs which are ready to merge with all relevant CI tests passing.<ol> <li>util:log: Added log_time decorator <a href="https://github.com/intel/dffml/pull/1101">#1101</a></li> <li>WIP : ci : lint : commits : Adding ci job to validate commit message format <a href="https://github.com/intel/dffml/pull/1076">#1076</a></li> </ol> </li> </ul> <h3><u>What is coming up next?</u></h3> <ul> <li>I would be working on to complete all the tasks in my possibly the <a href="https://github.com/intel/dffml/pull/1174">final PR</a> to fix issue <a href="https://github.com/intel/dffml/issues/662">#662</a></li> <li>This would take a long time as I have to make changes in all the models and update their respective tests and docs.</li> <li>Other than that I would be fixing a <a href="https://github.com/intel/dffml/issues/1167">logging issue</a>, I still have to think about how it can be fixed.</li> </ul> <h3><u> Did you get stuck anywhere?</u></h3> <ul> <li>Yes, starting with the <code>high_level</code> splitting <a href="https://github.com/intel/dffml/pull/1172">PR</a>, I was stuck on a failing <a href="https://github.com/intel/dffml/runs/3112289129">docstring test</a>, I don't know how it was working before I made this change, after a ton of debugging I just gave up and solved it with a <a href="https://github.com/intel/dffml/pull/1172/commits/8c89720c21e5435bd16523edbd6da4c5eb0c4dd0">pretty trivial solution</a>. </li> <li>Other than that I was not really stuck but spent quite a lot of my time in thinking how the tar support should be implemented in the <code>Model</code> class, I have <a href="https://github.com/intel/dffml/pull/1174/commits/8078c0d26b4b79c706427dd037ac676bd931713a">pushed</a> a rough implementation to get feedback from my mentor(s). </li> </ul>programmer290399@gmail.com (programmer290399)Tue, 27 Jul 2021 10:53:00 +0000https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-blog-post-8-july-26-2021/Weekly Check-In #7 [July 19, 2021]https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-check-in-7-july-19-2021/<h3><u>What did you do this week?</u></h3> <ul> <li>Finally got my directory property to location renaming <a href="https://github.com/intel/dffml/pull/1155">PR</a> <a href="https://github.com/intel/dffml/commit/5c1d437eac2b2e48a20866664faef607a42fe2d9">merged</a>. </li> <li>Other than that I closed two similar issues [<a href="https://github.com/intel/dffml/issues/1159">1</a>,<a href="https://github.com/intel/dffml/issues/1160">2</a>] with a PR, this change now returns an output for <a href="https://github.com/intel/dffml/commit/c16f6fae841c4ba961d8d9893529c1c7813fef79">archive</a> and <a href="https://github.com/intel/dffml/commit/5edb093780f8ae9c76f34336f188482d9efc70d2">compression</a> operations merged earlier.</li> <li>I also made a small fix for a <a href="https://github.com/intel/dffml/actions/runs/1036147649">docstring test</a> that was failing in <a href="https://github.com/intel/dffml/pull/1165">this PR</a>.</li> <li><p>Also fixed some typos and made some housekeeping related changes here and there [<a href="https://github.com/intel/dffml/commit/af43c314f467b06082e96892fc6578d423a203f6">1</a>,<a href="https://github.com/intel/dffml/commit/0d19fe6c4d06f366e6afdf826b1023b6e01b5244">2</a>,<a href="https://github.com/intel/dffml/commit/5f001ba68b3314a7771a4887060811694d3f7c5b">3</a>].</p> </li> <li><p>Other than that I worked on rebasing my open PRs [<a href="https://github.com/intel/dffml/pull/1101">1</a>,<a href="https://github.com/intel/dffml/pull/1076">2</a>] with master because Accuracy scorers have been merged into master branch. </p> </li> </ul> <h3><u>What is coming up next?</u></h3> <ul> <li>I hope I'd now work on implementing archive support in Model base class finally.</li> <li>But before that I would be fixing a <a href="https://github.com/intel/dffml/issues/1170">related issue</a> which is sort of blocking the implementation of DataFlows required to perform archiving ops. </li> <li>This in itself is going to be a big change which would take a considerable amount of time.</li> </ul> <h3><u> Did you get stuck anywhere?</u></h3> <ul> <li>Nope, this week I mostly fixed simple stuff so there was nothing that bothered me.</li> </ul>programmer290399@gmail.com (programmer290399)Sun, 18 Jul 2021 15:26:39 +0000https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-check-in-7-july-19-2021/Weekly Blog Post #6 [July 12, 2021]https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-blog-post-6-july-12-2021/<h3><u>What did you do this week?</u></h3> <ul> <li>I worked on implementation of archive support in the Model base class using Operations I implemented earlier and <a href="https://github.com/intel/dffml/pull/1155">created a PR</a>. </li> <li>Then I requested feedback from my mentors on the <a href="https://github.com/intel/dffml/pull/1155/commits/46a90399d6bad13e1f5e7573c60dd695b6993aec">changes I made</a> in the Model base class to make sure that I was working in the right direction.</li> <li>As per the <a href="https://github.com/intel/dffml/pull/1155#issuecomment-877750043">feedback</a> from my mentor I rolled back the changes related to archive support.[<a href="https://github.com/intel/dffml/pull/1155/commits/a24def9302ae6478d1973fe6af8d8f6b902b2d2d">1</a>,<a href="https://github.com/intel/dffml/pull/1155/commits/1b82ff23aaf53109465ddb3bb97c7c2ca8149532">2</a>]</li> <li><a href="https://github.com/intel/dffml/pull/1155/commits/656434b735ff456973c07e3c73fd5da12bc906bb">Fixed some typos</a> here and there as well.</li> <li>After the receiving the feedback, this week I worked hard on renaming the <code>directory</code> property to <code>location</code> and made changes to <a href="https://github.com/intel/dffml/pull/1155/files"><strong>144 files</strong></a> in the codebase to make sure that all the tests pass locally as well as in CI. </li> <li>However, still <a href="https://github.com/intel/dffml/pull/1155#issuecomment-877896250">one test is failing</a> and perhaps it is not an issue. </li> <li>I have requested for another review of my <a href="https://github.com/intel/dffml/pull/1155">PR related to renaming directory property to location</a> after making all the requested changes to code, docs and tests for all models.</li> <li><p>Other than that my other two open PRs: </p> <ul> <li><a href="https://github.com/intel/dffml/pull/1101">util:log: Added log_time decorator #1101</a></li> <li><p><a href="https://github.com/intel/dffml/pull/1076">ci : lint : commits : Adding ci job to validate commit message format #1076</a> </p> <p>are also ready to merge with all the CI tests passing 🎉.</p> </li> </ul> </li> </ul> <h3><u>What is coming up next?</u></h3> <ul> <li>After receiving feedback on my current changes I will work on any other changes that my mentor(s) might request. </li> <li>Other than that I will also discuss the archive support implementation details with my mentor in the upcoming weekly sync and work on a new PR for the same.</li> <li>I will also work on the <a href="https://github.com/intel/dffml/pull/1076">Commit Linting Issue's PR</a> to cover more <a href="https://github.com/intel/dffml/issues/1136">enhancement points</a> depending on the bandwidth I have this week. </li> </ul> <h3><u> Did you get stuck anywhere?</u></h3> <ul> <li>Yes and no as well, I took a considerable amount of time in reading the code base in couple of past weeks, to get a good idea of where the renaming is to be done, it is not like find and replace, there are a lot of places where directory property is not even related to a model and thus it was not as simple as it might seem. </li> <li>Making the changes also broke a lot of stuff, almost all model tests and other various seemingly unrelated tests as well. So I was stuck at certain errors in the CI for sometime but eventually I was able to sort all of them out. </li> <li>However, I cannot really say I was stuck all the time in the above listed things, as it was more about understanding and spotting out sources of problems which were causing various tests to fail.</li> </ul>programmer290399@gmail.com (programmer290399)Mon, 12 Jul 2021 02:17:17 +0000https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-blog-post-6-july-12-2021/Weekly Check-In #5 [July 5, 2021]https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-check-in-5-july-5-2021/<h3><u>What did you do this week?</u></h3> <ul> <li>I continued working on the update for Model base class to support archive. </li> <li>I also made all the requested changes for Archive and Compression related Operations and got my <a href="https://github.com/intel/dffml/pull/1128">PR merged</a>.</li> <li>Other than that I completed the implementation of a couple of <a href="https://github.com/intel/dffml/issues/1136">enhancement points</a> for <a href="https://github.com/intel/dffml/issues/1040">Commit Linting Issue</a> and updated the tests as well. This also increased coverage of master commits by ≈ 2%.</li> <li>I also finally <a href="https://github.com/intel/dffml/pull/1076/commits/98b337cf56a0f02c697fbed91dd1263f9b66fe95">Fixed the MacOS error</a> in the CI by refactoring the test case for the <a href="https://github.com/intel/dffml/issues/1040">Commit Linting Issue</a>.</li> </ul> <h3><u>What is coming up next?</u></h3> <ul> <li>My main focus this week would be on Updating the <a href="https://github.com/intel/dffml/blob/master/dffml/model/model.py#L70">Model base class</a> to support archive storage, as I have also mentioned in my previous blog post that it is a bit time consuming and thus I might not be able to push working changes very soon. </li> <li>Other than I'd be looking into the code for other models as well to foresee where changes would be required to adapt to the changes I've made in the Model Base class.</li> </ul> <h3><u> Did you get stuck anywhere?</u></h3> <ul> <li>Yes, for sometime I was stuck while implementing the body mutation for <a href="https://github.com/intel/dffml/issues/1040">Commit Linting Issue</a> , as conditional mutations would fail in catching cases if <code>no_muation</code> was added to the list, but not adding <code>no_mutation</code> would lead to failure in other common cases. </li> <li>I solved the issue by implementing a <a href="https://github.com/intel/dffml/pull/1076/files#diff-022f6b5bafb16719a49f41905bfffa4e18cc1454ccebc3f521666634912a34f1R779-R783">composition function generation method</a> and <a href="https://github.com/intel/dffml/pull/1076/files#diff-022f6b5bafb16719a49f41905bfffa4e18cc1454ccebc3f521666634912a34f1R785-R788">making the conditional body mutation act as <code>no_mutation</code> if the condition was not met</a>.</li> </ul>programmer290399@gmail.com (programmer290399)Thu, 08 Jul 2021 02:28:32 +0000https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-check-in-5-july-5-2021/Weekly Blog Post #4 [June 28, 2021]https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-blog-post-4-june-28-2021/<h3><u>What did you do this week?</u></h3> <ul> <li>I worked on two issues this week:<ol> <li><a href="https://github.com/intel/dffml/pull/1128"><code>Adding Operations for Archives and Compression</code></a></li> <li><a href="https://github.com/intel/dffml/pull/1076"><code>Commit Linting Issue</code></a></li> </ol> </li> <li>I removed <code>options</code> parameter from and updated definitions for existing zip operations as per the feedback from my mentor. [<a href="https://github.com/intel/dffml/pull/1128#discussion_r656417180">1</a>,<a href="https://github.com/intel/dffml/pull/1128#discussion_r656416969">2</a>]</li> <li>Other than that I cleaned up and also optimized some code here and there. [<a href="https://github.com/intel/dffml/pull/1128/commits/4e6150160054331eb50fd489575a4847210b5798">1</a>,<a href="https://github.com/intel/dffml/pull/1128/commits/d0938122a7542859d66044e51e9a55c682825697">2</a>,<a href="https://github.com/intel/dffml/pull/1128/commits/9e3a15df44f66319fd807927f31055cf401ee2bf">3</a>,<a href="https://github.com/intel/dffml/pull/1128/commits/564c2a7a52cb8023d6875d21b84fd501ea6c6444">4</a>,<a href="https://github.com/intel/dffml/pull/1128/commits/425506430c14fb2ab8eee8e8952ebb87f742d147">5</a>,<a href="https://github.com/intel/dffml/pull/1076/commits/d5c0e2f91dbe76009d915e59cbe6d4e901944d88">6</a> &amp; <a href="https://github.com/intel/dffml/pull/1076/commits/12bbd06f8e98a2dc8f203cfa1c66ad0d4d4699eb">7</a>]</li> <li>Starting with <strong>Point 1</strong>, I implemented operations for creating and inflation of tar archives, also implemented their respective test cases. </li> <li>The test for <code>extract_tar_archive</code> was especially a bit too tricky, writing it correctly took a considerable amount of time (more on that later).</li> <li>Also added compression operations to support <code>.gz</code>, <code>.bz2</code> &amp; <code>.xz</code> formats with their respective test cases.</li> <li>Moving on to <strong>Point 2</strong>, I implemented one of the five parts of <a href="https://github.com/intel/dffml/issues/1136">enhancement issue</a> related to it, i.e. added support for commits related to tests by implementing a body mutation.</li> <li>I also optimized the code to avoid multiple calls to <code>_get_all_exts()</code>, by making a class attribute for it's output. </li> <li>Last but not the least, I updated test cases to make sure that the linting support for tests related body mutation is working correctly. </li> </ul> <h3><u>What is coming up next?</u></h3> <ul> <li><a href="https://github.com/intel/dffml/pull/1128#discussion_r652835527">Updating the Model base class</a> to support archive storage using these operations would be my top priority for the coming week. </li> <li>Obviously this would break a considerable number of models and would also call for implementation of new test cases and a number of small tweaks. </li> <li>Other than that I would continue to work on the commit linting issue as per the enhancement checklist. </li> </ul> <h3><u> Did you get stuck anywhere?</u></h3> <ul> <li>Yes, while implementing the <a href="https://github.com/intel/dffml/pull/1128/commits/d8a3f5877402dd66f218511e6f4aca323412f1a6">test case for <code>extract_tar_archive</code></a>, I ran into a lot of issues, mostly consisting of problems of faulty patching and problems regarding getting the right mock calls in place. However, I was able to solve the problem successfully after investing almost a complete day in it. </li> <li>Deciding on how to implement the compression algorithms, also took some time, I wasn't really stuck on it, but it took some tome and discussion with my mentors to come to a conclusion. </li> <li>Initially I thought I would give a compression option with the archive operations itself but as per <a href="https://github.com/intel/dffml/pull/1128#discussion_r656417180">the feedback from my mentor</a> I wasn't supposed to implement options for now. </li> <li>Also, binding the compression with archiving operations would not have been the best thing to do, implementing compression as an operation itself was the best way which I and both of my mentors agreed as it is comparatively more modular and flexible approach at solving this issue. </li> <li>I still have not been able to resolve the MacOS related <a href="https://github.com/intel/dffml/pull/1076/checks?check_run_id=2937271300">CI error</a> in the commit linting issue, I would be discussing that with my mentor in the upcoming weekly sync. </li> </ul>programmer290399@gmail.com (programmer290399)Tue, 29 Jun 2021 00:30:06 +0000https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-blog-post-4-june-28-2021/Weekly Check-In #3 [June 21, 2021]https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-check-in-3-june-21-2021-1/<h3><u>What did you do this week?</u></h3> <ul> <li>I worked on two things this week:<ul> <li><a href="https://github.com/intel/dffml/issues/1040">Commit Linting Issue</a> </li> <li><a href="https://github.com/intel/dffml/pull/1128">Archive manager Implementation</a> </li> </ul> </li> <li>The commit linting issue was open from a long time and I had to debug some <a href="https://github.com/intel/dffml/runs/2715561947">CI errors</a> and I finally was able to get it up and running on the CI. </li> <li>I also worked on the Archive Manager implementation, which is the first part of my project after taking <a href="https://github.com/intel/dffml/pull/1128#discussion_r652835527">input from my mentor</a>.</li> </ul> <h3><u>What is coming up next?</u></h3> <ul> <li>A couple of <a href="https://github.com/intel/dffml/pull/1076#issuecomment-864342781">CI errors</a> are still left in the commit linting issue, would be fixing them. </li> <li>Would be brushing up and finalising the archive manager stuff. </li> <li>Would make necessary changes to model base class.</li> </ul> <h3><u>Did you get stuck anywhere?</u></h3> <ul> <li>Yes, I was stuck with one thing for a long time,that was getting the current branch name in the CI, and it was not that easy for me because I wasn't aware of the fact that some env variables hold it and the actual PR branch name is different, anyways it didn't solve the issue anyways and I had to change the code to not require the current branch name. More on that whole thing <a href="https://github.com/intel/dffml/pull/1076#discussion_r654551833">here</a>.</li> <li>Also some <a href="https://github.com/intel/dffml/pull/1076#issuecomment-864342781">CI errors</a> bothered me this week, I haven't been able to fix them yet, I'd be discussing them with my mentor soon. </li> </ul>programmer290399@gmail.com (programmer290399)Tue, 22 Jun 2021 06:38:59 +0000https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-check-in-3-june-21-2021-1/Weekly Blog Post #2 [June 14, 2021]https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-blog-post-2-june-14-2021/<h3><u>What did you do this week?</u></h3> <ul> <li><p>This week I spent most of my time reading the docs, stack-overflow answers and blog-posts of various archive handling methods and all which is built-in the python standard library. </p> </li> <li><p>The goal of reading all this was to come up with a design of archive manipulation module which can be format agnostic. </p> </li> <li><p>First I thought I should use <a href="https://docs.python.org/3.7/library/shutil.html?highlight=shutil%20unpack%20archive#shutil.unpack_archive"><code>shutil.unpack_archive</code></a> as it handles most of the common formats, and it would've been easier and simpler to implement in tandem with <a href="https://docs.python.org/3.7/library/shutil.html?highlight=shutil%20unpack%20archive#shutil.make_archive"><code>shutil.make_archive</code></a>, essentially completing the functionality of the archiving module I want to implement.</p> </li> <li><p>But I didn't use the <code>shutil</code> methods because they are limited to a small number of formats and people may need to use other formats, in that case if I would've written this module using these methods, it would've been very difficult to extend it to other formats, as there are other popular formats which these methods don't support.</p> </li> <li>This was my first iteration on this and it may be a good idea but I felt that we need something more maintainable and easy to extend, thus I came up with the idea of using a simple dictionary which would map format names to their respective methods from python standard library. something like:<pre><code class="lang-py"> <span class="hljs-symbol">SUPPORTED_ARCHIVE_FORMATS</span> = { <span class="hljs-string">'format_name'</span> : class_that_handles_it . . . } </code></pre> </li> <li><p>This looked a promising solution to me initially as new formats can be added directly to this dictionary and then used in the relevant methods. </p> </li> <li><p>But this was not the case, it turns out that the interface of various format specific archive handling methods are not very consistent, for example : For writing to a zip file the method used is <code>zip.write(file)</code> but the same function in tarfile is performed by <code>tarfile.add(file)</code> method. </p> </li> <li><p>All in all these little inconsistencies in the interface lead to the current design, which uses a helper class for each format and that class is registered in that common dictionary and each helper class inherits from an abstract base class which basically defines how the class should be implemented, and also provides some helper functions. </p> </li> <li><p>This way all the archive handling methods could be brought down to a consistent interface and can be used in related methods, also extending to new methods would be easy and they all should ideally work with the existing code like a charm. </p> </li> </ul> <h3><u>What is coming up next?</u></h3> <ul> <li><p>I have made a rough implementation to get inputs from my mentor and to improve upon this.</p> </li> <li><p>Would need to implement test cases for this and more helper classes the current state of the PR can be seen <a href="https://github.com/intel/dffml/pull/1128">here.</a></p> </li> <li><p>And then when I will add this to the model's base class and proceed to update tests cases for other models and make sure that they work with archives. </p> </li> <li><p>Also would need to update the <a href="https://github.com/intel/dffml/blob/master/dffml/source/file.py#L28">file source</a> as it has some archive handling code which should be removed and updated to use this module.</p> </li> </ul> <h3><u>Did you get stuck anywhere?</u></h3> <ul> <li><p>Not really, this week was more about thinking and trying out various implementations,and evaluating them based on extensibility and maintainability, as I have discussed above. </p> </li> <li><p>I was a bit confused on how shall I implement the tests for this, but I think it would be better to take input from the mentors first and then put in effort in covering it in the tests.</p> </li> </ul>programmer290399@gmail.com (programmer290399)Mon, 14 Jun 2021 02:38:10 +0000https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-blog-post-2-june-14-2021/Weekly Check-in #1 [June 7, 2021]https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-check-in-1-june-7-2021/Hello everyone !! <p>My name is Saahil Ali and I am a 3rd year student at IIPS,DAVV pursuing my integrated M.Tech (I.T.) currently. I have been working on DFFML from a few months now and I would be working on <b>Adding archive support in DFFML</b> this summer</p> <h3><u>What did you do this week?</u></h3> <p>I mostly focused on working towards finishing any open issues and requesting reviews for them so that I can completely focus on my project in this coming week. I also had a 1:1 meeting with my mentor, and yes also wrote this blog beforehand and waited for last date to publish only to find out that this site was down :P</p> <h3><u>What is coming up next?</u></h3> <p>I am planning to come up with a rough design of the archiving module, and would discuss and refine it with the help of my mentors and then would implement it with the test cases and documentation.</p> <h3><u>Did you get stuck anywhere?</u></h3> <p>Aah yes !! I was working on a <a href="https://github.com/intel/dffml/issues/1040">commit linting issue</a> from a long time and when it is almost ready for getting merged , I was getting some errors in the CI [<a href="https://github.com/intel/dffml/pull/1076/checks?check_run_id=2718732369">1</a>, <a href="https://github.com/intel/dffml/pull/1076/checks?check_run_id=2718730359">2</a>] and I haven't been able to debug them yet, I have also asked my mentor about those errors and would continue to debug them and would eventually fix em all.</p>programmer290399@gmail.com (programmer290399)Wed, 09 Jun 2021 01:36:05 +0000https://blogs.python-gsoc.org/en/programmer290399s-blog/weekly-check-in-1-june-7-2021/