Articles on gavishpoddar's Bloghttps://blogs.python-gsoc.orgUpdates on different articles published on gavishpoddar's BlogenTue, 24 Aug 2021 05:11:25 +0000Part of the journey is the end | Final Reporthttps://blogs.python-gsoc.org/en/gavishpoddars-blog/part-of-the-journey-is-the-end-final-report/<h3> Hi, community</h3> <p> Part of the journey is the end. It is time for me to work on my final work report for final evaluation of Google Summer of Code 2021. This week, I will devote my time mainly to write my final report. </p> <h1>Final Work Submission Report</h1> <ul> <li><strong>Name:</strong> Gavish Poddar</li> <li><strong>Organisation:</strong> Python Software Foundation</li> <li><strong>Sub-Organisation:</strong> Zyte</li> <li><strong>Project:</strong> <a href="https://github.com/scrapinghub/dateparser"><code>dateparser</code></a> Better language detection &amp; reimplementing <code>search_dates</code> </li> <li><strong>Proposal:</strong> <a href="https://blogs.python-gsoc.org/media/proposals/Dateparser_Better_Language_Detection_1.pdf">dateparser - better language detection</a></li> </ul> <p>Hello Everyone! My name is Gavish Poddar and I'm excited to tell you about my GSoC journey. For the past couple of months, I have been working on an awesome project <code>dateparser</code>. The dateparser aims to parse <code>datetime</code> from a string.</p> <p>My GSoC journey would not have been successful without the guidance of my mentors <a href="https://github.com/noviluni">Marc Hernández</a>, <a href="https://github.com/lopuhin">Konstantin Lopuhin</a> and <a href="https://github.com/kishan3">Kishan Mehta</a>.</p> <h2>What I Have Learned?</h2> <p>The whole GSoC journey was full of learning thanks to my mentors. I learned how to find good open source dependencies to include in our project. I tried my hands on improving code coverage and writing tests for the code. I learned how to optimize code and the need for extensive research before feature addition.</p> <h2>What I Have Contributed?</h2> <p>As mentioned in my proposal I worked on the implementation of the Optional Language Detection for dateparser and fixing as many issues as possible in the <code>search_dates</code> function of the <a href="https://github.com/scrapinghub/dateparser"><code>dateparser</code></a>.</p> <h3>Optional Language Detection</h3> <a href="https://github.com/scrapinghub/dateparser/pull/932"><img src="https://img.shields.io/github/pulls/detail/state/scrapinghub/dateparser/932?label=Language%20Detection"></a> <p>PR - <a href="https://github.com/scrapinghub/dateparser/pull/932">Optional Language Detection</a></p> <p>Implemented optional language detection to improve language detection. This allows to plug in any language detection library with the dateparser. Out of the box, dateparser supports two libraries <a href="https://github.com/facebookresearch/fastText"><code>fasttext</code></a> and <a href="https://github.com/Mimino666/langdetect"><code>langdetect</code></a>. The optional language detection works with both parse and search_dates. This PR also introduces a new setting <code>DEFAULT_LANGUAGES</code> which is used if no language is detected by default language detection and the optional language detection.</p> <h3>Reimplimenting <code>search_dates</code> (extended goal)</h3> <a href="https://github.com/scrapinghub/dateparser/pull/945"><img src="https://img.shields.io/github/pulls/detail/state/scrapinghub/dateparser/945?label=Reimplementing search_dates"></a> <p>PR - <a href="https://github.com/scrapinghub/dateparser/pull/945">Reimplimenting <code>search_dates</code></a></p> <p>A reimplemented and simplified <code>search_dates</code> improves the results and fixes many issues. The entire search_dates is newly implemented and would be easier to maintain. This PR introduces a new feature <code>search_first_date</code> which returns the first date in the given string. This PR also fixes around <strong>13 issues</strong>.</p> <h3>Other <code>search_dates</code> improvements</h3> <p><br></p> <p>Adding support for date-related objects <code>last decade</code>, <code>next decade</code>, etc in <code>search_date</code>. This PR fixes <strong>1 issue</strong>.</p> <a href="https://github.com/scrapinghub/dateparser/pull/953"><img src="https://img.shields.io/github/pulls/detail/state/scrapinghub/dateparser/953"></a> <p>PR - <a href="https://github.com/scrapinghub/dateparser/pull/953">Improvements in locale:translate_search fixes</a></p> <p><br></p> <p>Adding support <code>search_date</code> period separator support. Date string like <code>23.12.2000</code> can be parsed. This PR fixes <strong>5 issues</strong>.</p> <a href="https://github.com/scrapinghub/dateparser/pull/963"><img src="https://img.shields.io/github/pulls/detail/state/scrapinghub/dateparser/963"></a> <p>PR - <a href="https://github.com/scrapinghub/dateparser/pull/963"><code>search_date</code> period separator support</a></p> <p><br></p> <h2>Other Important Details</h2> <p>As part of our GSoC project, Python Software Foundation requires us to post a weekly blog where we usually post about what we have done in the week and what is coming up next. We can also write about any blockages or issues we are facing. I have also written my weekly blogs so if you want to know weekly details of my project you can refer them here.</p> <p><a href="https://blogs.python-gsoc.org/en/gavishpoddars-blog/"> Weekly Blogs</a></p> <h2>Future Work and Final Note</h2> <p>The project is very actively maintained the new <code>search_dates</code> and my contributions would improve the library. The main goal of the proposal is achived with the implimentation of the optional language detection and the PR is mergeable. I plan to keep working on the project and contribute as much as I can. Contribute to the <code>search_dates</code> function of the library (<a href="https://github.com/scrapinghub/dateparser"><code>dateparser</code></a>) would be my primary goal.</p> <h2>It was overall a wonderful experience and I learned a lot.</h2> <p>I would like to thank Google, Python Software Foundation and Zyte for providing me with the opportunity and my mentors <a href="https://github.com/noviluni">Marc Hernández</a>, <a href="https://github.com/lopuhin">Konstantin Lopuhin</a> and <a href="https://github.com/kishan3">Kishan Mehta</a>.</p> <p><b>Thank you for reading!</b></p>gavishpoddar@hotmail.com (gavishpoddar)Tue, 24 Aug 2021 05:11:25 +0000https://blogs.python-gsoc.org/en/gavishpoddars-blog/part-of-the-journey-is-the-end-final-report/GSoC Weekly Check-In #6https://blogs.python-gsoc.org/en/gavishpoddars-blog/gsoc-weekly-check-in-6-1/<h3> Hi, community</h3> <h3> 1. What did you do this week? </h3> <p>This week was a tots of coding and finalising the PR's and reviews from mentors.</p> <p>This week I also created 1 new PR which solves 5 issues and extends support of search_dates function.</p> <h3> 2. What is coming up next? </h3> This week I look forward to solve issues from code reviews and wrap up all the workings during the GSoC. <h3> 3. Did you get stuck anywhere? </h3> Nothing major, it was a great week with lots of coding. <p><b>Thank you for reading!</b></p>gavishpoddar@hotmail.com (gavishpoddar)Tue, 17 Aug 2021 20:56:52 +0000https://blogs.python-gsoc.org/en/gavishpoddars-blog/gsoc-weekly-check-in-6-1/GSoC Blog Post #5https://blogs.python-gsoc.org/en/gavishpoddars-blog/gsoc-blog-post-5/<h3> Hi, community</h3> <h3> 1. What did you do this week? </h3> <p>In this week I worked towards the completion and the PR is now mergeable we did a lots of micro improvements on the PR and a lots of reviews, suggestions and discussions. The PR with reimplementation of the search_dates is also complete.</p> <h3> 2. What is coming up next? </h3> In this week I will wrap up the Optional Language Detection PR and prepare for the final submission. <h3> 3. Did you get stuck anywhere? </h3> No. <p><b>Thank you for reading!</b></p>gavishpoddar@hotmail.com (gavishpoddar)Tue, 10 Aug 2021 15:52:22 +0000https://blogs.python-gsoc.org/en/gavishpoddars-blog/gsoc-blog-post-5/GSoC Weekly Check-In #5https://blogs.python-gsoc.org/en/gavishpoddars-blog/gsoc-weekly-check-in-5-1/<h3> Hi, community</h3> <h3> 1. What did you do this week? </h3> <p>This week we did a lots for code review and testing the language detection PR and improvising the search_dates function along with adding few new functions to the new search_dates. In the language detect function we tested and practicality checked most of the lines of code to make sure that they may not cause other issues. This week I also created 3 micro PR's .</p> <h3> 2. What is coming up next? </h3> In this week I look forward to complete the implementation of the new search_dates and take reviews form my mentors for improvements so that both the major PR's could be approved within the GSoC timeline. <h3> 3. Did you get stuck anywhere? </h3> I didn't got stuck with any thing but this week was a lots of coding and feedback from my mentors that took and lot of time. Few tests cases are still failing but I expect to fix them within this week. <p><b>Thank you for reading!</b></p>gavishpoddar@hotmail.com (gavishpoddar)Mon, 02 Aug 2021 19:07:21 +0000https://blogs.python-gsoc.org/en/gavishpoddars-blog/gsoc-weekly-check-in-5-1/GSoC Blog Post #4https://blogs.python-gsoc.org/en/gavishpoddars-blog/gsoc-blog-post-4/<h3> Hi, community</h3> <h3> 1. What did you do this week? </h3> <p>I work on search_dates this week included creation of many files new supporting functions, created previous tests work with new implementation.</p> <p>Debugged various issues, checked all settings. And creating docs and tests for language detection.</p> <h3> 2. What is coming up next? </h3> In the this week, I will try to complete the implementation of the language detection PR and complete the implementation of the new search_dates. <h3> 3. Did you get stuck anywhere? </h3> I got stuck with a tox test (python segmentation fault) now its fixed. <p><b>Thank you for reading!</b></p>gavishpoddar@hotmail.com (gavishpoddar)Mon, 26 Jul 2021 18:41:38 +0000https://blogs.python-gsoc.org/en/gavishpoddars-blog/gsoc-blog-post-4/Weekly Check-In #4https://blogs.python-gsoc.org/en/gavishpoddars-blog/weekly-check-in-4-20/<h3> Hi, community</h3> <h3> 1. What did you do this week? </h3> <p>I started this week with work on search_dates reimplementing and simplifying the current implementation.</p> <p>With a goal to make a maintainable and solving many issues with the search_dates. I created PR for the implemented and tested across the results.</p> <p>I am also trying to address many issues with search_dates function.</p> <h3> 2. What is coming up next? </h3> In the this week, I will try to complete the implementation of the search_dates and write tests for the newly implemented function find_first_date along with the few fixes in the implemented language detection library. <h3> 3. Did you get stuck anywhere? </h3> I got stuck with a few questions but mentors helped and supported me to understand and solve the the issues and now its solved. <p><b>Thank you for reading!</b></p>gavishpoddar@hotmail.com (gavishpoddar)Tue, 20 Jul 2021 08:46:58 +0000https://blogs.python-gsoc.org/en/gavishpoddars-blog/weekly-check-in-4-20/GSoC Blog Post #3https://blogs.python-gsoc.org/en/gavishpoddars-blog/gsoc-blog-post-3-1/<h3> Hello Everyone!</h3> <h3> 1. What did you do this week? </h3> <p>I worked with mostly on dateparser-download and preventing breaking changes.</p> <p>Few more like dateparser-download setting default caching folder, researching on search_dates and how to improve that along with some codes, and updating CLDR data, reviewing the changes and improvising the tests.</p> <h3> 2. What is coming up next? </h3> In this week, I will try to complete the language detection PR and make it mergeable with the master and re-implement search_dates with the target of increasing accuracy and performance. <h3> 3. Did you get stuck anywhere? </h3> This week was great I didn't got stuck. But in this week I spend significant time to reviewing all changes I made in the PR and improvised the code as much as possible. <p><b>Thank you for reading!</b></p>gavishpoddar@hotmail.com (gavishpoddar)Tue, 13 Jul 2021 14:30:50 +0000https://blogs.python-gsoc.org/en/gavishpoddars-blog/gsoc-blog-post-3-1/Weekly Check-In #3https://blogs.python-gsoc.org/en/gavishpoddars-blog/weekly-check-in-3-26/<h3> Hi, community back with the third check-in.</h3> <h3> 1. What did you do this week? </h3> <p>This was a long week I learned many things </p> <p> The biggest part was implementing dateparser-download entrypoints to download models from CLI.</p> <p> Many more like shrinking and organising language data, langdetect set default DetectorFactory without changing global state, Setting fasttext default language and downloading if not already cached and Improving docs.</p> <h3> 2. What is coming up next? </h3> In the this week, I intent to fix all issues and code review, finish docs and make changes to prevent braking changes with the previous major version. <h3> 3. Did you get stuck anywhere? </h3> I had some problem regarding on how to cache the downloaded fast text language detection models in the package. Now it's implemented and fixed. <p><b>Thank you for reading!</b></p>gavishpoddar@hotmail.com (gavishpoddar)Mon, 05 Jul 2021 20:41:44 +0000https://blogs.python-gsoc.org/en/gavishpoddars-blog/weekly-check-in-3-26/GSoC Blog Post #2https://blogs.python-gsoc.org/en/gavishpoddars-blog/gsoc-blog-post-2/<h2> Weekly Update </h2> <p>Hello,</p> <p>It was a really fun and productive week.</p> <p> Few major milestones achieved this week were :- </p> <p> Writing tests and docs.</p> <p> Improving code and making language detection function easier by removing the requirement of apply_setting.</p><p></p> <p> Language Mapping (CLDR &amp; ISO 639).</p> <p> Check working with other settings along with current.</p> <p> Removing unsupported locale from language detecting.</p> <p> Functionality test of the implemented library and discovered the speed and accuracy.</p> <h3> Next Week </h3> <p>The plan for next week is to do a more robust testing, it's settings and then move to updating docs. Moreover setting default language detection, and updating the language translation data.</p> <p><b>Thanks for reading</b></p>gavishpoddar@hotmail.com (gavishpoddar)Mon, 28 Jun 2021 18:56:15 +0000https://blogs.python-gsoc.org/en/gavishpoddars-blog/gsoc-blog-post-2/Weekly Check-In #2https://blogs.python-gsoc.org/en/gavishpoddars-blog/weekly-check-in-2-17/<h2> Hi, community back with the second check-in.</h2> <h3> 1. What did you do this week? </h3> Fixes and Tests - This week I spend most of my time writing tests, settings, making the implemented functions simpler for the user and updating setup.py <h3> 2. What is coming up next? </h3> In the next week, I am looking forward to writing documentation, language mapping, create tests for settings and check workings with other settings. <h3> 3. Did you get stuck anywhere? </h3> Nothing major as such, I had some minor problems with the tests. <p><b>Thank you for reading!</b></p>gavishpoddar@hotmail.com (gavishpoddar)Tue, 22 Jun 2021 18:14:14 +0000https://blogs.python-gsoc.org/en/gavishpoddars-blog/weekly-check-in-2-17/Weekly Blog #1https://blogs.python-gsoc.org/en/gavishpoddars-blog/weekly-blog-1-1/<h2> Weekly Update </h2> <p>Hello,</p> <p>We have had 1 week of coding till now and overall am pleased with the progress of the project. It has been going smoothly without too many major issues we have had frequent feedbacks from mentors are a great help for the development.</p> <p>This weak was mostly spent on implementing the functions for the language detection function and creating additional settings.</p> <p> Implementation of an optional language detection function for the <code>dateparser</code> library which allows to plugin language detection library and along with a few selected internally supported library. </p><p> The Implemented function takes <code>text</code> and settings as parameter and returns list with language codes which is further used by the dateparser for the translation for the date-time. The function is passed into dateparser as parameter. Any one may plugin there own function to the dateparser for language detection.</p><p></p> <p> The biggest issue we faced this week was on how to make the minimum file import requirements and allow easier use in CLI, Python Notebooks. We solved the issue by actually passing the function as a parameter to the dateparser's parse function.</p> <p>Implemented settings include: </p><ol> <li><code>DEFAULT_LANGUAGE </code> : Works as default language if no language is detected.</li> <li><code> LANGUAGE_DETECTION_STRICT_USE </code> : This setting makes the strict use of parsed languages only.</li> <li><code> LANGUAGE_DETECTION_CONFIDENCE_THRESHOLD </code> : Which helps set minimum confidence score of detected languages.</li> </ol> <p></p> <h3> Next Week </h3> <ol> <li>Functionality analysis of the implemented language detection.</li> <li>Mock test of <code> parse, search_dates and DateDataParser. </code></li> <li>Unit tests for language detect functions.</li> <li>Updating <code>setup.py</code> for optional language detection.</li> </ol> <p><b>Thanks for reading</b></p>gavishpoddar@hotmail.com (gavishpoddar)Tue, 15 Jun 2021 21:08:04 +0000https://blogs.python-gsoc.org/en/gavishpoddars-blog/weekly-blog-1-1/Weekly Post #1 ( 1st June - 7th June)https://blogs.python-gsoc.org/en/gavishpoddars-blog/weekly-post-1-1st-june-7th-june-1/gavishpoddar@hotmail.com (gavishpoddar)Tue, 15 Jun 2021 20:12:52 +0000https://blogs.python-gsoc.org/en/gavishpoddars-blog/weekly-post-1-1st-june-7th-june-1/Weekly Check-in #1https://blogs.python-gsoc.org/en/gavishpoddars-blog/weekly-check-in-1-14/<h2> Greetings, Python community! </h2> <p>Hi, I am Gavish, and this summer I am working to implementing an optional language detection library in dateparse and fixing issues related to search_date. This is the first of many upcoming weekly blog posts where I will describe in brief the work I have done in the previous week and my plans for the upcoming week.</p> <h3> 1. What did you do this week? </h3> From the beginning of the community bonding period, I am engaging with the community and exploring many different open-source libraries that could be implemented with the dateparser library. The rest of the time was spent on configuring a local development environment, exploring the library in-depth and understand tox. <h3> 2. What is coming up next? </h3> In the next week, I am looking to create the preview PR for the optional language detection. I would also work on tests for the optional language detection, and if time permits will implement the setting for the language detection. <h3> 3. Did you get stuck anywhere? </h3> Nope. I learned a lot from constant feedback from my mentors. It was an awesome week 😃. <p><b>Thank you for reading!</b></p>gavishpoddar@hotmail.com (gavishpoddar)Tue, 08 Jun 2021 06:47:55 +0000https://blogs.python-gsoc.org/en/gavishpoddars-blog/weekly-check-in-1-14/