Hi, community
Part of the journey is the end. It is time for me to work on my final work report for final evaluation of Google Summer of Code 2021. This week, I will devote my time mainly to write my final report.
Final Work Submission Report
- Name: Gavish Poddar
- Organisation: Python Software Foundation
- Sub-Organisation: Zyte
- Project:
dateparser
Better language detection & reimplementingsearch_dates
- Proposal: dateparser - better language detection
Hello Everyone! My name is Gavish Poddar and I'm excited to tell you about my GSoC journey. For the past couple of months, I have been working on an awesome project dateparser
. The dateparser aims to parse datetime
from a string.
My GSoC journey would not have been successful without the guidance of my mentors Marc Hernández, Konstantin Lopuhin and Kishan Mehta.
What I Have Learned?
The whole GSoC journey was full of learning thanks to my mentors. I learned how to find good open source dependencies to include in our project. I tried my hands on improving code coverage and writing tests for the code. I learned how to optimize code and the need for extensive research before feature addition.
What I Have Contributed?
As mentioned in my proposal I worked on the implementation of the Optional Language Detection for dateparser and fixing as many issues as possible in the search_dates
function of the dateparser
.
Optional Language Detection
PR - Optional Language Detection
Implemented optional language detection to improve language detection. This allows to plug in any language detection library with the dateparser. Out of the box, dateparser supports two libraries fasttext
and langdetect
. The optional language detection works with both parse and search_dates. This PR also introduces a new setting DEFAULT_LANGUAGES
which is used if no language is detected by default language detection and the optional language detection.
Reimplimenting search_dates
(extended goal)
PR - Reimplimenting search_dates
A reimplemented and simplified search_dates
improves the results and fixes many issues. The entire search_dates is newly implemented and would be easier to maintain. This PR introduces a new feature search_first_date
which returns the first date in the given string. This PR also fixes around 13 issues.
Other search_dates
improvements
Adding support for date-related objects last decade
, next decade
, etc in search_date
. This PR fixes 1 issue.
PR - Improvements in locale:translate_search fixes
Adding support search_date
period separator support. Date string like 23.12.2000
can be parsed. This PR fixes 5 issues.
PR - search_date
period separator support
Other Important Details
As part of our GSoC project, Python Software Foundation requires us to post a weekly blog where we usually post about what we have done in the week and what is coming up next. We can also write about any blockages or issues we are facing. I have also written my weekly blogs so if you want to know weekly details of my project you can refer them here.
Future Work and Final Note
The project is very actively maintained the new search_dates
and my contributions would improve the library. The main goal of the proposal is achived with the implimentation of the optional language detection and the PR is mergeable. I plan to keep working on the project and contribute as much as I can. Contribute to the search_dates
function of the library (dateparser
) would be my primary goal.
It was overall a wonderful experience and I learned a lot.
I would like to thank Google, Python Software Foundation and Zyte for providing me with the opportunity and my mentors Marc Hernández, Konstantin Lopuhin and Kishan Mehta.
Thank you for reading!