arnav_k's Blog

Weekly Blog #2 ( 15th June - 22nd June)

arnav_k
Published: 06/21/2020

Hi all , so we have had 3 weeks of coding till now and overall am pleased with the progress of the project. It has been going smoothly without too many major issues.
The three major milestones achieved this week were :-
1. Created a draft PR for incorporation with date-parser. The number-parser library is constantly improving but one of the major goals was also to incorporate it with other Scrapy libraries (primarily date-parser and price-parser). The incorporation needed to be seamless without too much modification to the date-parser base code. Once number-parser library improves we can add it as a dependency.
2. Issue creation and resolution on the number-parser library. Since we are done with the MVP , we can now move to a more organized structure where we are now discussing bugs/features and I am able to create PRs to target specific issues.
3. Implemented a parse_number feature that allows to parse single numbers written in natural language. eg) 'fifty seven' -> 57 , 'cats' -> None

The plan for  next-week is to tackle the multi-language feature (starting with spanish,hindi,russian) and hopefully by the end of the week will have the pipeline to incorporate multiple languages in place.

View Blog Post

Weekly Check-In #2 ( 7th Jun - 14th Jun )

arnav_k
Published: 06/14/2020

Hey back with the second check in blog covering week 2.

What did you do this week ?
Fixes and features - still tweaking the number parser library now it handles multiple numbers (not separated by a delimiter and returns a set of words as opposed to a single word). I also experimented with the date-parser library and looked into the integration.

Did you get stuck anywhere ?
Nothing major as such . I was modifying, updating the overall structure of the library , which needed some research into best python practices.

What is coming up next ?
The next week involves completing the integration with date-parser and price-parser. Additionally hoping to handle date-specific years.

View Blog Post

Weekly Post #1 ( 1st June - 7th June)

arnav_k
Published: 06/07/2020

Weekly Update

Hey everyone number-parser is up and running  ( number parser ). Do check it out and raise issues / feature request etc.

It was a really fun and productive first week and I have got the basic version done and will keep on refining it in the upcoming weeks.
So the procedure for the parser is as follows :-

  • Identify all the words which are numbers / part of number in natural language ( hundred , twelve , seven , million )
  • This list of token is passed to a number builder.
  • The number is built by putting appropriate signs b/w the tokens (current value is multiplied on encountering a multiplier like hundred ,thousand , million etc)
    • [ nine , hundred , and , seven , thousand]  - parser would treat it as  ( 9 * 100 + 7 ) * 1000 = 907000

The parser takes a string as input and only changes the words which are numbers. Thus non number words are ignored.

Most of the learning was in the setup needed to create the library. This involved configuring the setup.py and setting up a robust testing environment (tox). The mentors were really helpful  and reviewed the code mid-week and gave important insights. Overall it was a smooth first week of coding with no major issues.

 

Next Week

The plan for next week is to do a more robust testing of the parser (adding more test-cases ) and then move to integration of the current work of number-parser with the date-parser library.

 

View Blog Post

Weekly Check-In #1 - Community Bonding ( 4th May - 31st May )

arnav_k
Published: 05/30/2020

Hi, I am Arnav Kapoor a 3rd year Undergraduate student from IIIT-Hyderabad and I will be working with the Scrapinghub sub-org this summer. The project goal is to create a number-parser library to parse numbers in natural language and incorporate the same with existing libraries.

What did you do this week ?
The community bonding phase mostly involved researching more into the existing solutions, understanding the pros and cons of each. I also got to know the mentors and we have set up weekly meetings for the duration of the program.

Did you get stuck anywhere ?
No there weren't any hurdles as such.

What is coming up next ?
The next week involves creating a basic english only version which will gradually be built upon . It's time to begin coding and face the challenges as and when they come. smiley

View Blog Post