Hi all , so we have had 3 weeks of coding till now and overall am pleased with the progress of the project. It has been going smoothly without too many major issues.
The three major milestones achieved this week were :-
1. Created a draft PR for incorporation with date-parser. The number-parser library is constantly improving but one of the major goals was also to incorporate it with other Scrapy libraries (primarily date-parser and price-parser). The incorporation needed to be seamless without too much modification to the date-parser base code. Once number-parser library improves we can add it as a dependency.
2. Issue creation and resolution on the number-parser library. Since we are done with the MVP , we can now move to a more organized structure where we are now discussing bugs/features and I am able to create PRs to target specific issues.
3. Implemented a parse_number feature that allows to parse single numbers written in natural language. eg) 'fifty seven' -> 57 , 'cats' -> None
The plan for next-week is to tackle the multi-language feature (starting with spanish,hindi,russian) and hopefully by the end of the week will have the pipeline to incorporate multiple languages in place.
arnav_k's Blog
Hey back with the second check in blog covering week 2.
What did you do this week ?
Fixes and features - still tweaking the number parser library now it handles multiple numbers (not separated by a delimiter and returns a set of words as opposed to a single word). I also experimented with the date-parser library and looked into the integration.
Did you get stuck anywhere ?
Nothing major as such . I was modifying, updating the overall structure of the library , which needed some research into best python practices.
What is coming up next ?
The next week involves completing the integration with date-parser and price-parser. Additionally hoping to handle date-specific years.
Weekly Update
Hey everyone number-parser is up and running ( number parser ). Do check it out and raise issues / feature request etc.
It was a really fun and productive first week and I have got the basic version done and will keep on refining it in the upcoming weeks.
So the procedure for the parser is as follows :-
- Identify all the words which are numbers / part of number in natural language ( hundred , twelve , seven , million )
- This list of token is passed to a number builder.
- The number is built by putting appropriate signs b/w the tokens (current value is multiplied on encountering a multiplier like hundred ,thousand , million etc)
- [ nine , hundred , and , seven , thousand] - parser would treat it as ( 9 * 100 + 7 ) * 1000 = 907000
The parser takes a string as input and only changes the words which are numbers. Thus non number words are ignored.
Most of the learning was in the setup needed to create the library. This involved configuring the setup.py and setting up a robust testing environment (tox). The mentors were really helpful and reviewed the code mid-week and gave important insights. Overall it was a smooth first week of coding with no major issues.
Next Week
The plan for next week is to do a more robust testing of the parser (adding more test-cases ) and then move to integration of the current work of number-parser with the date-parser library.
Hi, I am Arnav Kapoor a 3rd year Undergraduate student from IIIT-Hyderabad and I will be working with the Scrapinghub sub-org this summer. The project goal is to create a number-parser library to parse numbers in natural language and incorporate the same with existing libraries.
What did you do this week ?
The community bonding phase mostly involved researching more into the existing solutions, understanding the pros and cons of each. I also got to know the mentors and we have set up weekly meetings for the duration of the program.
Did you get stuck anywhere ?
No there weren't any hurdles as such.
What is coming up next ?
The next week involves creating a basic english only version which will gradually be built upon . It's time to begin coding and face the challenges as and when they come.