Weekly Check-in #4: (14 Jun - 20 Jun )

anubhavp
Published: 06/18/2019

Hello! The third week of GSoC coding period is coming to an end. Here is an update on what I achieved in the past week and what I am looking forward to.

 

What did you do this week?

  • Created separate tox testing environments for testing integration with third-party parsers like Robotexclusionrulesparser and Reppy.

  • Made Travis use the new tox environments.

  • Described these parsers in Scrapy documentation.

  • Got Robotexclusionrulesparser to work with unicode user agents.

 

What is coming up next?

I will be working on creating a python based robots.txt parser which compliant with spec and supports modern conventions.

 

Did you get stuck anywhere?

Nothing major.  

View Blog Post

Weekly Check-in #3 : ( 7 Jun - 13 Jun )

anubhavp
Published: 06/08/2019

Hello, wandering pythonistas! The second week of GSoC coding period is coming to an end. Here is an update on what I achieved in the past week and what I am looking forward to.

 

What did you do this week?

  • I made few changes to the interface according to the feedback received from the mentors.

  • I implemented the interface on top of third party parsers like Robotexclusionrulesparser and Reppy.

  • Wrote tests for testing the implementation of interface on top of the two parsers. The tricky part was reducing duplication of code and keeping the test maintainable.

  • Modified Scrapy to use the new interface (instead of directly calling Python’s inbuilt RobotFileParser).

  • I had the weekly meeting with my mentors, where we discussed new stretch goals for the project.

 

What is coming up next?

It will depend on the feedback of the mentors. If everything seems good to them, I will focus my attention on writing a pure python robots.txt parser.

 

Did you get stuck anywhere?

Nothing major, though I had little difficulty due my lack of knowledge of difference between Python 2 and Python 3. I knew Python 3 uses unicode string by default, what I didn’t know is that in Python 3 `bytes` and `str` type are different. Hence, encoding a string produces an object of type `bytes`. This actually makes sense, having different types for string and arbitrary binary data.      

 

View Blog Post

Weekly Check-in #2 [31 May - 6 Jun]

anubhavp
Published: 06/02/2019

Hello everyone. The first week of GSoC coding period is coming to an end. Here is an update on what I achieved in the past week and what I am looking forward to.

 

What did you do this week?

I submitted my first pull request related to GSoC. This week mostly involved discussion on interface specification. I learned that designing an interface involves considering several small but quite important details, and a good practice is to question every choice you make. Also, I had a meeting with my mentors where we discussed about weekly milestones and decided to have weekly meetings, every Tuesdays. I implemented the interface on top of Python’s in-built robots.txt parser and worked on documentation related to the interface.

I got an opportunity to deep dive into the source code of Python’s in-built robots.txt parser.  For some reason, I always had this belief that reading through the implementation of python (or any language) and its inbuilt modules, would be difficult and not really useful, and code would mostly be complex and cryptic (to a beginner like me). This doesn’t seem to be the case (at least with python). I should do more of this, looking at a module’s implementation for fun •ᴗ• .   

What is coming up next?

In the next week, I am looking to finalize the interface, and modify Scrapy to use the interface to communicate with the parsers. I would also work on documenting the interface, and if time permits will implement the interface on top of few other parsers.

Did you get stuck anywhere?

Nope. I learned a lot from constant feedback from my mentors. It was an awesome week •ᴗ• 

View Blog Post

Weekly Check-in #1 [24 May - 30 May]

anubhavp
Published: 05/23/2019

Hey everyone. I am Anubhav, and this summer I am working to implement an interface for robots.txt parsers in scrapy. This is the first of many upcoming weekly blog posts where I will describe in brief the work I have done in the previous week and my plans for the upcoming week. So, let's get started. 

What did you do this week?

Most of time was spent on configuring a local development environment, and learning to use tox and how to run tests locally. For the patches I have submitted before, I didn't run tests locally beforehand, and relied solely on CI to do it. Running tests locally could have saved a lot of time. Also, I went through scrapy contribution guide, learned about twisted (scrapy uses it heavily) and PEP8, and worked on a pull request I had opened before.   

What is comping up next?

  • I will have my first meeting with mentors of the project.
  • I will work on few pull requests I had opened before.
  • Maybe, since this is the last week of community bonding period, looking to have discussion with the mentors regarding interface specification.

Did you get stuck anywhere?

I had minor difficulties understanding how to run tests using tox. When I followed the instructions given in scrapy documentation to run tests, I could only run tests using a Python 2.7 environment. Thankfully, tox has an incredible documentation that allowed to me understand settings inside of tox.ini config file. In the end, I just had to make few edits to my tox.ini file, and I was able to run tests using a Python 3 environment as well.

View Blog Post