Hello, wandering pythonistas! The second week of GSoC coding period is coming to an end. Here is an update on what I achieved in the past week and what I am looking forward to.
What did you do this week?
I made few changes to the interface according to the feedback received from the mentors.
I implemented the interface on top of third party parsers like Robotexclusionrulesparser and Reppy.
Wrote tests for testing the implementation of interface on top of the two parsers. The tricky part was reducing duplication of code and keeping the test maintainable.
Modified Scrapy to use the new interface (instead of directly calling Python’s inbuilt RobotFileParser).
I had the weekly meeting with my mentors, where we discussed new stretch goals for the project.
What is coming up next?
It will depend on the feedback of the mentors. If everything seems good to them, I will focus my attention on writing a pure python robots.txt parser.
Did you get stuck anywhere?
Nothing major, though I had little difficulty due my lack of knowledge of difference between Python 2 and Python 3. I knew Python 3 uses unicode string by default, what I didn’t know is that in Python 3 `bytes` and `str` type are different. Hence, encoding a string produces an object of type `bytes`. This actually makes sense, having different types for string and arbitrary binary data.