[Blog #4] Need For Speed

anubhavp
Published: 07/24/2019

The most of week 7 was spent making Protego compatible with Google's parser. I also worked on the documentation, since Protego codebase is small enough, proper comments and a good readme was sufficient. I uploaded Protego to PyPI - `pip install Protego` that's all it takes to install Protego.

Week 8 was quite interesting. For Protego to become default in Scrapy, it is necessary that it doesn’t throw any kind of error while parsing `robots.txt` files. To make sure that, I decided to download `robots.txt` from top 10,000 websites. I added tests to see if Protego throws any exceptions while parsing the downloaded `robots.txt`. I benchmarked Protego, and the results were quite disappointing. You can see the result here.

We decided to spend the next week improving performance of Protego. I am going to try profiling and heuristics, and see if the performance can be improved.

[Blog #4] Need For Speed

Versions

Time

Settings from gsoc.settings

Headers

Request

SQL queries from 1 connection

Static files (2312 found, 3 used)

Templates (11 rendered)

Cache calls from 1 backend

Signals

Log messages