Weekly Check-in #10 : ( 26 July - 1 Aug )

anubhavp
Published: 07/30/2019

What did you do this week?

  • Improved performance of Protego by implementing lazy regex compilation.
  • Benchmark Results :
    • Time took to parse 570 `robots.txt` files :

Protego : 
25th percentile : 0.000134
50th percentile : 0.000340
75th percentile : 0.000911
100th percentile : 0.345727
Total Time : 0.999360

Rerp : 
25th percentile : 0.000066
50th percentile : 0.000123
75th percentile : 0.000279
100th percentile : 0.101409
Total Time : 0.317715

Reppy : 
25th percentile : 0.000028
50th percentile : 0.000038
75th percentile : 0.000063
100th percentile : 0.015579
Total Time : 0.055850

  • Time took to parse 570 `robots.txt` and answer 1000 queries per `robots.txt` :

Protego : 
25th percentile : 0.009057
50th percentile : 0.012806
75th percentile : 0.023660
100th percentile : 9.033481
Total Time : 21.999680

Rerp : 
25th percentile : 0.006096
50th percentile : 0.011864
75th percentile : 0.041876
100th percentile : 35.027233
Total Time : 68.811635

Reppy : 
25th percentile : 0.000858
50th percentile : 0.001018
75th percentile : 0.001472
100th percentile : 0.236081
Total Time : 1.132098

What is coming up next?

  • Will depend on the review from the mentors. If everything looks good to them, I would shift my focus back on Scrapy.

Did you get stuck anywhere?

  • Nothing major.
DJDT

Versions

Time

Settings from gsoc.settings

Headers

Request

SQL queries from 1 connection

Static files (2312 found, 3 used)

Templates (11 rendered)

Cache calls from 1 backend

Signals

Log messages