siddharthadr's Blog

GSoC Final Report

siddharthadr
Published: 08/18/2021

Summary

GSoC comes to an end this week. Here I have presented my contributions to the Scrapy project that I made this summer. All the features that I have proposed were implemented. The last feature had to be divided into 2 parts: soft limits and hard limits. The soft limits part has been implemented but hard limit is yet to be completed.

Contributions

1. Issues

2. Pull Requests 3. Blog Posts: All my blog posts can be found here.

Future Work

Though the features have been implemented, there are scope of improvements:

  • Make hard limits for batch delivery triggers
  • Switching batches based on item content
  • batch scheduling based on cron expression

View Blog Post

Blog Post #5

siddharthadr
Published: 08/15/2021

The Past Week

Last week was spent improving documentation about the final feature - batch triggers. Hopefully the feature is now in a merge-able state. This feature can be improved a lot though so I'll continue to work on it.

Phasing Out..

This week brings an end to GSoC. It has been a wonderful experience for me! I'd like to thank my mentors Adrián Chaves and Aditya Kumar for their support and help. Scrapy has been a cool project to work on and I hope I continue to contribute to it.

View Blog Post

Weekly Check-in #6

siddharthadr
Published: 08/08/2021

What did you do this week?
A. I am almost done with soft limits for batch delivery triggers.

What is coming up next?
A. Up next is converting the soft limits to hard limits.

Did you get stuck anywhere?
A. Nothing major.

View Blog Post

Blog Post #4

siddharthadr
Published: 08/01/2021

The Past Week

My University classes have finally began and with that, placement season as well so I am kinda getting behind my schedule. Though I managed to write tests for file-size and duration batch delivery triggers. As these are soft limits, creating tests for them was tricky.

Work Done So Far

The added triggers are called soft limits because the limits are only checked when an item is scraped. So the triggers will not always be triggered accurately. I was a little unsure how to proceed with the tests but I affirmed with my mentor's suggestion and proceeded with that. So tests for soft limits can have 3 different general cases(presuming we have 2 total items): i) when the limit is zero(so no limits are imposed), ii) the smallest unit is the limit(so only 1 item is accepted), iii) when the limit causes at most 1 item per batch iv) when the limit causes at most 2 items per batch.

Work Ahead

I have yet to add tests for custom batch triggers. With addition of documentations I will finalize my soft limit triggers. After that I will have to convert the soft limits to hard ones.

View Blog Post

Weekly Check-in #5

siddharthadr
Published: 07/25/2021

What did you do this week?
A. I added soft limits for file size and time duration as batch triggers. I tried figuring out ways to find hard limits for them as well.

What is coming up next?
A. Finalize plans for hard limits and start its implementation.

Did you get stuck anywhere?
A. I am still in the middle of figuring out hard limits for time duration and file size.

View Blog Post
DJDT

Versions

Time

Settings from gsoc.settings

Headers

Request

SQL queries from 1 connection

Static files (2312 found, 3 used)

Templates (28 rendered)

Cache calls from 1 backend

Signals

Log messages