vipulgupta2048's Blog

Return GSoC; // Week that has been @ 2048

vipulgupta2048
Published: 08/23/2019

We are done,

Firstly checkout the pull request for the work product - https://github.com/scrapinghub/spidermon/pull/201

Project Repo - https://github.com/vipulgupta2048/mygsoc

All tasks have been completed as per project proposal. 

Cerberus validation library has now been integrated with Spidermon and its validation pipelines. Where users would be able to test their data items on custom schemas defined by them easily and with or no configuration. 

It brings me great joy to end on a fulfilling note for contributing to Spidermon and the Scrapy Project as part of Google Summer of Code 2019, I am happy and content with the work produced. 

The PR includes,

  • CerberusValidator() class for item validation through Cerberus.
  • Translator for translating errors for a better, unified system working with other validation methods.
  • Complete integration with Scrapy pipelines, working with raw schema, URL's, and paths.
  • Unit + integration tests for each component in place.
  • Documentation for Cerberus Validation method.

For system testing, one could go ahead and use the pre-configured Quotes spider https://github.com/vipulgupta2048/testing_quotes and installing Spidermon from the master branch of my fork.

This project has been completed with long nights of reading and writing the code, learning new concepts on the fly and asking hundreds of pop-questions on Slack, that were answered duly by my mentors @ejulio @rennerocha as without their constant help, motivation, and guidance completing this uphill task wouldn't be ever possible.

Thank you all for reading, 

You can check out more blogs here - https://mixstersite.wordpress.com/gsoc/

View Blog Post

The week that has been @ 2048

vipulgupta2048
Published: 08/12/2019

Week #12 7/08 13/08

Well, as far as the flow goes CerberusValidator works with schemas that are in Mapping structure. Basically any dicts with values as dict having types of values. If you don’t get it, then check this out https://docs.python-cerberus.org/en/stable/

But, Cerberus only cares for the schema and data which its getting from the user. Not from where it gets it. Almost most of our users will be giving the schema in the from either URL or paths to files. Which is fine by us until the point somewhere in week 12 where I forgot to code that properly into the code. Nothing to be afraid, had to redo some old functions. Actually improved a lot of old code in the process. How time flies by. Damn.

Not much is left to be done, except write a few more tests, and a lot of testing. And merging it to master. I am confident we can make it before August 19. Let’s see. Fingers crossed. This is vipulgupta2048 signing off for the second last time here. I won’t be going anywhere if you think. 
 

There is a lot of work to be done at ScrapingHub x The Scrapy Project. 
Looking forward to new challenges. 
Check progress on --> 
https://github.com/vipulgupta2048/spidermon/projects/1

View Blog Post

The week that has been @ 2048

vipulgupta2048
Published: 08/06/2019

<meta charset="utf-8">

Week #11 31/08 to 06/08

 

What did you do this week?

I wrote my docs. I broke all the tests. I shifted quite a lot of code around. 

And now, I am fixing it all up. Thank God for git!

 

Documentation is critical for any open-source project. And as an avid documentation writer I have a lot of experience writing docs. It’s something I feel good doing. Cerberus docs are no different. I worked on 3 PR’s this week, 

 

#5 being the old Cerberus Integration PR, whose tests are still being written - https://github.com/vipulgupta2048/spidermon/pull/5 

#6 being the Docs PR - https://github.com/vipulgupta2048/spidermon/pull/6

#500 being the Cerberus PR which I opened long ago to add new examples to the Cerberus documentation - https://github.com/pyeve/cerberus/issues/500

 

Working on full steam ahead for the last week. 

 

What is coming up next? 

Thankfully, just 13 more tasks. Well, I am somewhat of a over-enthusiastic person when it comes to opening project cards. So, I have a lot of personal work to be done. 

 

Get all the latest updates here - https://github.com/vipulgupta2048/spidermon/projects/1

 

Did you get stuck anywhere?

Pytest-mock took a lot of understanding strangely. I still don’t get it. Not for long, not for long. 

 

View Blog Post

We are in the endgame NOW @ 2048

vipulgupta2048
Published: 07/30/2019

 

<meta charset="utf-8">

Week #10 24/07 to 30/07

Well, only 2 weeks and some days left to go. Oh boy, the time it has been. I wish to keep working if they let me. 
 

What did you do this week?

Integration finally worked out !! You know what that means? That mean, my project is almost complete. <meta charset="utf-8">Here’s an informal take on how the week went, it was bumpy codewise, but we made it through to this outcome. 

To be very frank, Julio. I haven't had my fair share of practice with comprehensions in Python and this took a minute to figure out as did the entire test_pipelines.py and pipelines.pywhich took days to get through. This isn't complex Python, it's good code but there is just so much going on and I am not sure if the tests that I created are the best possible because I kept going back and forth between the code not able to figure what is the output from what function in this part of the code. As one can't just throw logging statements and run the file or project that we normally do. And I just wanted to do it on my own at that point, because I thought a bit more effort into this last bit and things might get clearer. And they did. I am happy that I did the work that was needed.

At one point on the Sunday night, I just gave up and initialized the ItemValidationPipeline(), imported everything just to see what was going on line by line. Good hunting. I am happy that it worked out (Cerberus Integration), but not happy with the tests and would like to make it better. Codewise.

 

What is coming up next? 

We are left with unittests for the pipelines Cerberus integrated bit, documentation for the features and last but not least system testing. Here’s an informal take on how the week went. 

 

Did you get stuck anywhere?

I am not sure this question brings me joy to answer. 

So, I say yes!! I got stuck in a lot of places around this weekend. But, I am proud to say with the guidance of my mentors and some of my will. The confidence to debug the lines of code written this week was never broken, and will never be broken. Thank you everyone who helped!

View Blog Post

The week that has been @ 2048

vipulgupta2048
Published: 07/23/2019

 

<meta charset="utf-8">

Week #9 17/07 to 23/07

 

Well, integration isn’t working out, and neither am I giving up. Also, Evaludation 2 coming up!

What did you do this week?

Well, another week another PR merged. Do those PR’s count that you merge yourself - https://github.com/vipulgupta2048/spidermon/pull/4, Translator has now been officially completed. 

Over to integration, the ride has been quite bumpy. As Cerberus is not being detected by the pipeline, no worries. We have settled on a methodology to solve this problem. We will first check if CerberusValidator works in the ItemValidationPipeline if it does then Spidermon works. And then we will start worrying why it doesn’t work in other places. 

Oh and I found a bug - https://github.com/scrapinghub/spidermon/issues/192

 

What is coming up next? 

For now, if you like to know. We will be completing the ItemValidationPipeline, then moving onto integration which rounds this project up successfully. 

 

Did you get stuck anywhere?

Don’t even ask, I somehow wasn’t able to install Scrapy in my first go (didn’t read the docs) and couldn’t implement the JSONSchemaValidator (didn’t read the docs enough times, with a magnifying glass). So yeah, bumpy.

View Blog Post
DJDT

Versions

Time

Settings from gsoc.settings

Headers

Request

SQL queries from 1 connection

Static files (2312 found, 3 used)

Templates (28 rendered)

Cache calls from 1 backend

Signals

Log messages