The week that has been @ 2048

vipulgupta2048
Published: 08/12/2019

Week #12 7/08 13/08

Well, as far as the flow goes CerberusValidator works with schemas that are in Mapping structure. Basically any dicts with values as dict having types of values. If you don’t get it, then check this out https://docs.python-cerberus.org/en/stable/

But, Cerberus only cares for the schema and data which its getting from the user. Not from where it gets it. Almost most of our users will be giving the schema in the from either URL or paths to files. Which is fine by us until the point somewhere in week 12 where I forgot to code that properly into the code. Nothing to be afraid, had to redo some old functions. Actually improved a lot of old code in the process. How time flies by. Damn.

Not much is left to be done, except write a few more tests, and a lot of testing. And merging it to master. I am confident we can make it before August 19. Let’s see. Fingers crossed. This is vipulgupta2048 signing off for the second last time here. I won’t be going anywhere if you think. 
 

There is a lot of work to be done at ScrapingHub x The Scrapy Project. 
Looking forward to new challenges. 
Check progress on --> 
https://github.com/vipulgupta2048/spidermon/projects/1

View Blog Post

The week that has been @ 2048

vipulgupta2048
Published: 08/06/2019

<meta charset="utf-8">

Week #11 31/08 to 06/08

 

What did you do this week?

I wrote my docs. I broke all the tests. I shifted quite a lot of code around. 

And now, I am fixing it all up. Thank God for git!

 

Documentation is critical for any open-source project. And as an avid documentation writer I have a lot of experience writing docs. It’s something I feel good doing. Cerberus docs are no different. I worked on 3 PR’s this week, 

 

#5 being the old Cerberus Integration PR, whose tests are still being written - https://github.com/vipulgupta2048/spidermon/pull/5 

#6 being the Docs PR - https://github.com/vipulgupta2048/spidermon/pull/6

#500 being the Cerberus PR which I opened long ago to add new examples to the Cerberus documentation - https://github.com/pyeve/cerberus/issues/500

 

Working on full steam ahead for the last week. 

 

What is coming up next? 

Thankfully, just 13 more tasks. Well, I am somewhat of a over-enthusiastic person when it comes to opening project cards. So, I have a lot of personal work to be done. 

 

Get all the latest updates here - https://github.com/vipulgupta2048/spidermon/projects/1

 

Did you get stuck anywhere?

Pytest-mock took a lot of understanding strangely. I still don’t get it. Not for long, not for long. 

 

View Blog Post

We are in the endgame NOW @ 2048

vipulgupta2048
Published: 07/30/2019

 

<meta charset="utf-8">

Week #10 24/07 to 30/07

Well, only 2 weeks and some days left to go. Oh boy, the time it has been. I wish to keep working if they let me. 
 

What did you do this week?

Integration finally worked out !! You know what that means? That mean, my project is almost complete. <meta charset="utf-8">Here’s an informal take on how the week went, it was bumpy codewise, but we made it through to this outcome. 

To be very frank, Julio. I haven't had my fair share of practice with comprehensions in Python and this took a minute to figure out as did the entire test_pipelines.py and pipelines.pywhich took days to get through. This isn't complex Python, it's good code but there is just so much going on and I am not sure if the tests that I created are the best possible because I kept going back and forth between the code not able to figure what is the output from what function in this part of the code. As one can't just throw logging statements and run the file or project that we normally do. And I just wanted to do it on my own at that point, because I thought a bit more effort into this last bit and things might get clearer. And they did. I am happy that I did the work that was needed.

At one point on the Sunday night, I just gave up and initialized the ItemValidationPipeline(), imported everything just to see what was going on line by line. Good hunting. I am happy that it worked out (Cerberus Integration), but not happy with the tests and would like to make it better. Codewise.

 

What is coming up next? 

We are left with unittests for the pipelines Cerberus integrated bit, documentation for the features and last but not least system testing. Here’s an informal take on how the week went. 

 

Did you get stuck anywhere?

I am not sure this question brings me joy to answer. 

So, I say yes!! I got stuck in a lot of places around this weekend. But, I am proud to say with the guidance of my mentors and some of my will. The confidence to debug the lines of code written this week was never broken, and will never be broken. Thank you everyone who helped!

View Blog Post

The week that has been @ 2048

vipulgupta2048
Published: 07/23/2019

 

<meta charset="utf-8">

Week #9 17/07 to 23/07

 

Well, integration isn’t working out, and neither am I giving up. Also, Evaludation 2 coming up!

What did you do this week?

Well, another week another PR merged. Do those PR’s count that you merge yourself - https://github.com/vipulgupta2048/spidermon/pull/4, Translator has now been officially completed. 

Over to integration, the ride has been quite bumpy. As Cerberus is not being detected by the pipeline, no worries. We have settled on a methodology to solve this problem. We will first check if CerberusValidator works in the ItemValidationPipeline if it does then Spidermon works. And then we will start worrying why it doesn’t work in other places. 

Oh and I found a bug - https://github.com/scrapinghub/spidermon/issues/192

 

What is coming up next? 

For now, if you like to know. We will be completing the ItemValidationPipeline, then moving onto integration which rounds this project up successfully. 

 

Did you get stuck anywhere?

Don’t even ask, I somehow wasn’t able to install Scrapy in my first go (didn’t read the docs) and couldn’t implement the JSONSchemaValidator (didn’t read the docs enough times, with a magnifying glass). So yeah, bumpy.

View Blog Post

The week that has been @ 2048

vipulgupta2048
Published: 07/16/2019

Week #8 10/07 to 16/07

I just realised that there aren’t many weeks left. Good times like these should never end. 

 

What did you do this week?

Good news, my PR for the validator has finally been merged. I am proud of it, great things coming forward → https://github.com/vipulgupta2048/spidermon/pull/2

Worked on finishing up the Translator as well, we had a change in direction in how we are going ahead on writing the tests for that class. I feel with the guidance of Julio especially on figuring out how to think about writing better tests really helped me out. Also, something extremely useful that I realized with using TDD in my thinking and coding is that while testing only, I find several edge cases that I never would have thought about. Check this out.

 

*After testing*

> r"^required field$":messages.MISSING_REQUIRED_FIELD,

This message allows only "required field" string to be passed. Which is what is needed, and works great. Here's the catch

 

*Earlier before testing I had,*

> r"required field":messages.MISSING_REQUIRED_FIELD,

Which led to the passing of all these string as well.

-  "not found required field"

- "aa required field aa"

- "required field almost anything here" and they all were getting translated.

 

Without testing, this would have lead to all kinds of troubles and TypeErrors. I am thankful to say the least, that testing has now become an integral part of my development work. Hence, the quote 

 

Good things happen when we test.

                                             - Vipul Gupta (2019-20)

 

What is coming up next? 

Start with the refactoring of the itemvalidation pipelines. Since that’s a more important task in hand. And now is priority one for the Team Cerberus

Here’s the big feature missing that I will also be tackling.

 

Schema = {'quotes': {'type': ['string', 'list'], 'schema': {'type': 'string'}}}

Data = {'quotes': [1, 'Heureka!']}

Error found while testing - 

TypeError: {'quotes': [{0: ['must be of string type']}]}

 

About this, this is _something special_ with Cerberus.

- *Reference* - https://docs.python-cerberus.org/en/stable/validation-rules.html#type 

- *Context* - To introduce some diversity into the tests, I added this type of schemas where you can have multiple parameters set to as `type` to your values, the `schema` key governs what actually would be the type of your internal schema.

- *About the Error* - I actually added the comments there, because the error we are getting is actually a parsing problem with the `Validator.py` parent class.

Usually, the errors we get and we parse are in the form of {field_name:message}, but here we are getting {field_name: {Array_element: message}} which I think is causing a typeError and something previous developers didn't account for since they never saw it coming with Cerberus. Cerberus is pretty good at showing detailed errors, hence I mentioned something related to not adding all the messages into the translator. But, this is something good that we caught here, as it would have never fit our use case in the future... Well, at least that's what my theory is.

 

Did you get stuck anywhere?

Yep, and I have been communicating a lot more with my nimble questions. I feel quite better asking, answering and discussing problem. Glad to figure that one out from my 1st eval. Quite happy with the work that’s happening as well. 

 

View Blog Post