vipulgupta2048's Blog

The week that has been @ 2048

vipulgupta2048
Published: 07/16/2019

Week #8 10/07 to 16/07

I just realised that there aren’t many weeks left. Good times like these should never end. 

 

What did you do this week?

Good news, my PR for the validator has finally been merged. I am proud of it, great things coming forward → https://github.com/vipulgupta2048/spidermon/pull/2

Worked on finishing up the Translator as well, we had a change in direction in how we are going ahead on writing the tests for that class. I feel with the guidance of Julio especially on figuring out how to think about writing better tests really helped me out. Also, something extremely useful that I realized with using TDD in my thinking and coding is that while testing only, I find several edge cases that I never would have thought about. Check this out.

 

*After testing*

> r"^required field$":messages.MISSING_REQUIRED_FIELD,

This message allows only "required field" string to be passed. Which is what is needed, and works great. Here's the catch

 

*Earlier before testing I had,*

> r"required field":messages.MISSING_REQUIRED_FIELD,

Which led to the passing of all these string as well.

-  "not found required field"

- "aa required field aa"

- "required field almost anything here" and they all were getting translated.

 

Without testing, this would have lead to all kinds of troubles and TypeErrors. I am thankful to say the least, that testing has now become an integral part of my development work. Hence, the quote 

 

Good things happen when we test.

                                             - Vipul Gupta (2019-20)

 

What is coming up next? 

Start with the refactoring of the itemvalidation pipelines. Since that’s a more important task in hand. And now is priority one for the Team Cerberus

Here’s the big feature missing that I will also be tackling.

 

Schema = {'quotes': {'type': ['string', 'list'], 'schema': {'type': 'string'}}}

Data = {'quotes': [1, 'Heureka!']}

Error found while testing - 

TypeError: {'quotes': [{0: ['must be of string type']}]}

 

About this, this is _something special_ with Cerberus.

- *Reference* - https://docs.python-cerberus.org/en/stable/validation-rules.html#type 

- *Context* - To introduce some diversity into the tests, I added this type of schemas where you can have multiple parameters set to as `type` to your values, the `schema` key governs what actually would be the type of your internal schema.

- *About the Error* - I actually added the comments there, because the error we are getting is actually a parsing problem with the `Validator.py` parent class.

Usually, the errors we get and we parse are in the form of {field_name:message}, but here we are getting {field_name: {Array_element: message}} which I think is causing a typeError and something previous developers didn't account for since they never saw it coming with Cerberus. Cerberus is pretty good at showing detailed errors, hence I mentioned something related to not adding all the messages into the translator. But, this is something good that we caught here, as it would have never fit our use case in the future... Well, at least that's what my theory is.

 

Did you get stuck anywhere?

Yep, and I have been communicating a lot more with my nimble questions. I feel quite better asking, answering and discussing problem. Glad to figure that one out from my 1st eval. Quite happy with the work that’s happening as well. 

 

View Blog Post

The week that has been @ 2048

vipulgupta2048
Published: 07/09/2019

 

Week #7 03/07 to 09/07

What did you do this week?

I tested. A LOT!

Well, this week I have been testing, refactoring and rethinking quite a lot of components for both the validator as well as the translator. By rethinking components, I mean, I rewrote the same 100 lines of code of the validator, about three times now. Improving it so much that. Git almost shows the changes as upward of 70% on every commit. There are some great changes being suggested from the detailed reviews of Renne and Julio on my pull request. I feel that I know quite a lot of new things about Cerberus and it’s working. 

(Vipul is content with the progress, and lately, the mentors are too so everyone is happy.)

We are getting close with the validator, almost mergeable. Check out the PR here, let us know what more we could be doing - https://github.com/vipulgupta2048/spidermon/pull/2 

The translator is as ready as it can get. We have to just keep on writing unit tests for it and adding new messages for the errors be passed through it well. 

 

What is coming up next? 

We will be finishing the translator this week only, and starting with the refactoring of the itemvalidation pipelines as soon as possible. Since that’s a more important task in hand. 

 

Did you get stuck anywhere?

I have been asking several mini-questions to my mentors regarding code, best practices, how is the best way to get X done. I aim for them to take less time as possible. I think, it's working because nowadays I feel I am more commited to the project and able to get a lot more done. And that’s a good thing. At least for me. 

 

That’s about it, thank you for reading. How about this time, we have some comments to see if these small posts are even read to the end. I do make a good effort in making them fun. Writing is something that I enjoy doing. 

 

This is vipulgupta2048 signing out, don’t forget to comment!

 

View Blog Post

1st Eval, Mistake working remotely, and the special week that has been @ 2048

vipulgupta2048
Published: 07/02/2019

 

Week #6 26/06 to 02/07

Well, I survived the first evaluation as you can all see. Made some mistakes along the way, recovered with the advice from my mentors and hopefully going strong into work period 2. Let’s talk shop, yes. 

Since this is a special blog. I will be asking the questions... and going to answer them. 

 

What did you do this week and what is coming up next? 

I worked. Most people take the week off in the evaluation week. But, I know one thing for certain that when the college reopens in July then the pressure will start to pile up a bit. And, these days really can help make a difference in tough weeks ahead or easy sailing. Sharing some experience I had from GSoC 2018. 

If you take a look here - https://github.com/vipulgupta2048/spidermon/projects/1

One of the main components of Cerberus which is validating is now in review, thoroughly tested and extremely powerful. I think Cerberus would be a worthy addition to the validation pipeline. The other component that is the translator, has also been created and will be finished as we go along. The next major task that I would like to take on is, having Cerberus to play nice with other pipelines that are Schematics and JSONSchema. Most of that work has been done as well, but it doesn’t work so there is a need for debugging that is all. So, all in all, good work, in the next meeting, my mentors and I will assess and review the milestones for the next period.

 

What did you love about working with ScrapingHub?

The thing that truly loved about ScrapingHub is the feeling of working remotely, with some good amount of discipline, and commitment. Google Summer of Code provides us with a great opportunity to truly improve upon on our work, skills and push us out of our comfort zone. I feel great, being able to learn so many things on the fly as well as getting guidance from my awesome mentors. But, it doesn’t push us into a working schedule. There is work that needs to be done for the week and as someone who loves chasing deadlines under pressure, I usually was doing work only around the time of weekends. Well, until I started working with ScrapingHub. 

The Scrapy Project and ScrapingHub has been great, I have been getting some good challenges to work towards, and lately, have resolved my shortcomings related to communication as well as the work that was needed to be done. I feel that has a good change coming in, I don’t work that hard. I distribute time evenly over the day, still write a lot of blogs, break down my tasks into smaller bits and look for feedback wherever possible. Life’s good working with ScrapingHub.  
 

What has the 1st work period taught you in terms professionally as well as mentally? 

The 1st work period helped me realize that things are almost almost never as simple as it seems. The more and more time I spent reading the code, documentation, trying to build a bigger picture in my head. The more I understood how big of a task I am undertaking, this also helped in reassessing the time as well recalibrate the effort that was being put into it. I learned about debugging, testing, documentation, module management, python packaging, absolute and relative imports. Defaultdicts, __new__, list comprehensions, code readability, code coverage, logging, and tons of best practices. I am looking forward to learning even more, faster. Leveling up my Python, one step at a time. 
 

That’s about it for the time, folks. 

Live in the mix, this is vipulgupta2048 signing off.

View Blog Post

[#5] The week that has been @ 2048

vipulgupta2048
Published: 06/26/2019

 

Week #5 19/06 to 25/06

The first evaluation is here, got done with a milestone and took a small break for a personal event. 

What did you do this week?

This week I had to attend a marriage and hence took a leave from work. I informed my mentors early of my absence from 23th to 25th June, did the work for the week early and now writing the blog post. This week, I finished implementing the validate method of Cerberus finally, previously I did the mistake of not implementing through the previous pipeline, hence it was returning the wrong output. Here’s a snippet of its correct working.   

>>> from spidermon.contrib.validation.cerberus.validator import CerberusValidator

>>> validator = CerberusValidator({'number': {'type': 'number'}, 'name': {'type': 'string'}})

>>> validator.validate({"name": "sda","number":9})
(True, defaultdict(, {}))

>>> validator.validate({"price":59,"name": 7,"number":"This is cool"})
(False, defaultdict(, {'name': ['must be of string type'], 'number': ['must be of number type'], 'price': ['unknown field']}))

I learned about defaultdict and @property decorators as well as several things about the existing validator pipeline. Kudos to Renne for having the patience to help me understand it.


What is coming up next? 

Now, we write unittest for the validator following a simple yet effective TDD approach and working towards making the translator of the validator. My college is opening soon and hence would like to get more work done. Next, up I am writing a special post about defaultdict as well as Python decorators.

Did you get stuck anywhere?

Yes, working remotely is quite a new experience for me. With GSoC, I often try to make the most of it. Somewhere I feel I am lacking, and need to be more disciplined. I thought for this section, I should at least once to talk physcologically rather than problems I am having in my code. Which there is no shortage of at any given moment of time. 

See you all next week if I get through the evaluation, fingers crossed. This is Vipul Gupta signing out!

View Blog Post

[#4] The week that has been @ 2048

vipulgupta2048
Published: 06/19/2019

 

Week #4 12/06 to 18/06

Well, this has been another rather testing week.

What did you do this week?

We are trying to complete the Validate function for Cerberus, and get it tested for integration. It’s all coming along really well. I got to learn several new tools and services such as IPDB, PyTest, as well as working to add logging for errors in my codebase. I think, things could be better. And that’s what I will be working hard over the next week.

Here’s the project we are going to follow - https://github.com/vipulgupta2048/spidermon/projects/1

What is coming up next?

I will be preparing to get into the best shape possible for the first evaluation. I have set some goals for myself regarding the integration of Cerberus. I would like to work hard towards completing each and every one of them to the best of my knowledge.

I wrote a blog about Spidermon Validation pipeline, and how it work - https://mixstersite.wordpress.com/2019/06/16/sprinkling-some-insight-into-how-spidermon-validation-pipelines-works/

Did you get stuck anywhere?

Several times, regarding debugging the code with IPDB, but Julio helped me fix and validate the process that I was applying including suggesting me to go for PyTest.

View Blog Post