vipulgupta2048's Blog

[#3] The Week that has been @ 2048

vipulgupta2048
Published: 06/11/2019

If the distance is the path traveled between 2 points and displacement between 2 points is the shortest path you can take to reach your destination from the initial point. Then, I say after making full circles this week my overall displacement is 0. But, I am sure as hell have come a long way in learning more about Python as a programming language by just reading, understanding and implementing new code concepts than ever before.

What did you do this week?

I still worked on implementing the Cerberus pipeline - validate() feature. Struggled with some errors but Stackoverflow along with some awesome Python Packaging docs were always there for me. This is taking a bit longer than I realized myself, as I now understand the code and ever wrote the implementation (Which can be found here - https://github.com/vipulgupta2048/spidermon/tree/cerberus) several bugs stand in my way to perfect it. Hence, will continue working on that.


What is coming up next?

Next up, the most immediate task to fix, refactor and get the basic Cerberus validation up and running. This is a priority task for more, as it will be critical for me to get this done to get in a better position of passing Round 1 Evals. As I plan to give my project, more focus this next week. I will plan to finish my other tasks, my side project and solve a PR as well. Things should be looking way better in the next report.


Did you get stuck anywhere?

Oh! Tons of bugs, mistakes, and errors were encountered this week. With a lot of time figuring out the Python Packaging and how to install local packages. As I took the rudimentary approach of repackaging Spidermon every time I made a change to it. To my surprise, the -e flag in pip install can help us install local packages without the need to re-package. Kudos to Renne for his guidance. I must have never actually made sense to the error that I was getting.

View Blog Post

[#2] The week that has been @ 2048

vipulgupta2048
Published: 06/04/2019

 

<meta charset="utf-8">

Week #2 - 28/05 to 04/06

Well, this has been a good week of learning about new things, revising old concepts and reading implementation of one of the oldest modules in Python to understand the idea behind Python Packaging. I feel bad about not able to write a lot of code,  but I think without understanding the existing code base the way forward would have been fruitless, and more disappointing. So, let’s start by answering our 3 infamous questions and later give you a broad picture into Python Packaging as I will try to explain it to you like a 5-year-old.
 

What did you do this week?

Well, the PR’s that I was working for regarding some critical fixes to docs have been merged. Thanks to Renne, Adrian and Julio for getting my first PR merged. I also worked on creating draft pipeline of validation of data through cerberus. Got completely sidetracked in that regard as I got busy trying to make sense of the entire code flow and answering some serious questions about python packaging, spidermon directory structure, as well as new things such as

  1. Absolute vs relative imports - Why we use them?

  2. Code Quality, linting, testing, deployment, and best practices to make it better.

  3. Decorators in Python

  4. Access modifiers and real usecases in production ready code

  5. Just for kicks, I created my own package. It was fun as well as quite a learning experience for me.

 

Status of the mini-project: Well, after validation bit was completed. I completed its purpose and that’s where I left it. I will start work on PostgreSQL pipelines next, whenever I feel like.

 

Otherwise, a good week nonetheless.

Issue tracking is now setup with GitHub Projects, Check it out here - https://github.com/vipulgupta2048/spidermon/projects/1

 

What is coming up next?

Well, in the recent meeting. My mentors, Renne and Julio did me a solid to help me figure out the real picture of how Spidermon works from start to validate to finish. I am not sure how I would have connected all the bits and pieces involved in this project. Well, coming up next in as far as milestones go is coming up with a basic validate method for Cerberus, equipped with a knowledge of gears, wheels and screws that work behind Spidermon I feel pretty confident about it than the last week. I think, this is the beauty of GSoC. You put in a week of struggle, the next week the struggle doubles but the peak you are climbing starts to look a bit closer.

 

Also, in the pipeline is my personal blog on Mixster. Last we talked about my community, I would like to talk about Python Packaging next. In extra work that I want to take up, is the Slack action extra issues features as well as docs. Let’s see if we can get that as well.

 

Did you get stuck anywhere?

Oh, I did. I got badly stuck, but regular visits to Stackoverflow, Python Documentation and my mentors Slack channel helped me get over my troubles. Gotta go, gotta catch up!

 

This is Vipul Gupta, signing out.

View Blog Post

[#1] The Week That has been @ 2048

vipulgupta2048
Published: 05/28/2019

<meta charset="utf-8">

Week #1 - 21/05 to 27/05

 

In the last meeting, my mentors and I decided upon the mini-project that I suggested. Here’s a brief overview of what I decided to work with over the course of the last week,

 

Main Steps description

 

  1. Will use scrapy to scrape data from given website - https://amity.edu/placement

data = {

   'Link': 'https://amity.edu/placement/Popup.asp?Eid=3895',

   'name': 'Impetus - Recruitment Opportunity For 2019 Batch (Apply Now) ',

   'year': 2019

}

 

2. Use data validation tools to filter out data

  • Schematics

  • JSON Schema

  • Cerberus

 

3. Same project spec for each tool to fully try them out

4. Store data in PostgreSQL, this is for sake of completion

5. Present data on a website, possible using ReactJS
 

All in all this project will greatly help me develop some good insight on the validation tools popularly used at ScrapingHub and how they work. Coming back to the 3 main questions that we have.

 

What did you do this week?

Well, for starters I am writing this blog post again. Due to some bug, my original post wasn’t saved. But, no regrets. These are blogs are important and should be written even if I have to write to them again.

 

My week has been busy with this mini project:

  • Studied the Schematics validation pipeline, implemented it in my mini project and work out a small bug of the documentation. So, good progress.

  • Implemented the JSON Schema validation, ran through the tutorial to understand the various properties and features. Quite powerful.

  • Cerberus will take some time to implement, still, need to research the best way to go about it.

 

This project is an ongoing thing. As when it gets finished, it would really help me with the development of the Cerberus pipeline whenever that gets completed. I also have been reading about PostgreSQL pipeline for Scrapy and learned new things.

 

I also went to Google Summer of Code meetup in New Delhi to meet and network with other GSoC’ers here. It was a good time.

 

What is coming up next?

 

Next up, I am working on 2 PR’s and fixing an issue related to the Slack actions that have been opened for quite a while. I will also be working to code a draft pipeline of Cerberus, to figure out what goes where. This will be a big Lego project with small parts that need to be stuck together to give a better picture. Looking forward to it.

 

I am also working towards a better issue tracking for my project through Github project and improving the documentation of Spidermon.

 

Did you get stuck anywhere?

I did, regarding the JSON schema validation implementation. I researched the issue, found several solutions and ran it through my mentors. Turns out the implementation is not listed in the documentation. Will add that too. Busy week ahead.

 

That’s that from side, this is Vipul Gupta signing out.

View Blog Post

2048's Weekly Check-in #0

vipulgupta2048
Published: 05/21/2019

Weekly check-in #1: 13/05 to 20/05

Hello everyone, hope you all are doing great. I am Vipul Gupta (goes by vipulgupta2048 all over the web) checking in for the first time under the Scrapy Project. I will be working towards integrating Cerberus into the prime data validation library for our spiders, called Spidermon. You can read all about from here.

What did you do this week?

Due to my university, I couldn't accomplish much this week. My college's summer holidays begin from 17th May 2019, hence most of the week was exhausted there. Had a call with mentors, Renne and Julio who will be mentoring me. Renne and Julio are maintainers of Spidermon and employees of ScrapingHub. It was nice to e-meet them, we discussed summer plans, problems that the project is facing that we would be solving over the course of summers. We also set our weekly meeting times, methods to prepare our blogs, ways to pull requests, documentation, code linting etc. Moreover, I decided to understand the working of the present 2 validation techniques that are integrated with Spidermon helping me understand the importance of pipelines, and contribute towards a picture. I thought of a mini-project idea to implement the same, will discuss more about it in weekly blog.  

What is coming up next?

Well, in the next meeting we will be deciding our evaluation by evaluation goals for the project that needs to be completed. This would help both my mentors and me to track my progress and work accordingly. I will also setup the recommended environment in my system, start documenting whatever I am doing. There ain't much to setup now, but I would like to be thorough. I am also working on some documentation for the actions of Spidermon as part of issue #141
Will also be working towards researching possible ways to integrate Cerberus into Spidermon. Quite excited for it. 

Did you get stuck anywhere?

No specific issues yet, trying to get a bigger picture of what we are trying to do here. 

Thank you for reading!

Vipul Gupta
Would love to connect - Twitter? @vipulgupta2048 all over the web.

View Blog Post