DebadityaPal's Blog

Final Blog Post: The Finished Product ⭐

DebadityaPal
Published: 08/26/2021

Whenever someone tries to learn a new package, usually the first stop is always the documentation. But the documentation is mostly long and boring to read through. Furthermore, since it is auto-generated, it often contains a plethora of arguments, most of which are internal and of no use to the end-user. Data science is an interactive field by nature, thus, packages for data science should also follow suit.

Hub is a data optimization package that enables users to stream unlimited amounts of data from the cloud to any machine without sacrificing performance compared to local storage.

It is an individual package that comes with its own set of APIs. Therefore, if a user wants to incorporate Hub into their project they must first get comfortable with these APIs and understand how to use them.

The primary reason why users would use libraries and packages in their projects is to simplify the coding process and make it much faster. The packages’ main purpose would be to skip the manual implementation of everything in the codebase. Hence, it only makes sense when I say that learning how to use Hub should be faster than having to finish the project without using Hub.

The idea of “Learn” is to provide a much more interesting and faster way of learning how to use Hub. The goal is achieved by serving interactive code-along tutorials, much like DataCamp, that the user can take from the comfort of their local terminals. “Learn” comes with a single command that starts the whole course engine and the rest works based on user feedback.

The way it works is that there exists a course library that contains all of the course content in YAML files. The content is divided into small bits of information we call “Snippets”. The course engine contains a YAML Parser that reads the information from these files and presents the same to the user one snippet at a time. Currently, we have 3 types of snippets to add variety to the way information is presented:
  1. Text Snippet: Purely meant for reading, does not expect user feedback.
  2. MCQ Snippet: Poses a multiple choice question for the user and expects them to enter an answer.
  3. Code Snippet: Provides a prompt and expects the user to code along.


The API has been designed in such a way that more types of Snippets can easily be added.

Writing new courses is also really simple should a user want to add their courses. A full guide can be found at https://learn-hub.readthedocs.io/en/latest/course.html. In short, it just involves writing one Snippet at a time following a particular format. With this package, learning how to use Hub is a much faster process now, we hope to make it as easy as possible for newcomers to start using Hub. Moreover, the package is completely extendable and community-driven, if you feel like writing a course on a topic, feel free to do so!
View Blog Post

Blog Post #5: Colors 🔴 🟠 🟡 🟢

DebadityaPal
Published: 08/14/2021

This week was all about colors. The standard monochrome of the terminal seems somewhat boring, so to make the course more palatable I wanted to add different colors. An added benefit of colors is that it adds structure and heirarchy to the snippets. Different areas are highlighted differently making the whole thing easier to read and digest.

So the question is, how do we color text in the console? The way I did it was to use `colorama`, a python package that does just this. The syntax and API for colorama is fairly simple. One just has to add the colour they want to the print statement, so that is what I did. However, this wasn't working on command prompt. For some reason CMD was printing a weird code instead of the color.

So, I turned to my mentor to gain some insight as to what is going on. He recommended I use the init funtion in colorama. After doing so, it started working! The issue was with escape sequences. For some reason CMD can't handle them well enough and ends up printing them. The init function handles those errors.
View Blog Post

Weekly Check-In #5: Courses and Feedback 👨🏼‍💻

DebadityaPal
Published: 08/07/2021

What did I do this week?

This week was spent away from the IDE. Majority of it was spend on Google Docs, I wrote the 2 courses that I wanted to and shared the links with my mentor and the devs at Activeloop. Everyone was super helpful and gave a lot of feedback which shaped the way the courses turned out to be. It was an iterative process but a necessary one. The main goal of the week was to write 2 courses, one on the basic topics of Hub and the other on the more advanced topic of parallel computation.

What is coming next?

This week essentially completes my project, so next week I will be working on a stretch goal. The idea is to introduce colors to the courses so that the interface is easier on the eyes for the user. Moreover, there is some documentation that needs to be written and I will try to optimize the whole project as well.

Did I get stuck anywhere?

This week was not technically challenging at all. It just involved writing courses and incorporating the feedback. So I did not get stuck anywhere.
View Blog Post

Blog Post #4: Course Writing Begins 📝

DebadityaPal
Published: 08/03/2021

The last week was a bit intense as it involved learning a lot of topics really fast. Since Code Snippets are the star of my project, I wanted to settle for nothing less than perfection. But now that is has been successfully implemented I can take a sigh of relief. The main technical part of my project is over. I have a fully functional course engine armed with 3 types of snippets ready to be deployed.

Now begins the second phase of my project viz. Course Writing. I have divided the entirety of Hub's features into two courses. The first one would cover the basics of Hub and enable everyone to use all of its main features like dataset accessing and uploading along with links to the visualization platform. Whereas the second one would cover the more advanced topic of parallel computing and how to use it to speed up the overall performance.

I have decided to write the basic script for my courses on Google Docs and share it with multiple people on the core team of Hub. Feedback is going to be crucial for this step as it will allow me to shape the course content much better.
View Blog Post

Weekly Check-In #4: ASTs and Code 💻

DebadityaPal
Published: 07/22/2021

What did I do this week?

This week was spent on the implementation of Code Type Snippets which is probably the most important part of my entire project. There were a lot of roadblocks but it has finally been implemented. The solution I used was to hack Python's code.InteractiveConsole and tweak it to my liking and required utility.

The output of the interactive console was then parsed into an AST (Abstract Syntax Tree). I decided to go with ASTs instead of string matching because, same code can be written in different ways, and ASTs would allow the user to do that and be more robust.

code.InteractiveConsole emulated a REPL, I had to hack it to stop it from running endlessly, so that I could evaluate the code from the user input after every step.

What is coming next?

Next I will be working on Creating the different courses for Hub. The main technical part of the project is over. I would also write the necessary documentation and add the required code optimizations. But the most important job at hand right now is to talk to my mentor, and the community in general to figure out what kind of course content they would prefer.

Did I get stuck anywhere?

There were many areas where I got stuck. The biggest one being, I was not being able to import modules into my custom InteractiveConsole. I tried to tweak things around to get it to work but absolutely couldn't. So I immediately approached my mentor, David. He responded with a code snippet of the actual code.InteractiveConsole where he got it to work. From, there I dissected my code and figured out the part which was throwing errors, and accordingly changed it. To my great surprise the error was coming from a deepcopy function, so I chucked it out and manually did the deepcopy, myself.
View Blog Post
DJDT

Versions

Time

Settings from gsoc.settings

Headers

Request

SQL queries from 1 connection

Static files (2312 found, 3 used)

Templates (28 rendered)

Cache calls from 1 backend

Signals

Log messages