Akshay_Sharma's Blog

GSoC Blog Post #3

Akshay_Sharma
Published: 07/13/2021

Hello All!

The first evaluation week is here and till now GSoC program has been challenging as well as exciting for me. With the persistent support of my mentors, I am able to complete the implementation of the MIME sniffing library and I hope that I will pass the first evaluation towards the end of this week:)

What did you do this week?

Last week I have finalized the implementation of the MIME Standards till section 7 with proper development as well as testing. I am able to achieve 100% code coverage uptil now through "pytest" and "pytest-cov" and the code has been merged to the main branch of the repo on Github. Thanks to my mentors!

What is coming up next?

This week or coming weeks, I will try to integrate my library into the Scrapy framework so that it will resolve the issue "Wrong type(response) for binary responses #4240" from where the main problem originated.

Did you get stuck anywhere?

Deciding an input type for Content-Type parameter in the main function of the library was a little confusing. There were two option, First, we allow users to input a byte type string or a simple string and Second, we restrict users to only input byte type string. After discussion with mentors, we choose the second option which was much easier to implement as well as understandable to the users. Else this week went quite smoothly except for some minor problems with the testing.
View Blog Post

GSoC Weekly Check-In #3

Akshay_Sharma
Published: 07/07/2021

Hello Everyone!

The first phase of this year's GSoC program is approaching its end with the first evaluation next week and I am trying my best to finalize the implementation of the MIME sniffing library with proper development as well as testing.

What did you do this week?

I have implemented the section 7 "Determining the computed mime type of a resource". This section covers the main sniffing functions for the library including different sniffing rules like "Identifying a resource with an unknown MIME type", "Sniffing a mislabeled binary resource", "Sniffing a mislabeled RSS XML feed".

What is coming up next?

This week I will apply the testing to section 7 covering all possible test cases to get 100% coverage. Also, I will start to integrate my library as soon as possible once the library is finalized

Did you get stuck anywhere?

Section 7.3 i.e "Sniffing a mislabeled RSS XML feed" was a bit confusing and complicated because of the way standards represent its pseudocode but mentors were there to help me with that. Other than this there were no major problems last week.
View Blog Post

GSoC Blog Post #2

Akshay_Sharma
Published: 06/29/2021

Hi All,

The fourth week of this years' GSoC program has been completed and I have implemented most major parts of the MIME standards into my python library including section 4, 5, 6 and some parts of the section 7.

What did you do this week?

I have spent the last week fixing some major issues in the library. The main issue that took most of my time was to fix the implementation of algorithm for matching MIME type pattern in an MP3 file without ID3 tags. ID3 tags covers the contents like artist name, album name, genre, and many more. The algorithm mentioned in standards has various problems that are mentioned in the issue here. I was finally able to fix the problems with the algorithm taking reference from the implementation of mozilla for mp3 files. I also worked on my coding style, thanks to my mentor Adrian Chaves for his extremely helpful reviews and suggestions about it and I learned a lot of interesting things too.

What is coming up next?

I have already started with section 7 last week but there is a lot to cover in that including the tests which I will try to cover this week.

Did you get stuck anywhere?

Except for fixing the implementation of algorithm for matching the mime pattern for MP3 file without ID3 tags, last week was interesting and went smoothly.
View Blog Post

GSoC Weekly Check-In #2

Akshay_Sharma
Published: 06/22/2021

Hey Everyone!

Last week was a bit tiring as well as exciting. I have made a quite progress in creating my python library for MIME sniffing and learned a lot of new things about universal clean-coding conventions.

What did you do this week?

I mainly focused only on implementing section 6 of the MIME standards into the library. This section typically covers the mime matching algorithm. An algorithm to determine a type of file based on predefined patterns by matching the initial bytes of the file with the pattern. The standards mentioned numerous predefined patterns like image file, audio or video file, text file, archive file. There are some special extensions of audio and video files that require different rules of matching the patterns. For e.g matching signature of mp4, WebM, and mp3 files. I have also worked on adding unit tests for the above algorithms that cover every possible test case. One of my mentors also added support for continuous integration to the Github repository which will help to keep an eye on the working of the library and also, will be much easier to debug issues if any.

What is coming up next?

Coming up next is the main algorithm for the library that is "computing the final MIME type". The rules for this algorithm are mentioned in section 7 of the standards and I will try to fully implement it including all the possible tests.

Did you get stuck anywhere?

Yes, the algorithm for matching the signature for Webm files mention in standards was a bit ambiguous. I tried many possible changes to the algorithm, some of them were suggested by my mentors and finally, it worked but I am not 100% sure if the change I made was correct or not. I left it for now as it is working perfectly fine but if something goes wrong in the future I will try to fix it.
View Blog Post

GSoC Blog Post #1

Akshay_Sharma
Published: 06/15/2021

Hey All,

It's already been a week now since the GSoC coding period has begun and I have started working on my project.

What did you do this week?

Like I mentioned in an earlier post, I have designed a high-level API for the python library this week and started to implement the rules mentioned in MIME sniffing standards. I worked on section 5 according to the standards i.e. "Handling the resource metadata and headers". One of my mentors suggested creating a template for the project before moving on to further coding. Therefore, I set up a template for the library with setup.py file, added a BSD license file and configured tox environment for various tests like flake8, typing, py, black.

What is coming up next?

I will start with the implementation of section 6 i.e "Matching a mime-type pattern" and will try to add some tests. Currently, I am using a simple hard-coded test for the library but this week I will try to automate the tests using python unit tests and add more tests as I build the library.

Did you get stuck anywhere?

No, last week went quite seamlessly as I have done similar work earlier, and also, the mentors were always there for suggesting me the best.
View Blog Post