GSoC Weekly Check-In #2

Published: 06/22/2021

Hey Everyone!

Last week was a bit tiring as well as exciting. I have made a quite progress in creating my python library for MIME sniffing and learned a lot of new things about universal clean-coding conventions.

What did you do this week?

I mainly focused only on implementing section 6 of the MIME standards into the library. This section typically covers the mime matching algorithm. An algorithm to determine a type of file based on predefined patterns by matching the initial bytes of the file with the pattern. The standards mentioned numerous predefined patterns like image file, audio or video file, text file, archive file. There are some special extensions of audio and video files that require different rules of matching the patterns. For e.g matching signature of mp4, WebM, and mp3 files. I have also worked on adding unit tests for the above algorithms that cover every possible test case. One of my mentors also added support for continuous integration to the Github repository which will help to keep an eye on the working of the library and also, will be much easier to debug issues if any.

What is coming up next?

Coming up next is the main algorithm for the library that is "computing the final MIME type". The rules for this algorithm are mentioned in section 7 of the standards and I will try to fully implement it including all the possible tests.

Did you get stuck anywhere?

Yes, the algorithm for matching the signature for Webm files mention in standards was a bit ambiguous. I tried many possible changes to the algorithm, some of them were suggested by my mentors and finally, it worked but I am not 100% sure if the change I made was correct or not. I left it for now as it is working perfectly fine but if something goes wrong in the future I will try to fix it.