GSoC Blog Post #4

Published: 07/29/2021

Hey All!

What did you do this week?

Last week I finalized the functionality for mime groups into the xtractmime library. I tried to cover all the mime types mentioned in the MIME standards through unit testing but still a lot of mime types are yet to be covered that are not in the standards. For instance the lists of mime types proposed by Mozilla, Wikipedia, or by IANA registry. I also made some progress in the integration of xtractmime into Scrapy.

What is coming up next?

I will try to finalize the integration part in the coming weeks. Also, once the we finalized xtractmime, I will start to work on refactoring the current method to determine response classes in Scrapy using xtractmime functionalities.

Did you get stuck anywhere?

Some testcases related to the integration part were failing. The current implementation of mime sniffing in Scrapy consider various parameters like body, url, HTTP headers, filename etc. Whereas xtractmime is reliable when body parameter is not NULL or we have a Content-Type header. This requires further discussion with mentors.