Weekly Blog Post #2 [June 14, 2021]

Published: 06/14/2021

What did you do this week?

  • This week I spent most of my time reading the docs, stack-overflow answers and blog-posts of various archive handling methods and all which is built-in the python standard library.

  • The goal of reading all this was to come up with a design of archive manipulation module which can be format agnostic.

  • First I thought I should use shutil.unpack_archive as it handles most of the common formats, and it would've been easier and simpler to implement in tandem with shutil.make_archive, essentially completing the functionality of the archiving module I want to implement.

  • But I didn't use the shutil methods because they are limited to a small number of formats and people may need to use other formats, in that case if I would've written this module using these methods, it would've been very difficult to extend it to other formats, as there are other popular formats which these methods don't support.

  • This was my first iteration on this and it may be a good idea but I felt that we need something more maintainable and easy to extend, thus I came up with the idea of using a simple dictionary which would map format names to their respective methods from python standard library. something like:
      'format_name' : class_that_handles_it
  • This looked a promising solution to me initially as new formats can be added directly to this dictionary and then used in the relevant methods.

  • But this was not the case, it turns out that the interface of various format specific archive handling methods are not very consistent, for example : For writing to a zip file the method used is zip.write(file) but the same function in tarfile is performed by tarfile.add(file) method.

  • All in all these little inconsistencies in the interface lead to the current design, which uses a helper class for each format and that class is registered in that common dictionary and each helper class inherits from an abstract base class which basically defines how the class should be implemented, and also provides some helper functions.

  • This way all the archive handling methods could be brought down to a consistent interface and can be used in related methods, also extending to new methods would be easy and they all should ideally work with the existing code like a charm.

What is coming up next?

  • I have made a rough implementation to get inputs from my mentor and to improve upon this.

  • Would need to implement test cases for this and more helper classes the current state of the PR can be seen here.

  • And then when I will add this to the model's base class and proceed to update tests cases for other models and make sure that they work with archives.

  • Also would need to update the file source as it has some archive handling code which should be removed and updated to use this module.

Did you get stuck anywhere?

  • Not really, this week was more about thinking and trying out various implementations,and evaluating them based on extensibility and maintainability, as I have discussed above.

  • I was a bit confused on how shall I implement the tests for this, but I think it would be better to take input from the mentors first and then put in effort in covering it in the tests.