View Blog Post
After implementing strings, I worked on implementing extractors. I found that Python actually provides several modules for extracting files. For example, with a instantiating an 'tarfile' object, we could extract files that end with .gz, .bz... Therefore, I modified extractor file, that if the system doesn't have processes like 'tar', 'unzip', it will use the python implemented method to extract target. This works for tar and zip, exe... Besides, for rpm and deb, I found that there is a package called libarchive, so I guess maybe we could use it. And in the following days of this week I will start implementing other extractors that python doesn't have a module to solve.
Right now I'm working on extracting cab files. I saw on github people have implemented such extractor in Python, so I will understand their code first and see what I could do (use it, rewrite it in C, etc...)
Week 3 (P2)
View Blog Post
Continued from the last time, I decided to implement the strings and files in C on Linux first. And I learnt a lot from this.
One thing is that how to extend C. So the entrance of the C program should be PyMODINIT_FUNC PyInit_modulename(void), python setuptools will recognize this, and then in the function, you should return an object called PyModule_Create(&modulename). the object is defined with a data structure called static struct PyModuleDef. In this, you will need to define the wrapper of python, the function table (similar as the call table in Linux kernel). This is really a briliant design since it considers all the object-oriented idea: encapsulation, inheritance and polymorphism. But the thing needs to notice is that there is a significant difference between python2 and python3 in terms of the implementation of this, so I decide to implement python3 version first since python2 is almost deprecated.
Another thing is the real implementation of strings and file. There are two difficulties that I have addressed. The first one is reading file. There is a slight difference between python and C. In C, firstly you need to get the file size and allocate memory for the array. And this could be dangerous since the memory might leak (maybe you forgot to free it after use). The second one is how do you iterate it after reading. This is trival but important. At first, I simply used strlen(buffer), but this is not correct. it should be the file length you got before. Another one important knowledge I learnt is the difference between char * and char array. We know that you could modify an array but not a char pointer. The reason is that char  will allocate a string on stack while char * will only create a pointer on stack and the string stores in data segment.
View Blog Post
Currently, I'm still working on implementing the extension of C.
The problem I met is that there is a significant difference between the extension on Windows and Linux while on PSF the instruction is mainly about Linux. For example, in the new .c file, on Linux platform I just need to declare the function and implemented, but on Windows, I need to create a new module called PyInit_, which is used for Windows compilers to identify python extension.
Besides, since I'm using visual studio to develop, there are some issues with the path. But finally I found an instruction writen by official visual studio very helpful: https://docs.microsoft.com/en-us/visualstudio/python/working-with-c-cpp-python-in-visual-studio?view=vs-2019
I haven't talked to Terri and John yet, so I will leave this first and change it after the meeting.
View Blog Post
As described last week, this week I mainly focused on solving the environment setup and enhance the test files to support Windows commands. I have made a pull request at https://github.com/intel/cve-bin-tool/pull/146.
The thing that I did is that under Windows, the header (magic byte) of the compiled binary file is not same as it's in Linux, it is 'MZ\x90\x00'. Besides, the "rm" command should be replaced with "erase" under Windows.
In this week, I will try to rewrite "string" and "file" command in C and test them in both Linux and Windows. In Linux, I hope it could run the whole process and scan for different type of files like rpm, tar... And in Windows, since extractor is not implemented, I will just test its parsing functionality. After that, I will start implementing the extractor under Windows, the first goal would be zip. Some test cases are need, ideally would be curl. The problem that I might encounter is how to extend Python with C. There are two ways but I'm not sure which one would be ideal. I asked John for his help, so I might wait for him first.
View Blog Post
It's a great pleasure to work on this project! I talked to Terri and we fixed down this week's plan.
Earlier in this week, I set up the Windows environment for the binary tool, like installed Mingw for gcc, make, and some python modules. In the following days, I will work on comping up with Windows test cases. In addition, when I was testing the program, I found that previously it is using many subprocess calls like "rm", which is not compatible in Windows. Therefore, it would be ideal to modify those test files. My idea is to make an abstract test class first, and then when user runs the test, it will check the operating system first, and then decide to extend the subclass to test, which is exactly like the factory mode in design pattern.
Besides, we talked about future plans. Both we decide to keep the plan in proposal first, and we could adjust the plan according to needs. And the priority of the task might be changed since some tasks are not important like extracting an rpm file on Windows.
In the nutshell, I'm really excited! Definitely going to do a good job on this!