Last week, I implemented multithread scanning with John's help. At first, I thought the logic was to create a function that instantiates a database everytime and close it finally, but that would also be too inefficient since it might take a long time to connect and disconnect the database if we let each thread call the function for each file. Instead, we could just use a queue to save all the files that will be scanned, and each thread just opens the database first and closes it only if there is no jobs to be done. We also don't need to take care of thread safety since the queue in Python is alread thread safe. Besides, I also added a flag to enable/disable the updating database so that users could save time to test or run the tool.
Compared with C, I think it is easier for Python to implement multithread/processing. For example, the communication between processes/threads are more various, in C we could only use signals, shared memory, pipe and message queue. In addition, in Python each thread we could call `join()`, which is like wait() in C. But in C the parent process is the only one who needs to call wait(), so in terms of coding, we have to implement parent and child processes individually.
The other thing that I learnt is about code coverage. Multithread is hard to debug because it is difficult for us to track every thread at the same time. With code coverage's help, we could see the report about which part of the code is not covered during the test, so we could find why it is not entered.