Week 7

Published: 07/19/2019

This week I'm working on implementing mutithread extracting and scaning. So far the extracting works. From this week I acquired how does multithread pool work in Python. The pool has a method called `map`, which is basically same as `map` in Python: taking a function and an iterable as input, apply the function to the iterable in a multithread way. For extracting, this is easy to implement because we don't need to worry about thread-safe issue. However, for scanning, this could be a problem. We know that multiprocessing in operating system is really hard because we need to keep all processes/threads synchronized by using locks/mutex/semaphore. At first I thought this is okay because in scanning function we are supposed to just run mutiple queries at the same time, so we don't need lock.

But acutally this becomes a problem. I will explain later.

Another problem is the difference of pool.map() between python2 and python3. In python3, pool is supported as a context manager, thus we could use something like "*pool.map()", while this is invalid in python2. So we had to go back to use try/finally or the pool will not terminate correctly.