WangJL's Blog

Week 3 Check-in

Published: 06/28/2020

What i have done this week

Since we can parse a shell script into statements now. We need to fiter the install command and extact what will be installed in the command.

I use filter_install_command() to get installed packages in the commands. It works well with the package-manager commands like apt-get.

But for commands like wget, these are downloading or file-manager commands, we need to figure out how to deal with them.


Modify the draft PR to be able to merged. Give a try on parsing commands like wget, tar. Probably adding these commands to snippest.yml.

Did i get stuck somewhere?

Not yet.

View Blog Post

Week 2 Blog Post

Published: 06/19/2020

What i have done this week

1. Use clean_command() to remove tabs and line indentations.

2. Add code to extract statements for a loop in parse_shell_loop_and_branch().

3. After discussion with my mentor, i remove the pipe symbol '|' from the seperators and add command 'export' as a variable assignment.

This should finish the first part of my proposal, with the splited concatenated commands we can parse them in the next step.

Here this an issue that we can replace variables in our command, i put this aside and will hanlde this after we finish the whole proposal.


Make a plan on how to parse the  concatenated commands. I will give a try on the current filter_install_command() function, and give some feedback on this.

Did i get stuck somewhere?

How to handle commands like `wget curl`, they are download commands, but how can we tell they are installing something. This remains to be discussed.

View Blog Post

Week 2 Check-in

Published: 06/14/2020

What I have done this week

This week, I have finished the spliting part of the concatenated into variable, command, branch and loop, and preparing for futher review. 

In the spliting part, the concept is matching the key words:

1. loop: (a) for ...... done; (b) while ...... done; (c) util ...... done;

2. branch: (a)case ...... esac; (b) if ...... fi;

3. varibale: find '=' in one line command.

4. command: the rest are commands.

This is done by using python string function startswith() and endwith(), and find '=' can be done by regex.

During the test, I have met some corner cases where the command contains '\'. '\' stands for continue line in a shell script, our parser can not remove it so it may cause error when finding the possible installed software.

After the discussion with my mentor, we will put this aside and move on to parse the command. Most commands do not contain '\', and the command containing '\' is not an installing command so far in our test file.

We will consider this corner case after we finished the main function.



Make necessary modifications on the code with mentor, and move on to parse the command.

Hopefully we can use the existing funciton to finish the parse.



GSOC is a very good opportunity for me to be a part of open source project, this project is very cool and i am enjoying it.

View Blog Post

Week 1 Blog Post

Published: 06/07/2020

What have done this week

To split a shell script into concatenated commands, i need to find the seperators which are

1. ;("A ; B" Run A and then B, regardless of success of A)

2. &&("A && B" Run B if A succeeded)

3. ||("A || B" Run B if A failed)

Besides, i added more seperators while spliting:

1. :;(stands for TRUE, can be skipped while spliting)

2. | (pipe)

During the parse, i have met a situation where the seperator is in a quoted string. In this case, the seperator should not be splited, so we need to skip single and double quote in the script.

I use Backtracking Control Verbs to solve this issue. It can be used to skip certain pattern while matching. Here is the templete:


The regex engine will first match the unwanted part, after matched, it will have a bracktrack. During this period, the engine come to (*SKIP)(*F) which indicated abandoning matchinng result and jumping to the rest of text to continue match. So we can use this to skip single and double quote and combined to split shell script.


1. Split the concatenated into variable, command, branch and loop.

2. Extract info from loop.

3. Parse the command.

View Blog Post

Week 1 Check-in

Published: 05/29/2020

Completed tasks

During the community bonding period, i am working on the first step of my proposal. I have used shlex to split the shell script into tokens, and then find the seperator(&&|;) to concatenate the commands. After the review from my mentor, we find that we can improve the code. We do not need to split into tokens at first. Instead, we can directly find the seperator(&&|;) to seperate the commands. This will save a lot of time, since we are not going through every word in the shell script.

To do

Use split_command function and add function to split branch(if and case), and loop(for and while).  We will leave the branch not to be parsed since we do not know which branch to be executed. For loop, we will futher develop a function to extract info from it. Here are 2 small steps i am going to do.

1. split command(using seperator(&&|;)), split branch and loop(finding keywords).

2. Extract loop. This is a simple version, we will just extract the commands without considering the loop.


1. Parsing command has already been implemented so this part is a key point. After the following 2 small steps finished, we should be able to parse the shell script in a Dockerfile RUN command to find what software maybe installed.

2. Try to break big issue into small independent ones.

View Blog Post