WangJL's Blog

Week 3 Blog Post

Published: 07/09/2020

Sorry for the late post.

What i have done this week

During the test for the parse command function, i find that there are long whitespaces that will be parse as package name. So i use shlex to remove long whitespaces.

First use shlex to split into seperate words, and then use space to join them( ' '.join(shlex.split(command)) ), this will remove long whitespaces, tab and line inditations.


After discussion with my mentor, we are now at the final step of the shell script parser which is updating the functions to enable this parser in the tern.

Did i get stuck somewhere?


View Blog Post

Week 3 Check-in

Published: 06/28/2020

What i have done this week

Since we can parse a shell script into statements now. We need to fiter the install command and extact what will be installed in the command.

I use filter_install_command() to get installed packages in the commands. It works well with the package-manager commands like apt-get.

But for commands like wget, these are downloading or file-manager commands, we need to figure out how to deal with them.


Modify the draft PR to be able to merged. Give a try on parsing commands like wget, tar. Probably adding these commands to snippest.yml.

Did i get stuck somewhere?

Not yet.

View Blog Post

Week 2 Blog Post

Published: 06/19/2020

What i have done this week

1. Use clean_command() to remove tabs and line indentations.

2. Add code to extract statements for a loop in parse_shell_loop_and_branch().

3. After discussion with my mentor, i remove the pipe symbol '|' from the seperators and add command 'export' as a variable assignment.

This should finish the first part of my proposal, with the splited concatenated commands we can parse them in the next step.

Here this an issue that we can replace variables in our command, i put this aside and will hanlde this after we finish the whole proposal.


Make a plan on how to parse the  concatenated commands. I will give a try on the current filter_install_command() function, and give some feedback on this.

Did i get stuck somewhere?

How to handle commands like `wget curl`, they are download commands, but how can we tell they are installing something. This remains to be discussed.

View Blog Post

Week 2 Check-in

Published: 06/14/2020

What I have done this week

This week, I have finished the spliting part of the concatenated into variable, command, branch and loop, and preparing for futher review. 

In the spliting part, the concept is matching the key words:

1. loop: (a) for ...... done; (b) while ...... done; (c) util ...... done;

2. branch: (a)case ...... esac; (b) if ...... fi;

3. varibale: find '=' in one line command.

4. command: the rest are commands.

This is done by using python string function startswith() and endwith(), and find '=' can be done by regex.

During the test, I have met some corner cases where the command contains '\'. '\' stands for continue line in a shell script, our parser can not remove it so it may cause error when finding the possible installed software.

After the discussion with my mentor, we will put this aside and move on to parse the command. Most commands do not contain '\', and the command containing '\' is not an installing command so far in our test file.

We will consider this corner case after we finished the main function.



Make necessary modifications on the code with mentor, and move on to parse the command.

Hopefully we can use the existing funciton to finish the parse.



GSOC is a very good opportunity for me to be a part of open source project, this project is very cool and i am enjoying it.

View Blog Post

Week 1 Blog Post

Published: 06/07/2020

What have done this week

To split a shell script into concatenated commands, i need to find the seperators which are

1. ;("A ; B" Run A and then B, regardless of success of A)

2. &&("A && B" Run B if A succeeded)

3. ||("A || B" Run B if A failed)

Besides, i added more seperators while spliting:

1. :;(stands for TRUE, can be skipped while spliting)

2. | (pipe)

During the parse, i have met a situation where the seperator is in a quoted string. In this case, the seperator should not be splited, so we need to skip single and double quote in the script.

I use Backtracking Control Verbs to solve this issue. It can be used to skip certain pattern while matching. Here is the templete:


The regex engine will first match the unwanted part, after matched, it will have a bracktrack. During this period, the engine come to (*SKIP)(*F) which indicated abandoning matchinng result and jumping to the rest of text to continue match. So we can use this to skip single and double quote and combined to split shell script.


1. Split the concatenated into variable, command, branch and loop.

2. Extract info from loop.

3. Parse the command.

View Blog Post