Articles on WangJL's Bloghttps://blogs.python-gsoc.orgUpdates on different articles published on WangJL's BlogenSat, 29 Aug 2020 11:32:48 +0000Week 7 Blog Posthttps://blogs.python-gsoc.org/en/wangjls-blog/week-7-blog-post/<p>This week I summarized my work during GSoC 2020. This is the last week on GSoC and I have spent a  great summer.</p> <p>I will continue working on the Tern project. I am now focusing on the multistage dockerfile analysis. We have now passed the first step. </p> <p>However, I came up with a issue about building the multistage dockerfile. This is a tricky issue and I am working on this.</p> <p>Thanks for my mentors and the Python Foundation!</p>Hazard15020@gmail.com (WangJL)Sat, 29 Aug 2020 11:32:48 +0000https://blogs.python-gsoc.org/en/wangjls-blog/week-7-blog-post/GSoC 2020: Final Code Submissionhttps://blogs.python-gsoc.org/en/wangjls-blog/gsoc-2020-final-code-submission/<p><strong>My proposal</strong></p> <p>Organization: Python Software Foundation</p> <p>Tag: Tern</p> <p>Repositorie link: <a href="https://github.com/tern-tools/tern">https://github.com/tern-tools/tern</a>.</p> <p>Tiltle: Use shlex to parse Dockerfile RUN instruction commands.</p> <p>Details: Use shlex to parse Dockerfile RUN instruction commands. On looking at the type of parsing needed for full shell scripts embedded in the run command, we may need to develop a shell script parser to catch all places where software could have been installed.</p> <p> </p> <p><strong>Previous work on the project</strong></p> <p><u>Works towards docs:</u></p> <p>1. Document YAML data output that Tern produces.  <a href="https://github.com/tern-tools/tern/pull/561">https://github.com/tern-tools/tern/pull/561</a>.</p> <p><u>Works towards dockerfile analysis:</u></p> <p>1. Record git project name and sha. <a href="https://github.com/tern-tools/tern/pull/571">https://github.com/tern-tools/tern/pull/571</a>.</p> <p>2. Parsing ARG varibales. <a href="https://github.com/tern-tools/tern/pull/580">https://github.com/tern-tools/tern/pull/580</a>.</p> <p>3. Find Git Project URL. <a href="https://github.com/tern-tools/tern/pull/606">https://github.com/tern-tools/tern/pull/606</a>.</p> <p><u>Works towards bug fixing:</u></p> <p>1. Fix linting error for helper.py. <a href="https://github.com/tern-tools/tern/pull/650">https://github.com/tern-tools/tern/pull/650</a>.</p> <p>2. Fix linting error for generator.py. <a href="https://github.com/tern-tools/tern/pull/651">https://github.com/tern-tools/tern/pull/651</a>.</p> <p> </p> <p><strong>Work during GSoC</strong></p> <p><u>Works towards shell script parser:</u></p> <p>1. Using Regex to split shell script. <a href="https://github.com/tern-tools/tern/pull/717">https://github.com/tern-tools/tern/pull/717</a>.</p> <p>2. Add test dockerfiles for split shell script. <a href="https://github.com/tern-tools/tern/pull/718">https://github.com/tern-tools/tern/pull/718</a>.</p> <p>3. Update functions to use the shell script parser. <a href="https://github.com/tern-tools/tern/pull/756">https://github.com/tern-tools/tern/pull/756</a>.</p> <p>4. Add report for branch statement. <a href="https://github.com/tern-tools/tern/pull/764">https://github.com/tern-tools/tern/pull/764</a>.</p> <p><u>Works towards analysis on multistage dockerfile:</u></p> <p>1. Split multistage dockerfile by stage. <a href="https://github.com/tern-tools/tern/pull/774">https://github.com/tern-tools/tern/pull/774</a>.</p> <p>2. Analyze multistage. <a href="https://github.com/tern-tools/tern/pull/786">https://github.com/tern-tools/tern/pull/786</a>. (Still work in progress).</p> <p><u>Works towards bug fixing:</u></p> <p>1. Bug fix with Dockerfile RUN parsing. <a href="https://github.com/tern-tools/tern/pull/773">https://github.com/tern-tools/tern/pull/773</a>.</p> <p> </p> <p><b>Progress on my GSoC 2020</b></p> <p>During the preparation period on the GSoC 2020, I got familiar with the basic operations on Github (filing a issue, how to commit) by working on the docs(PR#561). Then I picked up the issue on the dockerfile analysis. I quickly got through the code and tried to make changes on it. My mentors were very helpful and helped me a lot on the coding style and logic.</p> <p>I chose the proposal on the shell script parser. It seemed tricky at first, but if I took the plan step by step, it should work at end. The first step is to seperate the commands, and then pick out the key words, and do analysis on the command at last. We had a weekly meeting over Zoom to keep track on my progress and resolve some problems. Finally I finished my proposal by the beginning of August.</p> <p>Next, I began to work on multistage dockerfile analysis. This was kind of relevant to my previous work. My plan is spliting the multistage dockerfile, building the image and analyzing on the image. Now I have finished the first step. I will keep working on the following steps. </p> <p>Thanks to my mentors, your help was greatly appreciated. Tern is awesome, I like it! This is the first time that I have participated in open source project, and I have spent a great summer in GSoC!</p>Hazard15020@gmail.com (WangJL)Mon, 24 Aug 2020 15:37:45 +0000https://blogs.python-gsoc.org/en/wangjls-blog/gsoc-2020-final-code-submission/Week 7 Check-inhttps://blogs.python-gsoc.org/en/wangjls-blog/week-7-check-in-4/<p><strong>What I have done this week</strong></p> <p>1. Made changes on the PR(Split the multistage dockerfile for build) and after discussion with my mentor, my PR was merged. </p> <p>2. Rebased the PR(Analyze multistage dockerfile) to the latest branch. This draft PR is used to test the functionality and if it works, i will split it into small steps to implement it. So i will keep working on this issue after GSoC.</p> <p>3. Discussed with my mentor on the final evaluation and made preparation on it. I will write a blog here to present my work this summer.</p> <p><strong>Next week</strong></p> <p>Finish my final evaluation.</p>Hazard15020@gmail.com (WangJL)Sun, 23 Aug 2020 03:37:48 +0000https://blogs.python-gsoc.org/en/wangjls-blog/week-7-check-in-4/Week 6 Blog Posthttps://blogs.python-gsoc.org/en/wangjls-blog/week-6-blog-post/<p><strong>What I have done this week</strong></p> <p>1. Modified the PR on split multistage dockerfile. The function works fine so far.</p> <p>2. Filed a draft PR on building and analyzing the multistage dockerfile. This PR is used to test the feasibility and needs modifications. So far we can get the report on each stage. More tests will be run on this function.</p> <p><strong>Plan on next week</strong></p> <p>1. More tests on the draft PR and send feedbacks to mentors.</p> <p>2. Try dockerfile lock.</p> <p><strong>Thoughts</strong></p> <p>I am not sure if this is the best way to implement analysis on multistage dockerfile. But at least this should work.</p>Hazard15020@gmail.com (WangJL)Fri, 14 Aug 2020 01:39:18 +0000https://blogs.python-gsoc.org/en/wangjls-blog/week-6-blog-post/Week 6 Check-inhttps://blogs.python-gsoc.org/en/wangjls-blog/week-6-check-in-3/<p><strong>What I have done this week</strong></p> <p>Works towards analyzing multistage dockerfile. I combined the draft PR and the review from my mentors, the new commit is the first step of my plan. We split the multistage dockerfile into seperate dockefiles for build. Here are the changes in the new commit.</p> <p>1. Modified function check_multistage_dockerfile() to return.</p> <p>2. Remove function split_multistage_dockerfile() since we are working on the building stage. split_multistage_dockerfile() can be improved on analyze stage.</p> <p><strong>To Do</strong></p> <p>1. Improve readability for function check_multistage_dockerfile().</p> <p>2. Try build images and analyze on them.</p> <p><strong>Did I get stuck somewhere?</strong></p> <p>Not yet.</p> <p> </p> <p> </p>Hazard15020@gmail.com (WangJL)Sat, 08 Aug 2020 15:02:56 +0000https://blogs.python-gsoc.org/en/wangjls-blog/week-6-check-in-3/Week 5 Blog Posthttps://blogs.python-gsoc.org/en/wangjls-blog/week-5-blog-post/<p>I am not feeling well this week and have asked for leave this week with my mentors. I will catch up with my plan on this weekend or next week. </p>Hazard15020@gmail.com (WangJL)Fri, 31 Jul 2020 12:06:30 +0000https://blogs.python-gsoc.org/en/wangjls-blog/week-5-blog-post/Week 5 Check-inhttps://blogs.python-gsoc.org/en/wangjls-blog/week-5-check-in-3/<p><strong>What I have done this week</strong></p> <p>1. Fixed the bug. The shell script parser exits while parsing complex RUN command. After debugging, I found that function<em> consolidate_commands()  </em>will have a dead loop in this case. So I updated the logic in this function to fix it.</p> <p>2. Build multistage dockerfile. I splited the multistage dockerfile into seperate dockerfiles for building. My mentor has reviewed on my draft code, the logic is right but the implementation should be better.</p> <p><strong>To Do</strong></p> <p>1. Finishing the part of multistage dockerfile building part. Improve the draft code.</p> <p>2. Recognize install snippets that follow environment variable declarations. <a href="https://github.com/tern-tools/tern/issues/770">Issue 770</a>.</p> <p><strong>Did I get stuck somewhere?</strong></p> <p>To perform the analysis on multistage dockerfile, i need to learn about how Tern deals with a simple Dockerfile. This is relative to my function designing of the analysis on multistage dockerfile. For now, i have a basic understanding, i need to go deeper while working on analysis on multistage dockerfile.</p>Hazard15020@gmail.com (WangJL)Sat, 25 Jul 2020 15:14:20 +0000https://blogs.python-gsoc.org/en/wangjls-blog/week-5-check-in-3/Week 4 Blog Posthttps://blogs.python-gsoc.org/en/wangjls-blog/week-4-blog-post/<p><strong>What I have done this week</strong></p> <p>1. Give a detailed plan on analyzing multistage dockerfile and work on step1 which is spliting a multistage dockerfile by 'FROM'.</p> <p>2. Add report on branching command.</p> <p><strong>TO DO</strong></p> <p>1. A bug comes with issue #772, tern exists with error while parsing a complicated RUN command. Fix this issue.</p> <p>2. Work on analyzing multistage dockerfile.</p> <p> </p>Hazard15020@gmail.com (WangJL)Sat, 18 Jul 2020 15:29:12 +0000https://blogs.python-gsoc.org/en/wangjls-blog/week-4-blog-post/Week 4 Check-inhttps://blogs.python-gsoc.org/en/wangjls-blog/week-4-check-in-2/<p><strong>What i have done this week</strong></p> <p>Updating functions to implement a shell script parser for the RUN command. </p> <p>1. split_command(): Splite a shell script into statements: variable, command, branch and loop.</p> <p>2. get_shell_commands(): Traverse the statements to pick out command and command in the loop.</p> <p>3. clean_command(): Remove the tab, line inditation and long whitespaces in the command.</p> <p>This PR has been merged, and at the end of week I add a report for branch statement which is previously skipped during parse.</p> <p>With discussion with my mentor, we are now done with the main part of shell script parser. And next week we will move on to another issue. </p> <p><strong>To do</strong></p> <p>Perform analysis on multistage Dockerfiles, issue#612.</p> <p>Here are the steps:</p> <p>1. Split the Dockerfile by stage, making a single Dockerfile for each stage</p> <p>2. Build and analyze each stage</p> <p>3. For reporting, perhaps organize each stage as a different section and indicate that each is a build stage of the next. Pinning a multistage Dockerfile is straightforward.</p> <p>Implement the first step this week.</p> <p><strong>Did i get stuck somewhere?</strong></p> <p>No.</p>Hazard15020@gmail.com (WangJL)Sun, 12 Jul 2020 14:51:10 +0000https://blogs.python-gsoc.org/en/wangjls-blog/week-4-check-in-2/Week 3 Blog Posthttps://blogs.python-gsoc.org/en/wangjls-blog/week-3-blog-post/<p>Sorry for the late post.</p> <p><strong>What i have done this week</strong></p> <p>During the test for the parse command function, i find that there are long whitespaces that will be parse as package name. So i use shlex to remove long whitespaces.</p> <p>First use shlex to split into seperate words, and then use space to join them( ' '.join(shlex.split(command)) ), this will remove long whitespaces, tab and line inditations.</p> <p><strong>TO DO</strong></p> <p>After discussion with my mentor, we are now at the final step of the shell script parser which is updating the functions to enable this parser in the tern.</p> <p><strong>Did i get stuck somewhere?</strong></p> <p>No.</p>Hazard15020@gmail.com (WangJL)Thu, 09 Jul 2020 00:23:43 +0000https://blogs.python-gsoc.org/en/wangjls-blog/week-3-blog-post/Week 3 Check-inhttps://blogs.python-gsoc.org/en/wangjls-blog/week-3-check-in-3/<p><strong>What i have done this week</strong></p> <p>Since we can parse a shell script into statements now. We need to fiter the install command and extact what will be installed in the command.</p> <p>I use filter_install_command() to get installed packages in the commands. It works well with the package-manager commands like apt-get.</p> <p>But for commands like wget, these are downloading or file-manager commands, we need to figure out how to deal with them.</p> <p><strong>TO DO</strong></p> <p>Modify the draft PR to be able to merged. Give a try on parsing commands like wget, tar. Probably adding these commands to snippest.yml.</p> <p><strong>Did i get stuck somewhere?</strong></p> <p>Not yet.</p>Hazard15020@gmail.com (WangJL)Sun, 28 Jun 2020 09:07:45 +0000https://blogs.python-gsoc.org/en/wangjls-blog/week-3-check-in-3/Week 2 Blog Posthttps://blogs.python-gsoc.org/en/wangjls-blog/week-2-blog-post/<p><strong>What i have done this week</strong></p> <p>1. Use clean_command() to remove tabs and line indentations.</p> <p>2. Add code to extract statements for a loop in parse_shell_loop_and_branch().</p> <p>3. After discussion with my mentor, i remove the pipe symbol '|' from the seperators and add command 'export' as a variable assignment.</p> <p>This should finish the first part of my proposal, with the splited concatenated commands we can parse them in the next step.</p> <p>Here this an issue that we can replace variables in our command, i put this aside and will hanlde this after we finish the whole proposal.</p> <p><strong>TO DO</strong></p> <p>Make a plan on how to parse the  concatenated commands. I will give a try on the current filter_install_command() function, and give some feedback on this.</p> <p><strong>Did i get stuck somewhere?</strong></p> <p>How to handle commands like `wget curl`, they are download commands, but how can we tell they are installing something. This remains to be discussed.</p> <div style=""> <div class="gtx-trans-icon"> </div> </div>Hazard15020@gmail.com (WangJL)Fri, 19 Jun 2020 12:25:50 +0000https://blogs.python-gsoc.org/en/wangjls-blog/week-2-blog-post/Week 2 Check-inhttps://blogs.python-gsoc.org/en/wangjls-blog/week-2-check-in-5/<p><strong>What I have done this week</strong></p> <p>This week, I have finished the spliting part of the concatenated into variable, command, branch and loop, and preparing for futher review. </p> <p>In the spliting part, the concept is matching the key words:</p> <p>1. loop: (a) for ...... done; (b) while ...... done; (c) util ...... done;</p> <p>2. branch: (a)case ...... esac; (b) if ...... fi;</p> <p>3. varibale: find '=' in one line command.</p> <p>4. command: the rest are commands.</p> <p>This is done by using python string function startswith() and endwith(), and find '=' can be done by regex.</p> <p>During the test, I have met some corner cases where the command contains '\'. '\' stands for continue line in a shell script, our parser can not remove it so it may cause error when finding the possible installed software.</p> <p>After the discussion with my mentor, we will put this aside and move on to parse the command. Most commands do not contain '\', and the command containing '\' is not an installing command so far in our test file.</p> <p>We will consider this corner case after we finished the main function.</p> <p> </p> <p><strong>TO DO</strong></p> <p>Make necessary modifications on the code with mentor, and move on to parse the command.</p> <p>Hopefully we can use the existing funciton to finish the parse.</p> <p> </p> <p><strong>Thoughts</strong></p> <p>GSOC is a very good opportunity for me to be a part of open source project, this project is very cool and i am enjoying it.</p>Hazard15020@gmail.com (WangJL)Sun, 14 Jun 2020 13:41:54 +0000https://blogs.python-gsoc.org/en/wangjls-blog/week-2-check-in-5/Week 1 Blog Posthttps://blogs.python-gsoc.org/en/wangjls-blog/week-1-blog-post/<p><strong>What have done this week</strong></p> <p>To split a shell script into concatenated commands, i need to find the seperators which are</p> <p>1. ;("A ; B" Run A and then B, regardless of success of A)</p> <p>2. &amp;&amp;("A &amp;&amp; B" Run B if A succeeded)</p> <p>3. ||("A || B" Run B if A failed)</p> <p>Besides, i added more seperators while spliting:</p> <p>1. :;(stands for TRUE, can be skipped while spliting)</p> <p>2. | (pipe)</p> <p>During the parse, i have met a situation where the seperator is in a quoted string. In this case, the seperator should not be splited, so we need to skip single and double quote in the script.</p> <p>I use Backtracking Control Verbs to solve this issue. It can be used to skip certain pattern while matching. Here is the templete:</p> <p><u>unwanted_pattern(*SKIP)(*F)|wanted_pattern</u></p> <p>The regex engine will first match the unwanted part, after matched, it will have a bracktrack. During this period, the engine come to (*SKIP)(*F) which indicated abandoning matchinng result and jumping to the rest of text to continue match. So we can use this to skip single and double quote and combined to split shell script.</p> <p><strong>TO DO</strong></p> <p>1. Split the concatenated into variable, command, branch and loop.</p> <p>2. Extract info from loop.</p> <p>3. Parse the command.</p>Hazard15020@gmail.com (WangJL)Sun, 07 Jun 2020 05:47:38 +0000https://blogs.python-gsoc.org/en/wangjls-blog/week-1-blog-post/Week 1 Check-inhttps://blogs.python-gsoc.org/en/wangjls-blog/week-1-check-in-2/<p><strong>Completed tasks</strong></p> <p>During the community bonding period, i am working on the first step of my proposal. I have used shlex to split the shell script into tokens, and then find the seperator(&amp;&amp;|;) to concatenate the commands. After the review from my mentor, we find that we can improve the code. We do not need to split into tokens at first. Instead, we can directly find the seperator(&amp;&amp;|;) to seperate the commands. This will save a lot of time, since we are not going through every word in the shell script.</p> <p><strong>To do</strong></p> <p>Use split_command function and add function to split branch(if and case), and loop(for and while).  We will leave the branch not to be parsed since we do not know which branch to be executed. For loop, we will futher develop a function to extract info from it. Here are 2 small steps i am going to do.</p> <p>1. split command(using seperator(&amp;&amp;|;)), split branch and loop(finding keywords).</p> <p>2. Extract loop. This is a simple version, we will just extract the commands without considering the loop.</p> <p><strong>Notes</strong></p> <p>1. Parsing command has already been implemented so this part is a key point. After the following 2 small steps finished, we should be able to parse the shell script in a Dockerfile RUN command to find what software maybe installed.</p> <p>2. Try to break big issue into small independent ones.</p>Hazard15020@gmail.com (WangJL)Fri, 29 May 2020 05:02:51 +0000https://blogs.python-gsoc.org/en/wangjls-blog/week-1-check-in-2/