Niraj-Kamdar's Blog

GSoC: Week 13: Create GitHub Action

Niraj-Kamdar
Published: 08/24/2020

What did I do this week?

I was working on documentation this week. I have added an example GitHub action workflow so that users can easily integrate CVE Binary Tool in their CI/CD pipeline. I am using actions/setup-python to run CVE Binary Tool and actions/cache to cache database and dependencies to decrease CI runtime. In example, I am using latest version of CVE Binary Tool because current stable version lacks many features like config file and html report. I am using actions/artifact to  upload generated report as Github artifact which can be downloaded later.

I have also made a pull-request to integrate caching in our CI. It can help reduce CI runtime a little.

What am I doing this week? 

I am going to start building final project report this week and I will complete it  before 31st August.

Have I got stuck anywhere?

No, I didn't get stuck this week.

View Blog Post

GSoC: Week 12: Scanning docker

Niraj-Kamdar
Published: 08/17/2020

What did I do this week?

I was working on documentation this week. I have added how-to guide for scanning a docker image which was requested by our user. I have listed 2 different ways to scan a docker image:

  1. Install cve-bin-tool inside a docker instance and scan the directory just how you would normally and export report to the host.
  2. Export directory you want to scan from container to host and scan it on the host

I have also discussed pros and cons of both methods. I have also found out that when multiple file contains same product, CVEScanner perform unnecessary database IO and It can be performance bottleneck. So, I have short-circuited the flow in case product has already been scanned. I have also fixed filename generation bugs mentioned by Harmandeep Singh. I have also reviewed exclude path PR.

What am I doing this week? 

I have some documentation part left to do and I am also going to improve tests for module I have created and will also go through entire code base and add appropriate comments and docstrings for new contributors in these last 2 weeks.

Have I got stuck anywhere?

No, I didn't get stuck this week.

View Blog Post

GSoC: Week 11: InputEngine.add(paths)

Niraj-Kamdar
Published: 08/11/2020

Hello guys, 

What did I do this week?

After we added support for file paths in output. I have found out a bug which was breaking cve_scanner whenever we use --input-file flag for scanning CVEs from CSV or JSON file. I have also found out several other issues in the previous structures which is specified below: 

  1. Old CVEData was NamedTuple and since newly added path attribute was mutable it can create hard to find bugs. 
  2. To update path we need to scan all_cve_data to find product for which we want to append paths.
    Time Complexity: O(n**2) which can be reduced to O(n) using better structure.
  3. Throwing vendor, product, version in different function was decreasing readability. So, ProductInfo would be nice to pack this data together since we never need that alone.
  4. TriageData structure wasn't syncing with old CVEData. So, csv2cve or input_engine was breaking.

So, I have decided to change current structure to handle all these issues. Previously all_cve_data was Set[CVEData] which was sufficient then because all attributes are immutable in CVEData and we are just using set to remove duplicates from output. But, when we introduce paths attribute we need to change paths everytime we detect same product in different time and set doesn't have any easy way(Set isn't made for storing mutable type) to get value stored in it apart from looping over whole set to find what we are looking for. So, I have refactor structure into two parts: 1) immutable ProductInfo(vendor, product, version) and 2) mutable CVEData(list_of_cves, paths_of_cves). And I am storing mapping of ProductInfo and CVEData into all_cve_data so now we can access CVEData of a product without having to traverse whole all_cve_data. Also, I have moved all data structures into utils to avoid circular imports. I have also added test for paths.

What am I doing this week? 

I am continue to improve documentation of the code I generated like adding docstrings and comments. And I am also going to add requested how-to guides to improve User Experience. 

Have I got stuck anywhere?

No, I didn't get stuck this week.

View Blog Post

GSoC: Week 10: ''' Documentation '''

Niraj-Kamdar
Published: 08/03/2020

Hello guys, 

I hope you all doing great. Today, I am going to talk about what I did in this week.

What did I do this week?

I am working on documentation of code I have produced during the first two phases. I have changed user manual and readme. I am also going to change other documentation. I have created user manual for new input engine features and config file feature.

What am I doing this week? 

I have talked with a user and we come to conclusion that our documettion lacks some important How-to guides which are necessary as mentioned by Daniele Procida in his amazing PyCon talk. So, I am going to create a How-to directory inside our doc folder which will contain interesting recipes for different usecases. Ex:

  1. How to change theme of html?
  2. How to add custom checker (out of tree checker)?
  3. How to scan docker image?
  4. How to parallel scan?

Have I got stuck anywhere?

No, I didn't get stuck anywhere this week.

 

View Blog Post

GSoC: Week 9: ConfigParser()

Niraj-Kamdar
Published: 07/26/2020

What did I do this week?

I have done research on various configuration file formats and compiled outcomes of it in a issue:  Discussion: Configuration file format. Some users recommended INI files because it is very old and still popular among masses but  INI file does not have any built-in type support and It also lacks formal specification. It parses everything as string. So, we have to process data parsed by configparser to convert it into something usable.
Our example data can be parsed as following dictionary:

{
    "checker": {
        "runs": "[curl,binutils]",  # This has to be transformed into list 
        "skips": "[python,bzip2]"
    },
    "input": {
        "directory": "test/assets",
        "input_file": "test/csv/triage.csv"
    },
}

So, parsing INI file won't be as easy as TOML or YAML which supports complex datatypes by default. It is also not easy to parse other datatypes like integer, float etc.

TOML is very similar to INI file and TOML also supports complex data types by default.

{
    'checker': {
        'runs': ['curl', 'binutils'],  # this is correctly parsed as list
        'skips': ['python', 'bzip2']
    },
    'input': {
        'directory': 'test/assets',
        'input_file': 'test/csv/triage.csv'
    },
}

I concluded that TOML and YAML are both very easy to read and write by both machine and human. So, we should use one of them. We have discussed which format to use in meeting and my mentors had various opinions on it. Summary of our discussion was: "The top contenders among our team seem to be TOML (readable, familar to python folk and close enough to INI for skill transfer for windows folk) and YAML (which might be a better fit for the dev-ops community that we hope will be among the biggest users of cve-bin-tool)."

Since Parsers for both formats produce similar python structures, I have created ConfigParser class which can parse both YAML and TOML file format. I have also added basic tests for it. I have also changed architecture of main function of cli.py to add support for config files and I also made sure that option given from terminal get preference over config option. I am also going to add tests for this. I have also fixed quiet mode bugs.

What am I doing this week? 

I am going to write tests for config files in test_cli.py and since I have completed almost all work related to InputEngine, I think it's good time to document it. 

Have I got stuck anywhere?

Yes, I need my Quiet mode bug fix PR merged since I have changed TestCLI in it and I need latest TestCLI for testing ConfigParser.

 

View Blog Post
DJDT

Versions

Time

Settings from gsoc.settings

Headers

Request

SQL queries from 1 connection

Static files (2312 found, 3 used)

Templates (28 rendered)

Cache calls from 1 backend

Signals

Log messages