BOOK THIS SPACE FOR AD
ARTICLE ADThis article has been indexed from Security Boulevard
This post will shed some light on how we were able to optimise one of our frontends, reducing the typical project’s run time by half. We’ll also take a look at some of the pitfalls we encountered and how we can apply our changes to other projects as well.
Background
We’re maintaining multiple frontends for converting source code and binary applications into their corresponding CPG representation. One frontend in particular, called go2cpg for converting Go source code, was found to be somewhat slower than expected on certain bigger repositories.
What we did
Since we generally have to test language frontends on publicly available code, we’re describing here all the steps based on the open-policy-agent/opa repository, which is big enough, that even on a beefy machine the conversion will take a few seconds (in this case, all numbers reported below from an 8 core / 16 thread CPU, with 64 GB RAM available and a fairly fast M.2 NVMe SSD.
The initial state was this: Running the go2cpg generate command (which you’ll generally not see like this, it’s wrapped in our command line tool when you invoke it with sl analyze –go …) wasn’t as quick as we’d have hoped:
go2cpg generate ./...41.47s user 1.20s system 270% cpu 15.774 total
41.69s user 1.06s system 277% cpu 15.405 total
41.86s user 1.20s system 271% cpu 15.884 total
42.07s user 1.16s system 271% cpu 15.910 total
42.11s user 1.15s system 273% cpu 15.805 total
avg. 41.84s user 15.756s total
These measurements were taken by hand for illustration purposes, but for benchmarking we’ve adopted hyperfine, which in our case was then invoked like this:
hyperfine --warmup 3 "go2cpg generate -o /tmp/cpg.bin.zip ./..."Note here we’re adding a few warm-up runs to reduce any chances of disk caching or perhaps even some operations of the Go toolchain interfering with measurements. The /tmp directory was also set to a RAM disk – unfortunately /dev/null can’t be used because of how the ZIP archive gets written.
Our baseline was therefore round about sixteen seconds total.
Because we had already investigated this topic earlier, it was clear there were still a few areas to investigate, in particular further reducing the impact of compression and Prot […]