How much code do you need? (and can you debloat the rest?)

Benoit Baudry

Summer 2019

50th anniversary of Apollo 11

The Apollo 11 AGC
An iconic, small program

Small programs after 1969?

today we don't write much code, we reuse a lot
sqllite driver
microcontrollers code
commodore 64

Small Programs after 1969

The line mode browser was pretty small
Competition code, e.g.,
Flip dots with feelings in 1021 bytes
A tiny C program
Programs for very small things

Application code is big

Firefox is big
The Java Virtual Machine is big
ffmpeg is big (and very customizable :)
GAFAs are big

Why is code growth a problem?

code growth is an issue for code quality
big code is difficult to maintain
code growth can be a result of making the code easier to maintain
bandwidth issues when the code is downloaded
More dificult to understand and thus more difficult to contribute
I'd add psychological aspects: developing new code is more fun than maintaining it... so it grows (because many work on development), but nobody (or few) want to maintain it09:30
building the project (compile, test, package) becomes a problem for large code bases (Cf. the AMA session of yesterday ;-)

Why is code growth a problem?

Wikipedia's JavaScript initialisation on a budget
Stripping dependency bloat in VictoriaMetrics Docker image
Removing Kode
Reduce attack surfaces

Why does code size grow?

refactoring can make code size grow
reusing libraries that have a larger API than the one we actually need
adding new features
adding new platform support
obfuscation / make your code look complicate
keep all versions live / backward compatibility
code cloning
we don't remove old code that we don't use anymore (just in case :)
some managers expect code growth, it's a sign of progress :)

Code debloating techniques

Cimplifier: Automatically Debloating Containers ESEC/FSE 2017.
Binary Control-Flow Trimming CCS 2019.
Is Static Analysis Able to Identify Unnecessary Source Code? TOSEM 2020.
Slimium: Debloating the Chromium Browser with Feature Subsetting CCS 2020.

A Comprehensive Study of Bloated Dependencies in the Maven Ecosystem

Intuition: package managers, automatic build encourage software reuse and introduce bloated dependencies

A Comprehensive Study of Bloated Dependencies in the Maven Ecosystem

Let's look at one build file, for the jxls library
as well as a build file for a dependency of jxls

A Comprehensive Study of Bloated Dependencies in the Maven Ecosystem

9K artefacts and 700K dependencies
75% of dependencies are bloated
Developers care
removed 131 dependencies in 30 projects
experiments at SAP and Ericsson ongoing
DepClean Maven dependency debloating tool
To appear in EMSE journal, 2020.

Conclusion

There is lots of code bloat
from libc to Chrome
caused by reuse, feature creep, usage, etc.
Software developers care
for security
for performance
It is a relevant research topic
that is hard
that matters

Thank you!

This work is a collaboration with César Soto-Valero, Thomas Durieux, Nicolas Harrand, Martin Monperrus, at the KTH Royal Institute of Technology and is supported by the WASP program.