How much code do you need? === (and can you debloat the rest?) [Benoit Baudry](https://softwarediversity.eu/) baudry@kth.se --- ![winter](https://softwarediversity.eu/kth-benevol.jpg) --- Summer 2019 --- ![summer](https://softwarediversity.eu/summer2019.jpg =1000x580) --- Summer 2019 --- ![apollo](https://www.archives.gov/files/news/images/apollo-11-aldrin-on-moon-banner.jpg) --- 50th anniversary of Apollo 11 --- - The Apollo 11 [AGC](https://github.com/chrislgarry/Apollo-11) - An iconic, [small](https://fedtechmagazine.com/article/2019/07/tech-behind-apollo-11s-guidance-computer) program --- Small programs after 1969 ? --- * today we don't write much code, we reuse a lot * sqllite driver * microcontrollers code * commodore 64 --- Small Programs after 1969 --- - The [line mode browser](https://line-mode.cern.ch/) was [pretty small](https://github.com/w3c/libwww) - Competition code, e.g., - [Flip dots with feelings in 1021 bytes](http://www.p01.org/MONOSPACE/) - [A tiny C program](https://bellard.org/mersenne.html) - Programs for _very_ small things --- Application code is big --- * [Firefox](https://www.openhub.net/p/firefox) is big * The [Java Virtual Machine](https://www.openhub.net/p/openjdk) is big * [ffmpeg](https://www.openhub.net/p/ffmpeg) is big (and very [customizable](https://gist.github.com/tayvano/6e2d456a9897f55025e25035478a3a50) :) * [GAFAs are big](https://www.visualcapitalist.com/millions-lines-of-code/) --- Why is code growth a problem ? --- * code growth is an issue for code quality * big code is difficult to maintain * code growth can be a result of making the code easier to maintain * bandwidth issues when the code is downloaded * More dificult to understand and thus more difficult to contribute * I'd add psychological aspects: developing new code is more fun than maintaining it... so it grows (because many work on development), but nobody (or few) want to maintain it09:30 * building the project (compile, test, package) becomes a problem for large code bases (Cf. the AMA session of yesterday ;-) ) --- Why is code growth a problem ? --- * [Wikipedia's JavaScript initialisation on a budget](https://phabricator.wikimedia.org/phame/post/view/175/wikipedia_s_javascript_initialisation_on_a_budget/) * [Stripping dependency bloat in VictoriaMetrics Docker image](https://valyala.medium.com/stripping-dependency-bloat-in-victoriametrics-docker-image-983fb5912b0d) * [Removing Kode](https://cacm.acm.org/magazines/2020/12/248794-removing-kode/fulltext) * [Reduce attack surfaces](https://www.onr.navy.mil/en/Media-Center/Press-Releases/2016/Software-Bloat) --- Why does code size grow? --- * refactoring can make code size grow * reusing libraries that have a larger API than the one we actually need * adding new features * adding new platform support * obfuscation / make your code look complicate * keep all versions live / backward compatibility * code cloning * we don't remove old code that we don't use anymore (just in case :) * some managers expect code growth, it's a sign of progress :) --- Code debloating techniques --- * [Cimplifier: Automatically Debloating Containers](http://pages.cs.wisc.edu/~vrastogi/static/papers/rddjm17.pdf). ESEC/FSE 2017. * [Binary Control-Flow Trimming](https://www.researchgate.net/profile/Kevin_Hamlen/publication/334735194_Binary_Control-Flow_Trimming/links/5d3e5a52299bf1995b53cf27/Binary-Control-Flow-Trimming.pdf). CCS 2019. * [Is Static Analysis Able to Identify Unnecessary Source Code?](https://www.cqse.eu/publications/2020-unnecessary-code-tosem.pdf). TOSEM 2020. * [Slimium: Debloating the Chromium Browser with Feature Subsetting](https://gts3.org/assets/papers/2020/qian:slimium.pdf). CCS 2020. --- [A Comprehensive Study of Bloated Dependencies in the Maven Ecosystem](https://arxiv.org/pdf/2001.07808) --- * Intuition: package managers, automatic build encourage software reuse and introduce bloated dependencies --- [A Comprehensive Study of Bloated Dependencies in the Maven Ecosystem](https://arxiv.org/pdf/2001.07808) --- * Let's look at one [build file](https://repo1.maven.org/maven2/org/jxls/jxls-poi/1.0.15/jxls-poi-1.0.15.pom), for the [jxls library](https://github.com/jxlsteam/jxls/) * as well as a [build file for a dependency of jxls](https://repo1.maven.org/maven2/org/apache/poi/poi/3.17/poi-3.17.pom) --- [A Comprehensive Study of Bloated Dependencies in the Maven Ecosystem](https://arxiv.org/pdf/2001.07808) --- * 9K artefacts and 700K dependencies * 75% of dependencies are bloated * Developers care * removed 131 dependencies in 30 projects * experiments at SAP and Ericsson ongoing * [DepClean Maven dependency debloating tool](https://github.com/castor-software/depclean) * To appear in EMSE journal, 2020. --- Conclusion --- * There is lots of code bloat * from libc to Chrome * caused by reuse, feature creep, usage, etc. * Software developers care * for security * for performance * It is a relevant research topic * that is hard * that matters --- # Thank you! This work is a collaboration with [César Soto-Valero](https://www.cesarsotovalero.net/), [Thomas Durieux](https://durieux.me/), [Nicolas Harrand](https://nharrand.github.io/https://www.monperrus.net/martin/), [Martin Monperrus](https://www.monperrus.net/martin/), at the [KTH Royal Institute of Technology](https://www.kth.se/en) and is supported by the [WASP program](https://wasp-sweden.org/) --- More reads --- * [Living review on code debloat](https://www.cesarsotovalero.net/software-debloating-papers) * [Removing code not covered in production](https://carlosbecker.com/posts/production-code-coverage-jacoco) * [Shrinking a Kotlin binary by 99.2%](https://jakewharton.com/shrinking-a-kotlin-binary/) * [unikernels](https://people.cs.pitt.edu/~babay/courses/cs3551/papers/asplos13-unikernels.pdf)
{}