On Impact in Software Engineering Research (HU Berlin 2021)

@AndreasZeller ON IMPACT INSOFTWARE ENGINEERING RESEARCH ANDREAS ZELLER, CISPA HELMHOLTZ CENTER FOR INFORMATION SECURITY HU BERLIN WORKSHOP "SE FORSCHUNGSMETHODENTRAINING" MARCH 22, 2021

@AndreasZeller ANDREAS ZELLER: KEYFACTS • PhD in 1997 on Con fi guration Management with Feature Logic • Since 2001, Professor at Saarland University, Saarbrücken, Germany • Since 2019, Faculty at CISPA Helmholtz Center for Information Security • ACM Fellow in 2010 • ERC Advanced Grant in 2011 • SIGSOFT Outstanding Research Award in 2018 • Last week, got my sixth 10-year impact award

@AndreasZeller WHAT IS IMPACT? •How do your actions change the world? • We want to make the world a better place • Gives meaning and purpose to our (professional) life

@AndreasZeller WHAT MAKES IMPACTFULRESEARCH? • Intellectual challenge – was it hard, or could anyone have done this? • Elegance – is your research speci fi c to a context, or can it be reused again and again? • Usefulness – can someone make money with it? • Innovation is the delta in any of these metrics

@AndreasZeller SOFTWARE ENGINEERING RESEARCH   ASSEEN FROM OUTSIDE From a Programming Languages perspective, Software Engineering lacks intellectual challenge From a Security perspective, Software Engineering is all theory From an Industry perspective, Software Engineering lacks immediate usefulness From a Formal Methods perspective, Software Engineering is inelegant and full of uncontrollable details

@AndreasZeller MY PATH TOIMPACT • Life can only be understood backwards; but it must be lived forwards   (Søren Kierkegaard)

@AndreasZeller CONFIGURATION MANAGEMENT   WITH FEATURELOGIC (1991–1997) • Topic de fi ned by my PhD advisor Gregor Snelting • Idea: Formally describe variants and revisions with feature logic • “A uni fi ed model for con fi guration management” 3.3 Combining Delta Features and other Features . . . . . . . . . . bug fixed f function bug fixed f function bug fixed f function procedure bug fixed f function procedure bug fixed f function bug fixed f procedure Figure 8: Delta features and other features . This is only natural, since the change relies on the presence of the change. Adding a new revision under implies that the revision is now tagged w , so that the generalization property is not violated. Indeed, .

10.

11.

@AndreasZeller FEATURE LOGIC: LESSONSLEARNED • You can get plenty of papers accepted • even if you miss the problem • even if you neither prove nor evaluate • “Modeling for the sake of modeling” • Enabled much of my later work, though

12.

@AndreasZeller WHAT TO DOAFTER PHD • During PhD, found standards and topics at German IT companies disappointing • Academia seemed good alternative • Socialized by open source development

13.

@AndreasZeller DDD (1994–1999) • DuringPhD, programmed a lot • Debugging was hard! • Built the DDD debugger GUI   with Dorothea Lütkehaus • Welcome change from formal work

14.

@AndreasZeller DDD (1994–1999) • DDDwas among the fi rst dev tools   with a “professional” GUI • Downloaded by the tens of thousands • Adopted as a GNU project:   Street credibility with developers • Impact through usefulness

15.

@AndreasZeller DDD: LESSONS LEARNED •Work on a real problem • Assume as little as possible • Keep things simple – “real” as in “real world”, not “real papers” – make things fit into real processes – simple to use + deploy = impact

16.

@AndreasZeller DELTA DEBUGGING (1999–2003) •After PhD, looking for new topic • Delta Debugging brought together debugging and version control • Isolate failure causes through repeated experiments

17.

@AndreasZeller DELTA DEBUGGING (1999–2003) •Delta debugging was a bomb • Easy to teach + understand • 7 lines of algorithm   (and 25 lines of Python) • Spent two years on these                              (c! ✔, c! ✘) if |∆| = 1 dd! (c! ✘ ∆i, c! ✘, 2) if ∃i ∈ {1..n} · test(c! ✘ ∆i) = ✔ dd! (c! ✔, c! ✔ ∪ ∆i, 2) if ∃i ∈ {1..n} · test(c! ✔ ∪ ∆i) = ✘ dd!% c! ✔ ∪ ∆i, c! ✘, max(n − 1, 2) & else if ∃i ∈ {1..n} · test(c! ✔ ∪ ∆i) = ✔ dd!% c! ✔, c! ✘ ∆i, max(n − 1, 2) & else if ∃i ∈ {1..n} · test(c! ✘ ∆i) = ✘ dd!% c! ✔, c! ✘, min(2n, |∆|) & else if n < |∆| (“increase granularity”) (c! ✔, c! ✘) otherwise dd(c✔, c✘) = dd! (c✔, c✘, 2) dd! (c! ✔, c! ✘, n) =

18.

@AndreasZeller DELTA DEBUGGING: LESSONSLEARNED • Work on a real problem • Assume as little as possible • Keep things simple • Have a sound model – Version control? tests? Never heard of it – simple to build + teach = impact – DD was my version model reborn – Why debug? We build correct software

19.

@AndreasZeller MINING SOFTWARE ARCHIVES(2003–2010) • In the early 2000s, open-source version repositories became available • Stephan Diehl saw an opportunity for visualization and approached me • Quickly expanded into data mining • Tom Zimmermann: our MSc student • Work of a research team

20.

@AndreasZeller MINING SOFTWARE ARCHIVES(2003–2010) • Our 2004 paper was the fi rst ICSE paper on mining software archives • Handful of competing groups;   instant hit • MSR now a conference on its own • Paper has ≥1,500 citations so far • Impact at Microsoft, Google, SAP…

21.

@AndreasZeller MINING SOFTWARE REPOSITORIES: LESSONSLEARNED • Work on a real problem • Assume as little as possible • Keep things simple • Have a sound model • Keep on learning – Empirical research is core field of SE – simple parsers for multiple languages – essence of 2004 paper is one line of SQL – retrieval, precision, recall, etc, etc – NLP, data mining, machine learning

22.

@AndreasZeller MINING SOFTWARE ARCHIVES(2003–2010) • We are now after the gold rush • Data still exciting (if you have some) • Few new insights on old data • Get out of a fi eld when too crowded 3.5 Programmer Actions and Defects Now that we know how to predict defects, can we actually prevent them? Of course, we could focus quality assurance on those files predicted as most defect-prone. But are there also constructive ways to avoid these defects? Is there a general rule to learn? For this purpose, let us now focus on H2: Is there a correlation between individual actions (= keystrokes) and defects? For this purpose, we would search for correlations between the count of the 256 characters and the overall post-defect count per file; our null hypothesis would be: H0. There is no correlation between character distribution and defect-proneness. After a number of preliminary experiments, we focused on the Eclipse 3.0 dataset. It is well known that most metrics of software do not follow a normal distribution and our measures of key- strokes are no exception. The distributions of characters appear to have an exponential rather than a power-law character. Nonethe- less, due to the heavily skewed distribution, we used a standard non-parametric approach with the Spearman rank correlation. Of course, with so many metrics (one for each character), we run the risk of identifying spurious correlations, and we thus employed p- 3.6 Preventing Defects Our results show a strong correlation between specific pro- grammer actions (keystrokes I, R, O, and P) and defects. Figure 2: Color-coding keys by their defect correlation; (red = strong). The five strongest correlations are highlighted. Figure 3: Defect correlation for the 26 lower-case letters.

23.

@AndreasZeller MUTATION TESTING (2008–2011) •Is bug density related to test quality? • Mutation testing gave us the big problem of equivalent mutants • Built a solution for it (Javalanche) • In our 2011 paper, we even   nixed the need for mutation testing • 10-year impact award last week

24.

@AndreasZeller MUTATION TESTING: LESSONSLEARNED • Work on a real problem • Assume as little as possible • Keep things simple • Have a sound model • Keep on learning • Keep on moving – Mutation testing did not work for us – Assume we can do dynamic slices – Build slices for assertions, mutations – Impact through dependencies – Mutations, Java semantics – Where's the pain? What can you do?

25.

@AndreasZeller MORE THINGS IDID (AND DO!) • Automatic repair • Automatic parallelization • Automatic website testing • App mining and analysis • Fuzzing and test generation • Literate programming – Wesley + Claire beat us to it – Struggled with complexity – Built a company for that – Mining grammars for 1,000x faster fuzzing – Rethinking the ways be build software – Deployed at Google, Microsoft, …

26.

27.

28.

29.

@AndreasZeller TEACHING = CODING= EXPERIMENTING • I teach automated testing + debugging through annotated code • With Jupyter, I can build state-of-the-art techniques within hours • Along the way, I learn their potential + their issues (= new research!) • Side effect: My students get blueprints for further prototyping • Interactive textbooks: fuzzingbook.org, debuggingbook.org

30.

@AndreasZeller PROTOTYPING: LESSONS LEARNED •Work on a real problem • Assume as little as possible • Keep things simple • Have a sound model • Keep on learning • Keep on moving • Build prototypes – Organize the research process – Python is like pseudocode – Simple also communicates better – Teach through interactive examples – How to teach through coding – Literate programming is fun – Learn pain points + get algorithms right first

31.

@AndreasZeller THINGS I STAYEDAWAY FROM • Software processes • Formal methods • Modeling • Architecture • Work on a real problem • Assume as little as possible • Keep things simple • Have a sound model • Keep on learning • Keep on moving • Build prototypes

32.

@AndreasZeller THINGS I STAYEDAWAY FROM • Software processes • Formal methods • Modeling • Architecture • What is the problem? • How can you have impact? • How do you measure   your impact? – The problems are huge and there's so little I can do

33.

@AndreasZeller MEASURING IMPACT • Howdo your actions change the world? • Hard to measure in advance, so our employers go for   bean counting (publications, money, students, citations, …) – from @measuremania What actually   matters What can be

34.

• But youwant to be known for your tool, your algorithm, your book • You will not be remembered for doing well in a metric – please cite this frequently What actually   matters What can be

35.

• There aremany people who have done   the same or better – but with less success • We know too little about these

36.

37.

@AndreasZeller IMPACT AS ARESEARCHER • Society funds research to take risks that no one else does • Research is risky by construction –   you should expect to fail, and fail again • Tenure is meant to allow you to take arbitrarily grand challenges –   so work on the grand stuff • Build things! Try out things! Find the pain points! What can you do? • If you lack resources: abstract + prototype things

38.

@AndreasZeller IMPACT AS ATEACHER • Teaching can be a great way to multiply your message • Not only focus on teaching the standards, but also your research • Teaching your research helps to propagate it and make it accessible • Turn your research into well-annotated code you can use in teaching • Engage students on topics dear to you

39.

@AndreasZeller IMPACT WITH INDUSTRY •Do work with industry to fi nd problems and frame your work • Do not work with industry to solve (their) concrete problems • Your role as researcher is more than a cheap consulting tool • Many “research” funding schemes are there to subsidize industry

40.

@AndreasZeller IMPACT THROUGH TOOLS •Getting your technique out as a tool is a great way to have impact! • Also allows to check what actual users need (and if they exist) • A tool can have far more impact than a paper • Funding agencies and hiring committees begin to realize this

41.

@AndreasZeller IMPACT AS FOUNDER •Creating a company out of your research can be great fun! • Allows you to push your research and ideas into practice • Again, shows you what the market wants (and what not) • Plenty of monetary and consultancy support available

42.

@AndreasZeller IMPACT AS MENTOR •Working with advanced students (MSc, PhD, PostDoc) can be the most satisfying part of your job • The variety of SE research needs universal problem solving skills • Find such skills besides good grades

43.

@AndreasZeller A GREAT ENVIRONMENT •My university (Saarland / Saarbrücken) hired me for a tenured position   although I was the candidate with the fewest publications • But they liked the papers, so they hired me • No pressure or incentives on papers, citations, funding, etc. • One single expectation: long-term impact • Worked. – CISPA aims for long-term impact too, btw

44.

@AndreasZeller ON IMPACT INSOFTWARE ENGINEERING RESEARCH ANDREAS ZELLER, CISPA HELMHOLTZ CENTER FOR INFORMATION SECURITY

45.

@AndreasZeller • Work ona real problem • Assume as little as possible • Keep things simple • Have a sound model • Keep on learning • Keep on moving • Build prototypes ON IMPACT IN SOFTWARE ENGINEERING RESEARCH ANDREAS ZELLER, CISPA HELMHOLTZ CENTER FOR INFORMATION SECURITY – Try out new stuff, again and again – Learn pain points + get algorithms right first – “real” as in “real world”, not “real papers” – Make things fit into real processes – Complexity impresses, but prevents impact – Logics, retrieval, formal languages, etc etc – Classifiers, statistics, user studies, …