posted on 2023-11-29, 18:07authored byEzekiel Soremekun, Esteban Pavese, Nikolas Havrikov, Lars Grunske, Andreas ZellerAndreas Zeller
Grammars can serve as producers for structured test inputs that are
syntactically correct by construction. A probabilistic grammar assigns
probabilities to individual productions, thus controlling the
distribution of input elements. Using the grammars as input parsers, we
show how to learn input distributions from input samples, allowing to
create inputs that are similar to the sample; by inverting the
probabilities, we can create inputs that are dissimilar to the sample.
This allows for three test generation strategies:
1) “Common inputs” – by learning from common inputs, we can create
inputs that are similar to the sample; this is useful for regression
testing.
2) “Uncommon inputs” – learning from common inputs and inverting
probabilities yields inputs that are strongly dissimilar to the sample;
this is useful for completing a test suite with “inputs from hell” that
test uncommon features, yet are syntactically valid.
3) “Failure-inducing inputs” – learning from inputs that caused failures
in the past gives us inputs that share similar features and thus also
have a high chance of triggering bugs; this is useful for testing the
completeness of fixes.
Our evaluation on three common input formats (JSON, JavaScript, CSS)
shows the effectiveness of these approaches. Results show that “common
inputs” reproduced 96% of the methods induced by the samples. In
contrast, for almost all subjects (95%), the “uncommon inputs” covered
significantly different methods from the samples. Learning from
failure-inducing samples reproduced all exceptions (100%) triggered by
the failure-inducing samples and discovered new exceptions not found in
any of the samples learned from.
History
Preferred Citation
Ezekiel Soremekun, Esteban Pavese, Nikolas Havrikov, Lars Grunske and Andreas Zeller. Inputs from Hell: Learning Input Distributions for Grammar-Based Test Generation. In: IEEE Transactions on Software Engineering. 2020.
Primary Research Area
Secure Connected and Mobile Systems
Legacy Posted Date
2020-08-03
Journal
IEEE Transactions on Software Engineering
Open Access Type
Green
Sub Type
Article
BibTeX
@article{cispa_all_3167,
title = "Inputs from Hell: Learning Input Distributions for Grammar-Based Test Generation",
author = "Soremekun, Ezekiel and Pavese, Esteban and Havrikov, Nikolas and Grunske, Lars and Zeller, Andreas",
journal="{IEEE Transactions on Software Engineering}",
year="2020",
}