One of the key properties of a program is its input specification. Having a formal input specification can be critical in fields such as vulnerability analysis, reverse engineering, software testing, clone detection, or refactoring. Unfortunately, accurate input specifications for typical programs are often unavailable or out of date.
In this paper, we present a general algorithm that takes a program and a small set of sample inputs and automatically infers a readable context-free grammar capturing the input language of the program. We infer the syntactic input structure only by observing access of input characters at different locations of the input parser. This works on all stack based recursive descent input parsers, including parser combinators, and works entirely without program specific heuristics. Our Mimid prototype produced accurate and readable grammars for a variety of evaluation subjects, including complex languages such as JSON, TinyC, and JavaScript.
History
Preferred Citation
Rahul Gopinath, Björn Mathis and Andreas Zeller. Mining Input Grammars from Dynamic Control Flow. In: European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 2020.
Primary Research Area
Secure Connected and Mobile Systems
Name of Conference
European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)
Legacy Posted Date
2020-06-09
Open Access Type
Unknown
BibTeX
@inproceedings{cispa_all_3101,
title = "Mining Input Grammars from Dynamic Control Flow",
author = "Gopinath, Rahul and Mathis, Björn and Zeller, Andreas",
booktitle="{European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)}",
year="2020",
}