CISPA
Browse

Look Ma, No Input Samples! Mining Input Grammars from Code with Symbolic Parsing

Download (157.49 kB)
conference contribution
posted on 2024-07-25, 12:36 authored by Leon BettscheiderLeon Bettscheider, Andreas ZellerAndreas Zeller
Generating test inputs at the system level (“fuzzing”) is most effective if one has a complete specification (such as a grammar) of the input language. In the absence of a specification, all known fuzzing approaches rely on a set of input samples to infer input properties and guide test generation. If the set of inputs is incomplete, however, so will be the resulting test cases; if one has no input samples, meaningful test generation so far has been hard to impossible. In this paper, we introduce a means to determine the input language of a program from the program code alone, opening several new possibilities for comprehensive testing of a wide range of programs. Our symbolic parsing approach first transforms the program such that (1) ‍calls to parsing functions are abstracted into parsing corresponding symbolic nonterminals, and (2) ‍loops and recursions are limited such that the transformed parser then has a finite set of paths. Symbolic testing then associates each path with a sequence of symbolic nonterminals and terminals, which form a grammar. First grammars extracted from nontrivial C subjects by our prototype show very high recall and precision, enabling new levels of effectiveness, efficiency, and applicability in test generators.

History

Primary Research Area

  • Threat Detection and Defenses

Name of Conference

European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)

Page Range

522-526

Publisher

Association for Computing Machinery (ACM)

Open Access Type

  • Not Open Access

BibTeX

@conference{Bettscheider:Zeller:2024, title = "Look Ma, No Input Samples! Mining Input Grammars from Code with Symbolic Parsing", author = "Bettscheider, Leon" AND "Zeller, Andreas", year = 2024, month = 7, pages = "522--526", publisher = "Association for Computing Machinery (ACM)", doi = "10.1145/3663529.3663790" }

Usage metrics

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC