posted on 2024-07-17, 11:28authored byTural MammadovTural Mammadov, Dietrich Klakow, Alexander Koller, Andreas Zeller
We introduce Modelizer—a novel framework that, given a black-box program, learns a model from its input/output behavior using neural machine translation. The resulting model mocks the original program: Given an input, the model predicts the output that would have been produced by the program. However, the model is also reversible—that is, the model can predict the input that would have produced a given output. Finally, the model is differentiable and can be efficiently restricted to predict only a certain aspect of the program behavior. Modelizer uses grammars to synthesize inputs and to parse the resulting outputs, allowing it to learn sequence-to-sequence associations between token streams. Other than input and output grammars, Modelizer only requires the ability to execute the program. The resulting models are small, requiring fewer than 6.3 million parameters for languages such as Markdown or HTML; and they are accurate, achieving up to 95.4% accuracy and a BLEU score of 0.98 with standard error 0.04 in mocking real-world applications. We foresee several applications of these models, especially as the output of the program can be any aspect of program behavior. Besides mocking and predicting program behavior, the model can also synthesize inputs that are likely to produce a particular behavior, such as failures or coverage.
History
Primary Research Area
Threat Detection and Defenses
BibTeX
@misc{Mammadov:Klakow:Koller:Zeller:2024,
title = "Learning Program Behavioral Models from Synthesized Input-Output Pairs",
author = "Mammadov, Tural" AND "Klakow, Dietrich" AND "Koller, Alexander" AND "Zeller, Andreas",
year = 2024,
month = 7,
doi = "10.48550/arXiv.2407.08597"
}