CISPA
Browse

SoK: State of the Krawlers - Evaluating the Effectiveness of Crawling Algorithms for Web Security Measurements

Download (634.67 kB)
conference contribution
posted on 2024-03-26, 14:01 authored by Aleksei StafeevAleksei Stafeev, Giancarlo PellegrinoGiancarlo Pellegrino

Web crawlers are tools widely used in web security measurements whose performance and impact have been limitedly studied so far. In this paper, we bridge this gap. Starting from the past 12 years of the top security, web measurement, and software engineering literature, we categorize and decompose in building blocks crawling techniques and methodologic choices. We then reimplement and patch crawling techniques and integrate them into Arachnarium, a framework for comparative evaluations, which we use to run one of the most comprehensive experimental evaluations against nine real and two benchmark web applications and top 10K CrUX websites to assess the performance and adequacy of algorithms across three metrics (code, link, and JavaScript source coverage). Finally, we distill 14 insights and lessons learned. Our results show that despite a lack of clear and homogeneous descriptions hindering reimplementations, proposed and commonly used crawling algorithms offer a lower coverage than randomized ones, indicating room for improvement. Also, our results show a complex relationship between experiment parameters, the study's domain, and the available computing resources, where no single best-performing crawler configuration exists. We hope our results will guide future researchers when setting up their studies.

Funding

TEStabiliTy pAttern-driven weB appLication sEcurity and privacy testing

European Commission

Find out more...

Semantic Models and Agents for Security Testing of WebApplications

Deutsche Forschungsgemeinschaft

Find out more...

History

Primary Research Area

  • Empirical and Behavioral Security

Name of Conference

Usenix Security Symposium (USENIX-Security)

Usage metrics

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC