Exploring accidental triggers of smart speakers

Schönherr, Lea; Golla, Maximilian; Eisenhofer, Thorsten; Wiele, Jan; Kolossa, Dorothea; Holz, Thorsten

doi:10.60882/cispa.24612429.v2

Exploring accidental triggers of smart speakers

journal contribution

posted on 2024-04-29, 07:32 authored by Lea SchönherrLea Schönherr, Maximilian GollaMaximilian Golla, Thorsten Eisenhofer, Jan Wiele, Dorothea Kolossa, Thorsten HolzThorsten Holz

Voice assistants like Amazon’s Alexa, Google’s Assistant, Tencent’s Xiaowei, or Apple’s Siri, have become the primary (voice) interface in smart speakers that can be found in millions of households. For privacy reasons, these speakers analyze every sound in their environment for their respective wake word like “Alexa,” “Jiǔsì’èr líng,” or “Hey Siri,” before uploading the audio stream to the cloud for further processing. Previous work reported on examples of an inaccurate wake word detection, which can be tricked using similar words or sounds like “cocaine noodles” instead of “OK Google.” In this paper, we perform a comprehensive analysis of such accidental triggers, i. e., sounds that should not have triggered the voice assistant, but did. More specifically, we automate the process of finding accidental triggers and measure their prevalence across 11 smart speakers from 8 different manufacturers using everyday media such as TV shows, news, and other kinds of audio datasets. To systematically detect accidental triggers, we describe a method to artificially craft such triggers using a pronouncing dictionary and a weighted, phone-based Levenshtein distance. In total, we have found hundreds of accidental triggers. Moreover, we explore potential gender and language biases and analyze the reproducibility. Finally, we discuss the resulting privacy implications of accidental triggers and explore countermeasures to reduce and limit their impact on users’ privacy. To foster additional research on these sounds that mislead machine learning models, we publish a dataset of more than 350 verified triggers as a research artifact.

History

Preferred Citation

Lea Schönherr, Maximilian Golla, Thorsten Eisenhofer, Jan Wiele, Dorothea Kolossa and Thorsten Holz. Exploring accidental triggers of smart speakers. In: Computer Speech & Language. 2022.

Primary Research Area

Secure Connected and Mobile Systems

Legacy Posted Date

2021-12-14

Journal

Computer Speech & Language

Open Access Type

Unknown

Sub Type

Article

BibTeX

@article{cispa_all_3543, title = "Exploring accidental triggers of smart speakers", author = "Schönherr, Lea and Golla, Maximilian and Eisenhofer, Thorsten and Wiele, Jan and Kolossa, Dorothea and Holz, Thorsten", journal="{Computer Speech & Language}", year="2022", }

Exploring accidental triggers of smart speakers

History

Preferred Citation

Primary Research Area

Legacy Posted Date

Journal

Open Access Type

Sub Type

BibTeX

Usage metrics

Categories

Keywords

Licence

Exports