top of page

Unraveling the Bluetooth Enigma: Extracting the Flag from BSides TLV 2022 CTF 'Handsfree'


Bluetooth Gif

BSidesTLV is Israel's largest security research community event, part of the global Security BSides.


For the BSides TLV 2022 CTF, we at Botanica Software Labs wanted to contribute a forensics challenge that touches upon some of the real life technical challenges we encounter during our research projects.

More specifically - the Bluetooth protocol. The challenge we wrote deals with extracting actual, usable data, given a capture of Bluetooth data transmitted over the air. Eventually, the challenge was solved by one team that unfortunately did not provide a write-up of their solution. As such we wanted to provide our own write-up, written from a solvers perspective.

The challenge

First things first - let's look at the description.

I was walking by the flag factory when my handsfree headset picked something up. Seems advanced...

So we have a few pieces of information, but nothing concrete or substantial. The only additional resource we have is a pcap file. Let's open it and take a look.

pcap file content

Hmm, so we aren't looking at a run-of-the-mill network capture. Looking at the protocol heirarchy window we can see just a few protocols are present - "HCI" (H4, Event and ACL) and "L2CAP" - all under Bluetooth. There are no addresses per-se, but instead we see "remote", "localhost", "controller" and "host".

To figure out what we need to do next, we need to read up a little on these.

Bluetooth terminology

Bluetooth is an extremely contrived communications protocol. It has its own physical and link layers, and its own transports. To avoid going into too much detail here - we will describe the main elements you need to know to understand the solution to this challenge.

  1. HCI - HCI stands for host controller interface, and is in fact an internal communications bus that connects the host to the Bluetooth controller. HCI isn't actually broadcasted and carries no relevant information in this context.

  2. ACL - ACL stands for Asynchronous connection-oriented logical transport. To make things easier - we can think of it as a datagram based communication protocol, much like UDP.

  3. SCO - SCO stands for Synchronous Connection Oriented Link, and is a more reliable, symmetric link. Broadly generalized - if ACL is like UDP - SCO is like TCP.

  4. L2CAP - L2CAP stands for Logical Link Control Adaptation Protocol and is the protocol which actually carries data in this capture. You can also see this by looking at the protocol hierarchy - where the majority of content (Percent Bytes) can be seen to be contained within that layer:

Bluetooth HCI ACL Packet

So we have a capture, in which most of the data is contained in L2CAP packets. To contextualize the data, we would have to know what is being carried on the channel, which unfortunately isn't immediately obvious from looking at the data itself. More precisely, we have no idea which L2CAP service or profile is being communicated with. Is it file transfer? audio?

More recon

If we look at the description of the challenge, we see that the data was purportedly picked up by a handsfree headset. That means we're most likely dealing with audio, or some related metadata. (Note - in reality the capture was made from one of the participating endpoint, not sniffed from the air)

There isn't a lot to work with here, so let's just start by reading up a bit on Bluetooth profile - which are in a sense the applicative protocols:


... Bluetooth Profiles are defined wireless interface specifications which allows Bluetooth devices effectively communicate with each other. Ever wondered how you are able to wirelessly stream audio from your smartphone to your Bluetooth speakers... 1. Advanced Audio Distribution Profile (A2DP) Shortly known as A2DP, this profile is responsible for the transmission of high-quality stereo audio from a source (SRC) to a sink (SNK)

After reading up a little, you may come to the conclusion that A2DP (Advanced Audio Distribution Profile) is the most common way to transfer audio over Bluetooth. We may also recall the "Seems advanced..." bit of the description, which also points us in that way a little. Worth a shot!

So, lets try to decode the L2CAP packets based on this profile - by right clicking one of the packets, "Decode As > BT A2DP". A-ha! the protocol now changes to SBC:

BT A2DP decoded packet

Another way to discover this could also just be experimenting with the various decode targets until we find one that makes sense. What we can immediately see that is a major reason to think we have the correct profile is that the various layers (A2DP, RTP and then SBC) are decoded properly, and that the time and sequence fields, which we can expect to be ever incrementing - are indeed as such.

SBC

SBC stands for low-complexity subband codec - and it is indeed a very simple codec designed for freely-licensed Bluetooth audio applications. So it seems our goal is to extract the audio from this capture, and listen to it. This is easier said than done, since there does not seem to be any information whatsoever on how to do this online. Wireshark does not offer any option to extract the audio, and it's not quite clear what its format is.

We could do a deep dive into the SBC format, but after a bit of googling we can see that the SBC format is mainlined into ffmpeg. What if we take a sample .wav file and convert it to SBC, and then see how this compares to the data in the capture?

First - the ffmpeg output format

ffmpeg output format

And then the wire format:

wire format

Hmm, so the Wireshark decoder shows 8 bits of flags, followed by several "frames". Each frame has a constant sync word 0x9c.

The sync word 0x9c is clearly visible in the ffmpeg output, every few dozen bytes or so. Based on this we can deduce the ffmpeg format isn't complicated - it's just the frames appended one after the other.

Solving

So we have our capture, which is a collection of SBC frames, prefixed with a 1 byte header (for each A2DP packet). Based on what we now know about the SBC format, we can now simply extract the data from the pcap file and put it together in the correct format. One good way to extract the data is first using tshark to filter for the correct connection id, and export it to JSON, while making sure to produce raw hex output by using the -x option.

tshark extract command

Then - using Python to read the data - we can filter the SBC packets, and extract the raw frame data, omitting the header each packet:


 extracting the raw frame data

Finally, we can convert the resulting .sbc file to a .wav file using ffmpeg:

converting the resulting .sbc file to a .wav file

Upon listening to the resulting wav file, we can hear the flag!

BSidesTLV2022{th3-fl4g-15-mu5ic-t0-my-e4rz}






bottom of page