Product

Use your own audio data in NengoEdge

March 10, 2023
Trevor Bekolay

If you want to train a model to detect custom keywords, you can upload your own labelled audio datasets to NengoEdge.

Preparing your data

Labelled data in NengoEdge is stored as .wav files organized in folders, where the name of the folder is the keyword label. Place all samples associated with that label in that folder, then create a .zip or .tar.gz archive containing those folders.

Example

In this example, we will use some of the dataset collected for training the Loihi keyword spotter.

This dataset was originally organized into test and train folders, with subfolders for each speaker. Each subfolder contains .wav files with structured filenames containing the speaker ID, keyword, and time at which the sample was recorded.

Loihi dataset organization
NengoEdge handles splitting your data into testing and training sets automatically.

In order to make this dataset compatible with NengoEdge, we wrote a short Python script (organize-data.py) to copy files around into the NengoEdge format, and ensure that all wav files can be loaded correctly. Feel free to use this script as a starting point for organizing your data!

After running this script on the above data, we now have the directory structure that NengoEdge expects.

NengoEdge data file organization

We then create a compressed archive of this directory structure. Many tools will work for this, but we'll use the tar utility as it is usually available on Linux systems.

Creating a tar.gz archive of the data

We are now ready to upload this archive to NengoEdge.

Uploading your data

To upload your data, navigate to the Datasets page, accessible from the top navbar.

Datasets page

Click the Upload new data button and you will see a window pop up.

Upload new data modal

Give your data a name and select the file on your hard drive. Click the Upload button.

You will see a box with a progress bar in the bottom right of NengoEdge.

Uploading data progress bar

Once the upload is complete, your data will appear in the datasets list, though there is no dataset associated with it yet.

New data successfully uploaded

Creating new datasets from your data

Uploading raw data gives you the ability to create multiple datasets for use in training runs. To create a new dataset, click the New dataset button in the sections associated with your raw data on the Datasets page.

With the example dataset from above, we created a dataset to classify the "aloha" keyword. All keywords not selected are grouped under the "unknown" label. Notice that the sample rate and clip duration are filled in according to the data, but can be changed if desired. Similarly, you can change the percentages associated with the amount of silence, unknown samples, and samples used for validation and testing.

Configure dataset modal

The dataset you created can now be used in a run. See other tutorials like this one to see how your dataset can be used to train a model.

Similar content from the ABR blog