[PDF] Engineering Degree Project Real-time Audio Classification on an PDF

auDeep is a Python toolkit for deep unsupervised representation learning from acoustic competitive with state-of-the art audio classification.

pyAudioProcessing: Audio Processing Feature Extraction

https://conference.scipy.org/proceedings/scipy2022/pdfs/jyotika_singh.pdf

Apprentissage de descripteurs audio par Deep learning application

Résumé. Ce rapport de fin de stage vise à explorer l'utilisation de réseaux de neurones profonds à la classification en genre musical.

Masters Thesis

24 juin 2019 Learn about Deep Learning audio classification methods. ... Learning and how it works. The system is developed in Python and using PyTorch.

auDeep: Unsupervised Learning of Representations from Audio with

22 déc. 2017 competitive with state-of-the art audio classification. Keywords: deep feature learning sequence to sequence learning

DCASE-MODELS: A PYTHON LIBRARY FOR COMPUTATIONAL

2 nov. 2020 Detection and Classification of Acoustic Scenes and Events 2020 ... Index Terms— Python library deep learning

Urban Sound Event Classification for Audio-Based Surveillance

The machine learning algorithms used are Logistic Regression Support Vector Machines

A Robust Approach for Securing Audio Classification Against

Environmental sound classification has been a challenging problem in machine learning research. [5]. Both shallow and deep neural networks (DNNs) have.

Deep Learning Based Audio Classifier for Bird Species

2013 and the Machine Learning for Signal Processing (MLSP) 2013 Bird Classification Simple Minded Audio Classifier in Python (SMACPY) train the set of ...

Classification audio : Classification des sons environnementaux

Termes de l'index— Classification audio classification des sons environnementaux

(PDF) Sound Classification Using Python - ResearchGate

7 mar 2023 · We are going to work on it using python programming language and some deep learning techniques It's a basic model that we are trying to develop

[PDF] Sound Classification Using Python - ITM Web of Conferences

It is a very difficult task to recognize audio or sound events systematically and work on it for identification and give output We are going to work on it

[PDF] Engineering Degree Project Real-time Audio Classification on an

Training machine learning models to detect the sound of gunshots human speech and glass shattering 2 Optimize Deploy models onto the edge device (Jetson

[PDF] Audio classification with deep learning on limited data sets

The main aim of this work is to research new approaches to deep-learning-based pre- dictive modeling using limited audio data sets focusing especially on voice

[PDF] Sound Classification and Processing of Urban Environments - MDPI

8 nov 2022 · Keywords: audio classification; audio processing; deep learning; Convolutional Neural Networks; Transformers; attention mechanisms

[PDF] auDeep: Unsupervised Learning of Representations from Audio with

auDeep is a Python toolkit for deep unsupervised representation learning from acoustic data It is based on a recurrent sequence to sequence autoencoder

[PDF] Using Transfer Learning Spectrogram Audio Classification and MIT

27 juil 2020 · focuses on applying transfer learning and spectrogram audio classification methods to teach basic machine learning concepts to students

[PDF] A Classical Machine Learning Multi-Classifier Based Approach

10 sept 2021 · In this paper a classical machine learning based classifier called MosAIc and a lighter Convolutional Neural Network model for environmental

Build a Deep Audio Classifier with Python and Tensorflow - YouTube

15 avr 2022 · In this tutorial you'll learn how to build a Deep Audio Classification model with Tensorflow and Durée : 1:17:11Postée : 15 avr 2022

[PDF] Audio Event Classification using Deep Learning in an End-to-End

16 jui 2017 · The goal of the master thesis is to study the task of Sound Event Classification using Deep Neural Networks in an end- to-end approach

Which algorithm is best for audio classification?
Data preprocessing
To extract the features, we will be using the Mel-Frequency Cepstral Coefficients (MFCC) algorithm. This algorithm has been widely used in automatic speech and speaker recognition since the 1980s.
What is audio classification in ML?
Audio Classification is a machine learning task that involves identifying and tagging audio signals into different classes or categories. The goal of audio classification is to enable machines to automatically recognize and distinguish between different types of audio, such as music, speech, and environmental sounds.
Which deep learning model is best for audio classification?
MFCCs – The MFCC summarizes the frequency distribution across the window size. So, it is possible to analyze both the frequency and time characteristics of the sound. This audio representation will allow us to identify features for classification.
Audio classifications can be of multiple types and forms such as — Acoustic Data Classification or acoustic event detection, Music classification, Natural Language Classification, and Environmental Sound Classification.

Author:Christoffer Malmberg

Supervisor:David Radszuweit

Lnu Supervisor:Tobias Ohlsson

Semester:Spring 2021

Subject:Computer ScienceEngineering Degree Project

Real-time Audio Classification on

an Edge Device - Using YAMNet and TensorFlow Lite

Abstract

Edge computing is the idea of moving computations away from the cloud and instead perform them at the edge of the network. The benefits of edge computing are reduced latency, increased integrity, and less strain on networks. Edge AI is the prac- tice of deploying machine learning algorithms to perform computations on the edge. In this project, a pre-trained model YAMNet is retrained and used to perform audio classification in real-time to detect gunshots, glass shattering, and speech. The model is deployed onto the edge device both as a full TensorFlow model and as TensorFlow Lite models. Comparing results of accuracy, inference time, and memory allocation for full TensorFlow and TensorFlow Lite models with and without optimization. Re- sults from this research were that it was a valid option to use both TensorFlow and TensorFlow Lite but there was a lot of performance to gain by using TensorFlow Lite with little downside. Keywords: AudioEventclassification, edgedeviceaudioclassification, YAM-

Net, TensorFlow Lite comparison

1 Introduction

1.1 Background

1.2 Problem Statement

1.3 Motivation

1.4 Scope/Limitation

1.5 Target Group

1.6 Outline

2 Theory

2.1 Edge Computing

2.2 Convolutional Neural Network (CNN)

2.3 Depthwise Separable Convolution

2.4 YAMNet

2.5 Data Pre-Processing

2.6 TensorFlow And TensorFlow Lite

2.7 Post-training Quantization In TensorFlow

2.8 Model Evaluation

2.8.1 Model Accuracy

2.8.2 Inference Time

2.8.3 Memory Allocation For Loading Model

3 Related Work

4 Method

4.1 Research Project

4.2 Literature Review

4.3 Controlled Experiment / Model Evaluation

4.4 Dataset

4.5 Reliability And Validity

4.6 Ethical Considerations

5 Choices And Implementation

5.1 Technology choices

5.2 Implementation

6 Experimental Setup And Results

6.1 Accuracy Experiment

6.2 RAM Usage Model Loading

6.3 Real-Time Inference Experiment

6.4 Loading Time For Models

6.5 RAM Usage For TensorFlow Package Loading

7 Analysis

8 Discussion

9 Conclusion

9.1 Future Work

References26

A Appendix

1 Introduction

1.1 Background

Public safety and security are and will always be of importance for our society, today there is a range of different surveillance systems available for example security cameras and motion sensors. Cloud computing has been a great solution to aid the lack of com- puting power provided by Internet of Things(IoT) devices. However, as more and more IoT devices are added and connected to networks every year, and a few years forward there might have billions of these devices running [ 1 ]. Since machine learning is quite computational heavy these devices often rely on cloud services to be able to perform the machine learning (ML) computations which can result in a few problems such as latency, safety, and privacy concerns of the transmitted data and network costs. Edge devices can help solve these concerns, an edge device handles the computations for ML models on the device itself instead of relying on external sources to handle it for them [ 2 ]. Products like Nvidia Jetson and Google Coral and frameworks like TensorFlow & TensorFlow Lite open the doors to more easily develop and deploy machine learning models and handle computation on the edge. This project is done in collaboration with a company named OCCDEC, which is a start up company based in Kalmar Sweden aiming to develop an audio classification security system using machine learning on edge devices. The fo- cus of this project is therefor to use ML for recognizing different sounds that could be interesting for surveillance purposes, these sounds could be gunshots, glass shattering, and people talking. These trained models should then be able to run efficiently on edge devices without losing out on a lot of accuracy.

1.2 Problem Statement

The collaborating company have a few goals set for this project. Since the model is sup- posed to later run in real-time the inference time of the model needs to be as low as possible and for this project, the goals set by the company is to achieve an inference time of under 0.2 seconds. The model should also be able to achieve an accuracy of at least 70%.
This project ends up having the following problem statement. Can a Jetson Nano trained to recognise gunshots, speech and glass shattering be able to achieve an accuracy of 70% and an interference time of under 0.2 seconds? The expected outcome of this research will be having a model which can be deployed onto the Jetson Nano and perform audio recognition in real-time without the support of any cloud services. The edge device is also expected to be able to run inference through the model in under 0.2 seconds while getting real-time sound input from a microphone connected to the Jetson Nano. And lastly, the trained model should be able to achieve an accuracy of at least 70% while making predictions on a test dataset containing data that the model has never seen before during training. The expected goals for this project are summarized below. 1. T rainingmachine learning models to detect the sound of gunshots, human speech, and glass shattering. 2. Optimize & Deplo ymodels onto the edge de vice(Jetson Nano) and mak esure it achieves an accuracy of 70% when running real-time audio detection on the edge device. 1

3.Edge de viceshould achie vean inference time of under 0.2 seconds to be meet the

requirements for real-time audio classification.

1.3 Motivation

This study could help develop AI security devices using sound surveillance, since edge development does not rely on cloud computing data capture on the device will not leave increasing both security and integrity of people in society. This device could also notify authorities faster than humans would be able to as well as be placed in areas where a lot of people might not be located to increase surveillance in those areas. This research could also increase knowledge of deploying deep learning models onto edge devices, which would aid "smart" device development.

MilestonesM1Study audio recognition techniques

M1.1Different models available

M1.2Determine best solutions for edge device

M2Setup and prepare for training

M2.1Gather information on how to train selected models or own model for sound recognitionM3Train and evaluate models

M3.1Start training models

M3.2Evaluate the trained models to make sure they are running fine

M3.3Gather information about model conversion

M4Convert and setup models on edge device

M4.1Setup the Jetson Nano

M4.2Convert model to run on edge device

M4.3Write script to run inference, deploy model onto Jetson Nano and evaluate2

1.4 Scope/Limitation

For this study there are the following limitations: TensorFlow will be the machine learn- ing library of choice. The choice of this library is influenced by the company which this project collaborates with, as the company is currently working with TensorFlow. Jetson Nano will be the device the model is deployed to as this is the hardware which the com- pany is going to use to run the models. Since this project focuses on a deep learning model running on the edge device, the project will compare different variations of the trained model. Such as the full TensorFlow model, a TensorFlow Lite model which does not utilize the optimization features of TensorFlow Lite, a TensorFlow Lite model which utilizes float 16 quantization, and a TensorFlow Lite model which utilizes dynamic quan- tization. The following choices of models/optimizations are chosen to be able to compare trade-offs and benefits of using lighter models compared to the full models in the areas of accuracy, memory usage, and inference time.

1.5 Target Group

Groups that can be interested in this project might be people interested in AI secu- rity/surveillance, developers who are developing devices that will use audio recognition on the edge. This is not limited to security projects but can be helpful to other areas of edge audio recognition where a custom data set is wanted for use. The research can also be interesting for groups using or planning to use TensorFlow Lite and its optimization. To check trade-offs for the different optimization"s and which one can fit other projects.

1.6 Outline

For this project the outline is the following: the report starts off with Chapter 2 explaining different theories used to be able to perform this project as well as other theories needed to properly understand the results of this work. In Chapter 3 previous work which is related to this project is presented and summarized. Chapter 4 consists of explanations of all the methods this project is using to be able to gather information and perform tests for the solutions to the problem statement. In the 5th Chapter the choices for the project is discussed as well as how the implementation of the system is performed. Lastly Chapter

6 presents the results for the project, while Chapter 8 discusses the results and Chapter 9

draws a conclusion and discusses what future work can be performed. 3

2 Theory

2.1 Edge Computing

Edge computing is the practice of moving computations that would normally be per- formed on the cloud closer to the edge of the network onto edge devices. Edge devices could for example be local servers or a Single Board Computer (SBC) like the raspberryquotesdbs_dbs19.pdfusesText_25

[PDF] [PDF] Engineering Degree Project Real-time Audio Classification on an

Which algorithm is best for audio classification?

What is audio classification in ML?

Which deep learning model is best for audio classification?

Author:Christoffer Malmberg

Supervisor:David Radszuweit

Lnu Supervisor:Tobias Ohlsson

Semester:Spring 2021

Real-time Audio Classification on

Abstract

Net, TensorFlow Lite comparison

Contents

1 Introduction

1.1 Background

1.2 Problem Statement

1.3 Motivation

1.4 Scope/Limitation

1.5 Target Group

1.6 Outline

2 Theory

2.1 Edge Computing

2.2 Convolutional Neural Network (CNN)

2.3 Depthwise Separable Convolution

2.4 YAMNet

2.5 Data Pre-Processing

2.6 TensorFlow And TensorFlow Lite

2.7 Post-training Quantization In TensorFlow

2.8 Model Evaluation

2.8.1 Model Accuracy

2.8.2 Inference Time

2.8.3 Memory Allocation For Loading Model

3 Related Work

4 Method

4.1 Research Project

4.2 Literature Review

4.3 Controlled Experiment / Model Evaluation

4.4 Dataset

4.5 Reliability And Validity

4.6 Ethical Considerations

5 Choices And Implementation

5.1 Technology choices

5.2 Implementation

6 Experimental Setup And Results

6.1 Accuracy Experiment

6.2 RAM Usage Model Loading

6.3 Real-Time Inference Experiment

6.4 Loading Time For Models

6.5 RAM Usage For TensorFlow Package Loading

7 Analysis

8 Discussion

9 Conclusion

9.1 Future Work

References26

A Appendix

1 Introduction

1.1 Background

1.2 Problem Statement

3.Edge de viceshould achie vean inference time of under 0.2 seconds to be meet the

1.3 Motivation

MilestonesM1Study audio recognition techniques

M1.1Different models available

M1.2Determine best solutions for edge device

M2Setup and prepare for training

M3.1Start training models

M3.3Gather information about model conversion

M4Convert and setup models on edge device

M4.1Setup the Jetson Nano

M4.2Convert model to run on edge device

1.4 Scope/Limitation

1.5 Target Group

1.6 Outline

6 presents the results for the project, while Chapter 8 discusses the results and Chapter 9

2 Theory

2.1 Edge Computing