Audio Activity Recognition System Real-time Audio Classification with Machine Learning
A comprehensive real-time audio activity recognition system using classical machine learning (Random Forest) and deep learning models. Features live inference and visualization of audio activities from a microphone with modular code for feature extraction, model training, and extensibility.
System Capabilities
Real-time Audio Classification
Pre-trained Random Forest model for instant audio activity recognition with live microphone input processing.
Live Waveform Visualization
Real-time display of audio waveforms with prediction results, confidence scores, and latency metrics.
Modular Feature Extraction
Comprehensive feature extraction pipeline including FFT, MFCC, RMS, Mel spectrogram, and VGGish features.
Microphone Management
Intelligent microphone selection and management with support for multiple audio input devices.
Deep Learning Ready
Extensible architecture supporting advanced models like Wav2Vec2 and VGGish for enhanced recognition.
High Performance
Optimized for low-latency real-time processing with efficient feature extraction and model inference.
System Architecture
Audio Input
Real-time microphone capture using PyAudio with configurable sample rates and buffer sizes.
Feature Extraction
Multi-modal feature extraction including FFT, MFCC, RMS, and Mel spectrogram analysis.
Model Inference
Random Forest classification with confidence scoring and prediction latency tracking.
Visualization
Tkinter/Matplotlib GUI displaying live waveforms, predictions, and system metrics.
Core Modules
final_rf_featurse.py
Main entry point handling audio streaming, feature extraction, model inference, and visualization coordination.
rf_features.py
Feature extraction utilities for FFT, MFCC, and RMS analysis from audio input streams.
visualizer2.py
Tkinter/Matplotlib GUI implementation for real-time waveform and prediction display.
microphones.py
Microphone selection utility for listing and managing available audio input devices.
models/
Model definitions including Wav2Vec2-based classifier, pre-trained Random Forest, and feature extraction modules.
ubicoustics/
Label definitions, context mappings, and helper functions for advanced audio recognition tasks.
Technical Implementation
Audio Processing
Real-time audio capture and processing with configurable sample rates, buffer management, and noise reduction.
Feature Engineering
Comprehensive feature extraction pipeline including spectral analysis, statistical measures, and frequency domain features.
Machine Learning
Random Forest classification with support for deep learning models like Wav2Vec2 and VGGish for enhanced recognition.
User Interface
Real-time visualization using Tkinter and Matplotlib with live waveform display and prediction metrics.
Setup & Usage
Installation
# Install dependencies (Python 3.8+ recommended)
pip install -r requirements.txt
# Or for alternate versions:
pip install -r requirements2.txt
# Ensure working microphone and PyAudio drivers
Usage
# Run the main script for real-time audio recognition
python final_rf_featurse.py
# System will prompt for microphone selection
# Displays live waveform with predictions and confidence
Extensibility & Future Enhancements
System Extensions
- Add new models to models/ directory
- Update main script to integrate new classification algorithms
- Extend feature extraction in rf_features.py
- Use ubicoustics/ for advanced label sets
- Implement context-aware recognition systems
Planned Enhancements
- Support for additional deep learning models (Wav2Vec2, VGGish)
- Enhanced real-time visualization with multiple views
- Advanced audio preprocessing and noise reduction
- Multi-channel audio support for complex environments
- Cloud-based model deployment and inference
Interested in this project?
Feel free to reach out to discuss collaboration opportunities or ask any questions about the audio recognition system.