Deepspeech on Raspberry Pi
Requirements: have python3 installed with pip3
https://github.com/mozilla/DeepSpeech#using-the-python-package
Run Deepspeech with Trained Model
(use python deepspeech package)
WARNING: this model is really big: 1.6 GB; so you cannot do this on raspberry pi
Follow steps under Using Pre-trained mode on the github page (https://github.com/mozilla/DeepSpeech#using-the-python-package), using python package which are:
Make a virtual environment:
- Pip3 install virtualenv if you don’t have virtualenv python package yet (or pip) version
virtualenv -p python3 $HOME/tmp/deepspeech-venv/
- Instead of $HOME/tmp/deepspeech-venv, put the path of where you want the virtual environment to be made
- deepspeech-venv will be the name of the environment so change that if you want a different name
- Or just make a virtualenv how you normally do
Activate the virtual environment
- Now the virtual environment is created with a bin folder with activate document
source $HOME/tmp/deepspeech-venv/bin/activate
- This creates a virtual environment where you can install deepspeech related dependencies
- Now install deepspeech package on your local environment
pip3 install deepspeech
Using this: https://github.com/mozilla/DeepSpeech#getting-the-pre-trained-model, download the latest pre-trained deepspeech model: (You can use an older one if you want to)
-
Linux: run this command in the directory you want to put the file:
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.5.0/deepspeech-0.5.0-models.tar.gz
-
Others, just enter link into web browser, this will download the file. Then manually move the file to preferred directory
-
Then, unzip the file using tar command
tar xvfz deepspeech-0.5.0-models.tar.gz
-
This creates a folder, called deepspeech-0.5.0-models
- Now download an audio file you want the model to do speech to text recognition
- Put this model in the preferred directory
-
Go to the preferred directory on the command line and run this command:
deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_file.wav
-
EXCEPT: replace my_audio_file.wav with your audio file and --lm and --trie tags are optional
- Replace models with deepspeech-0.5.0-models or with the name of the folder created from the download
Making Your Own Model
Next we tried to make our own model to see if we can reduce the model size:
1.) When running on a raspberry pi, go to the "connecting to the raspberry pi" docs to connect
- You would have to scp the newly trained model to the raspberry pi assuming trained model is small enough
2.) If you want to use a GPU, follow directions from the gpu slack channel for conection
- Using steps from https://github.com/mozilla/DeepSpeech#training-your-own-model:
- Make or activate your virtualenv for deepspeech
- Git clone DeepSpeech from the github
git clone https://github.com/mozilla/DeepSpeech.git
- Install required dependencies from requirements.txt file, Run these commands
cd deepspeech
pip3 install -r requirements.txt
- If you are using gpu, use tensorflow gpu:
pip3 uninstall tensorflow
pip3 install 'tensorflow-gpu==1.13.1'
Download voice training data from common voice: https://voice.mozilla.org/en/datasets; - Download the Tatoeba dataset - Go to the link, scroll down to the Tatoeba dataset, press more, and press download - Move it to your preferrred directory - Unzip the file The data is needs to be converted wav files. The data needs to be split into train, test, and dev data 3 csv files need to be created (for each split) which stores the wav_filename, wav_filesize, and transcript - Use import.py and untilA.csv to convert MP3 to WAV file while creating train.csv, dev.csv, and test.csv (The untilA.csv file tells where all the mp3 files are located) - Put ‘import.py’ and ‘untilA.csv’ in same folder - Install pydub (pydub will help convert MP3 to WAV)
pip3 install pydub
- (Optional) apt-get install ffmpeg
- Edit import.py before you start running the code
- Change the fullpath variable to the directory that has the audio files
- For example, fullpath = ‘/home/user/Download/tatoeba_audio_eng/tatoeba_audio_eng/audio’
- Now, run import.py by
python3 import.py
- As a result, you will have the following files:
new_names.csv
train.csv
dev.csv
test.csv
‘new_names.csv’ is just a file that contains all wav file directories
- Using ./Deepspeech.py to create your own model
./DeepSpeech.py --train_files /locate/directory/here/train.csv --dev_files /locate/directory/here/dev.csv --test_files /locate/directory/here/test.csv