![]() That was my adventure running DeepSpeech on Windows. I do need to write a better sample snippet and explain it in a future blog post, because taking one giant file and waiting for it to complete is not really sustainable, but it works for now. ![]() With the GPU (NVIDIA GeForce RTX 2080 SUPER) library - 27 minutes. With the CPU (AMD Ryzen 9 3900XT 12-Core Processor) library, it took about 24 minutes to generate the transcript. ![]() ![]() Using DeepSpeechClient using System using System.IO using namespace ds_dotnet When I fed my WAV file through DeepSpeech, as follows: Make sure that the right version of CUDA and the associated CuDNN are installed.Įasy, right? Or so I thought.Install the deepspeech-gpu package (if you don’t have a beefy GPU, no worries - just use deepspeech).It all starts pretty trivially, as outlined in the official instructions: My primary machine is no longer UNIX-based, so I had a personal interest in getting it working properly - I could finally put my RTX 2080 to good use. As I started exploring the library, I realized that it had Windows builds, but no concrete instructions on how to get things running on the OS. ![]() More than that, it comes with a pre-trained English speech model that you can start using right away. One of the choices for STT might be DeepSpeech - a library developed by Mozilla that does just that. I have a podcast, that I want to transcribe and generate captions for, and I wanted to do that blazingly fast. You might have many reason to do speech-to-text (STT) transformations locally - privacy, you have custom-trained models, or maybe you just don’t need the latency that comes with online services. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |