Transforming Video and Audio Files for Qlik Answers

If you've been following my articles and videos here at Data Voyagers or catching up with my posts on LinkedIn, you know that lately I've been on a mission to test the boundaries of what we can achieve with Qlik Answers. Since my first impressions of Qlik Answers, I've been pushing its limits—teaching my Qlik AI Assistant to program in R and Python and even finding ways to use the Force (yes, that Force) to turn the impossible into possible.

Typically, Qlik Answers is introduced as an AI chatbot that answers questions based on unstructured data. While I'm still conducting more complex and advanced tests, I've successfully managed to read tables (structured data) in Qlik Answers, as I detailed in my previous article. However, when we talk about unstructured data, we shouldn't just consider textual files like TXT, PDF, DOCX, MD, HTML, and others. Video and audio files are treasure troves of knowledge, and while Qlik Answers doesn't currently support them directly, we shouldn't overlook their potential.

That's when I donned my Data Explorer hat and developed a solution to bridge this gap. In this article, I'll share how I made the impossible possible by transforming audio and video files into formats that Qlik Answers can digest.

Making the Impossible Possible: How did I do it?

The goal was to extract textual data from audio and video files and make it accessible to Qlik Answers. Check out how I did. I am not overwhelming you with the overly technical details. That I will do in a video soon.

1. Making the Video Files Accessible

First things first, we need to have our video files ready for processing. If your files are stored externally, like in an AWS S3 bucket, Google Drive, Azure, or even on a company network drive, it's best to download or copy them to a local or nearby location. This ensures faster processing and avoids any network latency issues.

2. Extracting Audio from Video

Since we're dealing with video files, the next step is to extract the audio component. If you're processing audio files, you can skip this step. I wrote a Python script using the MoviePy library—a free and relatively fast tool—to extract the audio from the video files. MoviePy makes it easy to handle multimedia files, and it's perfect for this task.

3. Saving Audio as WAV Files

Once we have the audio, we need to save it in a format suitable for speech recognition. Saving the audio as a WAV file is straightforward in most cases. However, if the audio is too long, you might encounter errors. To overcome this, I split the audio into 30-second chunks and processed each chunk separately before saving them all into a single WAV file. During this phase, it's also beneficial to simplify the audio by converting it to 16kHz Mono, which helps the speech recognition neural network perform better.

4. Extracting Text from Audio

Now comes the exciting part—transcribing the audio into text. This is a speech recognition task, and there are multiple ways to achieve it. If you browse through Hugging Face, you'll find numerous models that can do just this. My go-to option is the Google Speech Recognition library. It's fast, reliable, and free! However, it involves sending a RESTful API request to a Google server, which might not be ideal for all users, especially those with strict data privacy requirements.

For those who prefer to run everything locally, I also implemented an option using Vosk—a free and nearly as reliable alternative, albeit slightly slower. Vosk allows you to download neural network models and run speech recognition offline. In my solution, I've included both options, and you can choose which one to use by setting a simple flag variable in the script.

5. Saving and Uploading the Text

With the transcription complete, the next step is to save the extracted text. You can choose any format you like—PDF, Markdown, or even a simple TXT file, which is what I opted for in this case. You can also include references or metadata within the text if you wish.

Once saved, upload the text files to a location accessible by your Qlik Cloud environment. I chose an AWS S3 bucket, but you could use Google Drive, Azure, Dropbox, or even save them directly in a Qlik Cloud Space.

To automate the process, schedule the Python script to run at your desired frequency. For me, running it once a week is sufficient.

Integrating with Qlik Cloud

The work in Qlik Cloud is the easiest part. Here's how to set it up:

Create a Knowledge Base: Set up a new Knowledge Base in Qlik Answers for your transcriptions.

Connect Your Data: When choosing how to feed data to the Knowledge Base, use a data connection. This ensures that any new files added will be processed automatically based on your indexing schedule.

Image 03: Define a Connection for the Knowledge Base

Configure the Connection: In my example, I created an S3 connection and selected the dynamic definition of files. Save the configuration and set up an indexing schedule.

Create an Assistant: With your Knowledge Base ready and data connection configured, simply create an assistant from that Knowledge Base.

And that's it! Your solution is complete.

Conclusion

By converting audio and video files into text, we've unlocked a wealth of unstructured data that was previously inaccessible to Qlik Answers. This approach not only broadens the horizons of what we can achieve with AI chatbots but also reinforces the idea that with a bit of creativity and the right tools, we can turn the impossible into possible.

For those interested in the technical details, I'll be sharing a video at the end of this article where I dive deeper into the code and the process. Feel free to check it out if you're eager to get your hands dirty!

Stay tuned for more insights and adventures in data exploration. Until next time, keep pushing the boundaries!