Justinaquino: Add Buzz meeting transcription guide + Vulkan GPU & OOM investigation (2026-04-25)

2026-04-24T18:43:50Z

Add Buzz meeting transcription guide + Vulkan GPU & OOM investigation (2026-04-25)

New page

Buzz is an open-source, offline-capable audio/meeting transcription tool powered by OpenAI Whisper (via whisper.cpp). On this system it is installed as a Flatpak and runs with Vulkan GPU acceleration on the AMD RX 7600.

== How to Transcribe a Meeting ==

=== Prerequisites ===
* Buzz installed (Flatpak: <code>io.github.chidiwilliams.Buzz</code>)
* A model downloaded — recommended: <code>ggml-large-v3-turbo</code> for quality, <code>ggml-tiny</code> for speed
* Models are stored at <code>~/.var/app/io.github.chidiwilliams.Buzz/cache/Buzz/models/</code>

=== File Transcription (Meeting Recording) ===
# Launch Buzz from the application menu or run: <code>flatpak run io.github.chidiwilliams.Buzz</code>
# Go to '''File → Import Media''' and select your meeting recording (MP3, WAV, M4A, etc.)
# In the transcription dialog:
#* '''Model''': Select <code>large-v3-turbo</code> (best quality for meetings)
#* '''Task''': Transcribe
#* '''Language''': Select your language or leave on Auto
#* '''Extract Speech''': '''Leave unchecked''' for files longer than ~30 minutes (see [[#OOM Crash with Large Files|OOM Crash]] below)
#* '''Output''': SRT, VTT, or TXT depending on your need
# Click '''Transcribe'''
# When complete, the transcript appears in the main window and is saved alongside the source file

=== Live Transcription (Real-Time) ===
# Go to '''File → Record and Transcribe'''
# Select your microphone input
# Choose model and language
# Click '''Record''' — transcription appears live on screen
# Stop recording when done; export via '''File → Export'''

=== Recommended Settings for Meetings ===
{| class="wikitable"
|-
! Setting !! Value !! Reason
|-
| Model || large-v3-turbo || Best accuracy; 1.6 GB VRAM, runs on RX 7600
|-
| Extract Speech || '''Off''' for files >30 min || Demucs uses 8–15 GB RAM for long files (OOM risk)
|-
| Language || Set explicitly || Faster than auto-detect
|-
| Task || Transcribe || Use Translate only if you need English output from another language
|}

=== Before Each Session: Clear Swap ===
Prior transcription sessions can leave stale pages in swap. Run this before starting a long transcription:
<pre>sudo swapoff -a && sudo swapon -a</pre>
This is safe when free RAM exceeds swap used (typical on this system: 15+ GB free, 8 GB swap).

== GPU Acceleration ==

Buzz's Flatpak installation includes a Vulkan-enabled <code>whisper-cli</code>. On this system, GPU inference is automatically active — no configuration needed.

* GPU in use: '''AMD RX 7600 (RDNA3, RADV NAVI33)'''
* Confirmed by: <code>whisper_backend_init_gpu: using Vulkan0 backend</code> in runtime output
* Buzz checks for Vulkan at startup (<code>IS_VULKAN_SUPPORTED</code>) and omits the <code>--no-gpu</code> flag when found

To force CPU inference (for debugging):
<pre>flatpak override --user io.github.chidiwilliams.Buzz --env=BUZZ_FORCE_CPU=true</pre>

To revert:
<pre>flatpak override --user --reset io.github.chidiwilliams.Buzz</pre>

== OOM Crash with Large Files ==

=== Symptom ===
When selecting <code>ggml-large-v3-turbo</code> in Buzz and starting a transcription on a long file, the application crashes. VRAM usage never visibly climbs before the crash.

=== Root Cause ===
The crash is caused by the '''Extract Speech (Demucs)''' pre-processing step, not the large model.

Demucs is a PyTorch-based music source separation model that strips background noise from audio before passing it to the transcription engine. It processes raw PCM audio at full float32 precision entirely in RAM:

{| class="wikitable"
|-
! Audio Duration !! Compressed Size !! RAM Required (Demucs)
|-
| 1 hour MP3 || ~60 MB || 500 MB – 1 GB
|-
| 3–4 hour session || ~220 MB || 8–15 GB peak
|}

The combination that caused the crash:
# Swap already exhausted from previous sessions
# Extract Speech (Demucs) triggered on a ~220 MB MP3 (~3–4 hours of audio)
# Python RAM usage exceeded available RAM + swap ceiling
# OOM killer terminated the process at 22 GB RSS
# <code>whisper-cli</code> was never launched → no VRAM activity observed

=== Why VRAM Never Climbed ===
<code>whisper-cli</code> runs as a subprocess of the Python GUI. The OOM kill happened during Demucs pre-processing — <code>whisper-cli</code> was never started, so no VRAM allocation occurred.

From the Buzz log at crash time:
<pre>~/.var/app/io.github.chidiwilliams.Buzz/.local/state/Buzz/log/logs.txt</pre>
The last entry was <code>Will extract speech</code> — the log never reached <code>Starting whisper file transcription</code>.

=== Fix ===
'''Uncheck "Extract Speech"''' in the Buzz transcription dialog for any file longer than ~30 minutes. The <code>large-v3-turbo</code> model has built-in noise tolerance that makes Demucs pre-processing unnecessary in most cases.

Optionally, grow the swapfile if you need Extract Speech for shorter files:
<pre>sudo swapoff /swap.img
sudo fallocate -l 16G /swap.img
sudo mkswap /swap.img
sudo swapon /swap.img</pre>

== Environment Variables ==

{| class="wikitable"
|-
! Variable !! Default !! Effect
|-
| <code>BUZZ_FORCE_CPU</code> || false || Set to <code>true</code> to disable GPU inference
|-
| <code>BUZZ_WHISPERCPP_N_THREADS</code> || cpu_count / 2 || Override thread count for whisper-cli
|}

Set via: <code>flatpak override --user io.github.chidiwilliams.Buzz --env=VAR=value</code>

== Key File Paths ==

{| class="wikitable"
|-
! Path !! Description
|-
| <code>/var/lib/flatpak/app/io.github.chidiwilliams.Buzz/.../buzz/whisper_cpp/whisper-cli</code> || Bundled Vulkan whisper-cli (libwhisper 1.8.3)
|-
| <code>~/.var/app/io.github.chidiwilliams.Buzz/cache/Buzz/models/</code> || Buzz model cache (ggml binaries)
|-
| <code>~/.var/app/io.github.chidiwilliams.Buzz/.local/state/Buzz/log/logs.txt</code> || Buzz application log
|-
| <code>~/.var/app/io.github.chidiwilliams.Buzz/config/Buzz.conf</code> || Buzz settings
|-
| <code>~/whisper.cpp/build/bin/whisper-cli</code> || Custom Vulkan-enabled whisper.cpp build
|-
| <code>~/bin/whisper-cli-vulkan</code> || Wrapper script for custom build (sets <code>LD_LIBRARY_PATH</code>)
|-
| <code>~/.local/share/pipx/venvs/buzz-captions/lib/python3.12/site-packages/buzz/whisper_cpp/</code> || pipx install whisper_cpp directory (directly modifiable)
|}

== Investigation Summary ==

{| class="wikitable"
|-
! Question !! Finding
|-
| Is Buzz using the RX 7600? || '''Yes.''' <code>using Vulkan0 backend</code> confirmed inside the Flatpak sandbox.
|-
| Was <code>--no-gpu</code> being added? || '''No.''' <code>IS_VULKAN_SUPPORTED = True</code> in the sandbox; flag is never appended.
|-
| Does the large model load correctly? || '''Yes.''' 1.6 GB to VRAM, transcribes in ~1.25s on a short clip.
|-
| What caused the crash? || '''Extract Speech (Demucs) OOM.''' Python killed at 22 GB RSS before whisper-cli launched.
|-
| Why was VRAM not climbing? || <code>whisper-cli</code> was never started — process died during Demucs pre-processing.
|-
| Fix? || '''Uncheck Extract Speech''' for long files. Clear swap before sessions.
|}

''Investigation date: 2026-04-25. System: Ubuntu 24.04, AMD RX 7600, Buzz 1.4.4 (Flatpak).''

Buzz Transcription - Revision history

Justinaquino: Add Buzz meeting transcription guide + Vulkan GPU & OOM investigation (2026-04-25)