Jump to content

Buzz Transcription

From MediawikiCIT
Revision as of 18:43, 24 April 2026 by Justinaquino (talk | contribs) (Add Buzz meeting transcription guide + Vulkan GPU & OOM investigation (2026-04-25))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Buzz is an open-source, offline-capable audio/meeting transcription tool powered by OpenAI Whisper (via whisper.cpp). On this system it is installed as a Flatpak and runs with Vulkan GPU acceleration on the AMD RX 7600.

How to Transcribe a Meeting

Prerequisites

  • Buzz installed (Flatpak: io.github.chidiwilliams.Buzz)
  • A model downloaded — recommended: ggml-large-v3-turbo for quality, ggml-tiny for speed
  • Models are stored at ~/.var/app/io.github.chidiwilliams.Buzz/cache/Buzz/models/

File Transcription (Meeting Recording)

  1. Launch Buzz from the application menu or run: flatpak run io.github.chidiwilliams.Buzz
  2. Go to File → Import Media and select your meeting recording (MP3, WAV, M4A, etc.)
  3. In the transcription dialog:
    • Model: Select large-v3-turbo (best quality for meetings)
    • Task: Transcribe
    • Language: Select your language or leave on Auto
    • Extract Speech: Leave unchecked for files longer than ~30 minutes (see OOM Crash below)
    • Output: SRT, VTT, or TXT depending on your need
  4. Click Transcribe
  5. When complete, the transcript appears in the main window and is saved alongside the source file

Live Transcription (Real-Time)

  1. Go to File → Record and Transcribe
  2. Select your microphone input
  3. Choose model and language
  4. Click Record — transcription appears live on screen
  5. Stop recording when done; export via File → Export
Setting Value Reason
Model large-v3-turbo Best accuracy; 1.6 GB VRAM, runs on RX 7600
Extract Speech Off for files >30 min Demucs uses 8–15 GB RAM for long files (OOM risk)
Language Set explicitly Faster than auto-detect
Task Transcribe Use Translate only if you need English output from another language

Before Each Session: Clear Swap

Prior transcription sessions can leave stale pages in swap. Run this before starting a long transcription:

sudo swapoff -a && sudo swapon -a

This is safe when free RAM exceeds swap used (typical on this system: 15+ GB free, 8 GB swap).

GPU Acceleration

Buzz's Flatpak installation includes a Vulkan-enabled whisper-cli. On this system, GPU inference is automatically active — no configuration needed.

  • GPU in use: AMD RX 7600 (RDNA3, RADV NAVI33)
  • Confirmed by: whisper_backend_init_gpu: using Vulkan0 backend in runtime output
  • Buzz checks for Vulkan at startup (IS_VULKAN_SUPPORTED) and omits the --no-gpu flag when found

To force CPU inference (for debugging):

flatpak override --user io.github.chidiwilliams.Buzz --env=BUZZ_FORCE_CPU=true

To revert:

flatpak override --user --reset io.github.chidiwilliams.Buzz

OOM Crash with Large Files

Symptom

When selecting ggml-large-v3-turbo in Buzz and starting a transcription on a long file, the application crashes. VRAM usage never visibly climbs before the crash.

Root Cause

The crash is caused by the Extract Speech (Demucs) pre-processing step, not the large model.

Demucs is a PyTorch-based music source separation model that strips background noise from audio before passing it to the transcription engine. It processes raw PCM audio at full float32 precision entirely in RAM:

Audio Duration Compressed Size RAM Required (Demucs)
1 hour MP3 ~60 MB 500 MB – 1 GB
3–4 hour session ~220 MB 8–15 GB peak

The combination that caused the crash:

  1. Swap already exhausted from previous sessions
  2. Extract Speech (Demucs) triggered on a ~220 MB MP3 (~3–4 hours of audio)
  3. Python RAM usage exceeded available RAM + swap ceiling
  4. OOM killer terminated the process at 22 GB RSS
  5. whisper-cli was never launched → no VRAM activity observed

Why VRAM Never Climbed

whisper-cli runs as a subprocess of the Python GUI. The OOM kill happened during Demucs pre-processing — whisper-cli was never started, so no VRAM allocation occurred.

From the Buzz log at crash time:

~/.var/app/io.github.chidiwilliams.Buzz/.local/state/Buzz/log/logs.txt

The last entry was Will extract speech — the log never reached Starting whisper file transcription.

Fix

Uncheck "Extract Speech" in the Buzz transcription dialog for any file longer than ~30 minutes. The large-v3-turbo model has built-in noise tolerance that makes Demucs pre-processing unnecessary in most cases.

Optionally, grow the swapfile if you need Extract Speech for shorter files:

sudo swapoff /swap.img
sudo fallocate -l 16G /swap.img
sudo mkswap /swap.img
sudo swapon /swap.img

Environment Variables

Variable Default Effect
BUZZ_FORCE_CPU false Set to true to disable GPU inference
BUZZ_WHISPERCPP_N_THREADS cpu_count / 2 Override thread count for whisper-cli

Set via: flatpak override --user io.github.chidiwilliams.Buzz --env=VAR=value

Key File Paths

Path Description
/var/lib/flatpak/app/io.github.chidiwilliams.Buzz/.../buzz/whisper_cpp/whisper-cli Bundled Vulkan whisper-cli (libwhisper 1.8.3)
~/.var/app/io.github.chidiwilliams.Buzz/cache/Buzz/models/ Buzz model cache (ggml binaries)
~/.var/app/io.github.chidiwilliams.Buzz/.local/state/Buzz/log/logs.txt Buzz application log
~/.var/app/io.github.chidiwilliams.Buzz/config/Buzz.conf Buzz settings
~/whisper.cpp/build/bin/whisper-cli Custom Vulkan-enabled whisper.cpp build
~/bin/whisper-cli-vulkan Wrapper script for custom build (sets LD_LIBRARY_PATH)
~/.local/share/pipx/venvs/buzz-captions/lib/python3.12/site-packages/buzz/whisper_cpp/ pipx install whisper_cpp directory (directly modifiable)

Investigation Summary

Question Finding
Is Buzz using the RX 7600? Yes. using Vulkan0 backend confirmed inside the Flatpak sandbox.
Was --no-gpu being added? No. IS_VULKAN_SUPPORTED = True in the sandbox; flag is never appended.
Does the large model load correctly? Yes. 1.6 GB to VRAM, transcribes in ~1.25s on a short clip.
What caused the crash? Extract Speech (Demucs) OOM. Python killed at 22 GB RSS before whisper-cli launched.
Why was VRAM not climbing? whisper-cli was never started — process died during Demucs pre-processing.
Fix? Uncheck Extract Speech for long files. Clear swap before sessions.

Investigation date: 2026-04-25. System: Ubuntu 24.04, AMD RX 7600, Buzz 1.4.4 (Flatpak).