Buzz Transcription

Buzz is an open-source, offline-capable audio/meeting transcription tool powered by OpenAI Whisper (via whisper.cpp). On this system it is installed as a Flatpak and runs with Vulkan GPU acceleration on the AMD RX 7600.

How to Transcribe a Meeting

Prerequisites

Buzz installed (Flatpak: io.github.chidiwilliams.Buzz)
A model downloaded — recommended: ggml-large-v3-turbo for quality, ggml-tiny for speed
Models are stored at ~/.var/app/io.github.chidiwilliams.Buzz/cache/Buzz/models/

File Transcription (Meeting Recording)

Launch Buzz from the application menu or run: flatpak run io.github.chidiwilliams.Buzz
Go to File → Import Media and select your meeting recording (MP3, WAV, M4A, etc.)
In the transcription dialog:
- Model: Select large-v3-turbo (best quality for meetings)
- Task: Transcribe
- Language: Select your language or leave on Auto
- Extract Speech: Leave unchecked for files longer than ~30 minutes (see OOM Crash below)
- Output: SRT, VTT, or TXT depending on your need
Click Transcribe
When complete, the transcript appears in the main window and is saved alongside the source file

Live Transcription (Real-Time)

Go to File → Record and Transcribe
Select your microphone input
Choose model and language
Click Record — transcription appears live on screen
Stop recording when done; export via File → Export

Recommended Settings for Meetings

Setting	Value	Reason
Model	large-v3-turbo	Best accuracy; 1.6 GB VRAM, runs on RX 7600
Extract Speech	Off for files >30 min	Demucs uses 8–15 GB RAM for long files (OOM risk)
Language	Set explicitly	Faster than auto-detect
Task	Transcribe	Use Translate only if you need English output from another language

Before Each Session: Clear Swap

Prior transcription sessions can leave stale pages in swap. Run this before starting a long transcription:

sudo swapoff -a && sudo swapon -a

This is safe when free RAM exceeds swap used (typical on this system: 15+ GB free, 8 GB swap).

GPU Acceleration

Buzz's Flatpak installation includes a Vulkan-enabled whisper-cli. On this system, GPU inference is automatically active — no configuration needed.

GPU in use: AMD RX 7600 (RDNA3, RADV NAVI33)
Confirmed by: whisper_backend_init_gpu: using Vulkan0 backend in runtime output
Buzz checks for Vulkan at startup (IS_VULKAN_SUPPORTED) and omits the --no-gpu flag when found

To force CPU inference (for debugging):

flatpak override --user io.github.chidiwilliams.Buzz --env=BUZZ_FORCE_CPU=true

To revert:

flatpak override --user --reset io.github.chidiwilliams.Buzz

OOM Crash with Large Files

Symptom

When selecting ggml-large-v3-turbo in Buzz and starting a transcription on a long file, the application crashes. VRAM usage never visibly climbs before the crash.

Root Cause

The crash is caused by the Extract Speech (Demucs) pre-processing step, not the large model.

Demucs is a PyTorch-based music source separation model that strips background noise from audio before passing it to the transcription engine. It processes raw PCM audio at full float32 precision entirely in RAM:

Audio Duration	Compressed Size	RAM Required (Demucs)
1 hour MP3	~60 MB	500 MB – 1 GB
3–4 hour session	~220 MB	8–15 GB peak

The combination that caused the crash:

Swap already exhausted from previous sessions
Extract Speech (Demucs) triggered on a ~220 MB MP3 (~3–4 hours of audio)
Python RAM usage exceeded available RAM + swap ceiling
OOM killer terminated the process at 22 GB RSS
whisper-cli was never launched → no VRAM activity observed

Why VRAM Never Climbed

whisper-cli runs as a subprocess of the Python GUI. The OOM kill happened during Demucs pre-processing — whisper-cli was never started, so no VRAM allocation occurred.

From the Buzz log at crash time:

~/.var/app/io.github.chidiwilliams.Buzz/.local/state/Buzz/log/logs.txt

The last entry was Will extract speech — the log never reached Starting whisper file transcription.

Fix

Uncheck "Extract Speech" in the Buzz transcription dialog for any file longer than ~30 minutes. The large-v3-turbo model has built-in noise tolerance that makes Demucs pre-processing unnecessary in most cases.

Optionally, grow the swapfile if you need Extract Speech for shorter files:

sudo swapoff /swap.img
sudo fallocate -l 16G /swap.img
sudo mkswap /swap.img
sudo swapon /swap.img

Environment Variables

Variable	Default	Effect
`BUZZ_FORCE_CPU`	false	Set to `true` to disable GPU inference
`BUZZ_WHISPERCPP_N_THREADS`	cpu_count / 2	Override thread count for whisper-cli

Set via: flatpak override --user io.github.chidiwilliams.Buzz --env=VAR=value

Key File Paths

Path	Description
`/var/lib/flatpak/app/io.github.chidiwilliams.Buzz/.../buzz/whisper_cpp/whisper-cli`	Bundled Vulkan whisper-cli (libwhisper 1.8.3)
`~/.var/app/io.github.chidiwilliams.Buzz/cache/Buzz/models/`	Buzz model cache (ggml binaries)
`~/.var/app/io.github.chidiwilliams.Buzz/.local/state/Buzz/log/logs.txt`	Buzz application log
`~/.var/app/io.github.chidiwilliams.Buzz/config/Buzz.conf`	Buzz settings
`~/whisper.cpp/build/bin/whisper-cli`	Custom Vulkan-enabled whisper.cpp build
`~/bin/whisper-cli-vulkan`	Wrapper script for custom build (sets `LD_LIBRARY_PATH`)
`~/.local/share/pipx/venvs/buzz-captions/lib/python3.12/site-packages/buzz/whisper_cpp/`	pipx install whisper_cpp directory (directly modifiable)

Investigation Summary

Question	Finding
Is Buzz using the RX 7600?	Yes. `using Vulkan0 backend` confirmed inside the Flatpak sandbox.
Was `--no-gpu` being added?	No. `IS_VULKAN_SUPPORTED = True` in the sandbox; flag is never appended.
Does the large model load correctly?	Yes. 1.6 GB to VRAM, transcribes in ~1.25s on a short clip.
What caused the crash?	Extract Speech (Demucs) OOM. Python killed at 22 GB RSS before whisper-cli launched.
Why was VRAM not climbing?	`whisper-cli` was never started — process died during Demucs pre-processing.
Fix?	Uncheck Extract Speech for long files. Clear swap before sessions.

Investigation date: 2026-04-25. System: Ubuntu 24.04, AMD RX 7600, Buzz 1.4.4 (Flatpak).