<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://mediawiki.comfac.net/index.php?action=history&amp;feed=atom&amp;title=Buzz_Transcription</id>
	<title>Buzz Transcription - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://mediawiki.comfac.net/index.php?action=history&amp;feed=atom&amp;title=Buzz_Transcription"/>
	<link rel="alternate" type="text/html" href="https://mediawiki.comfac.net/index.php?title=Buzz_Transcription&amp;action=history"/>
	<updated>2026-06-05T11:02:43Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.45.1</generator>
	<entry>
		<id>https://mediawiki.comfac.net/index.php?title=Buzz_Transcription&amp;diff=254&amp;oldid=prev</id>
		<title>Justinaquino: Add Buzz meeting transcription guide + Vulkan GPU &amp; OOM investigation (2026-04-25)</title>
		<link rel="alternate" type="text/html" href="https://mediawiki.comfac.net/index.php?title=Buzz_Transcription&amp;diff=254&amp;oldid=prev"/>
		<updated>2026-04-24T18:43:50Z</updated>

		<summary type="html">&lt;p&gt;Add Buzz meeting transcription guide + Vulkan GPU &amp;amp; OOM investigation (2026-04-25)&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;Buzz is an open-source, offline-capable audio/meeting transcription tool powered by OpenAI Whisper (via whisper.cpp). On this system it is installed as a Flatpak and runs with Vulkan GPU acceleration on the AMD RX 7600.&lt;br /&gt;
&lt;br /&gt;
== How to Transcribe a Meeting ==&lt;br /&gt;
&lt;br /&gt;
=== Prerequisites ===&lt;br /&gt;
* Buzz installed (Flatpak: &amp;lt;code&amp;gt;io.github.chidiwilliams.Buzz&amp;lt;/code&amp;gt;)&lt;br /&gt;
* A model downloaded — recommended: &amp;lt;code&amp;gt;ggml-large-v3-turbo&amp;lt;/code&amp;gt; for quality, &amp;lt;code&amp;gt;ggml-tiny&amp;lt;/code&amp;gt; for speed&lt;br /&gt;
* Models are stored at &amp;lt;code&amp;gt;~/.var/app/io.github.chidiwilliams.Buzz/cache/Buzz/models/&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== File Transcription (Meeting Recording) ===&lt;br /&gt;
# Launch Buzz from the application menu or run: &amp;lt;code&amp;gt;flatpak run io.github.chidiwilliams.Buzz&amp;lt;/code&amp;gt;&lt;br /&gt;
# Go to &amp;#039;&amp;#039;&amp;#039;File → Import Media&amp;#039;&amp;#039;&amp;#039; and select your meeting recording (MP3, WAV, M4A, etc.)&lt;br /&gt;
# In the transcription dialog:&lt;br /&gt;
#* &amp;#039;&amp;#039;&amp;#039;Model&amp;#039;&amp;#039;&amp;#039;: Select &amp;lt;code&amp;gt;large-v3-turbo&amp;lt;/code&amp;gt; (best quality for meetings)&lt;br /&gt;
#* &amp;#039;&amp;#039;&amp;#039;Task&amp;#039;&amp;#039;&amp;#039;: Transcribe&lt;br /&gt;
#* &amp;#039;&amp;#039;&amp;#039;Language&amp;#039;&amp;#039;&amp;#039;: Select your language or leave on Auto&lt;br /&gt;
#* &amp;#039;&amp;#039;&amp;#039;Extract Speech&amp;#039;&amp;#039;&amp;#039;: &amp;#039;&amp;#039;&amp;#039;Leave unchecked&amp;#039;&amp;#039;&amp;#039; for files longer than ~30 minutes (see [[#OOM Crash with Large Files|OOM Crash]] below)&lt;br /&gt;
#* &amp;#039;&amp;#039;&amp;#039;Output&amp;#039;&amp;#039;&amp;#039;: SRT, VTT, or TXT depending on your need&lt;br /&gt;
# Click &amp;#039;&amp;#039;&amp;#039;Transcribe&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
# When complete, the transcript appears in the main window and is saved alongside the source file&lt;br /&gt;
&lt;br /&gt;
=== Live Transcription (Real-Time) ===&lt;br /&gt;
# Go to &amp;#039;&amp;#039;&amp;#039;File → Record and Transcribe&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
# Select your microphone input&lt;br /&gt;
# Choose model and language&lt;br /&gt;
# Click &amp;#039;&amp;#039;&amp;#039;Record&amp;#039;&amp;#039;&amp;#039; — transcription appears live on screen&lt;br /&gt;
# Stop recording when done; export via &amp;#039;&amp;#039;&amp;#039;File → Export&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
=== Recommended Settings for Meetings ===&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Setting !! Value !! Reason&lt;br /&gt;
|-&lt;br /&gt;
| Model || large-v3-turbo || Best accuracy; 1.6 GB VRAM, runs on RX 7600&lt;br /&gt;
|-&lt;br /&gt;
| Extract Speech || &amp;#039;&amp;#039;&amp;#039;Off&amp;#039;&amp;#039;&amp;#039; for files &amp;amp;gt;30 min || Demucs uses 8–15 GB RAM for long files (OOM risk)&lt;br /&gt;
|-&lt;br /&gt;
| Language || Set explicitly || Faster than auto-detect&lt;br /&gt;
|-&lt;br /&gt;
| Task || Transcribe || Use Translate only if you need English output from another language&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Before Each Session: Clear Swap ===&lt;br /&gt;
Prior transcription sessions can leave stale pages in swap. Run this before starting a long transcription:&lt;br /&gt;
&amp;lt;pre&amp;gt;sudo swapoff -a &amp;amp;&amp;amp; sudo swapon -a&amp;lt;/pre&amp;gt;&lt;br /&gt;
This is safe when free RAM exceeds swap used (typical on this system: 15+ GB free, 8 GB swap).&lt;br /&gt;
&lt;br /&gt;
== GPU Acceleration ==&lt;br /&gt;
&lt;br /&gt;
Buzz&amp;#039;s Flatpak installation includes a Vulkan-enabled &amp;lt;code&amp;gt;whisper-cli&amp;lt;/code&amp;gt;. On this system, GPU inference is automatically active — no configuration needed.&lt;br /&gt;
&lt;br /&gt;
* GPU in use: &amp;#039;&amp;#039;&amp;#039;AMD RX 7600 (RDNA3, RADV NAVI33)&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* Confirmed by: &amp;lt;code&amp;gt;whisper_backend_init_gpu: using Vulkan0 backend&amp;lt;/code&amp;gt; in runtime output&lt;br /&gt;
* Buzz checks for Vulkan at startup (&amp;lt;code&amp;gt;IS_VULKAN_SUPPORTED&amp;lt;/code&amp;gt;) and omits the &amp;lt;code&amp;gt;--no-gpu&amp;lt;/code&amp;gt; flag when found&lt;br /&gt;
&lt;br /&gt;
To force CPU inference (for debugging):&lt;br /&gt;
&amp;lt;pre&amp;gt;flatpak override --user io.github.chidiwilliams.Buzz --env=BUZZ_FORCE_CPU=true&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To revert:&lt;br /&gt;
&amp;lt;pre&amp;gt;flatpak override --user --reset io.github.chidiwilliams.Buzz&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== OOM Crash with Large Files ==&lt;br /&gt;
&lt;br /&gt;
=== Symptom ===&lt;br /&gt;
When selecting &amp;lt;code&amp;gt;ggml-large-v3-turbo&amp;lt;/code&amp;gt; in Buzz and starting a transcription on a long file, the application crashes. VRAM usage never visibly climbs before the crash.&lt;br /&gt;
&lt;br /&gt;
=== Root Cause ===&lt;br /&gt;
The crash is caused by the &amp;#039;&amp;#039;&amp;#039;Extract Speech (Demucs)&amp;#039;&amp;#039;&amp;#039; pre-processing step, not the large model.&lt;br /&gt;
&lt;br /&gt;
Demucs is a PyTorch-based music source separation model that strips background noise from audio before passing it to the transcription engine. It processes raw PCM audio at full float32 precision entirely in RAM:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Audio Duration !! Compressed Size !! RAM Required (Demucs)&lt;br /&gt;
|-&lt;br /&gt;
| 1 hour MP3 || ~60 MB || 500 MB – 1 GB&lt;br /&gt;
|-&lt;br /&gt;
| 3–4 hour session || ~220 MB || 8–15 GB peak&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The combination that caused the crash:&lt;br /&gt;
# Swap already exhausted from previous sessions&lt;br /&gt;
# Extract Speech (Demucs) triggered on a ~220 MB MP3 (~3–4 hours of audio)&lt;br /&gt;
# Python RAM usage exceeded available RAM + swap ceiling&lt;br /&gt;
# OOM killer terminated the process at 22 GB RSS&lt;br /&gt;
# &amp;lt;code&amp;gt;whisper-cli&amp;lt;/code&amp;gt; was never launched → no VRAM activity observed&lt;br /&gt;
&lt;br /&gt;
=== Why VRAM Never Climbed ===&lt;br /&gt;
&amp;lt;code&amp;gt;whisper-cli&amp;lt;/code&amp;gt; runs as a subprocess of the Python GUI. The OOM kill happened during Demucs pre-processing — &amp;lt;code&amp;gt;whisper-cli&amp;lt;/code&amp;gt; was never started, so no VRAM allocation occurred.&lt;br /&gt;
&lt;br /&gt;
From the Buzz log at crash time:&lt;br /&gt;
&amp;lt;pre&amp;gt;~/.var/app/io.github.chidiwilliams.Buzz/.local/state/Buzz/log/logs.txt&amp;lt;/pre&amp;gt;&lt;br /&gt;
The last entry was &amp;lt;code&amp;gt;Will extract speech&amp;lt;/code&amp;gt; — the log never reached &amp;lt;code&amp;gt;Starting whisper file transcription&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
=== Fix ===&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Uncheck &amp;quot;Extract Speech&amp;quot;&amp;#039;&amp;#039;&amp;#039; in the Buzz transcription dialog for any file longer than ~30 minutes. The &amp;lt;code&amp;gt;large-v3-turbo&amp;lt;/code&amp;gt; model has built-in noise tolerance that makes Demucs pre-processing unnecessary in most cases.&lt;br /&gt;
&lt;br /&gt;
Optionally, grow the swapfile if you need Extract Speech for shorter files:&lt;br /&gt;
&amp;lt;pre&amp;gt;sudo swapoff /swap.img&lt;br /&gt;
sudo fallocate -l 16G /swap.img&lt;br /&gt;
sudo mkswap /swap.img&lt;br /&gt;
sudo swapon /swap.img&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Environment Variables ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Variable !! Default !! Effect&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;BUZZ_FORCE_CPU&amp;lt;/code&amp;gt; || false || Set to &amp;lt;code&amp;gt;true&amp;lt;/code&amp;gt; to disable GPU inference&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;BUZZ_WHISPERCPP_N_THREADS&amp;lt;/code&amp;gt; || cpu_count / 2 || Override thread count for whisper-cli&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Set via: &amp;lt;code&amp;gt;flatpak override --user io.github.chidiwilliams.Buzz --env=VAR=value&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Key File Paths ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Path !! Description&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;/var/lib/flatpak/app/io.github.chidiwilliams.Buzz/.../buzz/whisper_cpp/whisper-cli&amp;lt;/code&amp;gt; || Bundled Vulkan whisper-cli (libwhisper 1.8.3)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;~/.var/app/io.github.chidiwilliams.Buzz/cache/Buzz/models/&amp;lt;/code&amp;gt; || Buzz model cache (ggml binaries)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;~/.var/app/io.github.chidiwilliams.Buzz/.local/state/Buzz/log/logs.txt&amp;lt;/code&amp;gt; || Buzz application log&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;~/.var/app/io.github.chidiwilliams.Buzz/config/Buzz.conf&amp;lt;/code&amp;gt; || Buzz settings&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;~/whisper.cpp/build/bin/whisper-cli&amp;lt;/code&amp;gt; || Custom Vulkan-enabled whisper.cpp build&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;~/bin/whisper-cli-vulkan&amp;lt;/code&amp;gt; || Wrapper script for custom build (sets &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;~/.local/share/pipx/venvs/buzz-captions/lib/python3.12/site-packages/buzz/whisper_cpp/&amp;lt;/code&amp;gt; || pipx install whisper_cpp directory (directly modifiable)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Investigation Summary ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Question !! Finding&lt;br /&gt;
|-&lt;br /&gt;
| Is Buzz using the RX 7600? || &amp;#039;&amp;#039;&amp;#039;Yes.&amp;#039;&amp;#039;&amp;#039; &amp;lt;code&amp;gt;using Vulkan0 backend&amp;lt;/code&amp;gt; confirmed inside the Flatpak sandbox.&lt;br /&gt;
|-&lt;br /&gt;
| Was &amp;lt;code&amp;gt;--no-gpu&amp;lt;/code&amp;gt; being added? || &amp;#039;&amp;#039;&amp;#039;No.&amp;#039;&amp;#039;&amp;#039; &amp;lt;code&amp;gt;IS_VULKAN_SUPPORTED = True&amp;lt;/code&amp;gt; in the sandbox; flag is never appended.&lt;br /&gt;
|-&lt;br /&gt;
| Does the large model load correctly? || &amp;#039;&amp;#039;&amp;#039;Yes.&amp;#039;&amp;#039;&amp;#039; 1.6 GB to VRAM, transcribes in ~1.25s on a short clip.&lt;br /&gt;
|-&lt;br /&gt;
| What caused the crash? || &amp;#039;&amp;#039;&amp;#039;Extract Speech (Demucs) OOM.&amp;#039;&amp;#039;&amp;#039; Python killed at 22 GB RSS before whisper-cli launched.&lt;br /&gt;
|-&lt;br /&gt;
| Why was VRAM not climbing? || &amp;lt;code&amp;gt;whisper-cli&amp;lt;/code&amp;gt; was never started — process died during Demucs pre-processing.&lt;br /&gt;
|-&lt;br /&gt;
| Fix? || &amp;#039;&amp;#039;&amp;#039;Uncheck Extract Speech&amp;#039;&amp;#039;&amp;#039; for long files. Clear swap before sessions.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;Investigation date: 2026-04-25. System: Ubuntu 24.04, AMD RX 7600, Buzz 1.4.4 (Flatpak).&amp;#039;&amp;#039;&lt;/div&gt;</summary>
		<author><name>Justinaquino</name></author>
	</entry>
</feed>