Comfac GPU Scaling and AI Research Goals

Objective

To develop and scale a high-performance AMD-based AI compute cluster, capable of running large-scale models (e.g., Qwen 2.5 235B) and supporting educational and R&D initiatives through open collaboration with partner schools.

Goals and Steps

1. Platform and Motherboard Selection

Identify and procure a motherboard or server platform that supports extensive GPU scaling and PCIe bifurcation (similar to the setup demonstrated by PewDiePie).
Ensure compatibility with ROCm and vLLM for distributed inference and multi-GPU coordination.

2. Initial Scaling (Pilot Models)

Begin with well-known, stable models to validate infrastructure performance and reliability.
Pilot hardware: AMD Radeon PRO R9700 AI or equivalent AI-focused GPU.
Validate thermal performance, power delivery, and driver stability for continuous inference workloads.

3. Progressive Hardware Replication

Once stable results are achieved with R9700 PRO, replicate the same environment using RX 7900 XTX and other AMD GPUs to benchmark performance scaling.
Document compatibility issues, driver updates, and quantization performance metrics.

4. Cluster and Swarm Development

Establish a Cluster System for large-model distributed inference and training.
Build a Swarm System capable of parallelizing smaller AI instances (e.g., 7700 and lower-end GPU nodes) for local and academic deployment.
Optimize inter-node communication, synchronization, and monitoring tools for mixed hardware setups.

5. Funding and Laboratory Deployment

Fund the creation of a dedicated AI Lab focused on testing, documentation, and educational use.
Provide access to partner schools for research, benchmarking, and AI model fine-tuning.

6. Open Compute and Tokenization Participation

Study and participate in open-source projects that allow community-based compute contributions (similar to Folding@home).
Learn and experiment with decentralized compute-sharing models that enable contributors to sell tokens or compute time securely and transparently.

Reference

Inspirational video: Watch on YouTube

End Goal

To make Comfac and its academic partners a recognized hub for open, scalable, and sustainable AI research using AMD technologies.