Jump to content

Comfac GPU Scaling and AI Research Goals

From MediawikiCIT
Revision as of 06:59, 25 February 2026 by BabiSender (talk | contribs) (Created page with "= Comfac GPU Scaling and AI Research Goals = == Objective == To develop and scale a high-performance AMD-based AI compute cluster, capable of running large-scale models (e.g., Qwen 2.5 235B) and supporting educational and R&D initiatives through open collaboration with partner schools. ---- == Goals and Steps == === 1. Platform and Motherboard Selection === * Identify and procure a motherboard or server platform that supports extensive GPU scaling and PCIe bifurcati...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Comfac GPU Scaling and AI Research Goals

Objective

To develop and scale a high-performance AMD-based AI compute cluster, capable of running large-scale models (e.g., Qwen 2.5 235B) and supporting educational and R&D initiatives through open collaboration with partner schools.


Goals and Steps

1. Platform and Motherboard Selection

  • Identify and procure a motherboard or server platform that supports extensive GPU scaling and PCIe bifurcation (similar to the setup demonstrated by PewDiePie).
  • Ensure compatibility with ROCm and vLLM for distributed inference and multi-GPU coordination.

2. Initial Scaling (Pilot Models)

  • Begin with well-known, stable models to validate infrastructure performance and reliability.
  • Pilot hardware: AMD Radeon PRO R9700 AI or equivalent AI-focused GPU.
  • Validate thermal performance, power delivery, and driver stability for continuous inference workloads.

3. Progressive Hardware Replication

  • Once stable results are achieved with R9700 PRO, replicate the same environment using RX 7900 XTX and other AMD GPUs to benchmark performance scaling.
  • Document compatibility issues, driver updates, and quantization performance metrics.

4. Cluster and Swarm Development

  • Establish a Cluster System for large-model distributed inference and training.
  • Build a Swarm System capable of parallelizing smaller AI instances (e.g., 7700 and lower-end GPU nodes) for local and academic deployment.
  • Optimize inter-node communication, synchronization, and monitoring tools for mixed hardware setups.

5. Funding and Laboratory Deployment

  • Fund the creation of a dedicated AI Lab focused on testing, documentation, and educational use.
  • Provide access to partner schools for research, benchmarking, and AI model fine-tuning.

6. Open Compute and Tokenization Participation

  • Study and participate in open-source projects that allow community-based compute contributions (similar to Folding@home).
  • Learn and experiment with decentralized compute-sharing models that enable contributors to sell tokens or compute time securely and transparently.

Reference


End Goal

To make Comfac and its academic partners a recognized hub for open, scalable, and sustainable AI research using AMD technologies.