Architecture¶
This page summarizes how the ProgramAsWeights production system is structured end to end.
Components¶
| Layer | Technology | Role |
|---|---|---|
| Frontend | React, Vite | Web UI; static assets served by nginx |
| API | FastAPI, uvicorn | REST endpoints, orchestration, auth integration |
| Database | PostgreSQL | Users, programs, aliases, votes, cases, operational logs |
| GPU services | Three vLLM instances | Pseudo-program generation, compiler (including hidden-state work), inference |
| Storage | Hugging Face, local disk | .paw bundles on Hugging Face; PEFT adapter artifacts on server disk |
| Auth | GitHub OAuth | Sign-in; session backed by HTTP cookies |
GPU layout¶
Typical allocation:
- GPU 0 — pseudo-program generation
- GPU 1 — compiler workload (including pooling / hidden-state extraction used in the pipeline)
- GPU 2 — multi-LoRA inference
Exact mapping may vary by deployment; the important split is dedicated vLLM roles per stage.
Compile pipeline¶
High-level flow:
- Pseudo-generation (vLLM) — turn the natural-language spec into a pseudo-program representation.
- LoRA extraction — derive adapter weights from vLLM hidden states / pooling as implemented in the compiler stack.
- Quantization — convert adapters to Q4_0 GGUF for the bundle format used by the runtime.
- Bundle — assemble the
.pawpackage with metadata and weights. - Upload — publish the
.pawartifact to Hugging Face for CDN-backed distribution.
Caching¶
The system uses two-level caching:
- Pseudo-generation cache — avoid recomputing pseudo-programs for identical or equivalent spec inputs where the cache key applies.
- Program-level disk cache — reuse compiled artifacts and intermediate state on the server when the same content-addressed program is requested again.
Together these reduce redundant GPU work and speed up repeat compiles.
Downloads and the SDK¶
The Python SDK downloads .paw files from the Hugging Face CDN (or equivalent object storage fronted as a CDN). Programs are not served as large binary payloads from the ProgramAsWeights API host, which keeps the API focused on metadata, auth, and orchestration.