A production-grade AI teaching platform that delivers lectures, tutoring, and adaptive feedback at university quality, across thousands of students and dozens of programs, on any device, at any hour.
Creative Integrations proposes to design, build, and operate the platform as a managed, multi-tenant system, not a one-off internal project. This document covers our approach, the full avatar-platform landscape, transparent token and cost economics, the savings model, and a phased plan to scale from a single pilot to 25,000+ students.
Most "AI professor" concepts are framed as a single course experiment that someone wires together by hand. That works for a demo and collapses at scale. We propose the opposite: a multi-tenant platform with the scaling assumptions designed in from day one, delivered and run by a partner who does this for a living.
The AI delivers teaching, lectures, Q&A, tutoring, formative feedback. The university keeps academic ownership and signs off all graded assessment. This is the fastest, most defensible regulatory path for HEP's partner institutions.
One platform, many programs and many universities, each isolated, each with its own content corpus, branding, and avatar. Adding the 12th program is a configuration, not a rebuild. This is the difference between a pilot and a business.
LLM, voice, and avatar providers sit behind an abstraction layer. Swapping one model or avatar provider for another is a config change, protecting HEP from any single vendor's pricing, terms, or capability shifts.
The core insight that drives the economics: the "thinking" layer of an AI professor is now a near-commodity and a negligible part of the cost. Almost all of the spend lives in one place, the real-time rendering of the on-screen professor. That is the honest headline: get the rendering approach right and the rest is rounding. We are transparent about that shape now and put firm numbers on it once the rendering platform is chosen. The story isn't a cheaper professor, it's a delivery model whose cost is a dial you control.
The retrieval-grounded AI that decides what to teach is, in 2026, a known quantity, proven components, commodity pricing, a few dollars of tokens per student a year. We treat it as a solved, swappable layer and don't dwell on it. What actually determines whether this works for HEP is harder and less glamorous: streaming a real-time video professor to thousands of concurrent learners, close to them, reliably, at a cost you control. That is where the real engineering, and our value, lives.
The hard problem. Hundreds or thousands of live, lip-synced avatar sessions running at once, each sub-second, none degrading under load, session brokering, stream capacity, autoscaling, and graceful fallback to voice/text when demand spikes. This is the part that separates a demo from a platform.
A live professor only feels real if it responds instantly. The streaming compute is placed in regions near the student base, so an avatar feels as immediate in Lagos or Nairobi as in Frankfurt, while pre-rendered lectures cache on the global edge.
Real-time video is the one line that scales with usage, and it is the platform's dominant cost, so we govern it directly with per-minute metering plus hard budget caps per tenant and per student. The live professor stays the default experience; text and voice remain available as a graceful fallback under demand spikes. The volume of live-tutorial minutes per student is the single biggest cost lever, which is why we cap and meter it rather than leave it open.
Many programs and many universities on one platform, each isolated, each with its own avatar, branding, and content. Adding the next program is configuration, not a rebuild.
Automated guardrails plus human-in-the-loop review: sampled session audits, faculty sign-off workflows, region-aware data isolation aligned to GDPR, POPIA, FERPA and local rules, and a full paper trail for accreditor scrutiny.
The intelligence (retrieval-grounded, smart-routed LLM + Socratic teaching logic) is proven and vendor-swappable, at a few dollars per student per year. Important, but not where the difficulty or the cost is, so we keep it brief and move on.
We deliberately separate the two things a professor does. The lecture is a pre-rendered element, produced once, delivered to everyone. The interactive tutorial is a live, real-time professor, a student talks to it and it talks back. Splitting them isn't cosmetic: they have opposite cost, latency, and hosting profiles, so we build and serve each one differently. Specific providers are finalized in scoping and kept swappable, the architecture is deliberately vendor-agnostic.
Core lecture content is produced once as polished talking-head video and delivered to every student. Because a program is only a handful of hours of lecture video, rendered once a year at commodity per-minute rates, the production cost is small, one-time, and shared across the whole cohort, so it comes to pennies per student at scale. It is a static file: broadcast-quality, cached at the edge, fast in every region with no live compute, and not latency-sensitive.
The interactive tutorial is a live, conversational professor (avatar, voice, and LLM in one, ~600ms response) that the student can interrupt and question. This is the standard interactive experience, not an add-on. It anchors the live experience, and it is the platform's dominant cost, a usage-driven one (live minutes of tutorial per student) that we govern with per-student and per-tenant budget caps and live metering. It is the latency-critical leg, so it runs close to the student (see hosting). Text and voice stay available underneath for quick questions.
Why this is the right call: pre-rendered video is cheap and shared, so it carries the heavy, repeatable lecture content; the real-time professor is the live, default experience for tutorials, and because it is per-minute we govern its budget with live metering and caps. That split is what keeps the platform feeling high-touch while concentrating spend only where it earns its keep. Providers are finalized with HEP in scoping and kept behind an abstraction layer, so a pricing or capability change is a config update, never a rebuild.
This is not a mockup. A live, real-time AI instructor runs right here on the page: click to activate the professor below, then speak or type, and it answers in real time, grounded in whatever knowledge base, voice, and personality the agent is given.
Note: the live instructor streams real-time video and is metered per minute. For a public page we recommend capping demo minutes and rate-limiting so traffic can't run up cost. The embed manages its own session, no API keys are exposed in the page.
A fair challenge to any AI proposal: you can't forecast exactly how much a student will use the system. We don't claim to. What we can do is build the cost from the bottom up, separate what's engineering-controlled from what's a genuine assumption, and show how sensitive the answer is to the one variable we can't know in advance. This is purely the AI brain's usage; the cost that dominates the platform is rendering, covered under Cost Structure.
Tokens per turn (~7,750). We set the size of the system prompt, retrieved context, and history window. This barely moves, it's a design choice, not a guess.
Inference is a rounding error. Token usage is small and the rates are public; whichever model we route to, the brain is a tiny fraction of platform cost. The money is in rendering.
Turns per student per year. Nobody can know this up front. We assume 300. Everything uncertain about the cost lives in this single number, so we test it below.
Every student message triggers one LLM call carrying:
| Component | Tokens | Billing |
|---|---|---|
| System / pedagogy prompt | 2,000 | cached (90% off) |
| Retrieved course context (RAG) | 3,000 | input |
| Conversation history | 2,000 | input |
| Student question | 150 | input |
| Teaching response | 600 | output |
| Per turn | ~7,750 | tokens |
The platform routes each turn to the cheapest capable model and caches the system prompt, so token usage stays minimal. The exact provider mix is finalized later; it barely moves the total.
Tokens per turn are fixed by design, so a student's annual usage comes down to one variable, turns per year. The whole range:
| Turns / student / yr | Tokens / yr |
|---|---|
| 150 (light) | 1.16 M |
| 300 (our baseline) | 2.33 M |
| 600 (heavy) | 4.65 M |
| 1,000 (power user) | 7.75 M |
The point: even if our usage assumption is off by 3×, the brain's footprint stays tiny. Token inference simply isn't where the money is, which is exactly why it can't sink the business case. Rendering can, which is why we focus there.
How the estimate becomes fact: from day one the platform meters real token usage per student, so within the first few weeks of the pilot the assumption is replaced by HEP's actual numbers. And because we can enforce per-student and per-tenant token budgets in-platform, cost cannot run away even if a cohort is unexpectedly heavy. It's a dial we control and monitor, not a forecast we hope holds.
The brain runs on commodity LLMs (Claude, GPT, Gemini), selected by capability and cost and swappable behind an abstraction layer. Caching and batching reduce usage further. Token counts verified programmatically; we are intentionally not attaching dollar figures until the platform path is set.
We are deliberately not putting dollar figures on the platform yet. The cost is dominated by one decision we have not made with HEP, namely how we source the real-time rendering: build it, license it, or partner for it. What we can be completely transparent about today is the shape of the cost, so nothing surprises anyone when firm pricing follows that decision.
The heavy lift, and where essentially all the spend concentrates. It scales with every minute of live tutorial and with concurrency. This is exactly why we will not quote a number until the rendering platform is decided: build, license, or partner.
Hosting, vector database, embeddings. Modest and mostly fixed; it scales gently with programs and regions, not with every interaction.
The retrieval-grounded LLM. Usage is small and known; whichever model we route to, inference is a rounding error next to rendering.
Produced once, shared across the whole cohort, cached at the edge. A small, one-time line that becomes pennies per student at scale.
The lever: because rendering is the cost, we govern it directly, with per-student and per-tenant minute caps, live metering, and routing routine questions to near-free text/voice. Two structural realities shape the eventual number: concurrency (live-video capacity is capped per provider, so real scale runs on an enterprise or self-hosted footprint) and usage (total live minutes per student). Both are levers we manage with HEP.
Why no dollar figures yet: the rendering decision, build our own pipeline, license a model, or partner with a provider, swings the number more than anything else. Quoting before that decision would be guesswork. We model it precisely with HEP once the path is chosen, and it lands in the Statement of Work, not in this document.
A note on academic oversight: accreditor-required faculty sign-off and QA is a real, ongoing cost, but it sits on the institution / HEP side, not in the platform itself. It scales with enrollment, and we size it with you against your partner-institution staffing rather than assuming a fixed headcount.
Each phase de-risks the next. The architecture never changes; we simply switch on capacity, programs, languages, and presentation tiers. This is how a pilot becomes a defensible platform business rather than a series of bespoke projects.
What "built for scale from the beginning" actually means here: stateless services behind a load balancer, per-tenant data isolation, queue-based avatar rendering, cached and batched LLM calls, an abstraction layer over every external vendor, and observability/cost-monitoring instrumented from day one. We do not retrofit scale, we assume it, so Phase 4's 25,000 students run on the same codebase that served Phase 1's first cohort.
The alternative approaches we've seen ask HEP (or an internal champion) to direct a hand-built system through 700+ hours of personal effort and then own its operation forever. That is real key-person risk on a system students depend on. Here's the difference in what we bring.
The bet we're making with HEP: the institutions that win the next decade of online education won't be the ones that ran an AI experiment, they'll be the ones that operationalized it at scale, with quality and compliance intact. We want to build and run that platform with you.
U.S. accreditors and the Dept. of Education require meaningful faculty involvement in credit-bearing instruction. AI-delivered teaching is a developing area.
University owns accreditation; a human academic signs off all graded assessment. We build the compliance paper trail before launch, not after, and structure each tenant to its accreditor's expectations.
An AI teaching incorrect content is a reputational and academic risk, especially in professional fields.
RAG grounds every answer in verified course materials, not the open web. Automated source-deviation flagging plus weekly sampled human QA audits catch drift early.
Students may perceive AI delivery as lower quality; faculty may see it as a threat.
Positioned as AI-enhanced delivery that frees faculty for high-value work. First-cohort feedback shapes the rollout; 24/7 availability tends to win students over quickly.
Student data is regulated differently in every market, and most students sit outside the US.
Per-tenant, in-region data isolation, encryption, data-processing agreements with every vendor, and a posture aligned to GDPR, POPIA, FERPA, and local data-protection rules from Phase 1.
Per-minute conversational video is the one cost that scales with usage.
The real-time professor stays the standard experience; we control spend with per-tenant and per-student budget caps, live metering, and a graceful text/voice fallback under demand spikes, never by gating it behind an opt-in.
Any LLM, voice, or avatar provider could change pricing or terms.
Model- and vendor-agnostic architecture makes switching a config change. Open-source models (Llama, Mistral) provide a cost floor; multi-vendor routing removes single-point dependency.
Rather than ask HEP to commit on the strength of a deck, we would rather show it working. The fastest path is a short, low-risk engagement where we stand up the interactive professor on a real program and let HEP and the partner institution judge it directly.
Proposed proof engagement: (1) stand up a working interactive professor on a slice of real course content, so HEP can see it first; (2) select the flagship program and target cohort; (3) build the curriculum and assessment inventory; (4) confirm the accreditation posture with the partner institution; (5) finalize the avatar tier and a fixed-fee pilot SOW.
Scope at a glance. Creative Integrations builds and operates: the platform, avatar delivery at scale, integrations, hosting, and QA tooling. HEP and the institution provide: course materials, faculty sign-off, and academic oversight. Detailed deliverables, milestones, and fixed-fee pricing live in a separate Statement of Work from the proof engagement, not in this document.