Skip to main content

The LLM Pricing War Is Hurting Education—and Startups

The LLM Pricing War Is Hurting Education—and Startups

Picture

Member for

1 year 1 month
Real name
Ethan McGowan
Bio
Ethan McGowan is a Professor of Financial Technology and Legal Analytics at the Gordon School of Business, SIAI. Originally from the United Kingdom, he works at the frontier of AI applications in financial regulation and institutional strategy, advising on governance and legal frameworks for next-generation investment vehicles. McGowan plays a key role in SIAI’s expansion into global finance hubs, including oversight of the institute’s initiatives in the Middle East and its emerging hedge fund operations.

Modified

Cheaper tokens made bigger bills 
The LLM pricing war squeezes startups and campuses 
Buy outcomes, route to small models, and cap reasoning

A single number illustrates the challenge we face: $0.07. This is the lowest cost per million tokens that some lightweight models achieved in late 2024, down from about $20 just 18 months earlier. Prices dropped significantly. However, technology leaders are reporting rising cloud bills that spike unexpectedly. University pilots that initially seemed inexpensive now feel endless. The paradox is straightforward. The LLM pricing war made tokens cheaper, but it also made it easy to use many more tokens, especially with reasoning-style models that think before they answer. Costs fell per unit but increased in total. Education buyers and AI startups are caught on the wrong side: variable usage, limited pricing power, and worried boards keeping an eye on budgets. Unless we change how we purchase and utilize AI, lower token prices will continue to result in higher bills.

We need to rethink the problem. The question is not “What is the lowest price per million tokens?” It is “How many tokens will the workflow use, who controls that number, and what happens when the model decides to think longer?” The LLM pricing war has shifted competition from price tags to hidden consumption. That is why techniques like prompt caching and model routing are now more critical than the initial price of a flagship model.

There’s another twist. Significant price cuts often conceal a shift in model behavior. New reasoning models add internal steps and reasoning tokens, with developer-set effort levels that can increase costs for identical prompts. Per-token fees may seem stable, but the total number of tokens does not. The billing line grows with every additional step.

The LLM pricing war: cheaper tokens, bigger bills

First, let’s look at the numbers. The Stanford AI Index reports a significant drop in inference prices for smaller, more efficient models, with prices as low as cents per million tokens by late 2024. However, campus and enterprise costs are trending in the opposite direction: surveys show a sharp increase in cloud spending as generative AI moves into production, with many IT leaders struggling to manage and control these costs. Both situations are actual. Prices fell, but bills grew. The cause is volume. As models become faster and cheaper, we give them more work. When we add reasoning, they generate many more tokens for each task. The curve rises again.

Figure 1:Prices per million tokens collapsed for GPT-3.5-equivalent tasks, yet total spend rose because volume and “reasoning” steps surged.

The mechanics turn this curve into a budget issue. Prompt caching can reduce input token costs by half when prompts are repeated, provided the cache hits, and only for the cached span. Reasoning models offer effort controls—low, medium, high—that change the hidden thought processes and, therefore, the bill. Providers now offer routers that select a more cost-effective model for simple tasks and a more robust one for more complex tasks. This represents progress, but it also serves as a reminder: governance is crucial. Without strong safeguards, the LLM pricing war leads to increased usage at the expense of efficiency and effectiveness. Cost figures mentioned are sourced from public pricing pages and documents; unit prices vary by region and date, so we use the posted figures as reference points.

How the LLM pricing war squeezes startups and campuses

Startups face the harshest math. Flat-rate consumer plans disappeared once users automated agents to operate continuously. Venture-backed firms that priced at cost to grow encountered runaway token burn as reasoning became popular. This resulted in consolidation. The inflection point shifted after its leaders and a significant portion of its team joined Microsoft. Adept’s founders and team moved to Amazon. These are not failures of science; they are failures of unit economics in a market where established companies can subsidize and manage workloads at scale. The LLM pricing war turns into a funding war the moment usage spikes.

Education buyers experience a similar squeeze. Many pilots initially experience limited and uneven financial impacts, while cloud costs remain unpredictable. Some industry surveys tout strong ROI, while others, including reports linked to MIT, find that most early deployments demonstrate little to no measurable benefit. Both can be accurate. Individual function-level successes exist, but overall value requires redesigning work, not just switching tools. For universities, this means aligning use cases with metrics such as “cost per graded assignment” or “cost per advising hour saved,” rather than just cost per token. We analyze various surveys and treat marketing-sponsored studies as directional while relying on neutral sources for trend confirmation.

The competitive landscape is shifting. Leaderboards now show open and regional players trading positions rapidly as Chinese and U.S. labs cut prices and release new products. Champions change from quarter to quarter. Even well-funded European players must secure large funding rounds to stay competitive. The LLM pricing war involves more than just price; it encompasses computing access, distribution, and time-to-market. For a university CIO, this constant change means procurement must assume switching—both technically and contractually—from the beginning.

Escaping the LLM pricing war: a policy playbook for education

The way out is governance, not heroics—first, purchase outcomes, not tokens. Contracts should link spending to specific services—such as graded documents, redlined pages, or resolved tickets—rather than raw usage. A writing assistant that charges per edited page aligns incentives; a metered chat endpoint does not. Second, demand transparency in routing. Suppose a vendor automatically switches to a more cost-effective model. In that case, that’s acceptable, but the contract must detail the baseline model, audit logs, and limits for reasoning effort. This turns “smart” routing into a controllable dial rather than a black box. Third, make cache efficiency a key performance indicator. If the average cache hit rate falls below an agreed threshold, renegotiate or switch providers. These steps transform the LLM pricing war from a hidden consumption issue into a manageable service.

Now for the implementation side. Universities should stick to small models and only upgrade when tests prove the need for it. For tutoring, classification, rubric-based grading, and basic drafting, the standard should be compact models, with strict budgets for more complex reasoning. A router that you control should enforce this standard. Cloud vendors now offer native prompt-routing that balances cost and quality; adopt it, but require model lists, thresholds, and logs. Pair this with a simple abstraction layer, allowing you to switch providers without rewriting all your applications. Recommendations for routing align with vendor documents and general financial operations principles; specific parameters depend on your technology stack.

Figure 2: Smart defaults (route down; cap “reasoning”; cache inputs) cut bills by ~77% versus sending everything to a reasoning model—without blocking occasional complex cases.

A narrow path to durable value

This situation is also a talent issue. Schools need a small FinOps-for-AI team that can enforce cost policies and stop unsafe routing. This team should operate between academic units and vendors, publish monthly cost/benefit reports, and manage cache and router metrics. Simple changes can help: lock prompt templates, condense context, favor retrieval over long histories, and establish strict limits on the number of tokens per session. These measures may seem mundane, but they save real money. They also make value measurable in ways that boards can trust.

On the vendor side, we should stop rewarding unsustainable pricing. If a startup’s quote seems “too good,” assume someone else is covering the costs. Inquire about how long the subsidy lasts, how the routing operates under load, and what happens when a leading model becomes obsolete. Include “time-to-switch” in the RFP and score it. Require escrowed red-team prompts and regression tests to ensure switching is possible without sacrificing safety or quality. For research labs, funders should allocate budget lines for test-time computing and caching, so teams do not conceal usage in student hours or through shadow IT.

There is reason for optimism. Some model families provide excellent value at a low cost per token, and the market is improving at directing simple prompts to smaller models. OpenAI, Anthropic, and others offer “effort” controls; when campuses set them to “low” by default, they reduce waste without compromising learning outcomes. The message is clear: the most significant savings do not come from waiting for the next price cut; they come from saying “no” to unbounded reasoning for routine tasks.

The final change is cultural. Faculty need guidance on when not to use AI. A course that grades with rubrics and short answers can function well with small models and concise prompts. An advanced coding lab may only require a heavier model for a few steps. A registrar’s chatbot needs to rely on cached flows before escalating to human staff. The goal is not to hinder innovation. It is to treat reasoning time like lab time—scheduled, capped, and justified by outcomes.

Returning to that initial number—$0.07 per million tokens—reveals the illusion it created. The LLM pricing war provided a headline that every CFO wanted to see. But the details reveal usage, and usage is elastic. If we continue to buy tokens instead of outcomes, budgets will continue to break as models think longer by design. Education leaders should adopt a new approach: prioritize cost control by default, manage reasoning effectively, cache resources, and contract for results. Startups should price transparently, resist flat-rate traps, and focus on service quality rather than subsidies. Following this strategy will help eliminate the paradox. Cheap tokens can transform from a trap into the foundation for affordable, equitable, and sustainable AI in our classrooms and labs.


The views expressed in this article are those of the author(s) and do not necessarily reflect the official position of the Swiss Institute of Artificial Intelligence (SIAI) or its affiliates.


References

Anthropic. (2024). Introducing Claude 3.5 Sonnet. Pricing noted at $3/M input and $15/M output tokens.
Artificial Analysis. (2025). LLM Leaderboard—Model rankings and price/performance.
AWS. (2025). Understanding intelligent prompt routing in Amazon Bedrock.
Axios. (2025). OpenAI releases o3-mini reasoning model.
Ikangai. (2025). The LLM Cost Paradox: How “Cheaper” AI Models Are Breaking Budgets.
Medium (Downes, J.). (2025). AI Is Getting Cheaper.
McKinsey & Company. (2025). Gen AI’s ROI (Week in Charts).
Microsoft Azure. (2024). Prompt caching—reduce cost and latency.
Microsoft Azure. (2025). Reasoning models—effort and reasoning tokens.
Mistral AI. (2025). Raises €1.7B Series C (post-money €11.7B).
Okoone. (2025). Why AI is making IT budgets harder to control.
OpenAI. (2024). API Prompt Caching—pricing overview.
OpenAI. (2025). API Pricing (fine-tuning and cached input rates).
OpenRouter. (2025). Provider routing—intelligent multi-provider request routing.
Stanford HAI. (2025). AI Index 2025, Chapter 1—Inference cost declines to ~$0.07/M tokens for some models.
Tangoe (press). (2024). GenAI drives cloud expenses 30% higher; 72% say spending is unmanageable.
TechCrunch. (2025). OpenAI launches o3-mini; reasoning effort controls.
TechRadar. (2025). 94% of ITDMs struggle with cloud costs; AI adds pressure.
The Verge. (2025). OpenAI’s upgraded o3/o4-mini can reason with images.
Tom’s Hardware. (2025). MIT study: 95% of enterprise gen-AI implementations show no P&L impact.

Picture

Member for

1 year 1 month
Real name
Ethan McGowan
Bio
Ethan McGowan is a Professor of Financial Technology and Legal Analytics at the Gordon School of Business, SIAI. Originally from the United Kingdom, he works at the frontier of AI applications in financial regulation and institutional strategy, advising on governance and legal frameworks for next-generation investment vehicles. McGowan plays a key role in SIAI’s expansion into global finance hubs, including oversight of the institute’s initiatives in the Middle East and its emerging hedge fund operations.

AI Labor Cost Is the New Productivity Shock in Education

AI Labor Cost Is the New Productivity Shock in Education

Picture

Member for

1 year 2 months
Real name
Keith Lee
Bio
Keith Lee is a Professor of AI and Data Science at the Gordon School of Business, part of the Swiss Institute of Artificial Intelligence (SIAI), where he leads research and teaching on AI-driven finance and data science. He is also a Senior Research Fellow with the GIAI Council, advising on the institute’s global research and financial strategy, including initiatives in Asia and the Middle East.

Modified

AI labor cost has collapsed, making routine knowledge work pennies
Schools should meter tokens, track accepted outputs, and redirect savings to student time
Contract for pass-through price drops and keep human judgment tasks off-limits

The price of machine work has dropped faster than most education leaders understand. In 2024, many firms paid around $10 per million tokens to automate text tasks using AI. By March 2025, typical rates of about $2.50 were standard, marking a 75% decrease. On some major platforms, the price is now as low as $0.10 per million input tokens and $0.40 per million output tokens. This enables a variety of routine writing, summarizing, and coding tasks to be completed for just a few cents each at scale. This is not about impressive demonstrations; it’s about costs. When a fundamental input for white-collar work becomes so inexpensive, it acts like a sudden wage cut for specific tasks across the economy. This sudden and significant decrease in the cost of AI labor is what we refer to as the 'AI labor cost shock'. For education systems that heavily invest in knowledge work—such as curriculum development, administrative services, IT support, and student services—the budget is affected first, well before teaching methods catch up.

AI labor cost is the productivity shock we’re overlooking

Macroeconomists have observed that AI innovation operates as a supply push, increasing output and lowering prices as total productivity improves over several years. This macro view is essential for schools, colleges, and education vendors because it shows the connection: productivity gains involve not just more innovative tools but also cheaper task hours. The AI labor cost channel, a term we use to describe the direct impact of AI on reducing costs for routine, text-based tasks, such as drafting policies, answering tickets, cleaning data, writing job postings, or generating preliminary code, is a key aspect to understand. Recent studies demonstrate the impact of applying these tools in real-world work settings. In customer support, a generative AI assistant improved the number of issues resolved per hour by about 15% on average, with gains exceeding 30% for less-experienced staff. In controlled writing assignments, time decreased by approximately 40% while quality improved. These findings aren’t isolated cases; they prove that specific, clearly defined tasks are already experiencing lower costs comparable to wages.

Figure 1: AI innovation behaves like a positive supply shock: industrial output and TFP rise over several years while price pressure eases, consistent with falling unit task costs.

Examining costs also informs the discussion of equity. Labor is the primary input for knowledge production. In sectors that rely heavily on research and development, over two-thirds of expenses are allocated to labor compensation; in the broader U.S. nonfarm business sector, labor’s share of income remained close to its long-term average through mid-2025. If AI labor costs rapidly decline for our most common tasks—editing, synthesizing, answering questions, coding—the basic expectation is that early adopters will see profit margins expand, followed by price pressure as competitors catch up. Educational institutions serve as both buyers and producers, purchasing services and creating curricula, assessments, and large-scale student support. Being mindful of costs is essential; it determines whether AI helps expand quality and access or whether savings are lost through widespread discounts.

AI labor cost in classrooms, back offices, and budgets

The first benefits show up where outputs can be standardized and reviewed. Support chats for students, financial aid Q&A, updates to IT knowledge bases, drafting syllabus templates, creating boilerplate for grants, and generating initial code for data dashboards all fit this mold. Here, the AI labor cost story is straightforward: pay per token, track usage, and measure costs per accepted output. Public pricing makes budgeting manageable. One major vendor currently lists $0.15 per million input tokens in a low-cost tier; another offers $0.10 per million input tokens in an even cheaper tier. With the use of prompt libraries and caching, marginal costs can be further reduced. A practical note: track three metrics for each case—tokens for accepted outputs, acceptance rates after human review, and staff time saved compared to the baseline. The policy shift should move from “hours budgeted” to “accepted outputs per euro,” allowing humans to focus on exceptions and judgments.

However, not every human hour can be easily replaced. New evidence from Carnegie Mellon in 2025 highlights the limitations of replacing humans with language models in qualitative research roles. When researchers attempted to use models as study participants, the results lacked clarity, omitted context, and raised concerns about consent. In software engineering, research has also shown that models can mimic human reviewers on specific coding tasks, but only in tightly controlled situations with clear guidelines. The lesson for education is clear: AI labor cost can take over routine, defined tasks that fit templates, but it should not replace student voices, personal experiences, or ethical inquiry. Procurement policies must establish clear boundaries to protect tasks that involve human judgment, emphasizing the value and integral role of human judgment tasks in the process.

Budgets should also account for price fluctuations. A price war is on the horizon: one major competitor cut off-peak API rates by up to 75% in 2025, prompting established companies to respond with cheaper “flash” or “mini” tiers and larger context windows. Yet costs don’t only decrease. As workflows become more automated, usage can increase significantly, and heavy users may exceed their flat-rate plans. For universities testing automated coding teachers or bulk document processing, this means two controls are crucial: caps on usage at the account level and policies for managing workflows effectively when those caps are reached. Treat AI labor costs as a market rate that could rise or fall based on features, rather than a permanent discount. This strategic approach to managing AI labor costs enables you to maintain control over your budget and operations, ensuring a more effective and efficient use of resources.

AI labor cost, prices, and what’s next

If education vendors experience significant increases in profit margins, will prices for services drop? Macro evidence indicates that AI innovation leads to decreases in consumer prices over time as productivity increases take effect. However, the timing hinges on market structure. In competitive areas, such as content localization, transcription, and large-scale assessments, price cuts are likely to occur sooner. In concentrated markets, savings may be redirected to product development before they reach buyers. For public systems, a more effective approach is to include AI labor cost metrics in contracts that specify prices for accepted items, allowed model types, cache hit ratios, and clauses to adjust for decreases in token prices. This turns unpredictable tech changes into manageable economic factors, offering a hopeful outlook for the future.

Finally, let’s consider the workforce. Most productivity gains so far have benefited less-experienced workers who adopt AI tools, consistent with a catch-up narrative. This supports a training strategy that targets the first two years on the job, focusing on training in prompt patterns, review checklists, and judgment exercises that enhance tool output to meet institutional standards. However, the risks associated with exposure are uneven. Analyses from the OECD and the ILO indicate that lower-education jobs and administrative roles, which women disproportionately hold, are at a higher risk of automation. Responsible adoption means redeploying staff instead of discarding them: retaining human-centered work where empathy, discretion, and context are essential, and supporting these positions with savings from tasks that AI can automate.

Figure 2: Without policy, AI gains skew wealth upward—top 10% share rises as the bottom 50% slips—so contracts should pass savings through to wages, training, and student services.

Toward a practical cost strategy

The shift in perspective is clear: stop questioning whether AI is “good for education” in general and start examining where AI labor cost can enhance access and quality for every euro spent. Begin with three immediate actions. First, redesign workflows so that models handle the routine tasks while people provide oversight. Use the evidence from writing and support as a benchmark. If a pilot isn’t demonstrating double-digit time savings or quality improvements upon review, adjust the workflow or terminate the pilot. Create dashboards that track accepted outputs per 1,000 tokens and the time saved through human review for each unit. Always compare these numbers to a consistent pre-AI baseline to avoid shifting targets.

Second, approach purchases like a CFO rather than a lab. Set maximum limits on monthly tokens, require vendors to disclose which model families and pricing tiers they offer, and automatically review prices when public rates drop by a specified amount. This makes enforcing contracts easier. Combine prompt caching with lower-tier models for drafts and higher-tier reviews for final outputs; this blended AI labor cost will outperform single-tier spending while maintaining quality. Include limits for any workflow that begins to make too many calls and risks exceeding budget limits.

Third, draw clear lines on tasks that cannot be replaced. The findings from Carnegie Mellon serve as a cautionary example: using language models in place of human participants muddies what we value. In schools, this applies to counseling, providing qualitative feedback on assignments connected to identity, and engaging with the community. Keep these human. Assign AI to logistics, drafts, and data preparation. In software education, models can act as code reviewers under established guidelines. However, students still need to articulate their intent and rationale verbally. The guiding principle should be that when the task requires judgment, AI labor cost should not dictate your purchasing decisions.

These decisions are made within a broader macro context. As AI innovation increases productivity and lowers prices, specific sectors are expected to witness higher wages and increased hiring. In contrast, others will experience higher turnover rates. For public education systems, this is a design decision. Use contracts and budgets to prioritize savings for teaching time, tutoring services, and student support. Allocate funds for small-group instruction by utilizing the hours saved from paperwork handled by AI. Invest in staff training so that the most significant gains—those benefiting new workers who access practical tools—also support early-career teachers and advisors rather than just central offices.

A budget is a moral document. Use the savings for students

We return to the initial insight. Prices for machine text work have plummeted at key tiers, and the typical effort required for white-collar tasks—like editing, summarizing, or drafting—now costs mere pennies at scale. This is the AI labor cost shock. Macro data indicate that productivity improvements can lead to increased output and lower prices over time; micro studies reveal that targeted task substitutions already save time and enhance quality; ethical research notes that substitutions have firm limits where human voices and consent are concerned. Taken together, the policy is clear. Treat AI as a measured labor input. Track accepted outputs instead of hype. Include clauses to capture price declines in contracts. Safeguard tasks that require judgment. And focus the saved resources where they matter most: human attention on learning. If done correctly, education can transform a groundbreaking technology into a quiet revolution in costs, access, and quality—one accepted output at a time.


The views expressed in this article are those of the author(s) and do not necessarily reflect the official position of the Swiss Institute of Artificial Intelligence (SIAI) or its affiliates.


References

Acemoglu, D. (2024). The Simple Macroeconomics of AI. NBER Working Paper 32487.
Anthropic. (2025a). Pricing.
Anthropic. (2025b). Web search on the Anthropic API.
Business Insider. (2025, Aug.). ‘Inference whales’ are eating into AI coding startups’ business model. (accessed 1 Oct 2025).
Carnegie Mellon University, School of Computer Science. (2025, May 6). Can Generative AI Replace Humans in Qualitative Research Studies? News release.
Federal Reserve Bank of St. Louis (FRED). (2025). Nonfarm Business Sector: Labor Share for All Workers (Index 2017=100). (updated Sept. 4, 2025).
Gazzani, A., & Natoli, F. (2024, Oct. 18). The macroeconomic effects of AI innovation. VoxEU (CEPR).
Google. (2025). Gemini 2.5 pricing overview.
International Labour Organization. (2025, May 20). Generative AI and Jobs: A Refined Global Index of Occupational Exposure. (accessed 1 Oct 2025).
Noy, S., & Zhang, W. (2023). Experimental evidence on the productivity effects of generative artificial intelligence. Science, 381(6654).
OpenAI. (2025a). Platform pricing.
OpenAI. (2025b). API pricing (fine-tuning and scale tier details). (accessed 1 Oct 2025).
Ramp. (2025, Apr. 15). AI is getting cheaper. Velocity blog. (accessed 1 Oct 2025).
Reuters. (2025, Feb. 26). DeepSeek cuts off-peak pricing for developers by up to 75%. (accessed 1 Oct 2025).
U.S. Bureau of Labor Statistics. (2025, Mar. 21). Total factor productivity increased 1.3% in 2024. Productivity program highlights. (accessed 1 Oct 2025).
Wang, R., Guo, J., Gao, C., Fan, G., Chong, C. Y., & Xia, X. (2025). Can LLMs Replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in Software Engineering. arXiv:2502.06193.

Picture

Member for

1 year 2 months
Real name
Keith Lee
Bio
Keith Lee is a Professor of AI and Data Science at the Gordon School of Business, part of the Swiss Institute of Artificial Intelligence (SIAI), where he leads research and teaching on AI-driven finance and data science. He is also a Senior Research Fellow with the GIAI Council, advising on the institute’s global research and financial strategy, including initiatives in Asia and the Middle East.

AI Productivity in Education: Real Gains, Costs, and What to Do Next

AI Productivity in Education: Real Gains, Costs, and What to Do Next

Picture

Member for

1 year 2 months
Real name
Keith Lee
Bio
Keith Lee is a Professor of AI and Data Science at the Gordon School of Business, part of the Swiss Institute of Artificial Intelligence (SIAI), where he leads research and teaching on AI-driven finance and data science. He is also a Senior Research Fellow with the GIAI Council, advising on the institute’s global research and financial strategy, including initiatives in Asia and the Middle East.

Modified

AI productivity in education is real but uneven and adoption is shallow
Novices gain most; net gains require workflow redesign, training, and guardrails
Measure time returned and learning outcomes—not hype—and scale targeted pilots

The most relevant number we have right now is small but significant. In late 2024 surveys, U.S. workers who used generative AI saved about 5.4% of their weekly hours. Researchers estimate this translates to approximately a 1.1% increase in productivity across the entire workforce. This is not a breakthrough, but it is also not insignificant. For a teacher or instructional designer working a 40-hour week, this saving amounts to just over two hours weekly, assuming similar patterns continue. The key question for AI productivity in education is not whether the tools can create rubrics or outline lessons, as they can. Instead, it's whether institutions will change their processes so those regained hours lead to better feedback, stronger curricula, and fairer outcomes, without introducing new risks that offset the gains. The answer depends on where we look, how we measure, and what we decide to focus on first.

AI productivity in education is inconsistent and not straightforward

Most headlines suggest that advancements benefit everyone. Early evidence, however, points to a bumpier road. In a randomized rollout at an extensive customer support operation, access to a generative AI assistant increased agent productivity by approximately 14% to 15% on average, with the most significant improvements observed among less-experienced workers. This pattern is essential for AI productivity in education. When novice teachers, new TAs, or early-career instructional staff have structured AI support, their performance aligns more closely with that of experienced educators, offering a beacon of hope in the journey of AI integration in education. But in areas outside the model's strengths—tasks that require judgment, unique contexts, and local nuances—AI can mislead or even hinder performance. Field experiments with consultants show the same inconsistent results: strong improvements on well-defined tasks, and weaker or adverse effects on more complex problems. The takeaway is clear. We will see significant wins in specific workflows, not universally, and the most considerable initial benefits will be realized by "upper-junior" staff and students who need the most support.

The extent of adoption is another barrier. U.S. survey data indicate that generative AI is spreading quickly overall. Still, only a portion of workers use it regularly for their jobs. One national study found that 23% of employed adults used it for work at least once in the previous week. OpenAI's analysis suggests that about 30% of all ChatGPT usage is work-related, with the remainder being personal. In educational settings, this divide is evident as faculty and students test tools for minor tasks. At the same time, core course design and assessment remain unchanged. If only a minority use AI at work and even fewer engage deeply, system-wide productivity barely shifts. This isn't a failure of the technology; it signals that policy should focus on encouraging deeper use in the workflows that matter most for learning and development.

Figure 1: Adoption is broad but shallow: only 28% used generative AI for work last week, and daily work users are just 10.5%—depth, not headlines, will move campus productivity.

Improving AI productivity in education needs more than tools

The basic technology is advancing rapidly, but AI productivity in education relies on several key factors: high-quality data, redesigned workflows, practical training, and robust safeguards. The Conversation's review of public-sector implementations is clear: productivity gains exist, but they require significant effort and resources to achieve. Integration costs, oversight, security, and managing change consume time and funds. These aren't extras; they determine whether saved minutes translate into better teaching or are lost to additional work. In software development, controlled studies have shown significant time savings—developers complete tasks approximately 55% faster with AI pair programmers when tasks are well-defined and structured. However, organizations only realize these gains when they standardize processes, document prompts, and improve code review. Education is no different. To turn drafts into tangible outcomes, institutions need shared templates, model "playbooks," and clear guidelines for uncertain situations, providing reassurance and guidance throughout the AI integration process.

Figure 2: The gains cluster in routinized tasks—writing, search, documentation—pointing schools to target formative feedback, item banks, and admin triage where AI complements judgment.

Costs and risk management also influence the rate of adoption. Hallucinations can be reduced with careful retrieval and structured prompts, but they won't disappear completely. Privacy regulations limit what student data can be processed by a model. Aligning curricula takes time and careful design. These challenges help explain why national productivity hasn't surged despite noticeable AI adoption. In the U.S., labor productivity grew about 2.3% in 2024 and 1.5% year-over-year by Q2 2025—an encouraging uptick after a downturn, but far from a substantial AI-driven change. This isn't a judgment on the future of education with AI; it reflects the context. The macro trend is improving, but significant gains will come from targeted, well-managed deployments in key educational processes, rather than blanket approaches.

Assess AI productivity in education by meaningful outcomes, not hype.

We should rethink the main question. Instead of asking, "Has productivity really increased?", we should ask, "Where, for whom, and at what total cost?" For AI productivity in education, three outcome areas matter most. First, time is saved on low-stakes tasks that can be redirected toward feedback and student interaction. Second, measurable improvements in assessment quality and course completion rates for at-risk learners. Third, institutional resilience: fewer bottlenecks in student services, less variability across sections, and shorter times from evidence to course updates. The best evidence we have suggests that when AI assists novices, the performance gap decreases. This presents a policy opportunity: target AI at bottlenecks for early-career instructors and first-generation students, and design interventions that allow the "easy" time savings to offset the "hard" redesign work that follows.

Forecasts should be approached cautiously. The Penn Wharton Budget Model predicts modest, non-linear gains from generative AI for the broader economy, with more potent effects expected in the early 2030s before diminishing as structures adapt. Applied to campuses, the lesson is clear. Early adopters who redesign workflows will capture significant benefits first; those who lag will experience smaller, delayed returns and may end up paying more for retrofits. That's why it's essential to measure outcomes: hours returned to instruction, reductions in grading variability, faster support for students who fall behind, and documented error rates in AI-assisted outputs. If we can't track these, we're not managing productivity; we're just guessing. This emphasis on measuring outcomes instills a sense of responsibility and accountability in the audience, encouraging them to participate actively in the AI integration process.

A practical agenda for the next 18 months

The way forward begins with focus. Identify three workflows where AI productivity in education can increase both time and quality: formative feedback on drafts, generating aligned practice items with explanations, and triaging student services. In each, establish what the "gold standard" looks like without AI, then insert the model where it can replace repetitive tasks and support decision-making, not replace it altogether. Use specific retrieval for course-related content to minimize hallucinations. Establish a firm guideline: anything high-stakes—such as final grades or progression decisions—requires human review. Document this and provide training. Show the first improvements in returned time to instructors and faster responses for students. Evidence, not excitement, should guide the next wave of AI use.

Procurement should reward complementary tools. Licenses must include organized training, prompt libraries linked to the learning management system, and APIs for safe retrieval from approved course repositories. Create incentives for teams to share their workflows—how they prompt, review, and what they reject—so that knowledge builds across departments. Start with small, cross-functional pilot projects: a program lead, a data steward, two instructors, a student representative, and an IT partner. Treat each pilot as a mini-randomized controlled trial: define the target metric, gather a baseline, run it for a term, and publish a brief report on methods. This is how AI productivity in education transforms from a vague promise into a manageable, repeatable process.

Measurement must accurately reflect costs—track computing and licensing expenses, as well as the "hidden" labor involved in redesigning and reviewing. If a course saves ten instructor hours per week on drafting but adds six hours for quality control because prompts deviate, the net gain is four hours. That is still a win, but smaller, and it points to the following fix: stabilize prompts, use drafts to teach students to critique AI outputs, and automate permitted checks. Where effect sizes are uncertain, borrow from labor-market studies by measuring not only the outputs created but also the hours saved and reductions in variability. Suppose novices close the gap with experts in rubric-based grading or writing accuracy. In that case, the benefits will be seen in more consistent learning experiences and higher progression rates for historically struggling students.

Finally, maintain control of the narrative while grounding it in reality. Macro numbers will fluctuate—quarterly productivity does this—and bold claims will continue to emerge. Maintain a close connection between campus evidence and policy if pilot projects show a steady two-hour weekly return per instructor without a decline in quality, scale that up. If error rates increase in certain classes, pause to address retrieval or assessment design issues before expanding the scope of the intervention. Use clear method notes in your reports. If adoption lags, don't blame reluctance; instead, look for gaps in workflows and training. The economies that benefit most from AI are not the loudest; they are the ones that effectively pair technology with process and people, all while learning in public. This is how AI productivity in education becomes a reality and a lasting impact.

We started with a modest figure: a 1.1% productivity boost at the workforce level, driven by a 5.4% time savings among users. Detractors might view this as lackluster. However, in education, it is enough to alter the baseline if we consider it working capital—time we reinvest into providing feedback, improving course clarity, and enhancing student support. The evidence shows us where the gains begin: at the "upper-junior" level, in routine tasks that free up expert time, and in redesigns that establish strong practices as standard. The risks are real, and the costs are not trivial. But we can set the curve. If we align incentives to deepen use in a few impactful workflows, purchase complementary tools instead of just licenses, and measure what students and instructors truly gain, the small increases will add up. That is the vital productivity story of the day. It's not about a headline figure. It's about the week-by-week time returned to the work that only educators can do.


The views expressed in this article are those of the author(s) and do not necessarily reflect the official position of the Swiss Institute of Artificial Intelligence (SIAI) or its affiliates.


References

Bick, A., Blandin, A., & Mertens, K. (2024). The Rapid Adoption of Generative AI. NBER Working Paper w32966 / PDF. (Adoption levels; work-use share.)
Bureau of Labor Statistics. (2025). Productivity and Costs, Second Quarter 2025, Revised; and related Productivity home pages. (U.S. productivity growth, 2024–2025.)
Brynjolfsson, E., Li, D., & Raymond, L. (2025). Generative AI at Work. Quarterly Journal of Economics 140(2), 889–944; and prior working papers. (14–15% productivity gains; largest effects for less-experienced workers.)
Dell’Acqua, F., McFowland III, E., Mollick, E. R., et al. (2023). Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality. HBS Working Paper; PDF. (Heterogeneous effects; frontier concept.)
Noy, S., & Zhang, W. (2023). Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence. Science (2023) and working paper versions. (Writing task productivity, quality effects.)
OpenAI. (2025). How people are using ChatGPT. (Share of work-related usage ~30%.)
Penn Wharton Budget Model. (2025). The Projected Impact of Generative AI on Future Productivity Growth. (Modest, non-linear macro effects over time.)
St. Louis Fed. (2025). The Impact of Generative AI on Work Productivity. (Users save 5.4% of hours; ~1.1% workforce productivity.)
The Conversation / University of Melbourne ADM+S. (2025). Does AI really boost productivity at work? Research shows gains don’t come cheap or easy. (Integration costs, governance, and risk.)
GitHub / Research. (2023–2024). The Impact of AI on Developer Productivity. (Task completion speedups around 55% in bounded tasks.)

Picture

Member for

1 year 2 months
Real name
Keith Lee
Bio
Keith Lee is a Professor of AI and Data Science at the Gordon School of Business, part of the Swiss Institute of Artificial Intelligence (SIAI), where he leads research and teaching on AI-driven finance and data science. He is also a Senior Research Fellow with the GIAI Council, advising on the institute’s global research and financial strategy, including initiatives in Asia and the Middle East.

Education and the AI Bubble: Talk Isn't Transformation

Education and the AI Bubble: Talk Isn't Transformation

Picture

Member for

1 year 2 months
Real name
Keith Lee
Bio
Keith Lee is a Professor of AI and Data Science at the Gordon School of Business, part of the Swiss Institute of Artificial Intelligence (SIAI), where he leads research and teaching on AI-driven finance and data science. He is also a Senior Research Fellow with the GIAI Council, advising on the institute’s global research and financial strategy, including initiatives in Asia and the Middle East.

Modified

The AI bubble rewards talk more than results
Schools should pilot, verify, and buy only proven gains using LRAS and total-cost checks
Train teachers, price energy and privacy, and pay only for results that replicate

A single number should make us pause: 287. That's how many S&P 500 earnings calls in one quarter mentioned AI, the highest in a decade and more than double the five-year average. However, analysts note that for most companies, profits directly linked to AI are rare. This situation highlights a classic sign of an AI bubble. Education is proper in the middle of it. Districts are getting pitches that reflect market excitement. If stock prices can rise based on just words, so can school budgets. We cannot let talk replace real change. The AI bubble must not become our spending plan. The first rule is simple: talk does not equal transformation; improved learning outcomes do. The second is urgent: establish strict criteria before making large expenditures. If we make a mistake, we risk sacrificing valuable resources for headlines and later face parents with explanations for why the results never materialized. The AI bubble is real, and schools must avoid inflating it. Setting high standards for AI adoption is crucial, and it's our commitment to excellence and quality that will guide us in this journey.

The AI bubble intersects with the classroom

We need to rethink the discussion around incentives—markets reward mentions of AI. Schools might imitate this behavior, focusing on flashy announcements instead of steady progress. It's easy to find evidence of hype. FactSet shows a record number of AI references on earnings calls. The Financial Times and other sources report that many firms still struggle to clearly articulate the benefits in their filings, despite rising capital spending. At the same time, the demand for power in AI data centers is expected to more than double by 2030, with the IEA estimating global data-center electricity use to approach 945 TWh by the end of the decade. These are the real costs of pursuing uncertain benefits. When budgets tighten, education is often the first to cut long-term investments, such as teacher development and student support, in favor of short-term solutions. That is the bubble's logic. It rewards talk while postponing proof.

Figure 1: Post-ChatGPT, AI exposure and upbeat sentiment spike—especially in IT—while “risk” barely moves. Talk outruns evidence.

But schools are not standing still. In the United States, the number of districts training teachers to use AI nearly doubled in a year, from 23% to 48%. However, the use among teachers is still uneven. Only about one in four teachers reported using AI tools for planning or instruction in the 2023-24 school year. In the UK, the Department for Education acknowledges the potential of AI but warns that evidence is still developing. Adoption must ensure safety, reliability, and teacher support. UNESCO's global guidance offers a broader perspective: proceed cautiously, involve human judgment, protect privacy, and demand proof of effectiveness. This approach is appropriate for a bubble. Strengthen teacher capacity and establish clear boundaries before scaling up purchases. Do not let vendor presentations replace classroom trials. Do not invest in "AI alignment" if it doesn't align with your curriculum. Thorough evaluation is key before scaling up AI investments, and it's our responsibility to ensure it's done diligently.

The macro signals send mixed messages. On the one hand, investors are pouring money into infrastructure, while the press speculates about a potential AI bubble bursting. On the other hand, careful studies report productivity gains under the right conditions. A significant field experiment involving 758 BCG consultants found that access to GPT-4 improved output and quality for tasks within the model's capabilities, but performance declined on tasks beyond its capabilities. MIT and other teams report faster, better writing on mid-level tasks; GitHub states that completion times with Copilot are 55% faster in controlled tests. Education must navigate both truths. Gains are real when tasks fit the tool and the training is robust. Serious risks arise if errors go unchecked or when the task is inappropriate. The bubble grows when we generalize from narrow successes to broad changes without verifying whether the tasks align with schoolwork.

From hype to hard metrics: measuring the AI bubble's learning ROI

The main policy mistake is treating AI like a trend rather than a learning tool. We should approach it in the same way we do any educational resource. First, define the learning return on AI investment (LRAS) as the expected learning gains or verified teacher hours saved per euro, minus the costs of training and integration. Keep it straightforward. Imagine a district is considering a €30 monthly license per teacher. If the tool reliably saves three teacher hours each week and the loaded hourly cost is €25, the time savings alone amount to €300 per teacher per term. This looks promising—if it's validated within your context, rather than based on vendor case studies. Measurement method: track time saved with basic time-motion logs and random spot checks; compare with student outcomes where relevant; adjust self-reports by 25% to account for optimism bias.

This approach also applies to student learning. A growing body of literature suggests that well-structured AI tutors can enhance outcomes. Brookings highlights randomized studies showing that AI support led to doubled learning gains compared to strong classroom models; other trials indicate that large language model assistants help novice tutors provide better math instruction. However, the landscape is uneven. The BCG field experiment cautions that performance declines when tasks exceed the model's strengths. In a school context, utilize AI for drafting rubrics, generating diverse practice problems, and identifying misunderstandings; however, verify every aspect related to grading and core content. Require specific outcome measures for pilot programs—such as effect sizes on unit tests or reductions in regrade requests—and only scale up if the improvements are consistent across schools.

Now consider the system costs. Data centers consume power; power costs money. The IEA forecasts that global data-center electricity use could more than double by 2030, with AI driving much of this growth. Local impacts are significant. Suppose your region faces energy limitations or rising costs. In that case, AI services might come with hidden "energy taxes" reflected in subscription fees. The Uptime Institute reports that operators are already encountering power limitations and rising costs due to the demand for AI. A district that commits to multi-year contracts during an excitement phase could lock in higher prices just as the market settles.

Finally, compare market signals with what's happening on the ground. FactSet indicates a record-high number of AI mentions; Goldman Sachs notes a limited direct profit impact so far; The Guardian raises questions about the dynamics of the bubble. In education, HolonIQ is tracking a decline in ed-tech venture funding in 2024, the lowest since 2014, despite an increase in AI discussions. This disparity illustrates a clear point. Talk is inexpensive; solid evidence is costly. If investment follows the loudest trends while schools chase the noisiest demos, we deepen the mistake. A better approach is to conduct narrow pilots, evaluate quickly, and scale carefully.

Figure 2: 2023 gains concentrate in early-exposed firms; GenAI explains ~40% of their returns—narrow breadth beneath the hype.

A better approach than riding the AI bubble

Prioritize outcomes in procurement. Use request-for-proposal templates that require vendors to clearly define the outcome they aim to achieve, specify the unit of measurement they will use, and outline the timeline they will follow. Implement a step-by-step rollout across schools: some classrooms utilize the tool while others serve as controls, then rotate. Keep the test short, transparent, and equitable. Insist that vendors provide raw, verifiable data and accept external evaluations. Consider dashboards as evidence only if they align with independently verified metrics. This is not red tape; it's protection against hype. UK policy experiments are shifting towards this approach, emphasizing a stronger evidence base and guidelines that prioritize safety and reliability. UNESCO's guidance is explicit: human-centered, rights-based, evidence-driven. Include that in every contract.

Prepare teachers before expanding tool usage. RAND surveys indicate forward movement alongside gaps. Districts have doubled their training rates year over year, but teacher use remains uneven, and many schools lack clear policies. The solution is practical. Provide short, scenario-based workshops linked to essential routines, including planning, feedback, retrieval practice, and formative assessments. Connect each scenario to what AI excels at, what it struggles with, and what human intervention is necessary. Use insights from the BCG framework: workers performed best with coaching, guidelines, and explicit prompts. Include a "do not do this" list on the same page. Then align incentives. Acknowledge teams that achieve measurable improvements and simplify their templates for others to follow.

Address energy and privacy concerns from the outset. Require vendors to disclose their data retention practices, training usage, and model development; select options that allow for local or regional processing and provide clear procedures for data deletion. Include energy-related costs in your total cost of ownership, because the IEA and others anticipate surging demand for data centers, and operators are already reporting energy constraints. This risk might manifest as higher costs or service limitations. Procurement should factor this in. For schools with limited bandwidth or unreliable power, offline-first tools and edge computing can be more reliable than always-online chatbots. If a tool needs live connections and heavy computing, prepare fallback lessons in advance.

A steady transformation

Anticipate the main critique. Some may argue we're underestimating the potential benefits of AI and that it could enhance productivity growth across the economy. The OECD's 2024 analysis estimates AI could raise aggregate TFP by 0.25-0.6 percentage points a year in the coming years, with labor productivity gains being somewhat higher. This is not bubble talk; it represents real potential. Our response is not to slow down unnecessarily but to speed up in evaluating what works. When reliable evidence emerges—such as an AI assistant that consistently reduces grading time by a third without increasing errors, or a tutor that achieves a 0.2-0.3 effect size over a term—we should adopt it, support it, and protect the time it saves. We aim for acceleration, not stagnation.

A second critique suggests schools fall behind if they wait for perfect evidence. That is true, but it doesn't represent our proposal. The approach is to pilot, validate, and then expand. A four-week stepped-wedge trial doesn't indicate paralysis; it shows momentum while retaining lessons learned. It reveals where the frontier lies in our own context. The findings on the "jagged frontier" illustrate why this is crucial: outputs improve when tasks align with the tool, and fall short when they don't. The more quickly we identify what works for each subject and grade, the more rapidly we can expand successes and eliminate failures. This is how we prevent investing in speed without direction.

A third critique may assert that the market will resolve these issues. That is wishful thinking within a bubble. In public services, the costs of mistakes are shared, and the benefits are localized. If markets reward mentions of AI regardless of the outcome, schools must do the opposite. Reward outcomes, irrespective of how much they are discussed. Ed-tech funding trends have already decreased since the peak in 2021, even as conversations about AI grow louder. This discrepancy serves as a warning. Build capacity now. Train teachers now. Create contracts that compensate only for measured improvements—design effective and impactful audits that drive meaningful change. The bubble may either burst or mature. In either case, schools that focus on outcomes will be fine. Those who do not will be left with bills and no visible gains.

Let's return to the initial number. Two hundred eighty-seven companies discussed AI in one quarter. Talk is effortless. Education requires genuine effort. The goal is to convert tools into time and time into learning. This means we must set high standards while keeping it straightforward: establish clear outcomes, conduct short trials, ensure accessible data, provide teacher training, and account for total costs, including energy and privacy considerations. We must align the jagged frontier with classroom tasks and resist broad claims. We need to build systems that develop slowly but scale quickly when proof arrives. The AI bubble invites us to purchase confidence. Our students need fundamental skills.

So, we change how we buy. We invest in results. We connect teachers with tools that demonstrate value. We do not hinder experimentation, but we are strict about what we retain. If the market values words, schools must prioritize evidence. The measure of our AI decisions will not be the number of mentions in reports or speeches. It will be the quiet improvement in a student's skills, the extra minutes a teacher gains back, and the budget allocations that support learning. Talk is not transformation. Let's transform the only thing we invest in.


The views expressed in this article are those of the author(s) and do not necessarily reflect the official position of the Swiss Institute of Artificial Intelligence (SIAI) or its affiliates.


References

Aldasoro, I., Doerr, S., Gambacorta, L., & Rees, D. (2024). The impact of artificial intelligence on output and inflation. Bank for International Settlements/ECB Research Network.
Boston Consulting Group (2024). GenAI increases productivity & expands capabilities. BCG Henderson Institute.
Business Insider (2025). Everybody's talking about AI, but Goldman Sachs says it's still not showing up in companies' bottom lines.
Carbon Brief (2025). AI: Five charts that put data-centre energy use and emissions into context.
FactSet (2025). Highest number of S&P 500 earnings calls citing "AI" over the past 10 years.
GitHub (2022/2024). Measuring the impact of GitHub Copilot.
Guardian (2025). Is the AI bubble about to burst – and send the stock market into freefall?
HolonIQ (2025). 2025 Global Education Outlook.
IEA (2025). Energy and AI: Energy demand from AI; AI is set to drive surging electricity demand from data centres.
MIT Economics / Noy, S., & Zhang, W. (2023). Experimental evidence on the productivity effects of generative AI (Science; working paper).
OECD (2024). Miracle or myth? Assessing the macroeconomic productivity gains from AI.
RAND (2025). Uneven adoption of AI tools among U.S. teachers; More districts are training teachers on AI.
UK Department for Education (2025). Generative AI in education: Guidance.
UNESCO (2023, updated 2025). Guidance for generative AI in education and research.
Uptime Institute (2025). Global Data Center Survey 2025 (executive summaries and coverage).
Harvard Business School / BCG (2023). Dell’Acqua, F., et al. Navigating the jagged technological frontier: Field experimental evidence… (working paper).

Picture

Member for

1 year 2 months
Real name
Keith Lee
Bio
Keith Lee is a Professor of AI and Data Science at the Gordon School of Business, part of the Swiss Institute of Artificial Intelligence (SIAI), where he leads research and teaching on AI-driven finance and data science. He is also a Senior Research Fellow with the GIAI Council, advising on the institute’s global research and financial strategy, including initiatives in Asia and the Middle East.

AI Energy Efficiency in Education: The Policy Lever to Bend the Power Curve

AI Energy Efficiency in Education: The Policy Lever to Bend the Power Curve

Picture

Member for

1 year 1 month
Real name
Catherine Maguire
Bio
Catherine Maguire is a Professor of Computer Science and AI Systems at the Gordon School of Business, part of the Swiss Institute of Artificial Intelligence (SIAI). She specializes in machine learning infrastructure and applied data engineering, with a focus on bridging research and large-scale deployment of AI tools in financial and policy contexts. Based in the United States (with summer in Berlin and Zurich), she co-leads SIAI’s technical operations, overseeing the institute’s IT architecture and supporting its research-to-production pipeline for AI-driven finance.

Modified

AI energy use is rising, but efficiency per task is collapsing
Education improves outcomes by optimizing energy usage and focusing on small models.Do this, and costs and emissions fall while learning quality holds

The key figure in today's discussion about AI and the grid isn't a terawatt-hour forecast but 0.4 joules per token, a number that reframes AI energy efficiency in education. This is the energy cost that NVIDIA now reports for advanced inference on its latest accelerator stack. According to the company's own long-term data, this shows about a 100,000-fold efficiency improvement in large-model inference over the past decade. However, looking at this number alone can be misleading. Total electricity demand from data centers is still expected to rise sharply in the United States, China, and Europe as AI continues to grow. But it changes the perspective. If energy usage per unit of practical AI work is decreasing, then total demand is not fixed; it's something that can be influenced by policy. Education systems, which are major consumers of edtech, cloud services, and campus computing, can establish rules and incentives that transform quick efficiency gains into reduced bills, lower emissions, and improved learning outcomes. The decision isn't about choosing between growth and restraint; it's about managing growth versus prioritizing efficiency in a way that also increases access to resources.

AI Energy Efficiency in Education: From More Power to Better Power

The common belief is that AI will burden the grids and increase emissions. Critical analyses warn that, without changes to current policies, AI-driven electricity use could significantly increase global greenhouse gas emissions through 2030. Models at the regional level forecast significant increases in data center energy use, with roughly 240 TWh in the United States, 175 TWh in China, and 45 TWh in Europe, compared to 2024 levels, by the end of the decade. These numbers are concerning and highlight the need for investment in generation, transmission, and storage. Yet these assessments also acknowledge considerable uncertainty, much of which relates to efficiency. This includes how quickly computing power per watt improves, how widely those improvements spread, and how much software and operational practices can reduce energy use per task. The risk is real, but the slope of the curve is not set in stone.

The technical case for a flatter curve is increasingly evident. Mixture-of-Experts (MoE) architectures now utilize only a small portion of parameters for each token, thereby decreasing the number of floating-point operations (flops) without compromising quality. A notable example processes tokens by activating approximately 37 billion out of 671 billion parameters, which significantly reduces computing needs per token, supported by distillation that transfers reasoning skills from larger models to smaller ones for everyday tasks. At the system level, techniques such as speculative decoding, KV-cache reuse, quantization to 4–8 bits, and improved batch scheduling all further reduce energy use per request. On the hardware front, the transition from previous GPU generations to Blackwell-class accelerators delivers significant speed gains while using far fewer joules per token. Internal benchmarks indicate substantial improvements in inference speed, accompanied by only moderate increases in total power.

Additionally, major cloud providers now report fleet-wide Power Usage Effectiveness (PUE) of nearly 1.1, which means that most extra energy use beyond chips and memory has already been minimized. Collectively, this represents an ongoing optimization process—from algorithms to silicon to cooling systems—that continues to drive down energy usage per beneficial outcome. Policy can determine whether these savings are realized.

Figure 1: New GPUs cut energy per token by ~25×, turning efficiency into the main policy lever for campuses.

AI Energy Efficiency in Education: What It Means for Classrooms and Campuses

Education budgets are feeling the impact of AI, with expenses including cloud bills, device updates, and hidden costs such as latency and downtime. An efficiency-first approach can make these bills smaller and more predictable while increasing access to AI support, feedback, and research tools. The first step is to establish procurement metrics that monitor energy per unit of learning value. Instead of simply purchasing "AI capacity," ministries and universities should aim to buy tokens per watt or joules per graded essay, with vendors required to specify model details, precision, and routing strategies. When privacy and timing allow, it's best to default to smaller distilled models for routine tasks like summaries, grammar checks, and feedback aligned with rubrics, saving larger models for specific needs. This won't compromise quality; it reflects how MoE systems function internally. With effective routing, a campus can handle 80–90% of requests with smaller models and switch to larger ones only when necessary, dramatically reducing energy use while maintaining quality where needed. A simple calculation using the published energy figures for new accelerators shows that moving a million-token daily workload from a 5 J/token baseline to 0.5 J/token—through distillation, quantization, and hardware upgrades—could save about 4.5 MWh per day before considering PUE adjustments. Even at an already efficient ~1.1 PUE, this represents significant budget relief and measurable reductions in carbon emissions.

Figure 2: Moving from a typical 1.58 PUE to ~1.09 saves ~49 MWh for every 100 MWh of IT work—money and carbon you can bank.

Secondly, workload management should be included in edtech implementation guides. Many uses of generative AI in education occur asynchronously—such as grading batches, generating prompts, and cleaning datasets—so grouping tasks and scheduling them during off-peak times can reduce the load without affecting users. Retrieval-augmented generation (RAG) reduces token counts by incorporating relevant snippets, rather than requiring models to derive responses from scratch. Speculative decoding enables a lighter model to generate tokens, which a heavier model then verifies, thereby boosting throughput while reducing energy use per output. Caching prevents the need to repeat system prompts and instructions across different groups. None of these requires the latest models; they need contracts that demand efficiency. Partnering with cloud providers that have best-in-class PUE and ensuring campuses only use on-prem servers when necessary can turn technical efficiency into policy efficiency: lowering total energy while achieving the same or better learning outcomes.

Bending the curve, not the mission

Critics may raise the concern of rebound effects: if we cut the energy required for an AI query by 10 times, won't usage rise by 10 times, negating the savings? Sometimes yes. But rebound is not an absolute rule, especially when buyers enforce limits. Public education can establish budget-based guidelines, such as caps on tokens per student aligned with educational objectives, and a tiered model routing that only escalates when the value demands it. Just as printers evolved from unmanaged to managed queues, AI requests can operate under quality-of-service guidelines that prioritize efficiency and reliability. The overall forecasts that trouble us most assume current practices will remain in place; changing the practices will change the estimates. Moreover, when usage increases for legitimate reasons—such as broader access and improved instruction—efficiency ensures that the extra energy used is lower per unit of learning than it would have been, which reflects responsible scaling.

Another critique is that claims of efficiency are overstated. It's smart to question the numbers. However, various independent assessments point in the same direction. Vendor reports reveal significant improvements in joules per token for recent accelerators, and third-party evaluations analyze these speed claims, showing that while overall rack power might increase, the work done per unit of energy rises at a much faster rate. Additionally, peer-reviewed methods are emerging to measure model performance in terms of energy and water use across various deployments. Even if any single claim is overly optimistic, the trend is clear, and different vendors can replicate the combination of architectural efficiency, distillation, and hardware co-design. For education leaders, the best approach is not disbelief; it's conditional acceptance: create procurement policies that reward demonstrated efficiency and penalize unclear energy use.

A third concern is infrastructure; schools in many areas face rising tariffs and overloaded grids. That's precisely why workload placement is crucial. Keep privacy-sensitive or time-critical tasks on energy-efficient local devices whenever possible; send batch tasks to cloud regions with cleaner grids and better cooling systems. Require vendors to disclose region-specific emission metrics and give buyers choices. Where a national cloud or academic network is available, education ministries can negotiate sector-wide rates and efficiency commitments, including plans for carbon-intensity disclosures per thousand tokens. This isn't unnecessary bureaucracy; it's modern IT management for a resource that is limited and costly.

Some may wonder if high-profile efficiency cases, such as affordable, effective chatbots, are exceptions. They are indications of what's possible. A notable case achieves competitive performance at a fraction of the cost of conventional computing by leveraging routing efficiency, targeted distillation, and hardware-aware training. Independent industry analysis credits its efficiency not to miraculous data but to solid engineering. As these techniques become more widespread, they redefine the efficient frontier for inference costs relevant to education—such as translation, formative feedback, concept checks, and code explanations—where smaller and mid-sized models already perform well if appropriately designed and fine-tuned on carefully chosen data. The policy opportunity is to connect contracts to that frontier so that savings are passed through.

Lastly, there is the challenge posed by climate change. Predictions of AI-related emissions growth are not mere scare tactics; they serve as alerts about a future without discipline in efficiency. If we take no action, power consumption by data centers will continue to rise into the 2030s, and some areas will revert to higher carbon generation to meet peak demands. If we do take action—by establishing efficiency metrics, timing workloads intelligently, and relocating computing resources wisely—education can seize the benefits of AI while reducing the energy required for each learning gain. This isn't just a financial story; it's a matter of credibility for the sector. Students and families will notice whether schools truly embody the sustainability principles they teach.

So, what should leaders do right now? First, revise requests for proposals (RFPs) to make energy per outcome a key criterion for awards, complete with clear measurement plans and third-party audit rights. Second, default to small models using distilled or MoE-routing for routine tasks and only escalate to larger models based on explicit policies. Implement management strategies to handle prompts and caches, minimizing recomputation. Third, partner with organizations that maintain a PUE close to 1.1 and have documented plans for joules per token, while also insisting on region-specific carbon intensity disclosures for hosted workloads. Fourth, strengthen internal capabilities: a small "AI systems" team to tune routing, batch jobs, and RAG pipelines is far more valuable than another generic SaaS license. Fifth, educate: help faculty and students understand why efficiency is equivalent to access and how choices regarding prompts, model selection, and timing impact energy use. This is how education can evolve AI from a flashy pilot into lasting infrastructure.

The final test is straightforward. If, two years from now, your campus is using significantly more AI but spending less per student on energy and emitting less per graduate, you will have bent the curve. The technology already provides the tools: a sub-joule token in the data center, a distilled model on the device, and an MoE gate that only processes what's necessary. The policy work involves placing the fulcrum correctly—through contracts, metrics, and operations—and then applying pressure. The conversation about whether AI inevitably requires ever more energy will continue, but in education, we don't need inevitability; we need results. The key figure to monitor is not solely terawatt-hours but the energy per learning gain. With focus, that number can continue to decrease even as access increases. That's the future we should strive for.


The views expressed in this article are those of the author(s) and do not necessarily reflect the official position of the Swiss Institute of Artificial Intelligence (SIAI) or its affiliates.


References

Bain & Company. (2025). DeepSeek: A Game Changer in AI Efficiency? (MoE routing and distillation details).
Brookings Institution. (2025, Aug. 12). Why AI demand for energy will continue to increase. (Context on the drivers of rising aggregate demand and unit-efficiency trends).
Google. (2024, July). 2024 Environmental Report; Power Usage Effectiveness (PUE) methodology page (fleet-wide PUE ~1.09).
International Energy Agency. (2025, Apr. 10). Energy and AI; Energy demand from AI (regional projections to 2030 for data-center electricity use).
International Monetary Fund. (2025, May 13). AI Needs More Abundant Power Supplies to Keep Driving Economic Growth (emissions implications under current policies).
NVIDIA. (2025, Jun. 11). Sustainability Report, FY2025 (long-run efficiency trend; ~0.4 J/token reference).
Zilliz/Milvus. (2025). How does DeepSeek achieve high performance with lower computational costs? (architecture and training optimizations that generalize).
Zhou, Z. et al. (2025, May 14). How Hungry is AI? Benchmarking Energy, Water, and Environmental Footprints of LLM Inference (infrastructure-aware benchmarking methods).

Picture

Member for

1 year 1 month
Real name
Catherine Maguire
Bio
Catherine Maguire is a Professor of Computer Science and AI Systems at the Gordon School of Business, part of the Swiss Institute of Artificial Intelligence (SIAI). She specializes in machine learning infrastructure and applied data engineering, with a focus on bridging research and large-scale deployment of AI tools in financial and policy contexts. Based in the United States (with summer in Berlin and Zurich), she co-leads SIAI’s technical operations, overseeing the institute’s IT architecture and supporting its research-to-production pipeline for AI-driven finance.