Skip to main content

Redesigning Education Beyond Procedure in the Age of AI

Redesigning Education Beyond Procedure in the Age of AI

Picture

Member for

1 year 1 month
Real name
David O'Neill
Bio
David O’Neill is a Professor of Finance and Data Analytics at the Gordon School of Business, SIAI. A Swiss-based researcher, his work explores the intersection of quantitative finance, AI, and educational innovation, particularly in designing executive-level curricula for AI-driven investment strategy. In addition to teaching, he manages the operational and financial oversight of SIAI’s education programs in Europe, contributing to the institute’s broader initiatives in hedge fund research and emerging market financial systems.

Modified

AI excels on known paths, so schools must shift beyond procedure
Assessments should reward framing and defense under uncertainty
This prepares students for judgment in an AI-driven world

Every era has its pivotal moment. Ours came when an AI system scored a gold medal at the world's toughest student math contest. It solved five out of six problems in the International Mathematical Olympiad within four-and-a-half hours, providing clear solutions that official coordinators graded for a total of 35 points. This accomplishment highlights not only the power of computation but also a key shift in our educational systems. When the route to a solution is clear—when methods are established, tactics are defined, and proofs can be systematically searched—machines aren't just helpful; they outperform humans. They work tirelessly within established techniques. If education continues to prioritize mastering these techniques as the highest achievement, we risk judging students on how well they imitate a machine. The correct response is not to reject the machine but to change the focus of the contest. We should teach for the unknown path by encouraging problem finding, framing models, auditing assumptions, transferring knowledge across domains, and crafting arguments under uncertainty. This shift is not only practical, but it is also urgently needed.

The Known Path Is Now a Conveyor Belt

Across various fields, evidence is coming together. When a task's route is clear and documented, AI systems speed up processes and reduce variability. In controlled experiments, developers using an AI pair programmer completed standard coding tasks about 56 percent faster. This result has been replicated in further studies. Customer service agents using a conversational assistant were 14 percent more productive, with greater gains for novices—significant for education systems that want to help lower performers. These findings reflect what happens when patterns are recognizable and "next steps" can be predicted from a rich history of similar tasks. In mathematics, neuro-symbolic systems have reached a significant milestone. AlphaGeometry solved 25 of 30 Olympiad-level geometry problems within standard time limits, nearing the performance of a human gold medalist, and newer systems have formalized proofs more extensively. This is what we should expect when problem areas can be navigated through guided searches of known tactics, supported by larger datasets of worked examples and verified arguments.

The policy issue is straightforward: if education continues to allocate most time and grades to known-path performance—even in prestigious courses—students will understandably look for tools to accelerate their success along these paths. Recent international assessments highlight the consequences of poorly structured digital time. In PISA 2022, students spending up to one hour per day on digital devices for leisure scored 49 points higher in mathematics than those using them for five to seven hours; pushing leisure screen time beyond one hour, and performance declines sharply. The takeaway is not to ban devices but to stop making schoolwork a device-driven search for known steps. The broader labor market context supports this point. Mid-range estimates suggest that by 2030, about a third of current work hours could be automated or accelerated, impacting routine knowledge tasks—exactly the procedural skills that many assessments still prioritize. Education systems that focus on the conveyor belt will inadvertently grade AI usage instead of human understanding.

Method note: To clarify ideas, consider a 15-week algebra course with four hours of homework each week. If 60 percent of graded work is procedural and AI tools cut completion time for those tasks by a conservative 40 percent, students save about 1 hour per week, 15 hours per term. The question is what to do with those hours: focus on more procedures or the skills machines still struggle to automate—defining a problem, managing uncertainty, building a model worth solving, and justifying an answer to an audience that challenges it.

Figure 1: AI boosts routine knowledge work by large margins, with outsized benefits for novices.

Designing for Unknown Paths

What does it mean to teach for the unknown path while maintaining fluency? The initial step is curricular: treat procedures as basic requirements and dedicate class time to deeper concepts. This means shifting from "Can you find the derivative?" to "Which function class models this situation appropriately for the decision at hand, and what error margin ensures robustness?" From "Solve for x" to "Is x even the right variable to consider?" From "Show that the claim is valid" to "If the claim is false, how would we know as early and cheaply as possible?" The aim is not to confuse students with endless questions for the sake of it, but to structure exercises around selecting, defining, and defending approaches under uncertainty—skills that remain distinctly human, even as computations become quicker.

There are practical ways to achieve this. Give students the answer and ask them to create the question—an exercise that demands attention to assumptions and limitations. Require an "assumption log" and a one-page "model card" with any multi-step solution. This document should outline what was constant, what data were acceptable, what alternatives were explored, and the principal risks of error. Use AI as a baseline: let students develop a solid procedural solution, then evaluate their responses—by challenging, generalizing, or explaining when it might lead them astray. Anchor practice in real inquiries ("What influenced attendance last quarter?") rather than just symbolic manipulation. When students need to make approximations, they should justify the approach they selected and show how their conclusions change as the estimation criteria relax. These are not optional extras; they are habits we can teach and assess.

Research shows that well-designed digital tools can enhance learning when they focus on practice and feedback instead of replacing critical thinking. A significant evaluation linked to Khan Academy found positive learning outcomes when teachers dedicated class time to systematic practice on core skills, while quasi-experimental studies reported improvements at scale. Early tests of AI tutors show higher engagement and faster mastery compared to lecture-based controls, particularly among students who start behind—again, a significant equity issue. The key point is not that an AI tutor teaches judgment. Instead, it's that thorough, data-informed practice frees up teacher time and student energy for developing judgment if— and only if—courses are designed to climb that ladder.

What Changes on Monday Morning

The most crucial factor is assessment. Institutions should immediately transform some summative tasks into "open-tool, closed-path" formats. Students may use approved AI systems for procedural tasks, but they will be scored on the decisions made before those steps and the critiques that follow. Provide the machine's answer and assess the student's follow-up: where it strayed, what different model might change the outcome, which signals mattered most, and which assumption, if incorrect, would compromise the result. Require verbal defenses with non-leading questions—five minutes per student in small groups or randomized presentations in larger classes—to ensure that written work reflects accurate understanding, since leisure device time correlates with significant performance drops. Schedule "deep work" sessions where phones are set aside and tools chosen intentionally, not reflexively.

For administrators, the budgeting focus should be on teacher time. If AI speeds up routine feedback and grading, those hours should be reclaimed for engaging, studio-style seminars where students present and defend their modeling choices. Pilot programs could establish a set target—such as two hours of teacher contact time per week shifted from grading procedures to facilitating discussions—and then monitor the results. Early signs in the workplace suggest that these gains are real. Surveys indicate that AI tools are quickly being adopted across functions, and studies show that saved time often gets redirected into new tasks instead of being wasted. However, redeployment is not automatic. Institutions should integrate this into their schedules: smaller discussion groups, rotating "devil's advocate" roles, and embedded writing support focused on evidence and reasoning rather than just grammar.

Policymakers have the most to address regarding incentives. Accountability systems need to stop awarding most points for de-contextualized procedures and instead assess students' abilities to diagnose, design, and defend. One option is to introduce moderated "reasoning audits" in high-stakes exams: a brief, scenario-based segment that provides a complex situation and asks candidates to create a justified plan instead of completing a finished calculation. Another approach is to fund statewide assessment banks of open-context prompts with scoring guidelines that reward managing uncertainty—explicitly identifying good practices such as bounding, triangulating data sources, and articulating a workable model. Procurement can also help: require that any licensed AI system records process data (queries, revisions, model versions) to support clear academic conduct policies rather than rigid bans. Meanwhile, invest in teacher professional learning focused on a few solid routines: defining a problem, conducting structured estimations, drafting a sensitivity analysis, and defending assumptions with a brief presentation. These skills are transferable across subjects; they are essential for mastering the unknown path.

Finally, we should be honest about student behavior. The use of AI for schoolwork among U.S. teens increased from 13 percent to 26 percent between 2023 and 2025, with this trend crossing demographic boundaries. Universities report similar patterns, with most students viewing AI as a study aid for explanation and practice. Ignoring these tools only drives their use underground and misses a chance to teach how to use them wisely. A simple institutional policy can align incentives: allow the use of AI for known steps if documented, and ensure that grades are mainly based on what students do that the model cannot define—tasks, justifying approaches, challenging outputs, and communicating effectively with a questioning audience.

Figure 2: Adoption of AI as a study partner doubled in two years, reshaping how learning time is spent.

We anticipate familiar critiques. First, won't shifting focus away from procedures weaken fluency? Evidence from tutoring and structured practice suggests otherwise—students improve more when routine practice is disciplined, clear, and linked to feedback, especially if time is shifted toward transfer. Second, won't assessing open-ended tasks be too subjective? Not if rubrics clearly define the criteria for earning points—stating assumptions, bounding quantities, testing sensitivity, and anticipating counterarguments. Third, isn't relying on AI risky because models can make mistakes? That's precisely the goal of teaching the unknown path. We include the model where it belongs and train students to identify and recover from its errors. Fourth, doesn't this favor students who are already privileged? In fact, the opposite can hold. Research shows that the most significant productivity gains from AI assistance tend to benefit less-experienced users. If we create tasks that value framing and explanation, we maximize class time where these students can achieve the most significant growth.

We also need to focus on attention. The PISA digital-use gradient serves as a reminder that time on screens isn't the issue; intent matters. Schools should adopt a policy of openly declaring tool usage. Before any assessments or practice sessions, students should identify which tools they plan to use and for which steps. Afterwards, they should reflect on how the tool helped or misled them. This approach safeguards attention by turning tool selection into a deliberate choice rather than a habit. It also creates valuable metacognitive insights that teachers can guide. Together with planned, phone-free sessions and shorter, more intense tasks, this is how we can make classrooms places for deep thinking, not just searching.

The broader strategic view is not anti-technology; it advocates for judgment. Open-source theorem provers and advanced reasoning models are improving rapidly, resetting the bar for procedural performance. State-of-the-art systems are achieving leading results across competitive coding and multi-modal reasoning tasks. If we keep treating known-path performance as the pinnacle of educational success, we will find that standards are slipping right from under us.
We started with a gold-medal proof counted in points. Now, let's look at a different measure: the minutes spent on judgment each week, for each student. A system that uses most of its time on tasks a machine does better will drain attention, lead to compliance games, and widen the gap between credentials on paper and actual readiness. A system that shifts those minutes to framing and defense will produce a different graduate. This graduate can select the right problem to tackle, gather the tools to address it, and handle an informed cross-examination. The way forward isn't about trying to outsmart the machine; instead, it's about creating pathways where none exist and teaching students to build them. This is our call to action. Rewrite rubrics to emphasize transfer and explanation. Free up teacher time for argument. Require assumption logs and sensitivity analyses. Make tool use clear and intentional. If we do this, the next time a model achieves a perfect score, we will celebrate and then pose the human questions that only our students can answer.


The views expressed in this article are those of the author(s) and do not necessarily reflect the official position of the Swiss Institute of Artificial Intelligence (SIAI) or its affiliates.


References

Brynjolfsson, E., Li, D., & Raymond, L. (2023). Generative AI at work (NBER Working Paper No. 31161). National Bureau of Economic Research.
Castelvecchi, D. (2025). DeepMind and OpenAI models solve maths problems at level of top students. Nature News.
DeepMind. (2025, July 21). Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad.
Education Next. (2024, December 3). AI tutors: Hype or hope for education?
Khan Academy. (2024, May 29). University of Toronto randomized controlled trial demonstrates a positive effect of Khan Academy on student learning.
McKinsey Global Institute. (2023, June 14). The economic potential of generative AI: The next productivity frontier.
McKinsey. (2025, January 28). Superagency in the workplace: Empowering people to unlock AI's full potential at work.
OECD. (2023, December). PISA 2022: Insights and interpretations.
OECD. (2024, May). Students, digital devices and success: Results from PISA 2022.
OpenAI. (2025, April 16). Introducing o3 and o4-mini.
Peng, S., Kalliamvakou, E., Cihon, P., & Demirer, M. (2023). The impact of GitHub Copilot on developer productivity (arXiv:2302.06590).
Pew Research Center. (2025, January 15). About a quarter of U.S. teens have used ChatGPT for schoolwork, double the share in 2023.
Trinh, T. H. et al. (2024). Solving Olympiad geometry without human demonstrations. Nature, 627, 768–774.
The Guardian. (2025, September 14). How to use ChatGPT at university without cheating: "Now it's more like a study partner."

Picture

Member for

1 year 1 month
Real name
David O'Neill
Bio
David O’Neill is a Professor of Finance and Data Analytics at the Gordon School of Business, SIAI. A Swiss-based researcher, his work explores the intersection of quantitative finance, AI, and educational innovation, particularly in designing executive-level curricula for AI-driven investment strategy. In addition to teaching, he manages the operational and financial oversight of SIAI’s education programs in Europe, contributing to the institute’s broader initiatives in hedge fund research and emerging market financial systems.

Build the Neutral Spine of AI: Why Europe Must Stand Up Its Own Stack

Build the Neutral Spine of AI: Why Europe Must Stand Up Its Own Stack

Picture

Member for

1 year 1 month
Real name
Catherine Maguire
Bio
Catherine Maguire is a Professor of Computer Science and AI Systems at the Gordon School of Business, part of the Swiss Institute of Artificial Intelligence (SIAI). She specializes in machine learning infrastructure and applied data engineering, with a focus on bridging research and large-scale deployment of AI tools in financial and policy contexts. Based in the United States (with summer in Berlin and Zurich), she co-leads SIAI’s technical operations, overseeing the institute’s IT architecture and supporting its research-to-production pipeline for AI-driven finance.

Modified

Europe’s schools rely on foreign AI infrastructure, creating vulnerability
A neutral European stack with local compute and governance can secure continuity
This ensures resilient, interoperable education under global tensions

The most critical number in European AI right now is fifteen. That is the share of the continent's cloud market held by European providers in 2024; the remaining eighty-five percent is controlled by three U.S. companies. Pair that dependency with another figure: NVIDIA shipped roughly ninety-eight percent of all data-center AI GPUs in 2023. This reveals an uncomfortable truth: Europe's schools, universities, and public labs depend on foreign hardware and platforms that affect every lesson plan, grant proposal, and research workflow we care about. As U.S.–China technology tensions grow and export controls expand, this isn't just a pricing issue; it's a problem of reliability, access, and values for education systems that cannot pause when the rules change. Europe has the necessary pieces to change this situation—laws, computing infrastructure, research, and governance. Still, it must assemble them into a coherent, education-friendly "third stack" that can withstand geopolitical shifts and is designed to work together.

From Consumer to Producer: Why Sovereignty Now Means Interoperability

The case for a European stack isn't about isolation. It is a response to two converging trends: stricter export controls around advanced chips and new AI regulations in Europe with hard, upcoming deadlines. The United States tightened access to advanced accelerators and supercomputing items with rules in October 2022 and October 2023. Clarifications in 2024 and updates in 2025 expand the rules and introduce restrictions on AI model weights. Meanwhile, the EU's AI Act will come into effect on August 1, 2024, with restrictions and AI education duties starting in February 2025, general-purpose model obligations from August 2025 (with systemic-risk thresholds at 10^25 FLOPs), and high-risk system rules essentially landing by August 2026. To be clear, restrictions at the hardware level and requirements at the application level are arriving simultaneously, and education must not get caught in between. A European stack ensures that classrooms and labs have lawful, reliable access to compute and models, even as geopolitical circumstances change.

There's also a compatibility risk that rarely shows up in procurement documents. If two AI superpowers develop competing, non-compatible ecosystems—different chips, interconnects, tools, and model documentation norms—then educational content, evaluation pipelines, and safety tools won't move smoothly across borders. Analysts now view AI as a matter of national security, with incentives for separation being high, especially as "block economies" reappear. Education is downstream of this competition, relying on standard file formats, portable model cards, and shared inference runtimes. The more innovative approach is not to pick a side but to build a neutral spine that can translate between them and keep European teaching and research ongoing under pressure.

Figure 1: Europe’s education and research sectors rely on foreign AI infrastructure for the vast majority of their cloud and GPU needs, leaving classrooms exposed to supply or policy shocks.

Crucially, Europe is not starting from scratch. The EuroHPC program has created world-class systems, led by JUPITER—Europe's first exascale supercomputer—launched in September 2025 in Jülich, and Switzerland's "Alps" system at CSCS, launched in 2024. These platforms aren't just valuable for their processing power; they allow governments and universities to test procurement, scheduling, and governance on a continental scale. They demonstrate that Europe can establish computing resources on its soil, under European law, while still using top-tier components.

A neutral European stack must, therefore, focus on four practical ideas. First, portability: standardize open exchange formats and evaluation artifacts so models trained or refined on European hardware can run on multiple platforms without months of additional work. Second, governed access: compute should be allocated through institutions and grant programs already in use by the education sector, with visible queue service-level agreements and commitments so coursework runs when needed. Third, energy awareness: data-center power demand in Europe is set to rise significantly this decade, with several forecasts suggesting a doubling or near-tripling by 2030; education workloads should prioritize efficient open-weight models, shared fine-tunes, and "green windows" that align with renewable energy surpluses. Finally, transparent data governance: a Gaia-X style labeling system can show whether a tool keeps student data in-region, supports audits by data protection officers, and manages copyrighted material appropriately. These are not ideological choices; they are protective measures against disruption.

What Schools, Ministries, and Regulators Should Do Next

Start with the procurement process. Treat model providers and cloud capacity like essential services, not optional apps. This means securing major contracts with at least one European provider for each layer: EU-based cloud for storage and inference, EuroHPC resources for training and research, and a mix of European model vendors—both open and proprietary—so curricula and research aren't reliant solely on foreign licenses. This is achievable: European model companies have scale and momentum. In September 2025, Mistral raised €1.7 billion in a Series C led by ASML, leading to a valuation of about €11.7 billion. Germany's Aleph Alpha closed a Series B exceeding $500 million in late 2023. A resilient education stack should provide ways to use and examine such models in line with European law and practice.

Next, set clear, immediate goals that convert strategy into reality in the classroom. Over the next twelve months, every teacher-education program and university department with a digital curriculum should have access to and training on an approved set of models and data services hosted in Europe, specifically for educational use. The focus should not be on familiarity with slideshows; it should be about instructor assignments that run on the designated services at predictable costs. Where reliable data is lacking, we can create transparent estimates: if a national system wishes to offer 100,000 students a modest set of 300 classroom inference tasks over a semester at about 10^11 FLOPs per task for efficient models, the total would be around 3×10^16 FLOPs—comfortably supported by a mix of local inference on open weights and scheduled inquiry on shared clusters. The point of detailing such calculations is to highlight that most educational use doesn't require extreme-scale training runs that trigger "systemic risk" thresholds in EU law; it requires reliable, documented, mid-sized capacity.

Next, invest in the translation layers. The fastest way to lose a year is to find that your curriculum relies on a model or SDK that your provider cannot legally support. National ed-tech teams should maintain a "portability panel" comprising engineers and teachers tasked with ensuring that lesson-critical models and datasets are convertible across platforms and clouds, with model cards and evaluation tools stored under European oversight. This concern is not abstract; when one vendor dominates the accelerator market and the leading clouds run proprietary systems, a licensing change can disrupt classrooms overnight. The more Europe insists on portable inference and well-documented build processes, the more resilient its teaching and research will be.

Regulators can help close the loop. The AI Act's phased rules are not a burden to navigate; they outline a product roadmap for an education-friendly stack. The requirements for general-purpose models—technical documentation, cybersecurity, testing for adversarial issues, serious-incident reporting, and summaries of training data—reflect what schools and universities should demand anyway. Oversight bodies can expedite alignment by offering sector-specific guidance and by funding "living compliance" sandboxes, where universities can test documentation, watermarking, and red-teaming practices using EuroHPC resources. The Commission's decision not to pause the Act—and its guidance schedule for models with systemic risk—offers helpful certainty for planning syllabi and budgets for two academic years.

The Neutral Spine We Build Now

Energy management and site selection will determine whether an education-friendly stack becomes a public benefit or remains just a concept. Global data-center electricity demand is expected to more than double by 2030, with the IEA's baseline scenario estimating about 945 TWh—around the current energy usage of Japan—with a significant portion of the growth occurring in the U.S., China, and Europe. Within Europe, studies project demand could rise from about 96 TWh in 2024 to about 168 TWh by 2030. Several analyses predict a near-tripling in specific Northern markets, leading to over 150 TWh of consumption across the continent by the end of the decade. An education-first policy response is clear: prioritize efficient, European-hosted models for daily classroom use, create "green windows" for training and extensive inference that match renewable energy surpluses, and require providers to disclose energy estimates for each task alongside their costs so departments can plan effectively.

Figure 2: AI growth could nearly triple European data-center electricity demand by 2030, underscoring the need for energy-aware education AI infrastructure.

Location and legal frameworks are just as crucial as power consumption. Switzerland serves as a practical center for cross-border research and educational services. It is central to Europe's knowledge networks, has updated its Federal Act on Data Protection to match GDPR standards, and benefits from an EU adequacy decision. This simplifies managing cross-border data flows. Adding a Gaia-X-style trust label that indicates where student data is stored, how audits are conducted, and how copyrighted materials are handled in training, provides an operational model that districts and deans can adopt without needing to hire legal experts in AI. This is what a neutral spine looks like when built thoughtfully: legally sound, energy-conscious, and designed for portability.

Anticipating the pushback clarifies the policy choice. Some will see a third stack as redundant given the scale of global platforms. This view underestimates risk concentration. When three foreign firms control most of the local cloud market and one U.S. vendor supplies nearly all AI accelerators, any geopolitical or licensing shock can quickly affect schools. Others may worry about costs. Yet the cost of losing continuity—canceled lab sessions, frozen grants, untestable curricula—rarely appears in bid comparisons. The way forward is not to eliminate foreign tools; it is to ensure European classrooms can maintain their teaching and European labs can continue their research when changes occur elsewhere. Existing examples—JUPITER, Alps, and a growing number of European model companies—show that the essential elements are in place. The work now involves integration, governance, and teaching the next million students how to use them effectively.

In the end, the number fifteen serves as both a warning and an opportunity. The warning is about dependency: an education system that relies on others for most of its computing and cloud resources will one day realize that someone else has made its choices. The opportunity is to build: an interoperable, education-friendly European stack—legally grounded, geopolitically neutral, and energy-aware—that utilizes JUPITER-class capability and Swiss-based governance to keep classes running, labs effective, and research open. The implementation timeline is clear; the AI Act's deadlines are public and imminent, and energy limitations are tightening. The choice is equally straightforward: move forward now with procurement, compute scheduling, and model portfolios while prices and regulations are stable, or wait for the next round of export controls and accept that curriculum and research will change on someone else's timetable. The students in our classrooms deserve the first option, as do the teachers who cannot afford another year of uncertainty.


The views expressed in this article are those of the author(s) and do not necessarily reflect the official position of the Swiss Institute of Artificial Intelligence (SIAI) or its affiliates.


References

Aleph Alpha. (2023, November 6). Aleph Alpha raises a total investment of more than half a billion U.S. dollars (Series B announcement).
ASML. (2025, September 9). ASML and Mistral AI enter strategic partnership; ASML to invest €1.3 billion in Mistral's Series C.
Bureau of Industry and Security (U.S. Department of Commerce). (2024, April 4). Implementation of Additional Export Controls: Certain Advanced Computing Items; Supercomputer and Semiconductor End Use (corrections and clarifications).
Bureau of Industry and Security (U.S. Department of Commerce). (2025, January 13). Framework for Artificial Intelligence Diffusion (Public Inspection PDF summarizing December 2024 expansions including HBM and AI-related controls).
CSCS – Swiss National Supercomputing Centre. (2024, September 16). New research infrastructure: "Alps" supercomputer inaugurated.
DataCenterDynamics (via TechInsights). (2024, June 12). NVIDIA data-center GPU shipments totaled 3.76 million in 2023 (≈98% market share).
EDÖB/FDPIC (Swiss Federal Data Protection and Information Commissioner). (2024, January 15). EU adequacy decision regarding Switzerland.
EuroHPC Joint Undertaking. (2025, September 5). JUPITER: Launching Europe's exascale era.
European Commission. (2025, August 1). EU rules on GPAI models start to apply: transparency, safety, accountability (AI Act GPAI obligations and systemic-risk threshold).
Gaia-X European Association. (2024, September). Compliance Document: Policy rules, labelling criteria, and trust framework.
IEA. (2025, April 10). Energy and AI: Data-centre electricity demand to 2030 (news release and analysis).
McKinsey & Company. (2024, October 24). The role of power in unlocking the European AI revolution.
Mistral AI. (2025, September 9). Mistral AI raises €1.7 billion (Series C) to accelerate technological progress.
RAND Corporation. (2025, August 4). Chase, M. S., & Marcellino, W. Incentives for U.S.–China conflict, competition, and cooperation across AGI's five hard national-security problems.
Reuters. (2025, July 4). EU sticks with timeline for AI rules despite calls for delay.
Synergy Research Group. (2025, July 24). European cloud providers' local market share holds steady at ~15% (2024 market ~€61 billion).
Tomorrow's Affairs. (2024, October 26). The legacy of the Cold War: Economics stuck between two worlds.
Ember. (2025, June 19). Grids for data centres in Europe (EU demand projection: 96 TWh in 2024 → 168 TWh by 2030).

Picture

Member for

1 year 1 month
Real name
Catherine Maguire
Bio
Catherine Maguire is a Professor of Computer Science and AI Systems at the Gordon School of Business, part of the Swiss Institute of Artificial Intelligence (SIAI). She specializes in machine learning infrastructure and applied data engineering, with a focus on bridging research and large-scale deployment of AI tools in financial and policy contexts. Based in the United States (with summer in Berlin and Zurich), she co-leads SIAI’s technical operations, overseeing the institute’s IT architecture and supporting its research-to-production pipeline for AI-driven finance.

Make Thinking Visible Again: How to Teach in an Age of Instant Answers

Make Thinking Visible Again: How to Teach in an Age of Instant Answers

Picture

Member for

8 months 1 week
Real name
Natalia Gkagkosi
Bio
Natalia Gkagkosi writes for The Economy Research, focusing on Economics and Sustainable Development. Her background in these fields informs her analysis of economic policies and their impact on sustainable growth. Her work highlights the critical connections between policy decisions and long-term sustainability.

Modified

AI doesn’t make students “dumber”; low-rigor, answer-only tasks do
Redesign assessments for visible thinking—cold starts, source triads, error analysis, brief oral defenses
Legalize guided AI use, keep phones out of instruction, and run quick A/B pilots to prove impact

The most alarming number in today's classrooms is 49. Across OECD countries in PISA 2022, students who spent up to an hour a day on digital devices for leisure scored 49 points higher in math than those who were on screens five to seven hours a day. This gap approaches half a standard deviation, even after accounting for background differences. At the same time, one in four U.S. teens, or 26%, now uses ChatGPT for schoolwork. This is double the percentage from 2023. In simple terms, students are losing focus as finding answers has become effortless. When a system makes it easier to get a decent sentence than to think deeply, we can expect students to rely on it. The problem is not that kids will interact with an intelligent machine; it's that schoolwork often demands so little from them when such a machine is around.

The Wrong Debate

We often debate whether AI will make kids "dumber." This question overlooks the real issue. When a task can be completed by simply pasting a prompt into a chatbot, it is no longer a thinking task. AI then becomes a tool for cognitive offloading—taking over recall, organization, and even judgment. In controlled studies, people often rely too much on AI suggestions, even when contradictory information is available; the AI's advice overshadows essential cues. This behavior isn't due to children's brains but reflects environments that prioritize speed and surface-level accuracy over reasoning, evidence, and revision. To address this, we need to focus less on blocking tools and more on rebuilding classroom practices so that visible thinking—clear reasoning that is visible, open to questioning, and assessable—becomes the easiest option.

We also need to acknowledge the changes in the information landscape. When Google displays an AI summary, users click on far fewer links; many stop searching entirely after reading the summary. This "one-box answer" habit transfers to schoolwork. If a single synthetic paragraph seems trustworthy, why bother searching for sources, comparing claims, or testing counterexamples? Students are not lazy; they respond rationally to incentives. If the first neat paragraph gets full credit, the demand for messy drafts and thoughtful revisions goes away—exactly where most learning happens. We cannot lecture students out of this pattern while keeping assessments that encourage it. We must change the nature of the work itself.

Figure 1: Usage doubled in two years—if tasks reward quick answers, more students will choose AI shortcuts.

The phone debate serves as a cautionary tale. Bans can minimize distractions and improve classroom focus. The Netherlands has reported better attention after a national classroom ban, and England has provided guidance for stricter policies. However, research on the relationship between grades and mental health is mixed. Reviews show no consistent academic increase from blanket bans. To put it simply, reducing noise is helpful, but it doesn't enhance learning. A school can ban phones, yet still assign tasks that a chatbot can complete in 30 seconds. If we only focus on control, we may temporarily address symptoms while leaving the main learning problem—ask-and-paste assignments—untouched.

Build Cognitive Friction

The goal is not to make school anti-technology; it's to make it pro-thinking. Start by creating cognitive friction—small, deliberate hurdles that make getting unearned answers challenging and earned reasoning rewarding. One method is implementing "cold-start time": the first ten minutes of a task should involve handwritten or whiteboard notes capturing a plan, a claim, and two tests that could disprove it. AI can support brainstorming later, but students must first present their foundation. During a pilot of this approach in math and history departments last year (about 180 students across two grade bands), teachers noted fewer indistinguishable responses and richer class discussions. Note: since there was no control group, these results should be seen as suggestive; we tracked argument quality based on rubrics, not final grades. Available research supports this change: meta-analyses indicate positive effects of guided AI on performance while cautioning against unguided reliance. The takeaway is to design effectively, not deny access.

Figure 2: When an AI summary appears, users click sources roughly half as often—another reason assignments must require visible evidence checks.

Next, create prompts that are complex for AI and easier for humans. Ask students for an answer plus their rationale: what three counterclaims they considered and the evidence they used to dismiss them. Require students to utilize triads of sources—two independent sources that don't cite each other and one dataset—and then ask them to reconcile any differences in the data. In science, place more weight on error analysis than on the final answer; in writing, assess the revision memo that explains what changed and why. In math, have students verbally defend a solution for two minutes to a randomly chosen "what if" scenario (like changing a parameter or inverting an assumption). These strategies make thinking visible and turn simple answer-chasing into a losing tactic. They also align with the field's direction: the OECD's "Learning in the Digital World" pilot for PISA 2025 emphasizes computational thinking and self-regulated learning—skills that will endure despite the rise of automation.

Finally, use AI as a counterpoint. Provide every student with the same AI-generated draft and assess their critiques: what is factually correct, what is plausible but lacking support, what is incorrect, and what is missing? The psychology at play is significant. Research shows that people often over-trust AI suggestions. Training students to identify systematic errors—incorrect citations, flawed causal links, hidden assumptions—fosters a healthy attitude of trust but verify. Teachers can facilitate this with brief checklists and disclosure logs: if students use AI, they must provide the relevant content and explain how they verified the claims. Note: This approach maintains academic integrity without requiring punitive oversight and can be implemented on a large scale. As districts increase AI training for teachers, the ability to implement these routines is rapidly improving.

Policy Now, Not Panic

At the system level, the principle should be freedom to use, yet the duty to demonstrate. Schools need clear policies: students may use generative tools as long as they can show how the answer was derived in a way that is traceable and partially offline. This requires rubrics that reward planning work, oral defenses, and revision notes, along with usage disclosures that protect privacy while ensuring transparency. UNESCO's 2023 global review states clearly: technology can help with access and personalization, but can also harm learning if it substitutes for essential tasks; governance and teaching should take the lead. A policy that allows beneficial uses while resisting harmful ones is more sustainable than outright bans. It also views students as learners to be nurtured, not issues to be managed.

Regarding phones, aim for a managed quiet rather than a panicked response. Research shows distractions are common and linked to worse outcomes; structured limitations during instructional time are justifiable. However, complete bans should be accompanied by redesigned assessments; otherwise, we may applaud compliance while critical skills lag. The OECD's 2024 briefs offer valuable insights: smartphone bans can reduce interruptions; however, the effectiveness of learning improvements depends on enforcement and effective teaching methods. Countries are making changes: the Netherlands has tightened classroom rules and reports better focus, while England has formalized schools' powers to limit and confiscate phones when necessary. Districts should implement effective strategies—clear rules and consistent enforcement—while creating lessons that make phones and chatbots irrelevant to grades because those make grades dependent on reasoning.

We also need prompt evidence, not years of discussion. School networks can conduct simple A/B tests in a single term: one group of classes adopts the cognitive-friction strategies (cold starts, triads, oral defenses, AI critiques), while another group continues with existing methods; compare reasoning and retention based on rubrics one month later. Note: keep the stakes low, pre-register metrics, and maintain intact classes to prevent contamination. Meanwhile, state agencies should fund updates to assessments—developing AI-resistant prompts and scoring guidelines along with teacher training. The good news is that we aren't starting from scratch; controlled studies and meta-analyses have already shown that guided AI can enhance performance by improving feedback and revision cycles. Our task is to tie these gains to judgment habits rather than outsourcing.

Anticipating potential pushback is essential. Some may argue that any friction is unfair to students who struggle with writing fluency or multilingual learners who need AI support. The key is to distinguish between support and shortcuts. Allow AI for language clarity and brainstorming outlines, but evaluate argument structure—claim, evidence, warrant—through work that cannot be pasted. Others might say oral defenses are logistically demanding. They don't have to be: two-minute "micro-vivas" at random points, once per unit, scored on a four-point scale, reveal most superficial work with minimal time commitment. A third concern is that strict phone rules could affect belonging or safety. In this case, policies should be narrow and considerate: phones should be off during instruction but accessible at lunch and after school, with exceptions made for documented medical needs. The choice is not between total freedom and strict control. It is about creating classrooms that focus on visible thinking rather than just submitted text.

What about the argument that AI makes us think less? The evidence is mixed by design: in higher education, AI tutors and feedback tools often improve performance when integrated into lessons. At the same time, experiments reveal that people can overtrust confident but incorrect AI advice, and reviews highlight risks to critical thinking due to cognitive offloading. Both outcomes can coexist. The pivotal factor is the design of the task. If a task requires judgment across conflicting sources, tests a claim against data, and demands a transparent chain of reasoning, AI becomes an ally rather than a substitute. If the task asks for a neat paragraph that any capable model can produce, outsourcing wins. Our policy should not aim to halt technology. It should increase the cost of unearned answers while decreasing the effort needed for earned ones.

A final note on evidence: PISA correlations do not establish causation, but the direction and magnitude of these associations—especially after adjusting for socioeconomic factors—match what teachers observe: increased leisure screen time at school and peer device use correlate with weaker outcomes and reduced attention. Conversely, structured technology use, including teacher-controlled AI tutors, can be beneficial. The reasonable policy response is to minimize ambient distractions, ensure visible reasoning, and use AI for feedback rather than answers. This framework is now implementable and can be audited by principals, making it understandable to families.

Return to the number 49. It does not suggest "ban technology." It indicates that as leisure screen time in school increases, measured learning decreases. It shows that attention is limited and fragile. With a world filled with instant answers, we risk short-circuiting the very skills—argument, analysis, revision—that education aims to strengthen. The solution is attainable. Make thinking the only route to earning grades through cold starts, counterclaims, source triads, error analysis, and brief oral defenses. Use AI where it accelerates feedback, but requires students to demonstrate their reasoning process, not just deliver their conclusions. Keep phones out during instruction unless they serve the task; create policies that permit beneficial uses and disclose them. If we do this, the machine will no longer serve as a crutch for weak prompts but as a tool that clarifies strong thinking. In a decade, let the takeaway be not that AI made students "dumber," but that schools became more thoughtful about what they ask young minds to achieve.


The views expressed in this article are those of the author(s) and do not necessarily reflect the official position of the Swiss Institute of Artificial Intelligence (SIAI) or its affiliates.


References

Campbell, M. et al. (2024). Evidence for and against banning mobile phones in schools: A scoping review. Journal of Children's Services Research.
Department for Education (England). (2024). Mobile phones in schools: Guidance.
Deng, R., Benitez, J., & Sanz-Valle, R. (2024). Does ChatGPT enhance student learning? A systematic review. Computers & Education.
EdWeek (reporting RAND). (2025, Apr. 8). More teachers than ever are trained on AI—are they ready to use it?
Klingbeil, A. et al. (2024). Trust and reliance on AI: An experimental study. Computers in Human Behavior.
OECD. (2024a). Students, digital devices and success. OECD Education and Skills.
OECD. (2024b). Technology use at school and students' learning outcomes. OECD Education Spotlights.
Pew Research Center. (2025, Jan. 15). About a quarter of U.S. teens have used ChatGPT for schoolwork—double the share in 2023.
Pew Research Center. (2025, July 22). Google users are less likely to click on links when an AI summary appears in the results.
Reuters. (2025, July 4). Study finds smartphone bans in Dutch schools improved focus.
UNESCO. (2023). Global Education Monitoring Report 2023: Technology in education—A tool on whose terms?
Wang, J. et al. (2025). The effect of ChatGPT on students' learning performance: A meta-analysis. Humanities and Social Sciences Communications.
Zhai, C. et al. (2024). The effects of over-reliance on AI dialogue systems on student learning: A systematic review. Smart Learning Environments.

Picture

Member for

8 months 1 week
Real name
Natalia Gkagkosi
Bio
Natalia Gkagkosi writes for The Economy Research, focusing on Economics and Sustainable Development. Her background in these fields informs her analysis of economic policies and their impact on sustainable growth. Her work highlights the critical connections between policy decisions and long-term sustainability.