The Economy is an independent global media and intelligence platform dedicated to connecting economics, technology, and power. It operates at the intersection of public knowledge and strategic insight — bridging financial markets, emerging technologies, and geopolitical intelligence for a discerning international audience.
Sector-Specific AI: Why Finance Isn’t ChatGPT—and Why That Matters
Picture
Member for
1 year 1 month
Real name
Ethan McGowan
Bio
Ethan McGowan is a Professor of Financial Technology and Legal Analytics at the Gordon School of Business, SIAI. Originally from the United Kingdom, he works at the frontier of AI applications in financial regulation and institutional strategy, advising on governance and legal frameworks for next-generation investment vehicles. McGowan plays a key role in SIAI’s expansion into global finance hubs, including oversight of the institute’s initiatives in the Middle East and its emerging hedge fund operations.
Published
Modified
AI works best when built for each sector’s data and goals
Finance needs domain-grounded models and risk-based metrics, not generic chatbots
Teach, buy, and regulate using sector-specific measures
The most crucial figure in today’s AI discussion isn’t a high parameter count. It is a divide. In 2024, only 13.5% of EU firms reported using AI at all. Yet nearly half of information and communications firms had adopted it, while most other sectors were below one in six. This pattern is striking and revealing: adoption occurs where data and tasks match the tools, and stalls where they do not. This divide challenges the idea that "AI equals ChatGPT." Sector-specific AI relies on domain data, error structure, incentives, and key metrics in the field. Photographs and writing benefit from identifying patterns in clean, stable datasets. In contrast, financial time series are challenging due to noise, regime shifts, and the higher costs of making mistakes. Suppose we continue to think of AI as a single entity. In that case, we risk teaching students incorrectly, purchasing the wrong systems, and mismanaging risk. The solution isn’t just more hype; it’s about adopting sector-specific AI tailored to the needs of each sector.
Sector-specific AI is a practice, not a product
Sector-specific AI isn’t just a chatbot dressed up; it’s a group of probabilistic systems tailored to specific contexts. In some areas, these systems already outperform previous top standards. Take weather forecasting. In peer-reviewed studies, DeepMind’s GraphCast beat the leading European physics model on 90% of 1,380 verification targets for forecasts ranging from three to ten days, delivering results in under a minute. This achievement stemmed from training on decades of structured reanalysis data, which had well-understood physical constraints and stable measurement processes. In short, the domain provided the model with valuable signals. This lesson applies broadly: when data are plentiful, consistent, and connected to governing dynamics, the benefits of learning are significant and rapid. That’s the essence of sector-specific AI.
The adoption rates illustrate the same story. Across the European Union, information and communications firms led AI use at 48.7% in 2024, followed by professional, scientific, and technical services at 30.5%. Adoption rates were lower in other sectors, highlighting differences in how well tasks match available data. UK business surveys present a similar picture: about 15% of firms reported using AI in late 2024, with adoption increasing alongside firm size and digital readiness. Global executive surveys show another layer: many organizations claim to be "using AI" somewhere, but the actual use is primarily in IT and sales/marketing, not in the complex, data-scarce areas of each sector. Sector-specific AI doesn’t spread like a single product; it expands where data, incentives, and measurement are aligned.
Why financial time series breaks the hype
Finance is where the idea that “AI = ChatGPT” causes the most damage. Equity and foreign exchange returns exhibit heavy tails, clustered volatility, and frequent regime changes—factors that render simple pattern-finding unreliable. The signal-to-noise ratio is low, labels change because of market perceptions (prices move because others believe they will), and feedback loops can reverse correlations. Recent studies clarify this issue: careful analyses find that language-model features generally do not beat strong, purpose-built time-series baselines. In many cases, simpler attention models or traditional structures perform just as well or better at a much lower cost. This isn’t a critique of AI; it’s a reminder that sector-specific AI for markets must begin with the data-generating process, not the latest general-purpose model.
What works in weather or image recognition doesn’t translate directly to forecasting returns. Weather models benefit from physics-consistent data with dense, continuous coverage. Corporate earnings, yields, and prices fluctuate due to policy changes, accounting adjustments, liquidity pressures, and narratives that affect risk pricing. Advanced models in finance can assist by nowcasting macroeconomic series from mixed data, classifying the tone of news, and identifying structural breaks, but the objectives differ. A trading desk cares less about root mean square error on tomorrow’s price and more about drawdowns, turnover, and tail risk during stress periods. This requires sector-specific AI that combines causal structure, reliable features, and stringent controls for backtesting and live deployment. If the benchmark is whether it surpasses a well-tuned seasonal naive or a hedged factor model after costs, many flashy systems fall short. That isn’t a failure of AI; it reflects the nature of finance.
Figure 1: AI user adoption (ChatGPT as proxy) has been exponentially faster than that of earlier digital platforms, underscoring that general-purpose tools spread faster than sector-specific ones but also face sharper saturation and adaptation limits.
Sector-specific AI needs local metrics, not generic demos
The quickest benefits from sector-specific AI come when evaluations align with the job to be done. In meteorology, the goal is timely, location-specific forecasts on variables that impact operations; benchmarks reward physical consistency and accuracy across different times. That’s why the GraphCast work—and hybrid systems that combine it with ensemble methods—are significant to grid planners and disaster response teams, not just to machine learning experts. The technique, data, and impact metric all align. In health imaging, sensitivity and specificity, depending on the condition and scanner type, matter more than polished fluency. In manufacturing, defects per million and scrap rates determine success, not flashy presentations. Sectors that establish these local metrics see real compounding benefits from AI.
Figure 2: The structure of AI research reveals distinct architectures—computer vision, NLP, robotics, and time-series analysis—each producing different error behaviors and performance ceilings. Policy must recognize these domain boundaries to design realistic AI adoption strategies.
Financial markets require an even stricter approach. A credible sector-specific AI roadmap starts with governance: controlled data pipelines, baseline econometrics, and pre-registered backtesting protocols that penalize overfitting. It then focuses evaluations on finance-specific outcomes: probability of failure under stress, realized Sharpe ratio after fees and slip, turnover-adjusted alpha, and worst-case liquidity scenarios. This viewpoint explains the gap between firms stating "we use AI somewhere" and those saying "we trust an AI signal in real time." Business surveys show widespread experimentation—78% of firms reported AI use in at least one function in 2025—but not widespread transformation. In finance, trust will increase when developers acknowledge the noise and start with the right benchmarks. Sector-specific AI works when it optimizes for the metrics that the sector already uses.
What educators, administrators, and policymakers should do next
Educators should teach sector-specific AI as a skill grounded in domain data. This begins with projects that use real sector datasets and metrics. A finance capstone should connect students with anonymized tick or macro datasets, run competitions against seasonal naïve, ARIMA-GARCH, and transformer models, and assess based on out-of-sample risk and costs, rather than classroom-friendly accuracy. A public-sector project should simulate casework prioritization or fraud detection, focusing on fairness, false-positive costs, and auditability from the start. An energy systems project should optimize fleet dispatch based on weather forecasts and price volatility. The core message is clear: the sector defines the loss function, and the model adapts accordingly. OECD analysis indicates that sectoral AI dynamics differ significantly; curricula should reflect that reality, rather than compress it into a single "AI course."
Administrators should also invest in sector metrics. In universities and research hospitals, purchases should require direct validation based on outcomes important to the unit—such as critical sensitivity and workload effects in radiology —rather than broad natural language processing benchmarks. In business schools and engineering programs, computing resources should connect to reproducible methods and solid baselines, rather than just parameter counts. In finance labs, live-trading environments should be separated from model development and subject to strict change controls. For many institutions, the first step is straightforward: improve the data management. UK and EU business surveys show that AI adoption increases with digital maturity; companies with good data management and security gain more tangible benefits from AI than those that skip the basics. Sector-specific AI relies on clean data.
Policymakers should create regulations based on sector needs, not a one-size-fits-all approach. Generic rules regarding "AI systems" overlook the actual risk landscape. Financial markets require model-risk management, record-keeping, and post-trade auditing. Health care needs documentation of training groups, site-to-site testing, and commitments to real-world performance. Critical infrastructure needs stress tests for rare events. International organizations are starting to recognize these differences as they assess sectoral AI use in manufacturing, finance, and government; the most effective frameworks relate controls to impacts and error costs rather than vague capability labels. In short, regulate the interface between model errors and human welfare where it occurs.
The counterargument is familiar: if models improve continuously, won’t ChatGPT-like systems soon handle everything? Progress is real, and overlaps will occur. However, the immediate path to impact relies on aligning with sector needs. Even in areas with significant breakthroughs, improvements stem from tailoring architecture and data to the task. Weather AI’s success wasn’t about chatbots; it was about domain knowledge. Finance will eventually reach its own breakthroughs, but they will involve effective risk management and improved hedging, rather than a general model extracting alpha from random noise. Since incentives vary by sector, the adoption curve will also differ. The right approach is not to wait for a single, universal solution; it’s to build according to the unique needs of each domain now.
The 13.5% overall adoption compared to nearly half in information and communications was never just a number. It illustrates how sector-specific AI spreads. When data are organized, outcomes are clear, and metrics are relevant, AI evolves quickly. In contrast, when data are chaotic, outcomes are uneven, and metrics are flawed, it stalls or even fails. Finance serves as a warning: time series are complex, rich in feedback, and resistant to pattern-finding that lacks a foundation in risk economics. The takeaway isn’t to lower expectations. It’s to teach, invest, build, and regulate as if AI encompasses many things—because it does. Suppose educators prepare builders who start with the data-generating process. In that case, administrators invest based on sector metrics, and policymakers regulate the impact of errors on people and capital. This approach will achieve the desired benefits alongside the necessary safeguards. This is the route out of the "AI = ChatGPT" trap and toward effective sector-specific AI based on the needs of each field.
The views expressed in this article are those of the author(s) and do not necessarily reflect the official position of the Swiss Institute of Artificial Intelligence (SIAI) or its affiliates.
References
DeepMind (2023). GraphCast: AI model for faster and more accurate global weather forecasting. Retrieved November 14, 2023, from DeepMind blog; see also Lam, R., et al. “Learning skillful medium-range global weather forecasting,” Science, 382(6673), eadi2336. Eurostat (2025). Use of artificial intelligence in enterprises—Statistics Explained (data extracted January 2025). European Commission. McKinsey & Company (2024). The state of AI in early 2024: Gen AI adoption. McKinsey Global Survey. McKinsey & Company (2025). The State of AI: Global survey. (March 12, 2025). OECD (2025). “How do different sectors engage with AI?” OECD Blog, February 13, 2025. OECD (2025). Governing with Artificial Intelligence: The State of Play and Way Forward in Core Government Functions (Media advisory and report). September 18, 2025. Office for National Statistics—UK (2024). Business insights and impact on the UK economy, October 3 2024 (PDF). Office for National Statistics—UK (2025). Management practices and the adoption of technology and artificial intelligence in UK firms, 2023, March 24, 2025. Tan, M., Merrill, M. A., Gupta, V., Althoff, T., & Hartvigsen, T. (2024). “Are Language Models Actually Useful for Time Series Forecasting?” NeurIPS 2024 (conference paper and arXiv). Xu, D. J., & Kim, D. J. (2025). “Modeling Stylized Facts in FX Markets with FINGAN-BiLSTM.” Entropy, 27(6), 635 (discussion of volatility clustering). Yan, Z., et al. (2025). “Evaluation of precipitation forecasting based on GraphCast.” Scientific Reports (Nature).
Picture
Member for
1 year 1 month
Real name
Ethan McGowan
Bio
Ethan McGowan is a Professor of Financial Technology and Legal Analytics at the Gordon School of Business, SIAI. Originally from the United Kingdom, he works at the frontier of AI applications in financial regulation and institutional strategy, advising on governance and legal frameworks for next-generation investment vehicles. McGowan plays a key role in SIAI’s expansion into global finance hubs, including oversight of the institute’s initiatives in the Middle East and its emerging hedge fund operations.
Adjusted deflators reveal hidden gains in US manufacturing productivity
The bottleneck is skills, not machines
Scale capital-intensive training, stackable credentials, and adult apprenticeships
Services trade is increasingly borderless and digital
Tariffs miss; data rules, licensing, and standards decide access
Train for exportable skills and build trust-based regimes to unlock growth
Domain-Specific AI Is the Safer Bet for Classrooms and Markets
Picture
Member for
1 year 1 month
Real name
Ethan McGowan
Bio
Ethan McGowan is a Professor of Financial Technology and Legal Analytics at the Gordon School of Business, SIAI. Originally from the United Kingdom, he works at the frontier of AI applications in financial regulation and institutional strategy, advising on governance and legal frameworks for next-generation investment vehicles. McGowan plays a key role in SIAI’s expansion into global finance hubs, including oversight of the institute’s initiatives in the Middle East and its emerging hedge fund operations.
Published
Modified
General AI predicts probabilities, not context-specific safety
Domain-specific AI fits the task and lowers risk in classrooms and markets
Use ISO 42001, NIST RMF, and the EU AI Act, and test on domain benchmarks
Reported AI incidents reached 233 in 2024, marking a 56% increase from the previous year. This sharp rise highlights real-world failures as generative systems become part of everyday tools. The public is aware of this shift. By September 2025, half of U.S. adults expressed more concern than excitement about AI in their daily lives, with only 10% feeling more excited than worried. This sentiment reveals a trend that cannot be ignored: general-purpose models focus on predicting the next token instead of meeting safety standards that differ by context. They operate on probability, not purpose. As usage shifts from chat interfaces to classrooms and financial settings, relying on probability as a measure of safety is inadequate. The solution lies in domain-specific AI, which consists of systems designed specifically for a narrow task, including safety measures suited to the task's stakes.
The case for domain-specific AI
The main issue with the push for greater generality is that it misaligns model learning with safety requirements. Models learn from training data trends, while safety is situational. What is acceptable in an open forum may be harmful in a school feedback tool or a trading assistant. As regulators outline risk tiers, this misalignment becomes both a compliance and ethical concern. The European Union’s AI Act, which took effect on August 1, 2024, categorizes AI used for educational decisions as high-risk, imposing stricter obligations and documentation requirements. General-purpose models also fall under a specific category with unique responsibilities. In short, the law reflects reality: risk depends on usage, not hype. Domain-specific AI acknowledges this reality. It narrows inputs and outputs, aligns evaluations with workflow, and establishes error budgets that correspond to the potential harm involved.
General benchmarks support the same idea. When researchers adapted the traditional MMLU exam into MMLU-Pro, top models experienced a 16-33% drop in accuracy. Performance on today’s leaderboards tends to falter when faced with real-life scenarios. This serves as a warning against unscoped deployment. Meanwhile, government labs now publish pre-deployment safety evaluations because jailbreak vulnerabilities still occur. The UK’s AI Safety Institute has outlined methods for measuring attack success and correctness; its May 2024 update bluntly addresses models’ dangerous capabilities and the need for contextual testing. If failure modes vary by context, safety must also be context-dependent. Domain-specific AI enables this possibility.
Figure 1: Domain-specific tuning reduces error rates in responses involving under-represented groups, showing safer alignment than general fine-tuning.
Evidence from incidents, benchmarks, and data drift
The curve for incidents is steep. The Stanford AI Index 2025 reported 233 incidents in 2024, marking the highest number on record. This figure is a system-wide measure rather than a reaction to media coverage. Additionally, the proportion of “restricted” tokens in common web data rose from about 5-7% in 2019 to 20-33% by 2023, altering both model inputs and behavior. As training data becomes more varied, the likelihood of general models misfiring in edge cases increases. Safety cannot be a once-and-done action; it must be a continuous practice tied to specific domains, using ongoing tests that reflect real tasks.
Safety evaluations confirm this trend. New assessment frameworks like HELM-Safety and AIR-Bench 2024 reveal that measured safety is heavily influenced by the specific harms tested and how prompts are structured. The UK AISI’s method also rates attack success rates and underscores that risk is contingent on deployment context rather than solely on model capability. The conclusion is clear: a general score does not guarantee safety for a specific classroom workflow, exam proctor, or FX-desk summary bot. Domain-specific AI allows us to select relevant benchmarks, limit the input space, and establish refusal criteria that reflect the stakes involved.
Data presents another limitation. As access to high-quality text decreases, developers increasingly rely on synthetic data and smaller, curated datasets. This raises the risk of overfitting and makes distribution drift more likely. Therefore, targeted curation gains importance. A model for history essay feedback should include exemplars of rubrics, standard errors, and grade-level writing—not millions of random web pages. A model for news reading in FX requires calendars, policy documents, and specific press releases, rather than a general assortment of internet text. In both cases, domain-specific AI addresses the data issue by defining what “good coverage” means.
What domain-specific AI changes in education and finance
In education, the risks are personal and honest. AI tools that determine student access or evaluate performance fall into high-risk categories in the EU. This requires rigorous documentation, monitoring, and human oversight. It also calls for choosing the right design for the task. A model that accepts inputs aligned with rubrics, generates structured feedback, and cites authoritative sources is easier to audit and remains within an error budget compared to a general chat model that can veer off-topic. NIST’s AI Risk Management Framework and its 2024 Generative AI profile provide practical guidance: govern, map, measure, manage—applied to specific use cases. Schools can utilize these tools to define what “good” means for various needs, such as formative feedback, plagiarism checks, or support accommodations, and to determine when AI should not be applied.
In finance, the most effective approaches are focused and testable. The best outcomes currently emerge from news-to-signal pipelines rather than general conversation agents. A study by the ECB this summer found that using a large model to analyze two pages of commentary in PMI releases significantly enhanced GDP forecasts. Academic and industry research shows similar improvements when models extract structured signals from curated news instead of trying to act like investors. One EMNLP 2024 study showed that fine-tuning LLMs on newsflow yielded return signals that surpassed conventional sentiment scores in out-of-sample tests. A 2024 EUR/USD study combined LLM-derived text features with market data, reducing MAE by 10.7% and RMSE by 9.6% compared to the best existing baseline. This demonstrates domain-specific AI in action: focused inputs, clear targets, and straightforward validation.
The governance layer must align with this technical focus. ISO/IEC 42001:2023, the first AI management system standard, provides organizations with a way to integrate safety into daily operations: defining roles, establishing controls, implementing monitoring, and improving processes. Combining this with the EU AI Act’s risk tiers and NIST’s RMF creates a coherent protocol for schools, ministries, and finance firms. Start small. Measure what truly matters. Prove it. Domain-specific AI is not a retreat from innovation; it is how innovation thrives amid real-world challenges.
Answering the pushback—and governing the shift
Critics may argue that “narrow” models will lag behind general models that continue to scale. However, the issue isn’t about measuring intelligence; it’s about being fit for purpose. When researchers modify tests or change prompts, general models struggle. MMLU-Pro’s 16-33 point decline illustrates that today’s apparent mastery could collapse under shifting distributions. General models remain prone to jailbreak vulnerabilities, so safety teams continue to publish defenses because clever attacks still work. The UK AI Safety Institute’s methodologies and the follow-up reports from labs and security firms emphasize one need: we must evaluate safety against the specific hazards of each task. Domain-specific AI achieves this inherently.
Figure 2: User evaluations show that domain-specific dialogue systems achieve higher persuasiveness and competence with no increase in discomfort.
Cost is another concern. Building smaller, focused systems may be redundant. In reality, effective stacks combine both types. Use a general model for drafting or retrieval, then run outputs through a policy engine that checks for rubric compliance, data lineage, and refusal rules. ISO 42001 and NIST RMF guide teams in this process by detailing what data was utilized, what tests were conducted, and how failures are addressed. The EU AI Act rewards such design with clearer paths to compliance, particularly for high-risk educational applications and for general models used in regulated environments. The lesson from recent evaluations is evident: governance must reside where the work occurs. This is more cost-effective than dealing with incident responses, reputational damage, and rework after audits.
A final objection is that specialization could limit creativity. Evidence from the finance sector suggests otherwise. The most reliable improvements currently come from models that analyze specific news to generate precise signals, validated against clear benchmarks. The ECB example demonstrated how small amounts of focused text improved projections, while the EUR/USD study outperformed baselines using task-specific features. Industry research indicates that fine-tuned newsflows perform better than generic sentiment analysis. None of these systems “thinks like a fund manager.” They excel at one function, making them easier to evaluate when they falter. In education, the parallel is clear: assist teachers in providing rubric-based feedback, highlight patterns of misunderstanding, and reserve critical judgment for humans. This approach keeps the helpful tool and mitigates harm.
The evidence is compelling. Incidents rose to 233 in 2024, public trust is fragile, and stricter benchmarks reveal delicate performance. The solution isn't abstract “alignment” with universal values implemented in larger models. The remedy is to connect capability with context. Domain-specific AI narrows inputs and outputs, utilizes curated data, and demonstrates effectiveness through relevant tests. It establishes concrete governance through ISO 42001 and NIST RMF. It aligns with the EU AI Act’s risk-driven framework. Its efficacy is already demonstrated in lesson feedback and news-based macro signals. The call to action is straightforward. Schools and ministries should set domain-specific error budgets, adopt risk frameworks, and choose systems that prove their reliability in domain tests before they interact with students. Financial firms should define assistants to focus on information extraction and scoring, not decision-making, and hold them to measurable performance standards. We can continue to chase broad applications and hope for safety, or we can develop domain-specific AI that meets the standards our classrooms and markets require.
The views expressed in this article are those of the author(s) and do not necessarily reflect the official position of the Swiss Institute of Artificial Intelligence (SIAI) or its affiliates.
References
Adalovelace Institute. (2024). Under the radar? London: Ada Lovelace Institute. Carriero, A., Clark, T., & Pettenuzzo, D. (2024). Macroeconomic Forecasting with Large Language Models (slides). Boston College. Ding, H., Zhao, X., Jiang, Z., Abdullah, S. N., & Dewi, D. A. (2024). EUR/USD Exchange Rate Forecasting Based on Information Fusion with LLMs and Deep Learning. arXiv:2408.13214. European Commission. (2024). EU AI Act enters into force. Brussels. Guo, T., & Hauptmann, E. (2024). Fine-Tuning Large Language Models for Stock Return Prediction Using Newsflow. In EMNLP Industry Track. ACL Anthology. ISO. (2023). ISO/IEC 42001:2023—Artificial intelligence management system. Geneva: International Organization for Standardization. NIST. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). Gaithersburg, MD. NIST. (2024). AI RMF: Generative AI Profile (NIST AI 600-1). Gaithersburg, MD. RAND Corporation. (2024). Risk-Based AI Regulation: A Primer on the EU AI Act. Santa Monica, CA. Reuters. (2025, June 26). ECB economists improve GDP forecasting with ChatGPT. Stanford CRFM. (2024). HELM-Safety. Stanford University. Stanford HAI. (2025). AI Index Report 2025—Responsible AI. Stanford University. UK AI Safety Institute (AISI). (2024). Approach to evaluations & Advanced AI evaluations—May update. London: DSIT & AISI. Wang, Y., Ma, X., Zhang, G., et al. (2024). MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark. NeurIPS Datasets & Benchmarks (Spotlight).
Picture
Member for
1 year 1 month
Real name
Ethan McGowan
Bio
Ethan McGowan is a Professor of Financial Technology and Legal Analytics at the Gordon School of Business, SIAI. Originally from the United Kingdom, he works at the frontier of AI applications in financial regulation and institutional strategy, advising on governance and legal frameworks for next-generation investment vehicles. McGowan plays a key role in SIAI’s expansion into global finance hubs, including oversight of the institute’s initiatives in the Middle East and its emerging hedge fund operations.
Takaichi’s rise is a pragmatic bet, not a protest vote
Shrinking cohorts demand an education reset—protect funding, cut teacher overload, use AI with guardrails
Judge success by delivery: fair consolidation and measurable gains in teacher time
Firms under competition cut jobs, switch lines, or move tasks offshore
Schools need a skills policy for competition: fast, stackable, portable training
Fund future-shaping sectors and track outcomes so workers can switch quickly