
The Government's Usage of AI in Economic Forecasting
Karan Singh
Sep 23, 2025

At the end of 2024, the Federal Reserve Bank of St. Louis published a research paper which indicated that Google’s AI models could predict inflation with greater accuracy than professional economists (Faria-e-Castro & Leibovici, 2024). Concurrently, federal officials warned that AI models “harbour, if not amplify, the biases found in their data” (Brainard, 2021).
The concerns raised by federal officials in the United States go beyond inflation forecasting. AI systems influence Federal Reserve decisions such as employment levels, interest rates, and economic growth that affect millions. The technology designed to improve economic policymaking may be perpetuating the very assumptions that it was intended to challenge.
Government AI systems designed for economic forecasting are reinforcing specific economic assumptions (ResearchGate, 2024), from inflation targeting frameworks to stability bias. This creates a dangerous feedback loop where AI systems confirm existing policy preferences rather than challenging them. As governments rush to deploy AI to make key economic decisions, the question we should ask ourselves isn’t whether AI systems perform better than humans, but rather if they can avoid preserving the biases that policymakers possess.
How AI reinforces our biases
The Federal Reserve bank of St Louis’ use of Google’s PaLM (Pathways Language Model) highlights how AI systems can represent flawed policymaking. The research team gave the language model decades' worth of inflation data, Federal Reserve communications, and economic indicators that spanned decades. PaLM’s results were fairly impressive as it consistently outperformed the survey of professional forecasters, generating more accurate inflation predictions with a swift processing time. However, this “superior” performance masked a fundamental flaw: the AI system was learning to replicate, not challenge the Federal Reserve’s economic assumptions.
The root cause of the issue arises from the training data. The data itself wasn’t neutral as it included three decades' worth of policy choices and economic assumptions. Examples of this include the Fed’s adoption of explicit inflation targeting, the stability assumptions from the Great Moderation era, and monetary policies that arose from economic shocks. Research conducted by the Columbia Business School found that economists’ underlying assumptions can shift policy estimates by 12 to 15 per cent towards more restrictive monetary policy. Which means the communications used to train PaLM were already biased before being given to the AI. When PaLM analysed thousands of Fed communications, it associated “good” economic policy with these approaches. The AI did not learn to predict inflation: it learned to predict inflation exactly the way the Fed’s economists who trained it would. In essence, it harnessed their underlying assumptions about optimal policy responses and economic relationships.
These biases would have been costly during recent economic crises. During the 2021-2022 inflation surge, the Federal Reserve was slow to abandon its "transitory inflation” by delaying raising the interest rates until March 2022 (Powell, 2022). PaLM has been trained on decades of Fed communications, which emphasise gradual adjustments and patience with regard to inflation. This would have reinforced this hesitancy in the model rather than challenging it. The stability bias in the AI meant that it was suggesting gradual rate increases even when rapid monetary tightening was required (Yellen, 2022). Critically, the PaLM’s training on data prior to the COVID-19 pandemic meant that it missed new inflation dynamics such as supply chain disruptions and labour market restructuring. When the model was tested in 2021, the recommendations aligned well with the Fed’s own assessments, which suggested that it learned to confirm rather than challenge institutional groupthink (St. Louis Fed, 2024).
Despite documented policy risks, Federal Reserve officials are aware of these concerns. Governor Leal Brainard warned that AI models “harbour, if not amplify, the biases in their training data” (Brainard, 2021), while Governor Lisa Cook brought up similar concerns of bias amplification. The 2024 AI Report by the Treasury Department identified “risks related to bias and third-party providers.” Despite the numerous warnings, these AI systems continue to be deployed for economic analysis due to agencies prioritising AI’s superior performance metrics over bias concerns. Moreover, these agencies also feel the pressure to modernise, and they believe that they can manage these risks through the gradual implementation of these systems (Minneapolis Fed, 2023). This is troubling, as despite the awareness of the AI bias, there is no willingness to address these issues.

(Pixabay)
Why this matters for Economic Policy
When the Federal Reserve raises interest rates by even a quarter of a per cent, millions of people notice changes in their mortgage payments, job prospects, and credit card rates. A 0.25 per cent rate increase typically adds $13 monthly to a $300,00 mortgage payment, while credit card rates rise within billing cycles, which affects 45 per cent of Americans who carry revolving debt (Federal Reserve, 2024). These paramount decisions increasingly rely on AI systems that may be spewing out the same assumptions that led to the failures of past policies. These risks are not just theoretical. The Treasury Department of Australia is a valuable case study to dive into to highlight the ramifications of what happens when economic policymakers rely heavily on biased forecasting models.
From 2008 to 2020, Australia’s treasury department failed to predict major economic shifts, from the Global Financial Crisis impact to commodity price swings that define the Australian economy. Upon investigation, a government review found that the treasury had developed an “over-reliance on formal modelling” that assumed economic relationships would remain constant. While the models performed well during periods of stability, they failed during disruptions because they were trained on data that promoted stability bias. Treasury economists were confident in the past performances of their model, resulting in them missing warning signs and making flawed policy recommendations. The parallel to the AI systems we have today is noticeable. Like the treasury models, AI systems trained on historical data assume that past economic relationships still hold.
AI systems intensify these biases in a few different ways. First, they operate much faster than their human counterparts. This allows these systems to potentially spread flawed assumptions through policy networks faster than human oversight can catch these errors. Secondly, these systems are “black-box” in nature, meaning that their decision-making processes are non-transparent and cannot be easily understood. This makes it harder for policymakers to understand how the recommendations of these AI systems could stem from biased assumptions. Finally, greater performance on traditional metrics such as Root Mean Squared Error calculations (a metric used to assess the accuracy of predictive models), processing speed comparisons, and accuracy rates against professional forecasters creates overconfidence and undermines the concerns related to bias.

(Kevin Ku, Pixabay)
The performance metrics trap
By these metrics, the St. Louis Fed’s deployment of Google’s PaLM appeared successful. The AI continuously outperformed human economists, processed data swiftly, and generated more accurate inflation predictions. Yet these performance metrics ignored the critical questions of whether AI systems challenge or simply reinforce existing institutional biases. These AI systems can provide false confidence, which can lead to institutions trusting the recommendations of these AI systems even when they perpetuate biases.
What is absent from these evaluations are metrics related to bias detection scores, explainability requirements, and measures of whether AI challenges institutional assumptions or simply confirms them. No government agency routinely tests whether or not their economic AI perpetuates historical biases or if policymakers can differentiate between poor and great AI recommendations. Agencies should monitor when AI recommendations align suspiciously well with existing policy preferences, when systems consistently favour certain economic theories, when confidence intervals become super narrow during periods of uncertainty, and when AI does not flatten policy risks (NIST, 2024). Furthermore, we should look out for “automation bias,” which is the tendency to trust AI recommendations for the sole reason that they come from advanced technology. The Treasury Department's 2024 AI report acknowledged the “risks related to bias” but did not provide a roadmap for how we can measure these risks. Federal Reserve officials warn about AI operating as “black boxes”, yet these systems are continually deployed without the requirement of explainability standards that would allow for meaningful oversight.
The above examples of AI over-reliance in forecasting can engender a sense of false confidence. When AI systems excel at traditional performance metrics, agencies assume that they are deploying “good AI” without taking a deeper look at whether these systems are improving decision-making. While the AI is more statistically accurate than humans, we should not ignore the questions of bias, transparency, and accountability. Thus, leading us to take the wrong approach for defining what “good” AI is.
Defining what “good AI” looks like
Defining what “good” AI is for economic forecasting requires us to shift our focus from the effectiveness of these models to whether or not they enhance or undermine democratic decision-making. Good AI usage should be transparent, challenge existing assumptions, and be accountable for the decisions it makes, standards that should apply equally to AI systems that aid in making critical decisions (Brookings 2024). Current AI systems, despite their technical prowess, are lacking on these three accounts as they prioritise performance metrics.
Transparency means more than releasing the AI model summaries. We require explainable AI that allows policymakers and the public to understand how the AI system generates its recommendations and what assumptions the system makes.
The NIST AI Risk Management Framework provides a roadmap to help achieve this goal through systematic bias testing, algorithmic impact assessments, and clear documentation of training data sources and embedded economic theories. Some of NIST’s protocols include requiring agencies to document all training data sources with explicit identification of the embedded assumption, conducting regular bias assessments through statistical tests that compare AI recommendations against diverse economic perspectives, and explaining how AI decisions might affect different economic groups (NIST AI Framework, 2024).
Good AI usage should also challenge the assumptions presented to AI rather than just blindly accepting them. This can be done by reviewing an array of economic perspectives and flagging recommendations that align suspiciously well with existing policies.
Finally, accountability would require a clear chain of responsibility to be established. Ensuring that the public is able to share their inputs is critical. This could include regular public comment periods before major AI systems are deployed, advisory panels that review the AI evaluation criteria, and disclosure of how AI systems weigh different economic priorities. Just as Fed banks gather inputs from local communities, they should collect feedback on whether AI systems reflect diverse economic perspectives and community priorities.
If employed ethically, AI forecasting can offer an opportunity to make government economic decision-making more democratic than it currently is. Rather than conforming to existing thought processes, AI systems can be designed to expand and incorporate broader perspectives that are currently excluded.
Policymakers may even have incentives to embrace AI systems that challenge their current economic assumptions. Past policy failures, such as the 2008 financial crisis, are theorised to have stemmed from institutional groupthink that well-designed AI systems can help counteract (Bernanke, 2022). Officials will be able to demonstrate that a wide range of perspectives were considered rather than simply following institutional preferences. When Federal Reserve officials acknowledge the pitfalls of the AI systems they deploy, they face they face a choice: continue deploying AI that perpetuates their biases or harness technology that could finally break the cycle of institutional biases that have plagued economic forecasting for decades.
View References