TaxBench is steered by the Tax AI Consortium, a body of The Intelligence Consortium.
Introduction
Benchmarks are essential for tracking large language model (LLM) capabilities and improving models through reinforcement learning paradigms. Despite their importance, specialized domains such as taxation remain critically understudied, though they demand unique reasoning across legal interpretations, mathematical calculations, and regulatory contexts. We introduce TaxBench, the definitive benchmark for evaluating LLMs on tax-related tasks. TaxBench features a comprehensive assessment framework with hyper-granular grading rubrics that enable consistent evaluation by both humans and LLMs. Our evaluation of leading models reveals significant capability gaps in tax reasoning, establishing clear targets for future development. To and regulatory contexts. We introduce TaxBench, the definitive benchmark for evaluating LLMs on tax-related tasks. TaxBench features a comprehensive assessment framework with hyper-granular grading rubrics that enable consistent evaluation by both humans and LLMs. Our evaluation of leading models reveals significant capability gaps in tax reasoning, establishing clear targets for future development. To maintain benchmark relevance in this evolving domain, we establish the Tax AI Consortium as a governing body to oversee submissions and regular updates. TaxBench provides the standard for measuring LLM tax proficiency, driving innovation at the intersection of artificial intelligence and taxation.
Dataset
TaxBench is a comprehensive evaluation framework comprising expert-crafted questions and detailed grading rubrics spanning diverse tax domains, including personal taxation, corporate taxation, international tax law, estate planning, tax procedure, and specialized areas such as cryptocurrency taxation and nonprofit tax compliance.
Sample Questions
International Corporate Tax
Question:ABC Inc., a U.S.-based corporation, enters into an agreement with Global Tech Ltd., a software development company incorporated and resident in Country X. Under the agreement, Global Tech Ltd. will develop custom software exclusively for ABC Inc.'s operations in the United States. The development work will be performed entirely in Country X by employees of Global Tech Ltd.
The total contract value is $500,000, and payments are to be made in installments over the course of 2025. There is a tax treaty between the United States and Country X, which includes provisions to avoid double taxation and reduce withholding rates on certain types of income.
ABC Inc. needs to determine:
- Whether it is required to withhold U.S. taxes on the payments to Global Tech Ltd.
- Whether it has any obligations to withhold taxes under Country X's tax laws.
A sample of the diverse and challenging questions in TaxBench.
Results
Live Leaderboard Coming Soon
Discussion
Humanity's Last Tax EvaluationTaxBench serves as the definitive and last necessary benchmark for tax reasoning capabilities in AI systems. Unlike static benchmarks that quickly become outdated in dynamic domains, TaxBench's living nature—supported by the Tax AI Consortium's commitment to regular updates and expert validation—ensures its continued relevance despite evolving tax regulations and policies. The comprehensive coverage across personal, corporate, international, and specialized tax areas, combined with hyper-granular evaluation rubrics, addresses the full spectrum of tax reasoning challenges. This represents the last necessary benchmark for tax domain evaluation, as its adaptable framework accommodates future regulatory changes and emerging tax concepts through the Consortium's governance structure. This approach eliminates the need for fragmented, specialized benchmarks that would make progress tracking difficult. By establishing a single, authoritative standard with built-in mechanisms for evolution, we create a sustainable framework for measuring, comparing, and advancing AI capabilities in tax reasoning for years to come, regardless of how tax systems or AI technologies develop.
ImpactTaxBench represents a significant advancement in domain-specific AI evaluation with implications extending beyond academic research. By improving AI tax reasoning capabilities, we can democratize access to tax expertise, potentially benefiting millions of individuals and businesses who cannot afford professional consultation. Enhanced tax reasoning in AI systems could dramatically reduce compliance burdens, which currently cost taxpayers billions annually in administrative expenses and professional fees. For policymakers, better AI tax reasoning enables more sophisticated analysis of proposed regulations, allowing for better understanding of distributional effects and unintended consequences. These advancements could ultimately contribute to more equitable tax systems by ensuring all taxpayers—regardless of resources—can properly navigate complex tax codes and receive the benefits and protections to which they are entitled.