← Blog
explainx / blog

arXiv imposes one-year ban for unchecked AI errors: What researchers need to know

The preprint repository arXiv now bans authors for one year if they submit papers containing obvious AI-generated mistakes like hallucinated references or fabricated results. With submissions up 50% since ChatGPT and rejections up 5x, the platform treats AI slop as an existential threat.

13 min readYash Thakker
AI safetyAcademic publishingLLM hallucinationsResearch integrityarXivScientific publishing

MDX restores the committed source plus an HTML comment attribution; plain text bundles the rendered markdown body with the explainx.ai attribution footer.

arXiv imposes one-year ban for unchecked AI errors: What researchers need to know

The preprint repository arXiv has moved from gentle moderation to hard enforcement: researchers who submit papers containing obvious AI-generated mistakes—hallucinated references, fabricated results, or other unverified outputs from large language models—now face a one-year ban from the platform. For many researchers, losing arXiv access for a year (or more, given the second-stage penalty described below) is a career setback that could derail grants, job searches, and collaboration.

The announcement, outlined by Thomas G. Dietterich (an arXiv moderator for the cs.LG machine learning category) in a post on X this week, treats the issue not as a technical curiosity but as an existential threat to the scientific preprint system. Here is what changed, why it matters, and what researchers and teams building AI-assisted workflows should understand about verification before publication.


The new enforcement policy: one year off arXiv

Under arXiv's Code of Conduct, every listed author accepts full responsibility for a paper's contents—regardless of whether sections were drafted by a human, an LLM, or a hybrid workflow. If a submission includes what Dietterich calls "inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content" that clearly came from generative AI and was not verified, the consequences are now codified:

  1. Immediate one-year suspension from submitting to arXiv.
  2. After the ban expires, returning authors must first secure acceptance at a reputable peer-reviewed venue before arXiv will consider new submissions.

That second step is the hidden sting. For many researchers, the preprint on arXiv is part of the pipeline to peer review—posting a draft, gathering feedback, iterating, then formally submitting to a journal or conference. Without arXiv access, you lose that feedback loop and the visibility that helps with hiring, funding, and citations. The effective penalty duration can be well beyond a year if you cannot land a peer-reviewed acceptance quickly.

Mathematician Thomas Bloom clarified on X that arXiv is "not banning the use of AI, or papers which used AI to generate proofs, code, etc." The policy targets only those who upload papers containing clear evidence that AI-generated content was not checked—the hallucinated citation, the nonsense statistic, the plagiarized paragraph lifted wholesale from training data.


Why arXiv calls AI slop an "existential threat"

The numbers tell the story. According to Nature, monthly submissions to arXiv have jumped more than 50% since the launch of ChatGPT in late 2022. Over the same period, the platform's monthly rejections have increased fivefold to more than 2,400 papers. That is not a small moderation headache—it is a flood that moderators describe as overwhelming.

Paul Ginsparg, a physicist at Cornell University and co-founder of arXiv, told Nature that AI-generated submissions "frequently can't be discriminated just by looking at [the] abstract, or even by just skimming full text." In other words, the surface plausibility is high enough that spotting fabrications requires close reading—a task that does not scale when thousands of new papers arrive each month. Ginsparg called the phenomenon an "existential threat" to the system, which was designed to accelerate trustworthy research sharing, not to act as a spam filter for LLM outputs.

The enforcement shift signals that arXiv's team has decided individual accountability is more practical than trying to build an AI-slop classifier at the moderation layer. By penalizing authors, the platform shifts the verification burden upstream to the people who control what gets submitted in the first place.


What triggers the ban: hallucinated references, fabricated data, unverified claims

Dietterich's post and the broader context from Nature and analyses at conferences like NeurIPS and ICLR suggest the following patterns reliably trigger bans or rejections:

  • Hallucinated citations: Paper titles, author names, journal references, or DOIs that do not exist. This is the classic LLM failure mode—plausible-sounding references that no one can find in any database.
  • Fabricated experimental results: Tables, statistics, or plots that were generated by the model rather than derived from real data or simulation.
  • Plagiarized text: Sections lifted from training corpora without attribution, or paraphrases that cross the line into verbatim copying.
  • Biased or inappropriate language: Content that reflects training-data biases or stylistic artifacts (e.g., overly promotional phrasing, generic summaries) that a human author would typically catch and revise.
  • Misleading or incorrect claims: Statements of fact that are wrong but presented confidently, a failure mode we covered in detail in our hallucination explainer.

The policy does not ban AI use for legitimate tasks: generating code, checking proofs, drafting introductions, or running simulations. The line is verification. If you use an LLM to help with your bibliography, you are expected to cross-check every citation against a real database (e.g., Google Scholar, PubMed, Semantic Scholar) before submission. If you use it to draft a methods section, you are expected to read and revise to ensure accuracy.


Previous steps: the October 2025 review-article ban

The one-year ban builds on measures arXiv introduced last fall. In October 2025, the platform announced it would no longer accept computer science review articles and position papers unless they had already been peer reviewed elsewhere. The rationale was the same: an "unmanageable influx" of low-quality, AI-generated submissions that moderators described as "little more than annotated bibliographies"—papers that looked like literature surveys but were actually stitched together by an LLM with minimal human curation.

At the time, arXiv framed the move as a stricter enforcement of existing standards rather than a wholly new policy. The new ban on individual authors is the next escalation: instead of filtering categories of papers, arXiv is now holding people accountable for what they submit, regardless of category.

The message is clear: you own your submission. If your name is on the author list and the paper contains fabricated content, you bear the consequences even if you did not personally run the LLM prompt that generated the error.


Broader context: EY retractions, Deloitte flags, conference submissions

arXiv's crackdown is part of a wider reckoning over AI-generated misinformation in professional and academic publishing:

  • EY retracted a cybersecurity report after it was found to contain AI hallucinations.
  • GPTZero has flagged similar issues in Deloitte reports and submissions to leading machine learning conferences like NeurIPS and ICLR.
  • A separate study highlighted by Forbes documented an alarming increase in fabricated citations across published scientific papers—not just preprints.
  • Microsoft published research in February on "AI Recommendation Poisoning", identifying techniques companies use to embed hidden instructions in web content designed to manipulate AI systems' memory and outputs.
  • Google expanded its spam policies to cover AI search manipulation, addressing tactics for gaming AI-generated summaries and search results.

The pattern across all these incidents is the same: fluent-sounding text that looks authoritative but contains errors that only appear under close inspection. For publishers, conference organizers, and platforms like arXiv, the cost of that inspection is unsustainable at scale, so enforcement is shifting to authors and institutions.


What this means for researchers using AI tools

If you use ChatGPT, Claude, Gemini, or any other LLM as part of your research workflow, the arXiv policy does not ask you to stop. It asks you to verify before you submit. Here is what that looks like in practice:

1. Treat LLM outputs as drafts, not final text

If an assistant drafts a paragraph, a citation list, or a methods section, read it as if it came from an untrusted collaborator. Check facts, cross-reference sources, and rewrite where needed. The model is a tool, not a co-author with epistemic responsibility.

2. Verify every citation before submission

Use Google Scholar, PubMed, Semantic Scholar, or your institutional library to confirm that every cited paper exists, the author names are correct, the journal or venue is real, and the publication year matches. If you cannot find it in a trusted database, do not include it.

3. Run your own checks for plagiarism and bias

Tools like Turnitin, iThenticate, and even simple Google searches of key phrases can catch verbatim copying. For bias, read your draft out loud or ask a colleague to review sections you did not write entirely yourself. LLM-generated text often has tell-tale phrasing (e.g., "it is worth noting," "in today's rapidly evolving," "a comprehensive approach") that signals low-effort synthesis.

4. Use retrieval and tool-backed workflows where possible

For factual claims, the durable pattern is retrieval-augmented generation (RAG) or tool-backed answers before the model "freestyles" from its parametric memory. If your research involves data analysis, consider using MCP servers or agent skills that query trusted databases (e.g., arXiv's own API, PubMed, institutional repositories) rather than relying on the model's internal knowledge.

This is what we call epistemic infrastructure at ExplainX: building verification into your workflow so you do not rely on willpower in every single message. See our guide on what MCP is for examples of tool-first stacks.

5. Know when to abstain from AI assistance

If your task sits in the high-risk zone for hallucinations—specific citations, obscure topics, exact identifiers like DOIs or legal references—treat that as a signal to not use an LLM for that section, or to use it only as a search assistant (e.g., "find papers on X") and then manually verify every result.

For more on when LLMs are likely to hallucinate and how to reduce the risk, see our hallucination explainer.


ExplainX perspective: verification as infrastructure, not discipline

At ExplainX, we build skills, MCP servers, and training so teams can work with AI agents without mistaking fluency for truth. The arXiv ban is a symptom of a broader problem: most people do not have workflows that enforce verification automatically. They rely on remembering to check, which fails under deadline pressure or when the model's output looks convincing.

Here is what we recommend beyond the user-level prompt tips:

Separate "language" from "ground truth"

For factual questions, the durable pattern is: retrieval (search, RAG, internal docs) or tool-backed answers before the model generates from memory. The model is a synthesizer; the corpus or API you trust is the authority. This is what MCP enables in tool-first stacks.

Encode verification in your agent setup, not only in your head

If you use Claude Code, ChatGPT plugins, or custom agents, consider building skills that automatically cross-check citations against Scholar APIs, or that require a human approval step before inserting references into a draft. The goal is to make "verify before claim" the default path, not a manual override.

Treat "abstention" as a feature, not a bug

LLMs are trained to be helpful, which can nudge them toward a satisficing answer when the truthful move is "I don't know" or explicit uncertainty. If your workflow rewards confident-sounding answers (e.g., you paste output directly into a paper), you are selecting for hallucinations. Redesign prompts and evaluation to reward hedging and source-citing over fluency alone.

Educate teams, not just individuals

The arXiv ban penalizes authors, but the risk often sits with junior researchers or students who may not know how to spot a hallucinated citation. If you lead a lab or research group, make AI literacy part of onboarding: what LLMs are good at, where they fail, and what verification looks like in practice. Our courses and blog cover these patterns in depth.


The bigger picture: academic publishing under AI pressure

arXiv's one-year ban is not an isolated policy shift. It sits within a broader transformation of how academic and professional publishing handles AI-generated content:

  • Conferences like NeurIPS and ICLR now run hallucination audits on submissions and have rejected papers with fabricated citations.
  • Journals are updating author guidelines to require disclosure of AI use and verification of AI-generated text.
  • Fact-checking services and plagiarism detectors are adding LLM-detection features, though these are not foolproof (especially as models improve).
  • Institutional review boards and ethics committees are beginning to treat unverified AI outputs as a form of research misconduct, similar to data fabrication.

The arXiv ban is a signal that the research community is moving from "AI is a curiosity" to "AI requires accountability". If you publish research, you now operate in an environment where LLM-assisted does not mean LLM-verified, and the gap between those two states can cost you access to critical infrastructure like preprint servers.


What to do now

If you are a researcher, graduate student, or lab lead who uses AI tools in your workflow:

  1. Audit your current process: Are you verifying citations? Checking facts? Reading LLM drafts as untrusted until proven otherwise?
  2. Build verification into tooling: Use MCP servers for database queries, skills that enforce human-in-the-loop for high-stakes claims, or scripts that cross-check references against Scholar APIs.
  3. Educate your team: Make sure everyone who touches a paper knows what hallucinations are, how to spot them, and why "it looked good" is not a defense.
  4. Treat arXiv access as infrastructure: Losing a year (or more) of preprint access is a serious career setback. Do not risk it by skipping verification steps.
  5. Follow updates: arXiv, journals, and conferences are still figuring out enforcement details. Subscribe to moderation updates or check the arXiv blog for clarifications.

For teams building AI-assisted research tools or workflows, this is a call to design for verification, not just speed. The platforms that will win in this environment are the ones that make it easy to be rigorous, not the ones that make it easy to be fluent.


Sources and further reading


Final thoughts

The arXiv one-year ban is a warning shot for the research community: AI tools are powerful, but unchecked AI outputs are career-ending risks. The policy does not ban AI; it bans negligence. If you use LLMs to draft, cite, or summarize, you are expected to verify before you submit—and if you do not, the consequences are now codified and public.

For researchers, the takeaway is simple: fluency is not accuracy. For teams building AI-assisted workflows, the takeaway is: verification must be infrastructure, not discipline. The platforms and tools that make rigorous workflows easy will be the ones researchers trust and adopt. The ones that optimize only for speed will be the ones that get their users banned.


About ExplainX: We build skills, MCP servers, and training for teams working with AI agents—so verification, safety, and epistemic hygiene are part of the architecture, not an afterthought. Explore our blog for practical guides on LLMs, hallucinations, agent workflows, and research integrity.


This post summarizes arXiv's May 2026 enforcement policy on AI-generated errors and situates it within the broader landscape of academic publishing under AI pressure. All interpretations and ExplainX-specific recommendations are editorial; consult arXiv's official documentation and your institution's research integrity office for authoritative guidance.

Related posts