AI Research

Don't Paste Unpublished Research Into ChatGPT

Arpit TripathiArpit TripathiLinkedIn·June 25, 2026·11 min read

Upload an unpublished manuscript to ChatGPT and you may hand your research to a vendor that trains on it. The confidentiality risk, and the fix.

You finished the manuscript at 1am and you want to paste it into ChatGPT to tighten the abstract. Don't. On a standard personal ChatGPT account, model training is on by default, so an unpublished result you have not yet defended in print can land inside a vendor's training pipeline the moment you hit enter. The setting that stops it is off until you turn it on, and OpenAI states plainly that once text enters a training corpus it cannot be removed.

Most advice about AI in academia fixates on plagiarism and authorship disclosure. That is the wrong fire. The sharper risk for anyone holding unpublished data is confidentiality: pasting a manuscript, a grant narrative, or a peer-review file into a consumer chatbot can breach a publisher policy, violate a funder's rules, or simply expose a novel idea before you have claimed it. This is not the academic-integrity question of whether AI wrote your words. It is the older, harder question of who else now holds your data.

Publishers already forbid uploading manuscripts under review

If you are a peer reviewer, the rule is explicit. Elsevier's reviewer policy states that reviewers must not upload a submitted manuscript, or any part of it, into a generative AI tool, because doing so can infringe the authors' confidentiality and intellectual property rights and, where the text contains personally identifiable information, may breach data privacy. AI tools may be used only in a limited supportive way, such as improving the language of the review report itself, never by feeding in the manuscript.

Read the part publishers rarely get credit for spelling out. The underlying logic is the same whatever the tool: uploading a manuscript means sharing confidential, unpublished research with a third-party system you do not control. That kills the most common rationalisation in academia, that a paid or privacy-branded tool makes the paste safe. The label on the product does not change the obligation you signed.

Insight

A manuscript under review is confidential by definition. The moment it leaves your laptop for a system you do not control, that confidentiality is gone, and no 'private' label on the tool brings it back.

Elsevier is not an outlier. The International Committee of Medical Journal Editors, whose recommendations shape practice across thousands of journals, says reviewers should not supply an author's manuscript to an AI tool where confidentiality cannot be assured, unless the journal explicitly permits it, because many AI tools retain uploaded content with no controls around its future use. ICMJE also tells reviewers to request the journal's permission before using any AI in their review, and to disclose which tool they used and for what purpose.

Two patterns repeat across these policies. The prohibition is about confidentiality, not output quality, so a smarter model does not make the upload acceptable. And the burden sits with you, the reviewer, not the platform: the journal cannot see what you paste into a chat window, which is exactly why the agreement you sign puts the obligation on your judgment rather than on a system anyone can audit.

Funders go further: NIH banned generative AI in peer review outright

The US National Institutes of Health prohibits its scientific peer reviewers from using generative AI to analyse and formulate critiques of grant applications and contract proposals. The notice, NOT-OD-23-149, was released on June 23, 2023 to clarify the agency's earlier confidentiality policy, and NIH revised its confidentiality agreements so reviewers must certify they will not upload or share content or original concepts from applications, proposals, or critiques to any online generative AI tool. The reason is blunt: using such tools requires sharing application material, which violates the confidentiality of the review.

Read that against your own habits. A federal funder treats a single paste into a chatbot as a confidentiality violation serious enough to rewrite a binding agreement over. The same paste of your own unpublished manuscript is not a casual convenience. It is the same act, minus the agreement you signed, and the data leaves your control just as completely.

Consumer AI can train on your inputs unless you opt out

Here is the mechanism that makes all of this concrete. OpenAI improves its models by training on the conversations people have with ChatGPT, and for free, Plus, and Pro personal accounts that is on by default. You switch it off under Settings, Data Controls, by turning off the toggle labelled Improve the model for everyone, after which new conversations are not used for training. Business, Enterprise, and API usage are different: OpenAI says it does not train on those inputs by default.

So the question is not abstract. On a personal account with default settings, the text you paste can feed the next model version, and OpenAI is clear that the opt-out only applies to conversations sent after you flip it. For a polished blog post that costs you nothing. For an unpublished dataset, a method you have not yet patented, or a grant idea a competing lab would love to see, it is a different calculation entirely. The risk is twofold: a confidentiality breach against a policy you agreed to, and the older fear with a new vector, getting scooped.

There is a further wrinkle worth knowing. Opting out of training is not the same as opting out of all retention. Vendors keep conversations for fixed windows for safety and abuse review, and some reserve the right to use flagged conversations even when a user has opted out. Read that as a floor, not a ceiling: opting out reduces the training exposure but does not turn a consumer chatbot into a vault. For genuinely confidential work, the safer answer remains keeping it off the tool, not trusting a toggle to make it safe.

Pro Tip

Before pasting anything sensitive into a personal ChatGPT account, open Settings, Data Controls, and turn off the Improve the model for everyone toggle. Opt-out is not the default, so the burden is on you, and it only protects conversations sent after you switch it.

ChatGPT is not the only place this applies. As of June 2026, Anthropic's consumer terms, updated August 28, 2025, train on chats and coding sessions across Claude Free, Pro, and Max plans unless the user opts out, and choosing to allow training extends data retention to five years rather than the prior 30-day window. As with OpenAI, the commercial tiers, including Claude for Work and API access, are excluded from default training. The pattern is consistent across vendors: consumer plans default toward training, business and API plans do not.

The quieter stake: getting scooped

Confidentiality policy is the formal risk. The one that keeps early-career researchers up at night is priority. A novel hypothesis, an unexpected effect size, a clever assay: these are the things you have not yet staked a public claim to. Being first is the whole currency of academic credit, and it evaporates the instant someone else publishes the same idea.

Pasting that idea into a chatbot does not mean a rival reads it tomorrow. The realistic harm is subtler. Content used for training can surface, in transformed form, in later model outputs that anyone can prompt, so your phrasing of a research direction becomes part of the statistical soup the model draws from. You are not guaranteed to be scooped. You have simply, and voluntarily, reduced your control over the one asset academic careers are built on. Weigh that against the five minutes a chatbot saves you on an abstract.

The safe workflow for researchers who still want AI help

You do not have to abandon AI to protect your work. You have to be deliberate about what you feed it and which account you feed it from. The distinction that matters is not whether AI is allowed in your field. It is which specific text is sensitive and which is already public.

  • Never upload a manuscript you are reviewing. This is non-negotiable under Elsevier-style policies and NIH rules. Summarise nothing, paste nothing.
  • Keep your own unpublished data out of default personal accounts. If the training toggle is on, the content can be retained and reused.
  • Opt out of model training, or use an account tier that does not train on inputs (business, enterprise, or the API).
  • Separate the safe from the sensitive. Use AI freely for already-published text, general background, and writing mechanics. Withhold unpublished results, identifiers, and novel methods.
  • Check your publisher's and funder's specific policy before any AI-assisted step, because they differ and they change.
What you pasteConfidentiality riskSafer move
Manuscript you are reviewingPolicy violation; breaches author confidentialityDo not paste at all; AI only for your own review wording
Your unpublished data or resultsCan be trained on by default; scoop riskOpt out of training or use a non-training tier
Grant narrative or original ideaFunder may prohibit; exposes the idea earlyKeep it off third-party chatbots entirely
Already-published paper textLow; it is publicUse freely for summaries and Q and A
Writing mechanics, no dataLow if no confidential content includedFine; paste prose, not findings

Where MemX fits, and where it does not

The friction is real. You want the AI to remember your project, your terminology, and your prior drafts so you are not re-explaining your work every session, but you do not want that context living in a vendor's chat history or training store. MemX is a private-by-architecture memory layer for ChatGPT, Claude, and Gemini. Your research context sits in per-user isolation with encryption at rest, and it is not used to train models. That keeps the persistent context you want out of the default training and chat-history surface that creates the exposure described above.

The point is to change where the durable context lives, not to make confidential text safe to upload. Background you genuinely want the model to carry across sessions, such as your project's terminology, your writing style, and your prior published work, can live in that private layer instead of being re-pasted into a chat window where the training toggle and retention window apply. The sensitive material, the unpublished result itself, still should not leave your own controlled environment.

Be clear about the limits. MemX is private by architecture, not end-to-end encrypted and not zero-knowledge, and it is not a legal shield. It does not override your obligations. If you are reviewing a manuscript or writing under an NIH confidentiality agreement, the rule is unchanged: do not upload the protected material to any third-party AI tool, MemX included. The product reduces the everyday leak of your own working context. It does not make you compliant, and it does not replace following your publisher's and funder's rules.

Frequently asked questions

Frequently Asked Questions
01Does ChatGPT train on what I paste?

On free, Plus, and Pro personal accounts it can, by default. OpenAI trains on conversations unless you turn off the Improve the model for everyone toggle in Settings, Data Controls. Business, Enterprise, and API usage are not trained on by default.

02Can I use ChatGPT to summarise a manuscript I am peer reviewing?

No. Elsevier's policy says reviewers must not upload the manuscript or any part of it into a generative AI tool, as it can breach confidentiality and intellectual property rights. AI may help word your own review, not process the submission.

03Did NIH ban AI in peer review?

Yes. NIH notice NOT-OD-23-149, released June 23, 2023, prohibits reviewers from using generative AI to analyse and formulate critiques of grant applications and proposals, and reviewers certify they will not upload application content to third-party AI tools.

04What is the risk of pasting unpublished data into AI?

Two risks. On a default personal account the content can be retained and used to train the model, and exposing a novel result or method to an outside system before publication creates a real chance of being scooped or breaching a policy you agreed to.

05Is any AI tool safe for unpublished research?

Safer options exist: opt out of training, or use a tier that does not train on inputs, such as Enterprise or the API. No consumer tool is a legal guarantee, so material under a confidentiality agreement should stay off third-party AI entirely.

The takeaway

Pasting unpublished research into a default personal ChatGPT account can hand your work to a vendor that trains on it, and for manuscripts under review it can break an explicit publisher or funder rule. The fix is not complicated. Do not upload material you are reviewing, opt out of training or use a non-training tier for your own data, and keep persistent project context in a layer built for privacy rather than in the chatbot's history. Treat your unpublished work as confidential, because the moment it leaves your machine, no one else will.

Read Next

Or try MemX to access 40+ AI models in one place — including Claude Sonnet 4.6 and GPT-5.4 — and get your questions answered today.

Was this article helpful?

Found this useful? Share it with someone who needs it.

Free · iOS, Android & WhatsApp

Stop losing what you save.
Let MemX remember it for you.

Every screenshot, photo, PDF and voice note — captured, encrypted, and instantly searchable. Ask in plain English, get the answer in seconds.

  • Reads text inside images and handwriting
  • Private and encrypted by default
  • Free to start, no credit card

Takes under a minute to set up. Your data stays yours.

Arpit Tripathi
Written by
Arpit TripathiLinkedIn

Founder of MemX. Ex-Google Staff Tech Lead Manager, ex-AWS Senior SDE (Elastic Block Store). Writes about practical AI on the MemX blog.

Keep reading

More guides for AI-powered students.