利用GPT创作学术论文的过程记录 (English)

Generated: 2026-06-20 21:20:37

---

I started by going through CVPR, ICCV, and ICLR papers from 2024–2025 to figure out the direction. Then I threw a summary of my own to GPT: “Find me the top 8 papers with the best results on the M3FD and FLIR datasets, summarize each one, compare them, and point out what nobody’s done yet.”

But what blew my mind was how fast it processed everything.

In under a minute, it had done it all.

I honestly thought: “Holy shit, this is way too convenient.”

But! — see, here comes that “but” —

The papers it gave me weren’t necessarily real.

I later verified one. The title and authors were correct, but the volume and page numbers were completely made up! Freaky, right? So with citations, there’s no shortcut. In the end, I had to download the actual papers using Zotero, read the abstracts, and then let GPT summarize them. It’s great for fast screening, but you have to do the fact-checking.

Another fun thing: I asked GPT to draw network structure diagrams. It could generate them, but the style… I didn’t like it, and they weren’t vector graphics. I had it make four versions in the style of top conference papers, and in the end I redrew everything myself in Visio. It didn’t save much time on the actual drawing, but it saved a ton on framework design.

Speaking of pitfalls — I tried to have Codex integrate GPT-generated modules directly into Ultralytics and run it. Half the code threw errors! It’s not that it couldn’t do it — it just had no idea what libraries I had installed on my machine or where my data was. It took me a whole day of fixing to get it running.

---

Literature Review: The Biggest Trap of All

I have to warn you about this one.

I asked GPT to find papers from 2024 to 2025, with most from 2025. It gave me 50, in a table: authors, titles, journals, core ideas. Looked super professional, right?

I checked 10 of them. Only 3 were completely accurate.

Some had the right title but wrong authors, others had the wrong journal or year, and some had the title and abstract correct but didn’t exist anywhere online — the AI made them up! Imagine submitting that and having a reviewer find out the paper you cited doesn’t exist… I don’t even want to think about it.

So I changed my approach. First, I asked GPT to “recall” 50 real papers. Then I verified each title and DOI one by one on Google Scholar. Only after verification did I send them back to GPT to make the table. One extra validation step, but way more reassuring.

Here’s a trick that worked really well: having GPT review my own writing. After finishing a section of the literature review, I’d throw it at GPT and ask it to act as a reviewer — critiquing logic, coverage, and potential counterpoints. Some of its suggestions were so-so, but occasionally it pointed out gaps I hadn’t noticed. As for keywords, though, the ones it gave were clearly made-up field terms — I thought they were sketchy and skipped them.

Long story short: AI is good at “expanding” and “organizing,” not “creating factual information.” You have to be the gatekeeper.

---

Experiments and Methods: My Most Proud Moment

Since I’m in computer science, I had to run models myself — no way around it.

I had GPT reproduce the network structures of those 8 papers, then had Codex integrate each module into the Ultralytics project, rented a server, and ran experiments on the M3FD and FLIR datasets.

What was my proudest moment?

My own innovation module finished training and beat the baseline by 2–6% in detection accuracy, surpassing several newer methods! That improvement became the core of the Methods section. GPT helped polish the description, but the key logic and formulas were mine.

But let me tell you something crazier. Someone once tried an “AI personality” experiment and found that the parameter R, which was supposed to output 0.1, 1.0, and 2.0, suddenly showed R = 2.5, 8, or even 30,000 in the debug logs. Anomalous data often signals the real problem. I ran into something similar — one module’s accuracy dropped sharply on nighttime data. GPT’s explanation was “overfitting,” but I knew that wasn’t right. After tracing the code myself, I realized the data augmentation strategy was distorting the images under low light.

If I had just trusted the AI’s analysis, I would have missed the real cause. You have to lead the experimental analysis yourself. GPT can help you write accuracy analysis paragraphs in a standard template, but spotting data anomalies and deciding how to handle comparisons — that’s on you.

---

Polishing, Plagiarism Reduction, and Reviewer Responses: Gotta Be Careful

Polishing is what I used most. After finishing a draft, I used a verified prompt: “Please rephrase it for clarity, coherence and conciseness, ensuring each paragraph flows into the next. Remove jargon. Use a professional tone.” The result was really good — sounded more academic than anything I could have written myself.

I also tried reducing similarity. I would feed a paragraph to GPT and ask it to rewrite it, but I kept half and changed half, otherwise the AI voice was too obvious. Later, a classmate of mine had their paper flagged by a reviewer for “inconsistent language style” because the whole thing had been AI-rewritten, and they had to resubmit. AI traces are still detectable — especially the kind where every sentence starts abruptly with few connecting words.

When it came to reviewer responses, I had GPT act as a “hostile reviewer” and tear my paper apart in advance. It pointed out some formatting issues, but its technical critiques were shallow — not as helpful as having a friend read it. It’s good for catching low-level errors, but high-level ones still need people.

One strategy I found especially clever: asking GPT to condense the conclusion of my paper into a headline or a WeChat post. Not for publication — just to help you verify: “What’s the real core selling point of this paper?” I found that if I couldn’t state it in one sentence, it meant the storyline wasn’t clear yet.

---

The Most Overlooked Deadly Traps

Let me tell you three — someone will definitely fall into them:

First: fabricated references. Like I said, 30% accuracy is generous. You need to check every single one on Google Scholar, PubMed, or Crossref. Don’t cheat! Imagine being asked during a defense: “The paper you cited by Zhang from 2025 — how is your method different from theirs?” If you hem and haw and can’t answer, your paper getting rejected is the least of your worries.

Second: messing up in the defense. You used AI to write the Introduction and Related Work, but you never actually read those papers carefully. Then the defense committee asks a simple question and you freeze. I forced myself to read the abstract and structural diagrams of every paper I cited — at least so I knew what the other guys did.

Third: getting flagged for AI-generated content. Many journals now require you to declare whether you used AI tools, and some submission systems have AI text detectors. Do not use GPT output directly. Change sentence structures, swap synonyms, and add your own understanding. The best approach: let GPT write

利用GPT创作学术论文的过程记录 (English)

利用GPT创作学术论文的过程记录 (English)

Literature Review: The Biggest Trap of All

Experiments and Methods: My Most Proud Moment

Polishing, Plagiarism Reduction, and Reviewer Responses: Gotta Be Careful

The Most Overlooked Deadly Traps

Cael Lee

Ready to get started?