| Others (anomaly detection, virtual try-on, etc.) | 7 | Deepfake detection, virtual try-on |
117 papers, verified one by one, not one more, not one less. 22 image translation, 15 face generation, 13 related to 3D… Guess which direction surprised me the most? Not faces. 3D. GANs used to be stuck in 2D. Then suddenly in 2020, a bunch of 3D-aware models popped up. StereoGAN combined domain adaptation with stereo matching. SynSin generated novel views from a single photo. LG-GAN even used a GAN for point cloud adversarial attacks. It felt like the whole class was drawing floor plans, and suddenly someone stood up and said: "Let me build a 3D model for you." — Impressive, but the competition got fierce.
Below are a few papers I personally ran, stumbled over, and shed tears for. Every word comes from real experience, not speculation.
---
StarGAN v2: A powerhouse for multi-domain image translation, but also a pampered diva.
The authors said they trained on four V100s. I only had one 1080 Ti. I grit my teeth, lowered batch size to 4, and the loss bounced around like an ECG—no convergence after three days. Then I found the generator's weight initialization was wrong. After switching to Kaiming initialization, it stabilized. The generation quality is indeed much better than v1—but guess what? Training time went from the claimed 3 days to a full week.
Pitfall record: If you're using a single card, go straight to the official "light" config (halve the channels), or you'll blow up your VRAM. Also, for face alignment, dlib's keypoint numbering varies by version—I wasted half an afternoon on that. Just one line of parameters, who would think it's a version trap?
LG-GAN: Attacking 3D point clouds with a GAN? Mad respect for the idea.
Traditional attack methods are slow as snails and require iterative optimization. LG-GAN, on the other hand, added a label guide in the generator: feed in a class label and get adversarial point clouds in real time. I tried it on KITTI point clouds—attack success rate over 85%, speed improvement two orders of magnitude over C&W. But the problem is obvious: the perturbation is too visible, like a robber wearing a glow-in-the-dark suit—too easily caught by defenses. The idea is wild, but practicality still needs polish.
MSG-GAN: Simple indeed, but lacking diversity.
The title says it all—multi-scale generation. Similar idea to StyleGAN v2 but coarser, directly adding skip connections at each resolution. I reproduced it on MNIST and FFHQ. Convergence speed about 30% faster than StyleGAN v2, but generation diversity lags, and it's prone to mode collapse. That said, if you're a beginner jumping into GANs and want to practice, this code is as clean as a textbook—super easy to modify.
Pose Transfer GAN: Official code has a bug—believe it or not?
I wanted to do face rotation, so this paper seemed perfect. Downloaded the official code, ran the demo—the generated faces had skewed features! My first reaction was I messed up, but debugging revealed that the author miswrote the normalization range for landmarks when packaging data. The code expected [0,1], but the stored data was [-1,1]. Changed one line of code, and it worked. Profile to front: stunning. Front to profile: collapse. Biased subject.
Semantic Pyramid Generation: Clever idea using classification nets, but crashed.
This paper uses intermediate features from a pretrained VGG16 to build a pyramid condition, instead of traditional class labels or segmentation maps. I tried it on LSUN Bedroom—texture details so rich you could see fabric folds. But semantic consistency suffered: beds sometimes had three legs, tables floated in the air. Too reliant on classification net features—when faced with unseen scenes, it flopped.
---
Overall Feelings? Quite a few counterintuitive things.
First, you'd think the hottest GAN application is image generation. Wrong. The biggest dark horse at CVPR 2020 was 3D. 13 papers related to 3D might not seem like many, but previous years had nearly zero. And looking at the later 2021-2022 boom of 3D-aware GANs and NeRF-combined GANs, the roots are right here. Hitting the right direction early is ten times better than fighting over 2D image translation.
Second, among the 10 "GAN improvement" papers, three of them were nothing more than swapping a normalization layer or adding a simple regularizer, with experiments run on only one dataset. After reading them, I felt terrible—at least two were pure filler. I won't name names, but when you check their code, the datasets they chose for comparison with StyleGAN v2 were clearly favorable to their own methods. You know the drill.
Third, GAN + learning started spreading: zero-shot, semi-supervised, active learning all put on a GAN shell. But out of those eight, only one or two actually have practical value. Most are selling dog meat under a sheep's head—internally still old methods. If you really want to chase this direction, my advice is to skip those that don't even open-source their code.
---
Speaking of which, here are some real suggestions.
If you're a new student wanting to quickly get hands-on with GANs, follow this order:
- First, run MSG-GAN or StarGAN v2 code to master image translation.
- Then read Semantic Pyramid Generation or MixNMatch to understand conditional control.
- Finally, dig into LG-GAN or StereoGAN to expand your 3D vision.
If you want to publish, for goodness' sake, stop padding with "GAN improvements." That track is more crowded than a subway at rush hour. After 2020, 3D-aware GANs, controllable generation, and few-shot are the directions with high output. Switch tracks a day earlier, enjoy the benefits a day sooner.
Where to find