Ask any image generation model to draw a clock showing 3:45 and you'll likely get 10:10. Ask for a spa scene and the girl will have six fingers. These aren't random glitches — they're systematic blind spots that reveal how diffusion models actually "see" the world.
Nearly every AI-generated clock shows approximately 10:10. Why? Because that's the most common time displayed in stock photography and watch advertisements. Manufacturers pose watches at 10:10 because it frames the brand logo and creates a "smile" shape. The training data is overwhelmingly biased toward this single time.
This is a perfect example of dataset bias manifesting as generation bias. The model has seen thousands of clocks at 10:10 and only a handful at other times. When you ask for "a clock showing 7:25," the model's prior is so strong that it overrides the prompt.
AI famously struggles with hands. The failure modes are specific:
The root cause: hands are high-frequency detail in a sea of low-frequency body parts. Arms, torsos, and faces have relatively smooth gradients. Hands are compact, articulated, and vary dramatically with pose. The model's latent space doesn't have enough resolution for them.
Most models produce backwards, scrambled, or nonsensical text. DALL-E 3 and newer models have improved, but the fundamental challenge remains: text generation requires sequential, exact character placement. Diffusion models work holistically — they generate the entire image simultaneously, not character by character.
Mirrors, water reflections, and glass surfaces consistently break. A person looking in a mirror might see a different face. A building reflected in water might have the wrong number of floors. Models don't understand reflection as a physical constraint — they treat the reflected region as an independent generation area.
If you're building products with AI image generation, these blind spots directly impact quality:
These blind spots teach us something important about AI: models learn the distribution, not the rules. A human knows clocks can show any time because we understand what a clock is. A diffusion model knows clocks usually show 10:10 because that's what the data says. Until models develop genuine spatial and physical reasoning, these blind spots will persist — they'll just move to different, subtler places.