MVAR redefines visual autoregression by applying Markovian constraints to scale and spatial, slashing training memory by 3.0x and enabling KV-cache-free inference without sacrificing performance.
VCD reinterprets image compression as a stochastic forward diffusion path, enabling high-fidelity reconstruction via direct SDE reversal with minimal sampling steps.
PerLDiff leverages 3D geometric information within diffusion models to enable robust and precise object-level control for street view image generation.
Thank you to Jon Barron for the source code for the website!