TY - JOUR
T1 - FoldDiff
T2 - Folding in Point Cloud Diffusion
AU - Zhao, Yuzhou
AU - Matías Di Martino, J.
AU - Farzam, Amirhossein
AU - Sapiro, Guillermo
N1 - Publisher Copyright:
© 2025, Transactions on Machine Learning Research. All rights reserved.
PY - 2025
Y1 - 2025
N2 - Diffusion denoising has emerged as a powerful approach for modeling data distributions, treating data as particles with their position and velocity modeled by a stochastic diffusion process. While this framework assumes data resides in a fixed vector spaces (e.g., images as pixel-ordered vectors), point clouds present unique challenges due to their unordered representation. Existing point cloud diffusion methods often rely on voxelization to address this issue, but this approach is computationally expensive, with cubically scaling complexity. In this work, we investigate the misalignment between point cloud irregularity and diffusion models, analyzing it through the lens of denoising implicit priors. First, we demonstrate how the unknown permutations inherent in point cloud structures disrupt denoising implicit priors. To address this, we then propose a novel folding-based approach that reorders point clouds into a permutation-invariant grid, enabling diffusion to be performed directly on the structured representation. This construction is exploited both globally and locally. Globally, folded objects can represent point cloud objects in a fixed vector space (like images), therefore it enables us to extend the work of denoising as implicit priors to point clouds. Locally, the folded tokens are efficient and novel token representations that can improve existing transformer-based point cloud diffusion models. Our experiments show that the proposed folding operation integrates effectively with both denoising implicit priors as well as advanced diffusion architectures, such as UNet and Diffusion Transformers (DiTs). Notably, DiT with locally folded tokens achieves competitive generative performance compared to state-of-the-art models while significantly reducing training and inference costs relative to voxelization-based methods.
AB - Diffusion denoising has emerged as a powerful approach for modeling data distributions, treating data as particles with their position and velocity modeled by a stochastic diffusion process. While this framework assumes data resides in a fixed vector spaces (e.g., images as pixel-ordered vectors), point clouds present unique challenges due to their unordered representation. Existing point cloud diffusion methods often rely on voxelization to address this issue, but this approach is computationally expensive, with cubically scaling complexity. In this work, we investigate the misalignment between point cloud irregularity and diffusion models, analyzing it through the lens of denoising implicit priors. First, we demonstrate how the unknown permutations inherent in point cloud structures disrupt denoising implicit priors. To address this, we then propose a novel folding-based approach that reorders point clouds into a permutation-invariant grid, enabling diffusion to be performed directly on the structured representation. This construction is exploited both globally and locally. Globally, folded objects can represent point cloud objects in a fixed vector space (like images), therefore it enables us to extend the work of denoising as implicit priors to point clouds. Locally, the folded tokens are efficient and novel token representations that can improve existing transformer-based point cloud diffusion models. Our experiments show that the proposed folding operation integrates effectively with both denoising implicit priors as well as advanced diffusion architectures, such as UNet and Diffusion Transformers (DiTs). Notably, DiT with locally folded tokens achieves competitive generative performance compared to state-of-the-art models while significantly reducing training and inference costs relative to voxelization-based methods.
UR - https://www.scopus.com/pages/publications/105014100456
UR - https://www.scopus.com/inward/citedby.url?scp=105014100456&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:105014100456
SN - 2835-8856
VL - August-2025
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -