FoldDiff: Folding in Point Cloud Diffusion

Yuzhou Zhao, J. Matías Di Martino, Amirhossein Farzam, Guillermo Sapiro

Research output: Contribution to journalArticlepeer-review

Abstract

Diffusion denoising has emerged as a powerful approach for modeling data distributions, treating data as particles with their position and velocity modeled by a stochastic diffusion process. While this framework assumes data resides in a fixed vector spaces (e.g., images as pixel-ordered vectors), point clouds present unique challenges due to their unordered representation. Existing point cloud diffusion methods often rely on voxelization to address this issue, but this approach is computationally expensive, with cubically scaling complexity. In this work, we investigate the misalignment between point cloud irregularity and diffusion models, analyzing it through the lens of denoising implicit priors. First, we demonstrate how the unknown permutations inherent in point cloud structures disrupt denoising implicit priors. To address this, we then propose a novel folding-based approach that reorders point clouds into a permutation-invariant grid, enabling diffusion to be performed directly on the structured representation. This construction is exploited both globally and locally. Globally, folded objects can represent point cloud objects in a fixed vector space (like images), therefore it enables us to extend the work of denoising as implicit priors to point clouds. Locally, the folded tokens are efficient and novel token representations that can improve existing transformer-based point cloud diffusion models. Our experiments show that the proposed folding operation integrates effectively with both denoising implicit priors as well as advanced diffusion architectures, such as UNet and Diffusion Transformers (DiTs). Notably, DiT with locally folded tokens achieves competitive generative performance compared to state-of-the-art models while significantly reducing training and inference costs relative to voxelization-based methods.

Original languageEnglish (US)
JournalTransactions on Machine Learning Research
VolumeAugust-2025
StatePublished - 2025

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'FoldDiff: Folding in Point Cloud Diffusion'. Together they form a unique fingerprint.

Cite this