Hierarchical data structures through the lenses of diffusion models

Name: Hierarchical data structures through the lenses of diffusion models
Start: 2024-10-02T14:00:00-04:00
End: 2024-10-02T15:00:00-04:00
Location: CMSA, 20 Garden St, G10

CMSA EVENTS: CMSA NEW TECHNOLOGIES IN MATHEMATICS

When: October 2, 2024

2:00 pm - 3:00 pm

Where: CMSA, 20 Garden St, G10

Address: 20 Garden Street, Cambridge, MA 02138, United States

Speaker: Antonio Sclocchi (EPFL)

The success of deep learning with high-dimensional data relies on the fact that natural data are highly structured. A key aspect of this structure is hierarchical compositionality, yet quantifying it remains a challenge.

In this talk, we explore how diffusion models can serve as a tool to probe the hierarchical structure of data. We consider a context-free generative model of hierarchical data and show the distinct behaviors of high- and low-level features during a noising-denoising process. Specifically, we find that high-level features undergo a sharp transition in reconstruction probability at a specific noise level, while low-level features recombine into new data from different classes. This behavior of latent features leads to correlated changes in real-space variables, resulting in a diverging correlation length at the transition.

We validate these predictions in experiments with real data, using state-of-the-art diffusion models for both images and texts. Remarkably, both modalities exhibit a growing correlation length in changing features at the transition of the noising-denoising process.

Overall, these results highlight the potential of hierarchical models in capturing non-trivial data structures and offer new theoretical insights for understanding generative AI.

https://harvard.zoom.us/j/95706757940?pwd=dHhMeXBtd1BhN0RuTWNQR0xEVzJkdz09
Password: cmsa