通用扩散噪声调度和采样步骤存在缺陷



摘要
评论

> 计划到零终端信噪比 (SNR) 不起作用,因为它在最后的时间步变为无穷大。
>
> sqrtrecipalphas_cumprod tensor([ 1.0003, 1.0009, 1.0017, 1.0027, 1.0040, 1.0056, 1.0074, 1.0094,
> 1.0117, 1.0143, 1.0171, 1.0201, 1.0235, 1.0271, 1.0310, 1.0352,
> 1.0397, 1.0444, 1.0495, 1.0549, 1.0605, 1.0665, 1.0729, 1.0795,
> 1.0866, 1.0939, 1.1017, 1.1098, 1.1184, 1.1273, 1.1367, 1.1464,
> 1.1567, 1.1674, 1.1786, 1.1903, 1.2026, 1.2154, 1.2288, 1.2428,
> 1.2574, 1.2726, 1.2886, 1.3053, 1.3228, 1.3410, 1.3601, 1.3801,
> 1.4011, 1.4230, 1.4460, 1.4701, 1.4954, 1.5220, 1.5499, 1.5792,
> 1.6101, 1.6426, 1.6768, 1.7129, 1.7511, 1.7915, 1.8342, 1.8794,
> 1.9275, 1.9785, 2.0329, 2.0908, 2.1526, 2.2188, 2.2898, 2.3660,
> 2.4481, 2.5368, 2.6327, 2.7370, 2.8505, 2.9746, 3.1108, 3.2608,
> 3.4270, 3.6120, 3.8190, 4.0522, 4.3170, 4.6199, 4.9698, 5.3785,
> 5.8620, 6.4427, 7.1530, 8.0416, 9.1848, 10.7100, 12.8463, 16.0520,
> 21.3966, 32.0883, 64.1689, inf], dtype=torch.float64)
> sqrtrecipm1alphas_cumprod tensor([2.5133e-02, 4.1840e-02, 5.7956e-02, 7.3890e-02, 8.9761e-02, 1.0562e-01,
> 1.2150e-01, 1.3742e-01, 1.5340e-01, 1.6944e-01, 1.8555e-01, 2.0175e-01,
> 2.1805e-01, 2.3446e-01, 2.5099e-01, 2.6764e-01, 2.8443e-01, 3.0136e-01,
> 3.1846e-01, 3.3572e-01, 3.5317e-01, 3.7081e-01, 3.8865e-01, 4.0671e-01,
> 4.2499e-01, 4.4353e-01, 4.6231e-01, 4.8138e-01, 5.0072e-01, 5.2038e-01,
> 5.4035e-01, 5.6066e-01, 5.8133e-01, 6.0238e-01, 6.2383e-01, 6.4570e-01,
> 6.6801e-01, 6.9079e-01, 7.1407e-01, 7.3787e-01, 7.6223e-01, 7.8717e-01,
> 8.1273e-01, 8.3894e-01, 8.6585e-01, 8.9350e-01, 9.2193e-01, 9.5118e-01,
> 9.8132e-01, 1.0124e+00, 1.0445e+00, 1.0776e+00, 1.1118e+00, 1.1473e+00,
> 1.1841e+00, 1.2222e+00, 1.2618e+00, 1.3031e+00, 1.3460e+00, 1.3908e+00,
> 1.4375e+00, 1.4864e+00, 1.5376e+00, 1.5913e+00, 1.6478e+00, 1.7072e+00,
> 1.7699e+00, 1.8361e+00, 1.9063e+00, 1.9807e+00, 2.0599e+00, 2.1443e+00,
> 2.2346e+00, 2.3314e+00, 2.4354e+00, 2.5477e+00, 2.6693e+00, 2.8014e+00,
> 2.9456e+00, 3.1037e+00, 3.2779e+00, 3.4708e+00, 3.6858e+00, 3.9269e+00,
> 4.1995e+00, 4.5103e+00, 4.8681e+00, 5.2847e+00, 5.7760e+00, 6.3646e+00,
> 7.0828e+00, 7.9792e+00, 9.1302e+00, 1.0663e+01, 1.2807e+01, 1.6021e+01,
> 2.1373e+01, 3.2073e+01, 6.4161e+01, inf], dtype=torch.float64)
零终端信噪比本身没有问题,完全可以工作。问题是许多采样器的数学公式是在假设模型 epsilon 预测的情况下推导出来的,因此许多实现会首先将模型输出转换为 epsilon,然后再执行数学计算。但是 epsilon 预测永远无法与零终端信噪比一起工作,因此会出现未定义的除法错误。解决方案是直接从 v 预测或 x0 预测中推导出数学公式。
diffusers 中的 DDPM、DDIM 实现应该完全可以工作。你正在使用什么采样器?
计划到零终端信噪比 (SNR) 不起作用,因为它在最后的时间步变为无穷大。
sqrtrecipalphas_cumprod tensor([ 1.0003, 1.0009, 1.0017, 1.0027, 1.0040, 1.0056, 1.0074, 1.0094,
sqrtrecipm1alphas_cumprod tensor([2.5133e-02, 4.1840e-02, 5.7956e-02, 7.3890e-02, 8.9761e-02, 1.0562e-01,