通用扩散噪声调度和采样步骤存在缺陷



摘要
评论

> 计划到零终端信噪比 (SNR) 不起作用,因为它在最后的时间步变为无穷大。 > > sqrtrecipalphascumprod tensor([ 1.0003, 1.0009, 1.0017, 1.0027, 1.0040, 1.0056, 1.0074, 1.0094, > 1.0117, 1.0143, 1.0171, 1.0201, 1.0235, 1.0271, 1.0310, 1.0352, > 1.0397, 1.0444, 1.0495, 1.0549, 1.0605, 1.0665, 1.0729, 1.0795, > 1.0866, 1.0939, 1.1017, 1.1098, 1.1184, 1.1273, 1.1367, 1.1464, > 1.1567, 1.1674, 1.1786, 1.1903, 1.2026, 1.2154, 1.2288, 1.2428, > 1.2574, 1.2726, 1.2886, 1.3053, 1.3228, 1.3410, 1.3601, 1.3801, > 1.4011, 1.4230, 1.4460, 1.4701, 1.4954, 1.5220, 1.5499, 1.5792, > 1.6101, 1.6426, 1.6768, 1.7129, 1.7511, 1.7915, 1.8342, 1.8794, > 1.9275, 1.9785, 2.0329, 2.0908, 2.1526, 2.2188, 2.2898, 2.3660, > 2.4481, 2.5368, 2.6327, 2.7370, 2.8505, 2.9746, 3.1108, 3.2608, > 3.4270, 3.6120, 3.8190, 4.0522, 4.3170, 4.6199, 4.9698, 5.3785, > 5.8620, 6.4427, 7.1530, 8.0416, 9.1848, 10.7100, 12.8463, 16.0520, > 21.3966, 32.0883, 64.1689, inf], dtype=torch.float64) > sqrtrecipm1alphascumprod tensor([2.5133e-02, 4.1840e-02, 5.7956e-02, 7.3890e-02, 8.9761e-02, 1.0562e-01, > 1.2150e-01, 1.3742e-01, 1.5340e-01, 1.6944e-01, 1.8555e-01, 2.0175e-01, > 2.1805e-01, 2.3446e-01, 2.5099e-01, 2.6764e-01, 2.8443e-01, 3.0136e-01, > 3.1846e-01, 3.3572e-01, 3.5317e-01, 3.7081e-01, 3.8865e-01, 4.0671e-01, > 4.2499e-01, 4.4353e-01, 4.6231e-01, 4.8138e-01, 5.0072e-01, 5.2038e-01, > 5.4035e-01, 5.6066e-01, 5.8133e-01, 6.0238e-01, 6.2383e-01, 6.4570e-01, > 6.6801e-01, 6.9079e-01, 7.1407e-01, 7.3787e-01, 7.6223e-01, 7.8717e-01, > 8.1273e-01, 8.3894e-01, 8.6585e-01, 8.9350e-01, 9.2193e-01, 9.5118e-01, > 9.8132e-01, 1.0124e+00, 1.0445e+00, 1.0776e+00, 1.1118e+00, 1.1473e+00, > 1.1841e+00, 1.2222e+00, 1.2618e+00, 1.3031e+00, 1.3460e+00, 1.3908e+00, > 1.4375e+00, 1.4864e+00, 1.5376e+00, 1.5913e+00, 1.6478e+00, 1.7072e+00, > 1.7699e+00, 1.8361e+00, 1.9063e+00, 1.9807e+00, 2.0599e+00, 2.1443e+00, > 2.2346e+00, 2.3314e+00, 2.4354e+00, 2.5477e+00, 2.6693e+00, 2.8014e+00, > 2.9456e+00, 3.1037e+00, 3.2779e+00, 3.4708e+00, 3.6858e+00, 3.9269e+00, > 4.1995e+00, 4.5103e+00, 4.8681e+00, 5.2847e+00, 5.7760e+00, 6.3646e+00, > 7.0828e+00, 7.9792e+00, 9.1302e+00, 1.0663e+01, 1.2807e+01, 1.6021e+01, > 2.1373e+01, 3.2073e+01, 6.4161e+01, inf], dtype=torch.float64)
零终端信噪比本身没有问题,完全可以工作。问题是许多采样器的数学公式是在假设模型 epsilon 预测的情况下推导出来的,因此许多实现会首先将模型输出转换为 epsilon,然后再执行数学计算。但是 epsilon 预测永远无法与零终端信噪比一起工作,因此会出现未定义的除法错误。解决方案是直接从 v 预测或 x0 预测中推导出数学公式。
diffusers 中的 DDPM、DDIM 实现应该完全可以工作。你正在使用什么采样器?
计划到零终端信噪比 (SNR) 不起作用,因为它在最后的时间步变为无穷大。
sqrtrecipalphascumprod tensor([ 1.0003, 1.0009, 1.0017, 1.0027, 1.0040, 1.0056, 1.0074, 1.0094, 1.0117, 1.0143, 1.0171, 1.0201, 1.0235, 1.0271, 1.0310, 1.0352, 1.0397, 1.0444, 1.0495, 1.0549, 1.0605, 1.0665, 1.0729, 1.0795, 1.0866, 1.0939, 1.1017, 1.1098, 1.1184, 1.1273, 1.1367, 1.1464, 1.1567, 1.1674, 1.1786, 1.1903, 1.2026, 1.2154, 1.2288, 1.2428, 1.2574, 1.2726, 1.2886, 1.3053, 1.3228, 1.3410, 1.3601, 1.3801, 1.4011, 1.4230, 1.4460, 1.4701, 1.4954, 1.5220, 1.5499, 1.5792, 1.6101, 1.6426, 1.6768, 1.7129, 1.7511, 1.7915, 1.8342, 1.8794, 1.9275, 1.9785, 2.0329, 2.0908, 2.1526, 2.2188, 2.2898, 2.3660, 2.4481, 2.5368, 2.6327, 2.7370, 2.8505, 2.9746, 3.1108, 3.2608, 3.4270, 3.6120, 3.8190, 4.0522, 4.3170, 4.6199, 4.9698, 5.3785, 5.8620, 6.4427, 7.1530, 8.0416, 9.1848, 10.7100, 12.8463, 16.0520, 21.3966, 32.0883, 64.1689, inf], dtype=torch.float64) sqrtrecipm1alphascumprod tensor([2.5133e-02, 4.1840e-02, 5.7956e-02, 7.3890e-02, 8.9761e-02, 1.0562e-01, 1.2150e-01, 1.3742e-01, 1.5340e-01, 1.6944e-01, 1.8555e-01, 2.0175e-01, 2.1805e-01, 2.3446e-01, 2.5099e-01, 2.6764e-01, 2.8443e-01, 3.0136e-01, 3.1846e-01, 3.3572e-01, 3.5317e-01, 3.7081e-01, 3.8865e-01, 4.0671e-01, 4.2499e-01, 4.4353e-01, 4.6231e-01, 4.8138e-01, 5.0072e-01, 5.2038e-01, 5.4035e-01, 5.6066e-01, 5.8133e-01, 6.0238e-01, 6.2383e-01, 6.4570e-01, 6.6801e-01, 6.9079e-01, 7.1407e-01, 7.3787e-01, 7.6223e-01, 7.8717e-01, 8.1273e-01, 8.3894e-01, 8.6585e-01, 8.9350e-01, 9.2193e-01, 9.5118e-01, 9.8132e-01, 1.0124e+00, 1.0445e+00, 1.0776e+00, 1.1118e+00, 1.1473e+00, 1.1841e+00, 1.2222e+00, 1.2618e+00, 1.3031e+00, 1.3460e+00, 1.3908e+00, 1.4375e+00, 1.4864e+00, 1.5376e+00, 1.5913e+00, 1.6478e+00, 1.7072e+00, 1.7699e+00, 1.8361e+00, 1.9063e+00, 1.9807e+00, 2.0599e+00, 2.1443e+00, 2.2346e+00, 2.3314e+00, 2.4354e+00, 2.5477e+00, 2.6693e+00, 2.8014e+00, 2.9456e+00, 3.1037e+00, 3.2779e+00, 3.4708e+00, 3.6858e+00, 3.9269e+00, 4.1995e+00, 4.5103e+00, 4.8681e+00, 5.2847e+00, 5.7760e+00, 6.3646e+00, 7.0828e+00, 7.9792e+00, 9.1302e+00, 1.0663e+01, 1.2807e+01, 1.6021e+01, 2.1373e+01, 3.2073e+01, 6.4161e+01, inf], dtype=torch.float64)