当前位置：首页 > news >正文

阅读 ADiffusion-Based Framework for Multi-Class Anomaly Detection

news 来源：原创 2025/8/3 0:42:50

A Diffusion-Based Framework for Multi-Class Anomaly Detection

我觉得引言部分写的不错，将问题清楚的讲出来了，值得借鉴！！

摘要

基于重建的方法在异常检测方面取得了显著成果。最近流行的扩散模型的卓越图像重建能力引发了研究努力，以利用它们来增强异常图像的重建。

（问题）尽管如此，这些方法可能会面临与更实际的多类设置中图像类别和像素结构完整性的保存相关的挑战。

为了解决上述问题，我们提出了一种基于扩散的异常检测Difusion-based Anomaly Detection (DiAD) 框架，用于多类异常检测，该框架由像素空间自动编码器、与稳定扩散去噪网络连接的潜在空间语义引导 (SG) 网络和特征空间预训练特征提取器组成。首先，SG 网络用于重建异常区域，同时保留原始图像的语义信息。其次，我们引入了空间感知特征融合 (SFF) 块，以在处理广泛重建区域时最大限度地提高重建精度。第三，输入和重建图像由预训练的特征提取器处理，以基于以不同尺度提取的特征生成异常图。在 MVTec-AD 和 VisA 数据集上进行的实验证明了我们的方法的有效性，它超越了最先进的方法，例如，在多类 MVTec-AD 数据集上分别实现 96.8/52.6 和 97.2/99.0 (AUROC/AP) 的定位和检测。

一、Introduction

问题：

For the Denoising Diffusion Probabilis tic Model (DDPM) (Ho, Jain,and Abbeel 2020) in Fig. 1 (a), when performing the multi-class
setting, this method may encounter issues with misclassifying image categories.

DDPM扩散模型无法解决多类别的异常检测问题。

出现这个问题的原因是什么，现存的方法为什么没有解决

Because after adding T timesteps noise to the input image,
the original class information is lost. During inference, denoising is performed based on this Gaussian noise-like dis
tribution, which may generate images belonging to different categories. 2) Latent Diffusion Model (LDM) (Rombach
et al. 2022) has an embedder as a class condition as shown in Fig. 1-(b), which overcomes the problem of misclassifica
tion in DDPM. However, LDM cannot address the issue of semantic loss in generated images. LDM cannot simultane
ously preserve the semantic information of the input image while reconstructing the anomalous regions. For example,
they may fail to maintain direction consistency with the in put image in terms of objects like screws and hazelnuts.

因为在输入图像上添加了 T 个时间步长的噪声后，原始的类别信息就丢失了。在推理过程中，基于这种高斯噪声分布进行去噪，这可能会生成属于不同类别的图像。2）潜在扩散模型 (LDM) (Rombachet al. 2022) 有一个嵌入器作为类条件，如图 1-(b) 所示，它克服了 DDPM 中的错误分类问题。然而，LDM 不能解决生成图像中的语义丢失问题。LDM 不能在重建异常区域的同时保留输入图像的语义信息。例如，
对于螺丝和榛子等物体，它们可能无法与输入图像保持方向一致性。

总结为两点：

DDPM加入噪声后让图像无法知道类别的信息了。
LDM虽然引入了类别信息，但是在重建异常的时候无法保存语义信息，即无法解决生成图像中的语义损失的问题

怎么解决

To address the problems, we propose DiAD for multi class anomaly detection in Fig. 2, which comprises: a pixel space autoencoder, a latent space denoising network and a feature space pre-trained model. To effectively maintain consistent semantic information with the original image while reconstructing the location of anomalous regions, we propose the Semantic-Guided (SG) network with a connection to the Stable Diffusion (SD) denoising network. To further enhance the capability of preserving fine details in the original image, we propose the Spatial-aware Feature Fusion (SFF) block to integrate features at different scales. Finally, the reconstructed and input images are extracted features through a pre-trained model for anomaly scores.

为了解决这些问题，我们在图 2 中提出了用于多类异常检测的 DiAD，它包括：像素空间自动编码器、潜在空间去噪网络和特征空间预训练模型。为了在重建异常区域的位置时有效地保持与原始图像一致的语义信息，我们提出了与稳定扩散 (SD) 去噪网络连接的语义引导 (SG) 网络。为了进一步增强保留原始图像中精细细节的能力，我们提出了空间感知特征融合 (SFF) 块来集成不同尺度的特征。最后，通过预训练的异常分数模型从重建图像和输入图像中提取特征。

(贡献)

We propose a novel diffusion-based framework DiAD for multi-class anomaly detection, which firstly tackles
the problem of existing denoising networks of diffusion based methods failing to correctly reconstruct anomalies.
We construct an SG network connecting to the SD denoising network to maintain consistent semantic infor
mation and reconstruct the anomalies.
Wepropose an SFF block to integrate features from different scales to further improve the reconstruction ability.
Abundant experiments demonstrate the sufficient superiority of DiAD over SOTA methods.

我们提出了一种基于扩散的新型多类异常检测框架 DiAD，该框架首先解决了基于扩散的现有方法的去噪网络无法正确重建异常的问题。
我们构建了一个连接到 SD 去噪网络的 SG 网络，以保持一致的语义信息并重建异常。
我们提出了一个 SFF 块来整合不同尺度的特征，以进一步提高重建能力。
大量实验证明了 DiAD 相对于 SOTA 方法的充分优势。

总结：一个框架解决核心问题，两个技术解决小问题。

二、Preliminaries

介绍 Denoising Diffusion Probabilistic Model.
介绍 Latent Diffusion Model.

三、Method

在这里插入图片描述