当前位置：首页 > news >正文

图像分类项目1：基于卷积神经网络的动物图像分类

news 来源：原创 2025/8/29 19:52:42

1、选题背景及动机

在现代社会中，图像分类是计算机视觉领域的一个重要任务。动物图像分类具有广泛的应用，例如生态学研究、动物保护、农业监测等。通过对动物图像进行自动分类，可以帮助人们更好地了解动物种类、数量和分布情况，从而支持相关领域的决策和研究。本研究的目标是使用卷积神经网络（CNN）对动物图像进行分类。通过对大量的猫、狗和野生动物图像进行训练，建立一个准确分类不同动物类别的模型。该模型可以用于自动识别和分类新的动物图像，从而提供快速、准确的动物分类结果。
动机：
（1）对于宠物猫和狗的图像分类，可以帮助饲养者或宠物主人快速准确地识别自己的宠物。这对于宠物寻找、宠物遗失的寻找以及宠物社交媒体的管理和组织都非常有用。
（2）通过在大规模的动物图像数据库中进行分类，可以构建一个便捷的图像检索系统。用户可以根据感兴趣的类别，如猫、狗或野生动物，快速搜索和浏览相关的图像内容。

2、研究内容

1、本次训练的数据来源
https://www.kaggle.com/datasets/andrewmvd/animal-faces/data
注册登录之后，下载即可。然后把下载好的数据集放在该项目路径下的data文件中
在这里插入图片描述
2、使用PyTorch和scikit-learn框架进行机器学习任务的实现和评估
3、技术难点：
（1）如何准确评估模型的性能和分析分类结果
（2）每个类别的数据样本不一致
解决方法：
（1）对模型进行评估，并计算准确率、精确度、召回率等指标，绘制混淆矩阵和学习曲线，以可视化模型的性能和错误情况
（2）通过对数据样本多的数据集进行欠采样，使所有类别的数据集统一样本数量

3、步骤

3.1、导入必要的库

import pandas as pd
from PIL import Image
import torch.nn as nn
import torch.optim as optim
from torch.utils.data.sampler import SubsetRandomSampler
from torch.utils.data import Dataset
import torchvision.transforms as transforms
import matplotlib.font_manager as fm
import torch
import torch.nn.functional as F
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix, roc_curve, auc
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
from sklearn.utils import resample
import numpy as np

3.2、加载数据集和对数据预处理

通过对数据的加载和数据预处理之后，打印出每个类别（猫、狗、野兽）的图片总数，并绘制出直方图，更直观的表示出图片每个类别的数量。

class InvalidDatasetException(Exception):def __init__(self, len_of_paths, len_of_labels):super().__init__(f"Number of paths ({len_of_paths}) is not compatible with number of labels ({len_of_labels})")
transform = transforms.Compose([transforms.ToTensor()])
class AnimalDataset(Dataset):def __init__(self, img_paths, img_labels, size_of_images):self.img_paths = img_pathsself.img_labels = img_labelsself.size_of_images = size_of_imagesif len(self.img_paths) != len(self.img_labels):raise InvalidDatasetException(self.img_paths, self.img_labels)def __len__(self):return len(self.img_paths)def __getitem__(self, index):PIL_IMAGE = Image.open(self.img_paths[index]).resize(self.size_of_images)TENSOR_IMAGE = transform(PIL_IMAGE)label = self.img_labels[index]return TENSOR_IMAGE, label
import glob
paths = []
labels = []
label_map = {0: "Cat",1: "Dog",2: "Wild"}
cat_paths = glob.glob("D:/test/pythonProject/data/afhq/train/cat/*") + glob.glob("D:/test/pythonProject/data/afhq/val/cat/*")  #路径需要改成自己存放项目数据的路径
for cat_path in cat_paths:paths.append(cat_path)labels.append(0)
dog_paths = glob.glob("D:/test/pythonProject/data/afhq/train/dog/*") + glob.glob("D:/test/pythonProject/data/afhq/val/dog/*")
for dog_path in dog_paths:paths.append(dog_path)labels.append(1)
wild_paths = glob.glob("D:/test/pythonProject/data/afhq/train/wild/*") + glob.glob("D:/test/pythonProject/data/afhq/val/wild/*")
for wild_path in wild_paths:paths.append(wild_path)labels.append(2)
data = pd.DataFrame({'classes': labels})num_classes = len(label_map)
print('总类别数:', num_classes)
for class_label, class_name in label_map.items():count = data[data['classes'] == class_label].shape[0]print(f"类别 {class_name}: {count} 张照片")
font_path = "C:/Windows/Fonts/msyh.ttc"
font_prop = fm.FontProperties(fname=font_path)
sns.set_style("white")
plot = sns.countplot(x=data['classes'], color='#2596be')
plt.figure(figsize=(15, 12))
sns.despine()
plot.set_title('类别分布\n', x=0.1, y=1, font=font_prop, fontsize=18)
plot.set_ylabel("数量", x=0.02, font=font_prop, fontsize=12)
plot.set_xlabel("类别", font=font_prop, fontsize=15)
for p in plot.patches:plot.annotate(format(p.get_height(), '.0f'), (p.get_x() + p.get_width() / 2, p.get_height()),ha='center', va='center', xytext=(0, -20), font=font_prop, textcoords='offset points', size=15)
plt.show()

运行截图：
在这里插入图片描述
通过对以上打印的数据以及可视化的图片进行观察，我们可以看到三个类别的数量存在一定的差异。虽然数量上的差距不是太大，但对于训练学习结果可能会有一定的影响。为了克服类别不平衡的问题，我们可以采取欠采样来平衡数据集，减少数量较多的类别的样本数量。

#数据集欠采样

labels = np.array(labels)
paths = np.array(paths)
counter = Counter(labels)
print("原始样本数量:", counter)
cat_indices = np.where(labels == 0)[0]
dog_indices = np.where(labels == 1)[0]
wild_indices = np.where(labels == 2)[0]
min_samples = min([len(cat_indices), len(dog_indices), len(wild_indices)])
undersampled_cat_indices = resample(cat_indices, replace=False, n_samples=min_samples, random_state=42)
undersampled_dog_indices = resample(dog_indices, replace=False, n_samples=min_samples, random_state=42)
undersampled_wild_indices = resample(wild_indices, replace=False, n_samples=min_samples, random_state=42)
undersampled_indices = np.concatenate((undersampled_cat_indices, undersampled_dog_indices, undersampled_wild_indices))
undersampled_paths = paths[undersampled_indices]
undersampled_labels = labels[undersampled_indices]
counter_undersampled = Counter(undersampled_labels)
print("欠采样后的样本数量:", counter_undersampled)
counter_undersampled = Counter(undersampled_labels)
categories = [label_map[label] for label in counter_undersampled.keys()]
sample_counts = list(counter_undersampled.values())

#可视化

sns.set_style("white")
plt.figure(figsize=(6.4, 4.8))
plot = sns.countplot(x=undersampled_labels, color='#2596be')
sns.despine()
plot.set_title('类别分布\n', x=0.1, y=1, font=font_prop, fontsize=18)
plot.set_ylabel("数量", x=0.02, font=font_prop, fontsize=12)
plot.set_xlabel("类别", font=font_prop, fontsize=15)for p in plot.patches:plot.annotate(format(p.get_height(), '.0f'), (p.get_x() + p.get_width() / 2, p.get_height()),ha='center', va='center', xytext=(0, -20), font=font_prop, textcoords='offset points', size=15)plt.show()

运行结果图：
在这里插入图片描述

在进行欠采样后，每个类别的图片数量已经被扩展为一致的数量，使得模型在训练过程中更加公平地对待每个类别。

3.3、缺失值处理

对数据进行预处理完之后，需要查看是否有缺失值，要检查路径和标签的数量是否匹配，并打印路径和标签数量，对缺失情况进行可视化

if len(undersampled_paths) != len(undersampled_labels):raise InvalidDatasetException(len(undersampled_paths), len(undersampled_labels))
#使用字符串格式化（f-string）来将整型值插入到字符串中。
print(f"打印paths列表的文件路径数量: {len(undersampled_paths)}")
print(f"打印labels列表的图片数量: {len(undersampled_labels)}")
#缺失情况数据可视化
df = pd.DataFrame({'Path': undersampled_paths, 'Label': undersampled_labels})
missing_values = df.isnull().sum()
#绘制条形图
plt.bar(missing_values.index, missing_values.values)
plt.xlabel("特征", fontproperties=font_prop, fontsize=12)
plt.ylabel("缺失值数量", fontproperties=font_prop, fontsize=12)
plt.title("缺失情况数据可视化", fontproperties=font_prop, fontsize=18)
plt.grid(False)
plt.xticks(rotation=90)
plt.show()

运行截图：
在这里插入图片描述
通过对打印的数据以及对条形图的查看，我们可以确认数据没有缺失。这意味着我们的数据集完整，并且可以进行进一步的分析和处理。

3.4、划分数据集

对将数据集划分为训练集和测试集，并创建对应的数据加载器，并定义了每个批次的样本数量。

dataset = AnimalDataset(undersampled_paths,undersampled_labels,(250,250))
from sklearn.model_selection import train_test_split
dataset_indices = list(range(0,len(dataset)))
#从数据集中划分训练集和测试集
train_indices,test_indices=train_test_split(dataset_indices,test_size=0.2,random_state=42)
print("训练集样本数量: ",len(train_indices))
print("测试集样本数量: ",len(test_indices))
#创建训练集和测试集的采样器
train_sampler = SubsetRandomSampler(train_indices)
test_sampler = SubsetRandomSampler(test_indices)
BATCH_SIZE = 128
train_loader = torch.utils.data.DataLoader(dataset, batch_size=BATCH_SIZE,sampler=train_sampler)
validation_loader = torch.utils.data.DataLoader(dataset, batch_size=BATCH_SIZE,sampler=test_sampler)
dataset[1][0].shape
images,labels = next(iter(train_loader))
type(labels)

运行截图：
在这里插入图片描述

3.5、获取一个批次的训练数据，并可视化

def add_subplot_label(ax, label):ax.text(0.5, -0.15, label, transform=ax.transAxes,ha='center', va='center', fontsize=12)
images, labels = next(iter(train_loader))
fig, axis = plt.subplots(3, 5, figsize=(15, 10))
for i, ax in enumerate(axis.flat):with torch.no_grad():npimg = images[i].numpy()npimg = np.transpose(npimg, (1, 2, 0))label = label_map[int(labels[i])]ax.imshow(npimg)ax.set(title = f"{label}")ax.grid(False)add_subplot_label(ax, f"({i // axis.shape[1]}, {i % axis.shape[1]})")  # 添加编号
plt.tight_layout()
plt.show()

运行截图：
在这里插入图片描述

3.6、模型设计

定义卷积神经网络模型，并设定在哪个设备上运行，为后续的模型训练做准备

class CNN(nn.Module):#定义了卷积神经网络的各个层和全连接层。def __init__(self):super(CNN, self).__init__()# First we'll define our layersself.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1)self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1)self.batchnorm1 = nn.BatchNorm2d(64)self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1)self.batchnorm2 = nn.BatchNorm2d(128)self.conv4 = nn.Conv2d(128, 256, kernel_size=3, stride=2, padding=1)self.batchnorm3 = nn.BatchNorm2d(256)self.maxpool = nn.MaxPool2d(2, 2)self.fc1 = nn.Linear(256 * 2 * 2, 512)self.fc2 = nn.Linear(512, 3)#定义数据在模型中的流动def forward(self, x):x = F.relu(self.conv1(x))x = F.relu(self.conv2(x))x = self.batchnorm1(x)x = self.maxpool(x)x = F.relu(self.conv3(x))x = self.batchnorm2(x)x = self.maxpool(x)x = F.relu(self.conv4(x))x = self.batchnorm3(x)x = self.maxpool(x)x = x.view(-1, 256 * 2 * 2)x = self.fc1(x)x = self.fc2(x)x = F.log_softmax(x, dim=1)return x
#选择模型运行的设备
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

3.7、模型训练

执行模型的训练过程，使用交叉熵损失函数和RMSprop优化器来定义损失计算和参数优化的方法，设置了训练的轮次数，并记录每个训练轮次的损失和准确率，对每个训练轮次的损失和准确率进行可视化

model = CNN().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.RMSprop(model.parameters(),lr=1e-4)
EPOCH_NUMBER = 6
TRAIN_LOSS = []
TRAIN_ACCURACY = []
#训练过程
for epoch in range(1, EPOCH_NUMBER + 1):epoch_loss = 0.0correct = 0total = 0#遍历训练数据加载器for data_, target_ in train_loader:target_ = target_.to(device).long()data_ = data_.to(device).float()#清零优化器中之前的梯度，准备计算当前轮次的梯度。optimizer.zero_grad()#将输入数据传递给模型，获取模型的预测输出。outputs = model(data_)loss = criterion(outputs, target_)loss.backward()optimizer.step()epoch_loss = epoch_loss + loss.item()_, pred = torch.max(outputs, dim=1)#统计预测正确的样本数量，将预测值与真实标签进行比较，并累计正确预测的数量。correct = correct + torch.sum(pred == target_).item()total += target_.size(0)#记录每个训练轮次的损失和准确率，并输出当前训练轮次的准确率和损失。TRAIN_LOSS.append(epoch_loss)TRAIN_ACCURACY.append(100 * correct / total)print(f"Epoch {epoch}: Accuracy: {100 * correct / total}, Loss: {epoch_loss}")
#可视化训练过程中的损失和准确率
plt.subplots(figsize=(6, 4))
plt.plot(range(EPOCH_NUMBER), TRAIN_LOSS, color="blue", label="Loss")
plt.legend()
plt.xlabel("轮次", fontproperties=font_prop)
plt.ylabel("损失值", fontproperties=font_prop)
plt.title("训练损失", fontproperties=font_prop)
plt.show()
plt.subplots(figsize=(6, 4))
plt.plot(range(EPOCH_NUMBER), TRAIN_ACCURACY, color="green", label="Accuracy")
plt.legend()
plt.xlabel("轮次", fontproperties=font_prop)
plt.ylabel("准确率", fontproperties=font_prop)
plt.title("训练准确率", fontproperties=font_prop)
plt.show()

运行截图：
在这里插入图片描述

通过上面的数据以及图形，我们可以观察到，随着训练轮次的增加，训练损失逐渐降低，训练准确率逐渐提高。这表明模型在学习过程中逐渐减小了预测值与真实标签之间的差异，提高了对训练数据的拟合能力。每轮的训练损失率都比上一轮的损失率低，说明模型的优化算法有效地调整了参数，使模型逐渐逼近最优解。也意味着模型在训练数据上的分类性能不断改善，更准确地预测了样本的标签。每轮的训练准确率都比上一轮的高，说明模型逐渐学习到了更多的特征和模式，提高了对训练数据的分类准确性。总体来说损失下降和准确率提高是我们期望在训练过程中看到的趋势，表明模型正在逐渐优化和提升性能。

3.8、性能评估

评估模型在每个类别上的性能，并绘制ROC曲线以衡量模型的分类准确性

def predict_labels(model, data_loader):model.eval()y_pred = []y_true = []with torch.no_grad():for images, labels in data_loader:images = images.to(device)labels = labels.to(device)outputs = model(images)_, predicted = torch.max(outputs.data, 1)y_pred.extend(predicted.cpu().numpy())y_true.extend(labels.cpu().numpy())return np.array(y_pred), np.array(y_true)
#获取预测结果
y_pred, y_true = predict_labels(model, validation_loader)
#计算每个类别的ROC曲线
fpr = dict()
tpr = dict()
roc_auc = dict()
num_classes = len(label_map)
for i in range(num_classes):fpr[i], tpr[i], _ = roc_curve((np.array(y_true) == i).astype(int), (np.array(y_pred) == i).astype(int))roc_auc[i] = auc(fpr[i], tpr[i])
#绘制ROC曲线
plt.figure(figsize=(10, 8))
colors = ['b', 'g', 'r']  # 每个类别的曲线颜色
for i in range(num_classes):plt.plot(fpr[i], tpr[i], color=colors[i], lw=2, label='类别 {0} 的ROC曲线 (AUC = {1:.2f})'.format(i, roc_auc[i]))
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('假阳性率', fontproperties=font_prop)
plt.ylabel('真阳性率', fontproperties=font_prop)
plt.title('接收者操作特征曲线', fontproperties=font_prop)
plt.legend(loc="lower right", prop=font_prop)
plt.show()

运行截图：
在这里插入图片描述
从图片中可以看出来，cat类别的ROC曲线相对于其他类别的曲线更加接近左上角，而dog和wild类别的曲线则相对较低。这意味着在不同的阈值下，模型更容易将cat类别正确分类为正例，并且在cat类别上具有较高的真阳性率和较低的假阳性率。相比之下，dog和wild类别在模型分类能力方面相对较弱，表明模型更容易将它们错误地分类为其他类别。

3.9、测试

评估模型在验证集上对模型进行测试，并计算评估指标（准确率、精确率、召回率）以及混淆矩阵，并使用可视化工具将混淆矩阵进行可视化。

model.eval() # 将模型设置为评估模式
predictions = [] # 存储预测结果和真实标签
true_labels = []
#使用测试集进行预测
with torch.no_grad():for images, labels in validation_loader:images = images.to(device)labels = labels.to(device)outputs = model(images) # 前向传播_, predicted = torch.max(outputs.data, 1)  # 获取预测结果predictions.extend(predicted.tolist())    # 存储预测结果和真实标签true_labels.extend(labels.tolist())
#将预测结果和真实标签转换为NumPy数组
predictions = np.array(predictions)
true_labels = np.array(true_labels)
accuracy = accuracy_score(true_labels, predictions) # 计算准确率
precision = precision_score(true_labels, predictions, average='macro') # 计算精确率
recall = recall_score(true_labels, predictions, average='macro') # 计算召回率
confusion = confusion_matrix(true_labels, predictions) # 计算混淆矩阵
# 打印评估结果
print("准确率:", accuracy)
print("精确率:", precision)
print("召回率:", recall)
print("混淆矩阵:")
print(confusion)
# 可视化混淆矩阵
labels = ['Cat', 'Dog', 'Wild']
plt.rcParams['font.sans-serif'] = ['SimSun']
plt.figure(figsize=(8, 6))
sns.heatmap(confusion, annot=True, fmt="d", cmap="Blues", xticklabels=labels, yticklabels=labels)
plt.xlabel('预测标签')
plt.ylabel('真实标签')
plt.title('混淆矩阵')
plt.show()

运行截图：
在这里插入图片描述

4 思考题

4.1 基础概念

什么是卷积神经网络 (CNN)？它与传统的神经网络相比有哪些优势？

卷积层、池化层和全连接层在 CNN 中分别起到什么作用？

在动物图像分类任务中，为什么使用 CNN 比使用传统的机器学习方法更有效？

4.2 数据准备与模型构建

如何构建一个用于动物图像分类的 CNN 模型？需要考虑哪些因素？（例如：网络深度、卷积核大小、激活函数等）

如何准备用于训练 CNN 模型的动物图像数据集？有哪些数据预处理步骤是必不可少的？

什么是数据增强 (Data Augmentation)？在动物图像分类任务中，如何使用数据增强来提高模型的泛化能力？

换模型行不行？用其它卷积神经网络模型试一试：

LeNet-5‌：由Yann LeCun等人于1998年提出，主要用于手写数字识别。LeNet-5包含了卷积层、池化层和全连接层，是第一个成功应用于数字识别任务的卷积神经网络模型。
AlexNet‌：由Alex Krizhevsky等人在2012年的ImageNet图像分类竞赛中提出。AlexNet采用了更深的网络结构和更大的数据集，使用了ReLU激活函数和Dropout正则化技术，取得了突破性的性能提升。
VGGNet‌：由Karen Simonyan和Andrew Zisserman在2014年提出。VGGNet的特点是使用了非常小的卷积核（3x3），并通过堆叠多个卷积层来增加网络的深度，提高了特征提取的效果。
GoogLeNet (Inception)‌：由Google团队在2014年提出。GoogLeNet采用了Inception模块结构，通过并行的多个卷积分支来提取不同尺度的特征，并使用1x1的卷积核来降低计算复杂度。‌
ResNet‌：由Microsoft团队在2015年提出。ResNet引入了残差学习的思想，通过跨层连接解决了深度网络训练中的梯度消失和梯度爆炸问题，适用于大规模图像识别任务。‌
MobileNet‌：由Google团队在2017年提出。MobileNet采用了深度可分离卷积的结构，减少了参数数量，适用于移动设备等资源受限的场景。

4.3 模型训练与评估

如何选择合适的损失函数和优化器来训练动物图像分类模型？

在训练过程中，如何监控模型的训练过程并防止过拟合？

如何评估动物图像分类模型的性能？有哪些常用的评估指标？

4.4 项目实践与拓展

换数据集行不行？比如动物数据集换成植物数据集等，大家可以自行找公开数据集进行测试。

如何将训练好的动物图像分类模型部署到实际应用中？需要考虑哪些因素？

除了动物图像分类，CNN 还可以应用在哪些图像识别任务中？请举例说明。

如何利用迁移学习 (Transfer Learning) 来提高动物图像分类模型的训练效率和性能？

4.5 深入思考

在动物图像分类任务中，如何解决不同动物类别之间样本数量不平衡的问题？

如何解释动物图像分类模型的预测结果？有哪些方法可以帮助我们理解模型的决策过程？

随着深度学习技术的不断发展，动物图像分类领域还有哪些挑战和机遇？

图像出现以下情况怎么处理？
（1）模糊
（2）光照不均匀
（3）扭曲变形
（4）有雨有雾
（5）图上除了动物外还有其它物体

4.6 提示

这些问题没有标准答案，旨在引导你深入思考基于 CNN 的动物图像分类项目的各个方面。

在回答问题时，可以结合自己的项目经验和所学知识进行阐述。

可以查阅相关文献和资料，寻找更深入的解答。

1、选题背景及动机

2、研究内容

3、步骤

3.1、导入必要的库

3.2、加载数据集和对数据预处理

3.3、缺失值处理

3.4、划分数据集

3.5、获取一个批次的训练数据，并可视化

3.6、模型设计

3.7、模型训练

3.8、性能评估

3.9、测试

4 思考题

4.1 基础概念

4.2 数据准备与模型构建

4.3 模型训练与评估

4.4 项目实践与拓展

4.5 深入思考

4.6 提示

相关文章：