当前位置：首页 > news >正文

深度学习物体检测之YOLOV5源码解读

news 来源：原创 2025/9/18 18:54:42

V5比前面版本偏工程化,项目化,更贴合实战

一.V5版本项目配置

(1)整体项目概述

首先github直接查找yolov5，下载下来即可。在训练时，数据是怎么处理的？网络模型架构是怎么设计的(如各层的设计)？yolov5要求是大于python3.8与大于等以torch1.6 。因为torch1.6以上版本用到的混合精度是来提速做的。要求的工具包都是在requirements.txt文件中写有。pip install -r requirements.txt 。它默认用到的是coco数据集，但现在用自己做的数据集来做。它的单机多卡，多机多卡的情况都有考虑到，可做参考。

(2)训练自己的数据集方法

1)数据集可从https://public.roboflow.com网站中去下载，下载前注册一个免费账号，然后下载时，导出的格式就选TXT------>YOLO v5 PyTorch。最后把下载下来的数据集放到与YOLO5同级目录下(例如放在自己建的MaskDataSet目录下, 不放到项目里面，方便管理，这个可结合实际来放置吧),MaskDataSet目录下将有训练，测试与验证数据集和配置文件等。本工程是验证这个人是否带囗罩(2分类)，所以下载的是囗罩数据集。

2)打开MaskDataSet目录下的配置文件data.yaml,如下图：

3)打开lables下的txt文件(它是标注数据)，它是与同级的images下面的图片是要对应上(即txt前面的名字与images下的图片名字一样，名字名别太长并且不要中文)的，如下图：

标注数据制作工具可用lableme(它标出来的是json格式，当然可以把它转成txt格式)，参考yoloV3

(3)训练数据参数配置

像V5版本真的偏重工程化，项目化，例如训练日志跟踪也很详细，可以处理多种不同格式的数据源。

1)打开 https://github.com/ultralytics/yolov5下载预训练模型，到时用作训练，验证时用，本例下载一个yolov5s.pt放在源码目录下，如下图：

2)train.py训练参数配置。本例：--data ..//MaskDataSet//data.yaml --cfg models//yolov5s.yaml --weights '' --epochs 300 --batch-size 16，主要配置数据源位置,预训练模型配置,epochs大小,每批处理的样本数,weights '' 表示不使用人家预训练模型。像yolov5s.yaml主要是网络结构参数配置，例包括特征提取层和网络输出层配置等等，可参考github v5中配置的例子：

3）detect.py验证推断参数配置。本例配置例如 --source ./inference/images/ --weights yolov5s.pt --conf 0.4:

二.V5项目工程源码解读

(1)数据源配置与读取（数据增强）,把数据与标签返回给模型

1)数据源DEBUG流程解读

<1>train.py。先得到dataloader与dataset。前面的内容不看先，直接打断点到train.py中的168行处，它的初始化参数有如训练数据路径，图片大小，标签分类，model,运行设备，优化器，batch-size等等，如下图：

<2>单击断点后进入了datasets.py中的create_dataloader，这里主要调用LoadImagesAndLabels类得到dataset，然后用dataset传入到InfiniteDataLoader类中得到dataloader。先看看datasets.py的要功能(不同数据源的读取，做数据增强等功能,这个对不同数据源的处理方式都有考虑到并做了，偏工程化与项目化)：

1>BOF解释:只增加训练成本,但是能显著提高精度,并不影响推理速度;数据增强：调整亮度，对比度，色调，随机缩放，剪切，翻转，旋转等方式;网络规则化方法：Dropout,Dropblock;类别不平衡，损失函数设计。LoadImagesAndLabels类(马赛克数据增强等)的代码如下：

class LoadImagesAndLabels(Dataset):  # for training/testingdef __init__(self, path, img_size=640, batch_size=16, augment=False, hyp=None, rect=False, image_weights=False,cache_images=False, single_cls=False, stride=32, pad=0.0, rank=-1):try:f = []  # image filesfor p in path if isinstance(path, list) else [path]: #win和linux有点区别 所以这里面代码稍微处理的内容多了点p = str(Path(p))  # os-agnosticparent = str(Path(p).parent) + os.sepif os.path.isfile(p):  # filewith open(p, 'r') as t:t = t.read().splitlines()f += [x.replace('./', parent) if x.startswith('./') else x for x in t]  # local to global pathelif os.path.isdir(p):  # folderf += glob.iglob(p + os.sep + '*.*')else:raise Exception('%s does not exist' % p)self.img_files = sorted([x.replace('/', os.sep) for x in f if os.path.splitext(x)[-1].lower() in img_formats])except Exception as e:raise Exception('Error loading data from %s: %s\nSee %s' % (path, e, help_url))n = len(self.img_files)assert n > 0, 'No images found in %s. See %s' % (path, help_url)bi = np.floor(np.arange(n) / batch_size).astype(np.int64)  # batch index #batch索引nb = bi[-1] + 1  # number of batches #一个epoch有多少个batchself.n = n  # number of imagesself.batch = bi  # batch index of imageself.img_size = img_sizeself.augment = augmentself.hyp = hypself.image_weights = image_weights self.rect = False if image_weights else rectself.mosaic = self.augment and not self.rect  # load 4 images at a time into a mosaic (only during training)self.mosaic_border = [-img_size // 2, -img_size // 2] #限定范围self.stride = stride#下采样总值# Define labelssa, sb = os.sep + 'images' + os.sep, os.sep + 'labels' + os.sep  # /images/, /labels/ substringsself.label_files = [x.replace(sa, sb, 1).replace(os.path.splitext(x)[-1], '.txt') for x in self.img_files]# Check cache #可以设置缓存，再训练就不用一个个读了cache_path = str(Path(self.label_files[0]).parent) + '.cache'  # cached labelsif os.path.isfile(cache_path):cache = torch.load(cache_path)  # loadif cache['hash'] != get_hash(self.label_files + self.img_files):  # dataset changedcache = self.cache_labels(cache_path)  # re-cacheelse:cache = self.cache_labels(cache_path)  # cache# Get labelslabels, shapes = zip(*[cache[x] for x in self.img_files])self.shapes = np.array(shapes, dtype=np.float64)self.labels = list(labels)# Rectangular Training  https://github.com/ultralytics/yolov3/issues/232if self.rect: #矩形# Sort by aspect ratios = self.shapes  # whar = s[:, 1] / s[:, 0]  # aspect ratioirect = ar.argsort()self.img_files = [self.img_files[i] for i in irect]self.label_files = [self.label_files[i] for i in irect]self.labels = [self.labels[i] for i in irect]self.shapes = s[irect]  # whar = ar[irect]# Set training image shapesshapes = [[1, 1]] * nbfor i in range(nb):ari = ar[bi == i]mini, maxi = ari.min(), ari.max()if maxi < 1:shapes[i] = [maxi, 1]elif mini > 1:shapes[i] = [1, 1 / mini]self.batch_shapes = np.ceil(np.array(shapes) * img_size / stride + pad).astype(np.int64) * stride# Cache labelscreate_datasubset, extract_bounding_boxes, labels_loaded = False, False, Falsenm, nf, ne, ns, nd = 0, 0, 0, 0, 0  # number missing, found, empty, datasubset, duplicatepbar = enumerate(self.label_files)if rank in [-1, 0]:pbar = tqdm(pbar)for i, file in pbar:l = self.labels[i]  # labelif l is not None and l.shape[0]:assert l.shape[1] == 5, '> 5 label columns: %s' % file #5列是否都有assert (l >= 0).all(), 'negative labels: %s' % file #标签值是否大于0assert (l[:, 1:] <= 1).all(), 'non-normalized or out of bounds coordinate labels: %s' % file #归一化if np.unique(l, axis=0).shape[0] < l.shape[0]:  # duplicate rows 计算重复的nd += 1  # print('WARNING: duplicate rows in %s' % self.label_files[i])  # duplicate rowsif single_cls:l[:, 0] = 0  # force dataset into single-class mode 单个类别，设置其类别为0self.labels[i] = lnf += 1  # file found# Create subdataset (a smaller dataset)if create_datasubset and ns < 1E4:if ns == 0:create_folder(path='./datasubset')os.makedirs('./datasubset/images')exclude_classes = 43if exclude_classes not in l[:, 0]:ns += 1# shutil.copy(src=self.img_files[i], dst='./datasubset/images/')  # copy imagewith open('./datasubset/images.txt', 'a') as f:f.write(self.img_files[i] + '\n')# Extract object detection boxes for a second stage classifier 把那个坐标框里面的数据截出来，看你任务需要if extract_bounding_boxes:p = Path(self.img_files[i])img = cv2.imread(str(p))h, w = img.shape[:2]for j, x in enumerate(l):f = '%s%sclassifier%s%g_%g_%s' % (p.parent.parent, os.sep, os.sep, x[0], j, p.name)if not os.path.exists(Path(f).parent):os.makedirs(Path(f).parent)  # make new output folderb = x[1:] * [w, h, w, h]  # boxb[2:] = b[2:].max()  # rectangle to squareb[2:] = b[2:] * 1.3 + 30  # padb = xywh2xyxy(b.reshape(-1, 4)).ravel().astype(np.int64)b[[0, 2]] = np.clip(b[[0, 2]], 0, w)  # clip boxes outside of imageb[[1, 3]] = np.clip(b[[1, 3]], 0, h)assert cv2.imwrite(f, img[b[1]:b[3], b[0]:b[2]]), 'Failure extracting classifier boxes'else:ne += 1  # print('empty labels for image %s' % self.img_files[i])  # file empty# os.system("rm '%s' '%s'" % (self.img_files[i], self.label_files[i]))  # removeif rank in [-1, 0]:pbar.desc = 'Scanning labels %s (%g found, %g missing, %g empty, %g duplicate, for %g images)' % (cache_path, nf, nm, ne, nd, n)if nf == 0:s = 'WARNING: No labels found in %s. See %s' % (os.path.dirname(file) + os.sep, help_url)print(s)assert not augment, '%s. Can not train without labels.' % s# Cache images into memory for faster training (WARNING: large datasets may exceed system RAM)self.imgs = [None] * nif cache_images:gb = 0  # Gigabytes of cached imagespbar = tqdm(range(len(self.img_files)), desc='Caching images')self.img_hw0, self.img_hw = [None] * n, [None] * nfor i in pbar:  # max 10k imagesself.imgs[i], self.img_hw0[i], self.img_hw[i] = load_image(self, i)  # img, hw_original, hw_resizedgb += self.imgs[i].nbytespbar.desc = 'Caching images (%.1fGB)' % (gb / 1E9)def cache_labels(self, path='labels.cache'):# Cache dataset labels, check images and read shapesx = {}  # dictpbar = tqdm(zip(self.img_files, self.label_files), desc='Scanning images', total=len(self.img_files))for (img, label) in pbar:try:l = []image = Image.open(img)image.verify()  # PIL verify# _ = io.imread(img)  # skimage verify (from skimage import io)shape = exif_size(image)  # image sizeassert (shape[0] > 9) & (shape[1] > 9), 'image size <10 pixels'if os.path.isfile(label):with open(label, 'r') as f:l = np.array([x.split() for x in f.read().splitlines()], dtype=np.float32)  # labelsif len(l) == 0:l = np.zeros((0, 5), dtype=np.float32)x[img] = [l, shape]except Exception as e:x[img] = [None, None]print('WARNING: %s: %s' % (img, e))x['hash'] = get_hash(self.label_files + self.img_files)torch.save(x, path)  # save for next timereturn xdef __len__(self):return len(self.img_files)# def __iter__(self):#     self.count = -1#     print('ran dataset iter')#     #self.shuffled_vector = np.random.permutation(self.nF) if self.augment else np.arange(self.nF)#     return selfdef __getitem__(self, index):if self.image_weights:index = self.indices[index]hyp = self.hypmosaic = self.mosaic and random.random() < hyp['mosaic']if mosaic:# Load mosaicimg, labels = load_mosaic(self, index)shapes = None# MixUp https://arxiv.org/pdf/1710.09412.pdfif random.random() < hyp['mixup']:img2, labels2 = load_mosaic(self, random.randint(0, len(self.labels) - 1))r = np.random.beta(8.0, 8.0)  # mixup ratio, alpha=beta=8.0img = (img * r + img2 * (1 - r)).astype(np.uint8)labels = np.concatenate((labels, labels2), 0)else:# Load imageimg, (h0, w0), (h, w) = load_image(self, index)# Letterboxshape = self.batch_shapes[self.batch[index]] if self.rect else self.img_size  # final letterboxed shapeimg, ratio, pad = letterbox(img, shape, auto=False, scaleup=self.augment)shapes = (h0, w0), ((h / h0, w / w0), pad)  # for COCO mAP rescaling# Load labelslabels = []x = self.labels[index]if x.size > 0:# Normalized xywh to pixel xyxy formatlabels = x.copy()labels[:, 1] = ratio[0] * w * (x[:, 1] - x[:, 3] / 2) + pad[0]  # pad widthlabels[:, 2] = ratio[1] * h * (x[:, 2] - x[:, 4] / 2) + pad[1]  # pad heightlabels[:, 3] = ratio[0] * w * (x[:, 1] + x[:, 3] / 2) + pad[0]labels[:, 4] = ratio[1] * h * (x[:, 2] + x[:, 4] / 2) + pad[1]if self.augment:# Augment imagespaceif not mosaic: #这个之前在mosaic方法最后做过了img, labels = random_perspective(img, labels,degrees=hyp['degrees'],translate=hyp['translate'],scale=hyp['scale'],shear=hyp['shear'],perspective=hyp['perspective'])# Augment colorspace h:色调 s:饱和度 V:亮度augment_hsv(img, hgain=hyp['hsv_h'], sgain=hyp['hsv_s'], vgain=hyp['hsv_v'])# Apply cutouts# if random.random() < 0.9:#     labels = cutout(img, labels)nL = len(labels)  # number of labels if nL: #1.调整标签格式 2.归一化标签取值范围labels[:, 1:5] = xyxy2xywh(labels[:, 1:5])  # convert xyxy to xywhlabels[:, [2, 4]] /= img.shape[0]  # normalized height 0-1labels[:, [1, 3]] /= img.shape[1]  # normalized width 0-1if self.augment:#要不要做翻转操作# flip up-downif random.random() < hyp['flipud']:img = np.flipud(img)if nL:labels[:, 2] = 1 - labels[:, 2]# flip left-rightif random.random() < hyp['fliplr']:img = np.fliplr(img)if nL:labels[:, 1] = 1 - labels[:, 1]labels_out = torch.zeros((nL, 6))if nL:labels_out[:, 1:] = torch.from_numpy(labels)# Convertimg = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416 要满足pytorch的格式img = np.ascontiguousarray(img)return torch.from_numpy(img), labels_out, self.img_files[index], shapes@staticmethoddef collate_fn(batch):img, label, path, shapes = zip(*batch)  # transposedfor i, l in enumerate(label):l[:, 0] = i  # add target image index for build_targets()return torch.stack(img, 0), torch.cat(label, 0), path, shapes

2)图像数据源配置。

对datasets.py中的LoadImagesAndLabels类分析：

训练时主要用到这个类处理数据.其中这个类的__init__这个初始化方法主要是读取图像数据与标签数据。__getitem__方法是用来做数据增强的。

<1>数据源读取与处理

1>图像数据源配置。按路径读取数据源(本例都是图片数据)，如下图：

至此读取数据源(图片)过程完成，下一步就是读取标签数据

2>加载标签数据

按以结尾的.cache缓存名去缓存中读取，如果存在就读取出来标签出来,否则就结合实际图像名称与标签文件路径读取txt文件中的坐标值,并保存到缓存中(以dict的格式存在，以具体的图像名为key形式，标签值用array保存，同时也把每一个的shape也在缓存中保存下来了)。读取标签放入到缓存的代码(LoadImagesAndLabels类中的cache_labels方法)如下：

    def cache_labels(self, path='labels.cache'):# Cache dataset labels, check images and read shapesx = {}  # dictpbar = tqdm(zip(self.img_files, self.label_files), desc='Scanning images', total=len(self.img_files))for (img, label) in pbar:try:l = []image = Image.open(img)image.verify()  # PIL verify# _ = io.imread(img)  # skimage verify (from skimage import io)shape = exif_size(image)  # image sizeassert (shape[0] > 9) & (shape[1] > 9), 'image size <10 pixels'if os.path.isfile(label):with open(label, 'r') as f:l = np.array([x.split() for x in f.read().splitlines()], dtype=np.float32)  # labelsif len(l) == 0:l = np.zeros((0, 5), dtype=np.float32)x[img] = [l, shape]except Exception as e:x[img] = [None, None]print('WARNING: %s: %s' % (img, e))x['hash'] = get_hash(self.label_files + self.img_files)torch.save(x, path)  # save for next timereturn x

本例缓存文件保存到这个目录下，文件名是以.cache结尾

上图中如果后续要做物体检测后，要对物体进行相似度比较(create_datasubset=true)或者对检查到的框的内容提取出来(extract_bounding_boxes=true)。

至此标签中的类别与坐标值都已经读取完毕了。

3>Mosaic数据增强方法

数据增强目的是降低过拟合风险。

经过前面二步的图像数据与标签数据读进来后，就可以进行增强操作了.现在还是进入LoadImagesAndLabels类的__getitem__方法,如下图:

    def __getitem__(self, index):if self.image_weights:index = self.indices[index]hyp = self.hypmosaic = self.mosaic and random.random() < hyp['mosaic']if mosaic:# Load mosaicimg, labels = load_mosaic(self, index)shapes = None# MixUp https://arxiv.org/pdf/1710.09412.pdfif random.random() < hyp['mixup']:img2, labels2 = load_mosaic(self, random.randint(0, len(self.labels) - 1))r = np.random.beta(8.0, 8.0)  # mixup ratio, alpha=beta=8.0img = (img * r + img2 * (1 - r)).astype(np.uint8)labels = np.concatenate((labels, labels2), 0)else:# Load imageimg, (h0, w0), (h, w) = load_image(self, index)# Letterboxshape = self.batch_shapes[self.batch[index]] if self.rect else self.img_size  # final letterboxed shapeimg, ratio, pad = letterbox(img, shape, auto=False, scaleup=self.augment)shapes = (h0, w0), ((h / h0, w / w0), pad)  # for COCO mAP rescaling# Load labelslabels = []x = self.labels[index]if x.size > 0:# Normalized xywh to pixel xyxy formatlabels = x.copy()labels[:, 1] = ratio[0] * w * (x[:, 1] - x[:, 3] / 2) + pad[0]  # pad widthlabels[:, 2] = ratio[1] * h * (x[:, 2] - x[:, 4] / 2) + pad[1]  # pad heightlabels[:, 3] = ratio[0] * w * (x[:, 1] + x[:, 3] / 2) + pad[0]labels[:, 4] = ratio[1] * h * (x[:, 2] + x[:, 4] / 2) + pad[1]if self.augment:# Augment imagespaceif not mosaic: #这个之前在mosaic方法最后做过了img, labels = random_perspective(img, labels,degrees=hyp['degrees'],translate=hyp['translate'],scale=hyp['scale'],shear=hyp['shear'],perspective=hyp['perspective'])# Augment colorspace h:色调 s:饱和度 V:亮度augment_hsv(img, hgain=hyp['hsv_h'], sgain=hyp['hsv_s'], vgain=hyp['hsv_v'])# Apply cutouts# if random.random() < 0.9:#     labels = cutout(img, labels)nL = len(labels)  # number of labels if nL: #1.调整标签格式 2.归一化标签取值范围labels[:, 1:5] = xyxy2xywh(labels[:, 1:5])  # convert xyxy to xywhlabels[:, [2, 4]] /= img.shape[0]  # normalized height 0-1labels[:, [1, 3]] /= img.shape[1]  # normalized width 0-1if self.augment:#要不要做翻转操作# flip up-downif random.random() < hyp['flipud']:img = np.flipud(img)if nL:labels[:, 2] = 1 - labels[:, 2]# flip left-rightif random.random() < hyp['fliplr']:img = np.fliplr(img)if nL:labels[:, 1] = 1 - labels[:, 1]labels_out = torch.zeros((nL, 6))if nL:labels_out[:, 1:] = torch.from_numpy(labels)# Convertimg = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416 要满足pytorch的格式img = np.ascontiguousarray(img)return torch.from_numpy(img), labels_out, self.img_files[index], shapes

如果mosaic为真,则进入load_mosaic方法，这个方法主要是把四张图像拼接成一张大图,不光是图像发生变化，它的标签也肯定也重新计算的，将会把这大图传入到网络中的，代码如下：

def load_mosaic(self, index):# loads images in a mosaiclabels4 = []s = self.img_sizeyc, xc = [int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border]  # mosaic center x, yindices = [index] + [random.randint(0, len(self.labels) - 1) for _ in range(3)]  # 3 additional image indicesfor i, index in enumerate(indices):# Load imageimg, _, (h, w) = load_image(self, index)# place img in img4if i == 0:  # top left  1.初始化大图；2.计算当前图片放在大图中什么位置；3.计算在小图中取哪一部分放到大图中img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tilesx1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)elif i == 1:  # top rightx1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), ycx1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), helif i == 2:  # bottom leftx1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)elif i == 3:  # bottom rightx1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)#1.截图小图中的部分放到大图中 2.由于小图可能填充不满，所以还需要计算差异值，因为一会要更新坐标框标签img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]  # img4[ymin:ymax, xmin:xmax]padw = x1a - x1bpadh = y1a - y1b# Labels 标签值要重新计算，因为现在都放到大图中了x = self.labels[index]labels = x.copy()if x.size > 0:  # Normalized xywh to pixel xyxy formatlabels[:, 1] = w * (x[:, 1] - x[:, 3] / 2) + padwlabels[:, 2] = h * (x[:, 2] - x[:, 4] / 2) + padhlabels[:, 3] = w * (x[:, 1] + x[:, 3] / 2) + padwlabels[:, 4] = h * (x[:, 2] + x[:, 4] / 2) + padhlabels4.append(labels)# Concat/clip labels 坐标计算完之后可能越界，调整坐标值，让他们都在大图中if len(labels4):labels4 = np.concatenate(labels4, 0)np.clip(labels4[:, 1:], 0, 2 * s, out=labels4[:, 1:])  # use with random_perspective# img4, labels4 = replicate(img4, labels4)  # replicate# Augment 对整合的大图再进行随机旋转、平移、缩放、裁剪img4, labels4 = random_perspective(img4, labels4,degrees=self.hyp['degrees'],translate=self.hyp['translate'],scale=self.hyp['scale'],shear=self.hyp['shear'],perspective=self.hyp['perspective'],border=self.mosaic_border)  # border to removereturn img4, labels4

对load_mosaic方法分析：

由上图按图片大小(img_size=640)与x=-320可得出随机中心点位置是(808,862)，中心点是偏右下角，其中2*s表示两张图片长度1280;然后还要传入的index值与随机去选3张图片的index值一起组成4个indexs值;然后按这4个indexs进行for操作,调用load_image(self,index)取图片,重置图片长宽大小，目的将来是把四张拼在一张大图(标签也发生拼接了)中。

4>数据四合一方法与流程演示

调用img, _, (h, w) = load_image(self, index)返回后，回到了load_mosaic方法中,

由上二张图总结:h,w是从图片加载进来时得到的高度与宽度；xc,yc是大图的计算出的初始中心点位置,初始位置付给第一张图的右下角；max(xc - w, 0)目的是如果小图放到大图里面超出大图位置(越界)时，就取大图的位置0处；

计算偏移量与更新标签后至此第一张小图就放入到大图的左上角位置,然后第二张图是放在右上角了,剩下的三，四张图按同样的方式计算拼接到大图里。

下面开始对数据与标签做数据增强：

点进去的数据增强方法(用opencv去做，不是用transformer来做)如下：

def random_perspective(img, targets=(), degrees=10, translate=.1, scale=.1, shear=10, perspective=0.0, border=(0, 0)):# torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(.1, .1), scale=(.9, 1.1), shear=(-10, 10))# targets = [cls, xyxy]# 最后大图还要resize回正常的大小height = img.shape[0] + border[0] * 2  # shape(h,w,c)width = img.shape[1] + border[1] * 2#旋转 平移 缩放等操作 都需要系数矩阵（参考opencv函数，这里全部随机）# CenterC = np.eye(3)C[0, 2] = -img.shape[1] / 2  # x translation (pixels)C[1, 2] = -img.shape[0] / 2  # y translation (pixels)# Perspective 平移P = np.eye(3)P[2, 0] = random.uniform(-perspective, perspective)  # x perspective (about y)P[2, 1] = random.uniform(-perspective, perspective)  # y perspective (about x)# Rotation and Scale 旋转与缩放R = np.eye(3)a = random.uniform(-degrees, degrees)# a += random.choice([-180, -90, 0, 90])  # add 90deg rotations to small rotationss = random.uniform(1 - scale, 1 + scale)# s = 2 ** random.uniform(-scale, scale)R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=s)# Shear 裁剪S = np.eye(3)S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # x shear (deg)S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # y shear (deg)# TranslationT = np.eye(3)T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width  # x translation (pixels)T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height  # y translation (pixels)# 一起执行这些随机变换# Combined rotation matrixM = T @ S @ R @ P @ C  # order of operations (right to left) is IMPORTANTif (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any():  # image changedif perspective:img = cv2.warpPerspective(img, M, dsize=(width, height), borderValue=(114, 114, 114))else:  # affineimg = cv2.warpAffine(img, M[:2], dsize=(width, height), borderValue=(114, 114, 114))# Visualize# import matplotlib.pyplot as plt# ax = plt.subplots(1, 2, figsize=(12, 6))[1].ravel()# ax[0].imshow(img[:, :, ::-1])  # base# ax[1].imshow(img2[:, :, ::-1])  # warped# Transform label coordinates 数据变化了，标签的坐标值也得跟着一起变n = len(targets)if n:# warp pointsxy = np.ones((n * 4, 3))xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2)  # x1y1, x2y2, x1y2, x2y1xy = xy @ M.T  # transformif perspective:xy = (xy[:, :2] / xy[:, 2:3]).reshape(n, 8)  # rescaleelse:  # affinexy = xy[:, :2].reshape(n, 8)# create new boxesx = xy[:, [0, 2, 4, 6]]y = xy[:, [1, 3, 5, 7]]xy = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T# # apply angle-based reduction of bounding boxes# radians = a * math.pi / 180# reduction = max(abs(math.sin(radians)), abs(math.cos(radians))) ** 0.5# x = (xy[:, 2] + xy[:, 0]) / 2# y = (xy[:, 3] + xy[:, 1]) / 2# w = (xy[:, 2] - xy[:, 0]) * reduction# h = (xy[:, 3] - xy[:, 1]) * reduction# xy = np.concatenate((x - w / 2, y - h / 2, x + w / 2, y + h / 2)).reshape(4, n).T# clip boxesxy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width)xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height)# filter candidatesi = box_candidates(box1=targets[:, 1:5].T * s, box2=xy.T)targets = targets[i]targets[:, 1:5] = xy[i]return img, targets

最后返回图片数据与标签回去，数据增强方法random_perspective结束，同样调用load_mosaic方法也就结束了，现在返回到__getitem__方法中。

5>getItem构建batch

进入到datasets.py中的__getitem__方法里，最后代码有：

        nL = len(labels)  # number of labels if nL: #1.调整标签格式 2.归一化标签取值范围labels[:, 1:5] = xyxy2xywh(labels[:, 1:5])  # convert xyxy to xywhlabels[:, [2, 4]] /= img.shape[0]  # normalized height 0-1labels[:, [1, 3]] /= img.shape[1]  # normalized width 0-1if self.augment:#要不要做翻转操作# flip up-downif random.random() < hyp['flipud']:img = np.flipud(img)if nL:labels[:, 2] = 1 - labels[:, 2]# flip left-rightif random.random() < hyp['fliplr']:img = np.fliplr(img)if nL:labels[:, 1] = 1 - labels[:, 1]labels_out = torch.zeros((nL, 6))if nL:labels_out[:, 1:] = torch.from_numpy(labels)# Convertimg = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416 要满足pytorch的格式img = np.ascontiguousarray(img)return torch.from_numpy(img), labels_out, self.img_files[index], shapes

最后把转成矩阵的图像数据与标签数据，shapes值返回，至此利用__getitem__得到一批数据与标签工作就完成，然后结合设置几个batch，那就调几次__getitem__方法就行了。拿到矩阵数据后就可供网络前向传播与网络反向传播做训练了。

(2)网络模型细节与逻辑

6)网络架构图可视化工具安装

<1>打开网页版：https://netron.app/，然后导入相应的模型文件(例如可在yolov5的models目录下有预训练模型文件yolov5s.pt与yolov5s.onnx文件，直接打开即可，发现onnx格式的文件打开后比较清晰各结构走向)。onnx文件是由pt文件用代码转换成的,那就是按下面<2>可视化工具中的第二点开始，在源代码export.py中输入参数后转换得到

<2>手动安装(有源码下载: https://github.com/lutzroeder/netron)：

7) V5网络配置文件解读

yolov5s.yaml主要是网络结构参数配置，例包括特征提取层和网络输出层配置等等。yolov5s.yaml主要配置信息代码如下：

# parameters
nc: 2  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple# anchors
anchors:- [10,13, 16,30, 33,23]  # P3/8- [30,61, 62,45, 59,119]  # P4/16- [116,90, 156,198, 373,326]  # P5/32# YOLOv5 backbone
backbone:# [from, number, module, args][[-1, 1, Focus, [64, 3]],  # 0-P1/2[-1, 1, Conv, [128, 3, 2]],  # 1-P2/4[-1, 3, BottleneckCSP, [128]],[-1, 1, Conv, [256, 3, 2]],  # 3-P3/8[-1, 9, BottleneckCSP, [256]],[-1, 1, Conv, [512, 3, 2]],  # 5-P4/16[-1, 9, BottleneckCSP, [512]],[-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32[-1, 1, SPP, [1024, [5, 9, 13]]],[-1, 3, BottleneckCSP, [1024, False]],  # 9]# YOLOv5 head
head:[[-1, 1, Conv, [512, 1, 1]],[-1, 1, nn.Upsample, [None, 2, 'nearest']],[[-1, 6], 1, Concat, [1]],  # cat backbone P4[-1, 3, BottleneckCSP, [512, False]],  # 13[-1, 1, Conv, [256, 1, 1]],[-1, 1, nn.Upsample, [None, 2, 'nearest']],[[-1, 4], 1, Concat, [1]],  # cat backbone P3[-1, 3, BottleneckCSP, [256, False]],  # 17 (P3/8-small)[-1, 1, Conv, [256, 3, 2]],[[-1, 14], 1, Concat, [1]],  # cat head P4[-1, 3, BottleneckCSP, [512, False]],  # 20 (P4/16-medium)[-1, 1, Conv, [512, 3, 2]],[[-1, 10], 1, Concat, [1]],  # cat head P5[-1, 3, BottleneckCSP, [1024, False]],  # 23 (P5/32-large)[[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)]

<1>paremeters中的参数说明

1>nc:分类个数

2>depth_multiple:模型深度参数,表示网络中重复执行的比例,从配置文件中读出number值为非1的值后要乘以这个参数值就是真正重复执行的次数了，例如

3>width_multiple:卷积核的个数比例，最终得到几个特征图也是要乘以这个值的，例如：

<2>anchors候选框参数说明

<3>特征提取模块backbone

<4>测试输出配置head:

源码中会先读出这个配置文件后,调用相应方法解释里面的内容来。

8)Focus模块流程分析(特征提取模块的第一层)

用网页版的网页架构图打开yolov5s.onnx文件后,

注意上图中的长，宽，通道数是发生变化的。

由上图的32输出,那么就是说对应配置文件的64*0.5=32，例如下图:

运行了一个

9)完成配置文件解析任务

现在看下源代码:

其实最前面的图像数据与标签数据增强等操作是后面才进行的，首先是对网络模型的加载，前向等操作的。

yolo.py中的Model类中__init__方法主要是对网络各层的堆叠

__init__方法中读取到yolov5s.yaml配置后,将调用self.model, self.save = parse_model(deepcopy(self.yaml), ch=[ch])和按配置文件内容把每一层定义出来 ;点进去parse_model方法中代码如下:

def parse_model(d, ch):  # model_dict, input_channels(3)logger.info('\n%3s%18s%3s%10s  %-40s%-30s' % ('', 'from', 'n', 'params', 'module', 'arguments'))anchors, nc, gd, gw = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple']na = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors  # number of anchorsno = na * (nc + 5)  # number of outputs = anchors * (classes + 5)layers, save, c2 = [], [], ch[-1]  # layers, savelist, ch outfor i, (f, n, m, args) in enumerate(d['backbone'] + d['head']):  # from, number, module, argsm = eval(m) if isinstance(m, str) else m  # eval stringsfor j, a in enumerate(args):try:args[j] = eval(a) if isinstance(a, str) else a  # eval stringsexcept:passn = max(round(n * gd), 1) if n > 1 else n  # depth gainif m in [Conv, Bottleneck, SPP, DWConv, MixConv2d, Focus, CrossConv, BottleneckCSP, C3]:c1, c2 = ch[f], args[0]# Normal# if i > 0 and args[0] != no:  # channel expansion factor#     ex = 1.75  # exponential (default 2.0)#     e = math.log(c2 / ch[1]) / math.log(2)#     c2 = int(ch[1] * ex ** e)# if m != Focus:c2 = make_divisible(c2 * gw, 8) if c2 != no else c2# Experimental# if i > 0 and args[0] != no:  # channel expansion factor#     ex = 1 + gw  # exponential (default 2.0)#     ch1 = 32  # ch[1]#     e = math.log(c2 / ch1) / math.log(2)  # level 1-n#     c2 = int(ch1 * ex ** e)# if m != Focus:#     c2 = make_divisible(c2, 8) if c2 != no else c2args = [c1, c2, *args[1:]]if m in [BottleneckCSP, C3]:args.insert(2, n)n = 1elif m is nn.BatchNorm2d:args = [ch[f]]elif m is Concat:c2 = sum([ch[-1 if x == -1 else x + 1] for x in f])elif m is Detect:args.append([ch[x + 1] for x in f])if isinstance(args[1], int):  # number of anchorsargs[1] = [list(range(args[1] * 2))] * len(f)else:c2 = ch[f]m_ = nn.Sequential(*[m(*args) for _ in range(n)]) if n > 1 else m(*args)  # modulet = str(m)[8:-2].replace('__main__.', '')  # module typenp = sum([x.numel() for x in m_.parameters()])  # number paramsm_.i, m_.f, m_.type, m_.np = i, f, t, np  # attach index, 'from' index, type, number paramslogger.info('%3s%18s%3s%10.0f  %-40s%-30s' % (i, f, n, np, t, args))  # printsave.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1)  # append to savelistlayers.append(m_)ch.append(c2)return nn.Sequential(*layers), sorted(save)

现对上面代码解释:

上图中的no是指最后输出的结果个数,而nc是类别数,5是四个坐标值加上一个置信度,na是候选框

到此为止,本例中的Focus模块对应的参数构建好了，然后通过 m_ = nn.Sequential(*[m(*args) for _ in range(n)]) if n > 1 else m(*args) 就会调用common.py中把这个Focus定义进来(调用这个common.py中的Focus类的__init__方法)。如下图：

本例是加载第一个模型Focus

又因为上图中的__init__方法最后一行调用了Conv,所以它就有下面这张图(其实主要就如这三个内容组成<conv2d,batchnorm2d,hardswish>)：

最后

至此读取配置文件中的内容把网络各层都初始化加进来了，下一步就是怎样计算输入进来的矩阵数据(例如一张图片的数据)。

10)前向传播计算

还是定位在yolo.py的class Model的forward中，因为augment默认是false，所以直接会先转到下面那个forward_once方法中。在forward_once方法中它会for每一个模型，然后会在common.py文件中调用相应的模型class，并进入这个类中的forward方法执行。

上图是指对输入的数据进行分块,拼接操作，最后它的channel是变成3*4=12，而w与h都是小一半，都变成2了。

按Focus网络结构然后会进入到batchnorm.py中的class BatchNorm2d(_BatchNorm):中执行，最后还要执行激活函数就结束了。

11)BottleneckCSP层计算方法

先打开common.py中的class BottleneckCSP中的代码，可看到模型BottleneckCSP的每一层定义(如conv,conv2d,batchnorm2d,leakyrelu,sequential)情况(__init__)，然后计算的forward方法(主要是对__init__定义的各层之间进行怎样的先后计算,组合等)

class BottleneckCSP(nn.Module):# CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworksdef __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansionsuper(BottleneckCSP, self).__init__()c_ = int(c2 * e)  # hidden channelsself.cv1 = Conv(c1, c_, 1, 1)self.cv2 = nn.Conv2d(c1, c_, 1, 1, bias=False)self.cv3 = nn.Conv2d(c_, c_, 1, 1, bias=False)self.cv4 = Conv(2 * c_, c2, 1, 1)self.bn = nn.BatchNorm2d(2 * c_)  # applied to cat(cv2, cv3)self.act = nn.LeakyReLU(0.1, inplace=True)self.m = nn.Sequential(*[Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)])def forward(self, x):y1 = self.cv3(self.m(self.cv1(x)))y2 = self.cv2(x)return self.cv4(self.act(self.bn(torch.cat((y1, y2), dim=1))))

至此BottleneckCSP层计算结束。

12)SPP层计算细节分析

打开配置文件:

现在跳转到common.py中的SPP模块的forward方法中，如下图:

像SPP这里主要用到卷积是做了降维操作功能,而三个最大迟化后分别得到特征图数量还是256个的，拼接后变成1024个特征图，最后再做一次卷积使特征图又减半，这个SSP方法中的__init__中用到//表示除法取商。

到此特征提取方法内容结束。下面将说Head层里面的内容：

13)Head层流程解读

像上图中的Concat之前，需要做上采样操作，目的是把w,h变成一样，这样才能拼接在一起。

14)上采样与拼接操作

经过上采样并与第6层拼接后，w与h会不变，但特征图个数就要相加，变成了256+256=512个，如下图打印出的第13层：

15）输出结果分析

先看配置文件可知，它将从17，20，23层作为输入，最后输出3个结果，如下图：

至此按配置文件中的特征提取，计算与输出的网络结构源码就结束了

(3)参数,训练策略,最终结果与模型展示

16)超参数解读

打开/data/hyp.scratch.yaml文件，里面配置了很多训练时的初始化参数，如下图：

17)命令行参数介绍

train.py中的main方法代码如下：

if __name__ == '__main__':parser = argparse.ArgumentParser()parser.add_argument('--weights', type=str, default='yolov5s.pt', help='initial weights path')parser.add_argument('--cfg', type=str, default='', help='model.yaml path')#网络配置parser.add_argument('--data', type=str, default='data/coco128.yaml', help='data.yaml path')#数据parser.add_argument('--hyp', type=str, default='data/hyp.scratch.yaml', help='hyperparameters path')parser.add_argument('--epochs', type=int, default=300)parser.add_argument('--batch-size', type=int, default=16, help='total batch size for all GPUs')parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='[train, test] image sizes')parser.add_argument('--rect', action='store_true', help='rectangular training')#矩形训练parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training')#接着之前的训练parser.add_argument('--nosave', action='store_true', help='only save final checkpoint')#不保存parser.add_argument('--notest', action='store_true', help='only test final epoch')#不测试parser.add_argument('--noautoanchor', action='store_true', help='disable autoanchor check')#是否调整候选框parser.add_argument('--evolve', action='store_true', help='evolve hyperparameters')#超参数更新parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')parser.add_argument('--cache-images', action='store_true', help='cache images for faster training')#缓存图片parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')parser.add_argument('--name', default='', help='renames experiment folder exp{N} to exp{N}_{name} if supplied')parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%')#是否多尺度训练parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset')#是否一个类别parser.add_argument('--adam', action='store_true', help='use torch.optim.Adam() optimizer')#优化器选择parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')#跨GPU的BNparser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter, do not modify')#GPU IDparser.add_argument('--logdir', type=str, default='runs/', help='logging directory')parser.add_argument('--workers', type=int, default=0, help='maximum number of dataloader workers')#windows的同学别改opt = parser.parse_args()# Set DDP variables WORLD_SIZE：进程数 RANK：进程编号opt.total_batch_size = opt.batch_sizeopt.world_size = int(os.environ['WORLD_SIZE']) if 'WORLD_SIZE' in os.environ else 1opt.global_rank = int(os.environ['RANK']) if 'RANK' in os.environ else -1set_logging(opt.global_rank)if opt.global_rank in [-1, 0]:check_git_status()# Resumeif opt.resume:  # resume an interrupted run 是否继续训练#传入模型的路径或者最后一次跑的模型（在runs中有last.pt）ckpt = opt.resume if isinstance(opt.resume, str) else get_latest_run()  # specified or most recent pathlog_dir = Path(ckpt).parent.parent  # runs/exp0assert os.path.isfile(ckpt), 'ERROR: --resume checkpoint does not exist'with open(log_dir / 'opt.yaml') as f:opt = argparse.Namespace(**yaml.load(f, Loader=yaml.FullLoader))  # replaceopt.cfg, opt.weights, opt.resume = '', ckpt, Truelogger.info('Resuming training from %s' % ckpt)else:#加载之前配置好的参数# opt.hyp = opt.hyp or ('hyp.finetune.yaml' if opt.weights else 'hyp.scratch.yaml')opt.data, opt.cfg, opt.hyp = check_file(opt.data), check_file(opt.cfg), check_file(opt.hyp)  # check filesassert len(opt.cfg) or len(opt.weights), 'either --cfg or --weights must be specified'opt.img_size.extend([opt.img_size[-1]] * (2 - len(opt.img_size)))  # extend to 2 sizes (train, test)log_dir = increment_dir(Path(opt.logdir) / 'exp', opt.name)  # runs/exp1device = select_device(opt.device, batch_size=opt.batch_size)# DDP mode 分布式训练，没有多卡的同学略过if opt.local_rank != -1:assert torch.cuda.device_count() > opt.local_ranktorch.cuda.set_device(opt.local_rank)#选择GPUdevice = torch.device('cuda', opt.local_rank)dist.init_process_group(backend='nccl', init_method='env://')  # distributed backendassert opt.batch_size % opt.world_size == 0, '--batch-size must be multiple of CUDA device count'opt.batch_size = opt.total_batch_size // opt.world_sizelogger.info(opt)with open(opt.hyp) as f:hyp = yaml.load(f, Loader=yaml.FullLoader)  # load hyps# Trainif not opt.evolve:tb_writer = Noneif opt.global_rank in [-1, 0]:logger.info(f'Start Tensorboard with "tensorboard --logdir {opt.logdir}", view at http://localhost:6006/')tb_writer = SummaryWriter(log_dir=log_dir)  # runs/exp0train(hyp, opt, device, tb_writer)# 参数搜索与突变# Evolve hyperparameters (optional) 参考github issue:https://github.com/ultralytics/yolov3/issues/392else:# Hyperparameter evolution metadata (mutation scale 0-1, lower_limit, upper_limit)meta = {'lr0': (1, 1e-5, 1e-1),  # initial learning rate (SGD=1E-2, Adam=1E-3)'lrf': (1, 0.01, 1.0),  # final OneCycleLR learning rate (lr0 * lrf)'momentum': (0.3, 0.6, 0.98),  # SGD momentum/Adam beta1'weight_decay': (1, 0.0, 0.001),  # optimizer weight decay'warmup_epochs': (1, 0.0, 5.0),  # warmup epochs (fractions ok)'warmup_momentum': (1, 0.0, 0.95),  # warmup initial momentum'warmup_bias_lr': (1, 0.0, 0.2),  # warmup initial bias lr'box': (1, 0.02, 0.2),  # box loss gain'cls': (1, 0.2, 4.0),  # cls loss gain'cls_pw': (1, 0.5, 2.0),  # cls BCELoss positive_weight'obj': (1, 0.2, 4.0),  # obj loss gain (scale with pixels)'obj_pw': (1, 0.5, 2.0),  # obj BCELoss positive_weight'iou_t': (0, 0.1, 0.7),  # IoU training threshold'anchor_t': (1, 2.0, 8.0),  # anchor-multiple threshold'anchors': (2, 2.0, 10.0),  # anchors per output grid (0 to ignore)'fl_gamma': (0, 0.0, 2.0),  # focal loss gamma (efficientDet default gamma=1.5)'hsv_h': (1, 0.0, 0.1),  # image HSV-Hue augmentation (fraction)'hsv_s': (1, 0.0, 0.9),  # image HSV-Saturation augmentation (fraction)'hsv_v': (1, 0.0, 0.9),  # image HSV-Value augmentation (fraction)'degrees': (1, 0.0, 45.0),  # image rotation (+/- deg)'translate': (1, 0.0, 0.9),  # image translation (+/- fraction)'scale': (1, 0.0, 0.9),  # image scale (+/- gain)'shear': (1, 0.0, 10.0),  # image shear (+/- deg)'perspective': (0, 0.0, 0.001),  # image perspective (+/- fraction), range 0-0.001'flipud': (1, 0.0, 1.0),  # image flip up-down (probability)'fliplr': (0, 0.0, 1.0),  # image flip left-right (probability)'mosaic': (1, 0.0, 1.0),  # image mixup (probability)'mixup': (1, 0.0, 1.0)}  # image mixup (probability)assert opt.local_rank == -1, 'DDP mode not implemented for --evolve'opt.notest, opt.nosave = True, True  # only test/save final epoch# ei = [isinstance(x, (int, float)) for x in hyp.values()]  # evolvable indicesyaml_file = Path(opt.logdir) / 'evolve' / 'hyp_evolved.yaml'  # save best result hereif opt.bucket:os.system('gsutil cp gs://%s/evolve.txt .' % opt.bucket)  # download evolve.txt if existsfor _ in range(300):  # generations to evolveif os.path.exists('evolve.txt'):  # if evolve.txt exists: select best hyps and mutate# Select parent(s)parent = 'single'  # parent selection method: 'single' or 'weighted'x = np.loadtxt('evolve.txt', ndmin=2)n = min(5, len(x))  # number of previous results to considerx = x[np.argsort(-fitness(x))][:n]  # top n mutationsw = fitness(x) - fitness(x).min()  # weightsif parent == 'single' or len(x) == 1:# x = x[random.randint(0, n - 1)]  # random selectionx = x[random.choices(range(n), weights=w)[0]]  # weighted selectionelif parent == 'weighted':x = (x * w.reshape(n, 1)).sum(0) / w.sum()  # weighted combination# Mutatemp, s = 0.8, 0.2  # mutation probability, sigmanpr = np.randomnpr.seed(int(time.time()))g = np.array([x[0] for x in meta.values()])  # gains 0-1ng = len(meta)v = np.ones(ng)while all(v == 1):  # mutate until a change occurs (prevent duplicates)v = (g * (npr.random(ng) < mp) * npr.randn(ng) * npr.random() * s + 1).clip(0.3, 3.0)for i, k in enumerate(hyp.keys()):  # plt.hist(v.ravel(), 300)hyp[k] = float(x[i + 7] * v[i])  # mutate# Constrain to limitsfor k, v in meta.items():hyp[k] = max(hyp[k], v[1])  # lower limithyp[k] = min(hyp[k], v[2])  # upper limithyp[k] = round(hyp[k], 5)  # significant digits# Train mutationresults = train(hyp.copy(), opt, device)# Write mutation resultsprint_mutation(hyp.copy(), results, yaml_file, opt.bucket)# Plot resultsplot_evolution(yaml_file)print(f'Hyperparameter evolution complete. Best results saved as: {yaml_file}\n'f'Command to train a new model with these hyperparameters: $ python train.py --hyp {yaml_file}')

18)训练流程解读

在训练时,它会把每个训练时的情况全部保存下来，可供复源与排

一般是一次batch后就更新一次参数，而它这里是运行4个batch后才更新一次，如下图：

主要对权重，偏置，学习率的优化设置：

19)各种训练策略概述

20)模型迭代过程

深度学习物体检测之YOLOV5源码解读

V5比前面版本偏工程化,项目化,更贴合实战一.V5版本项目配置 (1)整体项目概述首先github直接查找yolov5，下载下来即可。在训练时，数据是怎么处理的？网络模型架构是怎么设计的(如各层的设计)？yolov5要求是大于python3.8与大于等…...

编程日记 2025/9/18 18:54:42

Ubuntu22.04配置3D gaussian splatting

这篇博客提供了3D gaussian splatting在新安装Ubuntu上的配置过程。 1.拉仓库 2.安装显卡驱动和cuda版本 3.安装Pytorch 4.安装Pycharm和配置Python 5.安装附加依赖项（方法一） 6.安装Anaconda（方法二） 7.测试 1.拉仓库 # HT…...

编程日记 2025/9/15 22:58:14

【Python知识】python基础-关于异常处理

python的异常处理知识概览基本用法自定义异常捕获特定异常信息异常抛出概览在Python中，异常处理是通过try、except和finally等关键字来实现的。这些关键字允许你捕获和处理在程序运行时可能出现的错误和异常情况，从而避免程序崩溃，并允许…...

编程日记 2025/9/18 20:28:09

golang 使用gzip对json例子

package main import ( "bytes" "compress/gzip" "encoding/json" "fmt" "io" "log" ) // User 结构体定义 type User struct { ID int json:"id" Name string json:"name" Age in…...

编程日记 2025/9/19 1:08:37

qt-C++笔记之自定义类继承自 `QObject` 与 `QWidget` 及开发方式详解

qt-C笔记之自定义类继承自 QObject 与 QWidget 及开发方式详解 code review! 参考笔记 1.qt-C笔记之父类窗口、父类控件、对象树的关系 2.qt-C笔记之继承自 QWidget和继承自QObject 并通过 getWidget() 显示窗口或控件时的区别和原理 3.qt-C笔记之自定义类继承自 QObject 与 QW…...

编程日记 2025/9/19 1:06:11

利用git上传项目到GitHub

GitHub是基于git实现的代码托管。git是目前最好用的版本控制系统了，非常受欢迎，比之svn更好。 GitHub可以免费使用，并且快速稳定。利用GitHub，你可以将项目存档，与其他人分享交流，并让其他开发者帮助你一…...

编程日记 2025/9/19 1:08:00

机器学习预处理-表格数据的空值处理

机器学习预处理-表格数据的空值处理机器学习预处理-表格数据的分析与可视化中详细介绍了表格数据的python可视化，可视化能够帮助我们了解数据的构成和分布，是我们进行机器学习的必备步骤。上文中也提及，原始的数据存在部分的缺失&#xff0…...

编程日记 2025/9/11 7:44:10

python学opencv|读取图像（十二）BGR图像转HSV图像

【1】引言前述已经学习了opencv中图像BGR相关知识，文章链接包括且不限于下述： python学opencv|读取图像（六）读取图像像素RGB值_opencv读取灰度图-CSDN博客 python学opencv|读取图像（七）抓取像素数据顺利…...

编程日记 2025/9/11 8:24:01

【C语言】库函数常见的陷阱与缺陷(六)：输入输出函数

目录一、printf 函数 1.1. 功能与用法 1.2. 陷阱与缺陷 1.3. 安全使用建议 1.4. 代码示例二、scanf 函数 2.1. 功能与用法 2.2. 陷阱与缺陷 2.3. 安全使用建议 2.4. 代码示例三、gets 函数 3.1. 功能与用法 3.2. 陷阱与缺陷 3.3. 安全使用建议 3.4. 代码示例…...

编程日记 2025/9/18 11:56:46

sunset: midnight

https://www.vulnhub.com/entry/sunset-midnight,517/ 主机发现端口扫描探测存活主机，8是靶机 nmap -sP 192.168.56.0/24 Starting Nmap 7.94SVN ( https://nmap.org ) at 2024-12-05 16:49 CST Nmap scan report for 192.168.56.1 …...

编程日记 2025/9/11 11:44:51

CSS Backgrounds(背景)

CSS Backgrounds(背景) Introduction(介绍) CSS backgrounds play a crucial role in web design, allowing developers to apply colors, images, and other decorative elements to the background of HTML elements. This enhances the visual appeal of web pages and he…...

编程日记 2025/9/18 3:07:47

D101【python 接口自动化学习】- pytest进阶之fixture用法

day101 pytest的fixture执行顺序学习日期：20241218 学习目标：pytest基础用法 -- pytest的fixture执行顺序学习笔记： fixtrue的作用范围实战结果 import pytestpytest.fixture(scopesession) def test_session():print(我是 session f…...

编程日记 2025/9/19 1:08:36

HCIA-Access V2.5_4_1_1路由协议基础_IP路由表

大型网络的拓扑结构一般会比较复杂，不同的部门，或者总部和分支可能处在不同的网络中，此时就需要使用路由器来连接不同的网络，实现网络之间的数据转发。本章将介绍路由协议的基础知识、路由表的分类、静态路由基础与配置、VLAN间…...

编程日记 2025/9/17 18:35:57

Meta重磅发布Llama 3.3 70B：开源AI模型的新里程碑

在人工智能领域，Meta的最新动作再次引起了全球的关注。今天，我们见证了Meta发布的Llama 3.3 70B模型，这是一个开源的人工智能模型，它不仅令人印象深刻，而且在性能上达到了一个新的高度。一，技术突破&#…...

编程日记 2025/9/15 9:52:07

20241218_segmentation

参考： 使用SA模型 https://ai.meta.com/research/publications/segment-anything/讲解生物学意义 https://www.nature.com/articles/s41593-024-01714-3#Sec13 x.0 workflow 图像分割方法识别出重要的ROI区域计算ROI区域个数（需要计算机算法&#xff…...

编程日记 2025/9/19 1:08:37

公链常用的共识算法

1. 工作量证明（Proof of Work, PoW） 工作原理：要求节点（矿工）解决一个数学难题，这个过程称为挖矿。第一个解决难题的矿工将有权添加一个新的区块到区块链上，并获得一定数量的加密货币作为奖励。…...

编程日记 2025/9/11 0:49:30

监控易在汽车制造行业信息化运维中的应用案例

引言随着汽车制造行业的数字化转型不断深入，信息化类IT软硬件设备的运行状态监控、故障告警、报表报告以及网络运行状态监控等成为了企业运维管理的关键环节。监控易作为一款全面、高效的信息化运维管理工具，在汽车制造行业中发挥着重要作用。本文将结合…...

编程日记 2025/9/16 5:21:47

Spring Boot项目使用虚拟线程

Spring Boot项目启用虚拟线程开始基本使用先写一个测试方法通过springboot配置项开启虚拟线程目前存在的问题开始虚拟线程正式发布是在JDK21，对于Spring Boot版本选择3以上。基本使用关于虚拟线程本身的使用，之前已经介绍过。这里要说的是直接将…...

编程日记 2025/9/19 1:07:13

Deveco Studio首次编译项目初始化失败

编译项目失败 Ohpm install失败的时候重新使用管理者打开程序 build init 初始化失败遇到了以下报错信息 Installing pnpm8.13.1... npm ERR! code CERT_HAS_EXPIRED npm ERR! errno CERT_HAS_EXPIRED npm ERR! request to https://registry.npm.taobao.org/pnpm failed, r…...

编程日记 2025/9/18 18:07:05

Unity 开发Apple Vision Pro空间锚点应用Spatial Anchor

空间锚点具有多方面的作用虚拟物体定位与固定： 位置保持：可以把虚拟物体固定在现实世界中的特定区域或位置。即使使用者退出程序后再次打开，之前锚定过的虚拟物体仍然能够出现在之前所锚定的位置，为用户提供连贯的体验。比如在一…...

编程日记 2025/9/19 1:07:10

由学习率跟batch size 关系引起的海塞矩阵和梯度计算在训练过程中的应用思考

最近看到了个一个学习率跟batch size 关系的帖子，里面说 OpenAI的《An Empirical *** Training》通过损失函数的二阶近似分析SGD的最优学习率，得出“学习率随着Batch Size的增加而单调递增但有上界”的结论。推导过程中将学习率作为待优化参数纳入损失函…...

编程日记 2025/9/19 1:07:12

PHP开发日志 ━━ 基础知识：四种不同的变量返回方式该如何调用

最近在给框架升级，其中涉及到古早的缓存系统升级，现在准备区分类型为混合、变量和普通文件，那么变量用什么形式存储到缓存才能给后续开发带来便利和通用性呢？于是就涉及到了本文的php基础知识。好吧，又是一个无用的知…...

编程日记 2025/9/17 12:09:59

centos上配置yum源

1. 进入yum源repo的目录 cd /etc/yum.repos.d/然后可以通过ls查看下面所有的后缀为.repo的文件 2. 新建一个备份目录，将原有的.repo文件放到其中 mkdir yum.repos.d.backup mv *.repo yum.repos.d.backup/3. 获取阿里提供的repo配置文件这里使用到了wget命令&a…...

编程日记 2025/9/11 12:30:50

Web3 时代：技术变革与未来展望

Web3作为下一代互联网技术，正在逐步改变我们使用互联网的方式。它的核心特点是去中心化，利用区块链技术，使得数据不再集中存储，用户能更好地掌控自己的信息。本文将简要介绍Web3的核心技术及其未来展望。 Web3代表的是去中心化的互…...

编程日记 2025/9/18 3:47:20

小程序快速实现大模型聊天机器人

需求分析： 基于大模型，打造一个聊天机器人；使用开放API快速搭建，例如：讯飞星火；先实现UI展示，在接入API。最终实现效果如下： 一.聊天机器人UI部分 1. 创建微信小程序&#xff0c…...

编程日记 2025/9/19 1:07:58

用shell脚本来判断web服务是否运行（端口和进程两种方式）

判断web服务是否运行（1、查看进程的方式判断该程序是否运行，2、通过查看端口的方式判断该程序是否运行），如果没有运行，则启动该服务并配置防火墙规则。------这里以nginx为例一、用进程方式判断 （1&#…...

编程日记 2025/9/19 1:08:00

Android运行低版本项目可能遇到的问题

Android运行低版本项目可能遇到的问题低版本项目总是遇到各种问题的，耐心点一、gradle-xxx.xxx.xxx.zip一直下载不下来在gradle-wrapper.properties可以试下 distributionBaseGRADLE_USER_HOME distributionPathwrapper/dists zipStoreBaseGRADLE_USER_HOME …...

编程日记 2025/9/19 1:08:01

SQL Server 解决游标性能问题的替代方案

在 SQL Server 中，游标（Cursor）是一种用于逐行处理数据集的强大工具，但在某些情况下，它们可能会导致性能问题，尤其是在处理大量数据时。为了提高性能和可维护性，可以考虑使用其他替代方案。以下…...

编程日记 2025/9/19 1:07:12

【论文笔记】CLIP-guided Prototype Modulating for Few-shot Action Recognition

🍎个人主页：小嗷犬的个人主页 🍊个人网站：小嗷犬的技术小站 🥭个人信条：为天地立心，为生民立命，为往圣继绝学，为万世开太平。基本信息标题: CLIP-guided Prototype Mo…...

编程日记 2025/9/19 1:07:59

PHP：从入门到进阶的全方位探索

PHP（Hypertext Preprocessor）作为一种开源的服务器端脚本语言，自1995年问世以来，凭借其简单易学、高效灵活的特点，迅速成为了Web开发领域的中流砥柱。无论是构建动态网页、开发Web应用程序，还是处理复杂的服…...

编程日记 2025/9/17 17:56:25

vue复习

1.试述前端开发技术发展变化历程，理解推动技术发展动力以及对软件开发职业的启发。 2.当前前端开发技术主要特征有哪些？ 前后端分离开发： 前端专注页面展示效果与用户使用体验,后端专注为前端提供数据和服务。工程化特征：模块…...

编程日记 2025/9/19 1:07:10

伊克罗德与九科信息共同发布RPA+AI智能机器人解决方案

12月12日，伊克罗德信息在上海举办“创见AI，迈进智能化未来——科技赋能零售电商”活动，与九科信息、亚马逊云科技共同探讨与分享，融合生成式AI技术和智能自动化（RPA,Robotic Process Automation）在电商零售…...

编程日记 2025/9/17 5:43:53

nano编辑器的使用

nano 是一个非常简单易用的命令行文本编辑器，它常用于在 Linux 或类 Unix 系统中快速编辑文件，特别适用于需要修改配置文件或快速编辑文本的场景。以下是一些常见的 nano 使用技巧和基本操作。 1. 打开文件要使用 nano 编辑文件，打开终端并…...

编程日记 2025/9/19 1:07:11

灵当crm pdf.php存在任意文件读取漏洞

免责声明: 本文旨在提供有关特定漏洞的深入信息，帮助用户充分了解潜在的安全风险。发布此信息的目的在于提升网络安全意识和推动技术进步，未经授权访问系统、网络或应用程序，可能会导致法律责任或严重后果。因此，作者不对读者基于本文内容所采取的任何行为承担责任。读者在…...

编程日记 2025/9/14 22:46:58

Liinux下VMware Workstation Pro的安装，建议安装最新版本17.61

建议安装最新版本17.61，否则可能有兼容性问题下载VMware Workstation安装软件从官网网站下载 https://support.broadcom.com/group/ecx/productdownloads?subfamilyVMwareWorkstationPro 选择所需版本现在最新版本是17.61，否则可能有兼容性问题…...

编程日记 2025/9/10 13:50:27

性能测试度量指标学习笔记

目录一、概要二、不同系统软件性能测试度量指标三、性能测试度量指标 1、响应时间 2、用户数 3、系统处理能力 4、错误率 5、成功率 6、资源占用率 7、CPU利用率 8、内存页交换速率 9、内存占用率 10、磁盘IO 11、磁盘吞吐量 12、网络吞吐量 13、系统稳定性…...

编程日记 2025/9/14 9:48:30

一款可以替代Navicat的数据库管理工具

Navicat是一款非常受欢迎的数据库管理工具，基本支持市面上的所有数据库、而且支持跨平台。可以说Navicat是一款功能强大、非常全面的数据库管理工具，提供了多种版本和定价方案，以满足不同用户的需求和预算。也是很多企业的首选工具&#…...

编程日记 2025/9/17 2:10:52

使用C#在目录层次结构中搜索文件以查找目标字符串

例程以递归方式搜索目录层次结构中的文件以查找目标字符串。它可以搜索几乎任何类型的文件，即使它不包含 Windows 理解的文本。例如，它可以搜索 DLL 和可执行文件以查看它们是否恰好包含字符串。下面的代码中显示的ListFiles 方法完成了大部分工作。 …...

编程日记 2025/9/14 7:00:42

C++设计模式

C设计模式什么是 C 设计模式？设计模式的用途设计模式的核心原则设计模式的分类 1. 创建型设计模式1.1 单例模式（Singleton Pattern）1.2 工厂方法模式（Factory Method Pattern）1.3 抽象工厂模式（Abstract F…...

编程日记 2025/9/15 4:03:07

LM芯片学习

1、LM7805稳压器 https://zhuanlan.zhihu.com/p/626577102?utm_campaignshareopn&utm_mediumsocial&utm_psn1852815231102873600&utm_sourcewechat_sessionhttps://zhuanlan.zhihu.com/p/626577102?utm_campaignshareopn&utm_mediumsocial&utm_psn18528…...

编程日记 2025/9/16 11:46:50

使用 MyBatis-Plus Wrapper 构建自定义 SQL 查询

前言 MyBatis-Plus (MP) 是一款基于 MyBatis 的增强工具，它简化了数据库操作，提供了诸如自动分页、条件构造器等功能，极大地提高了开发效率。其中，Wrapper 条件构造器是 MP 的核心功能之一，它允许开发者以链式调用的方…...

编程日记 2025/9/18 6:33:11

C# OpenCvSharp DNN 实现百度网盘AI大赛-表格检测第2名方案第一部分-表格边界框检测

目录说明效果模型项目代码 frmMain.cs YoloDet.cs 参考下载其他说明百度网盘AI大赛-表格检测的第2名方案。该算法包含表格边界框检测、表格分割和表格方向识别三个部分，首先，ppyoloe-plus-x 对边界框进行预测，并对置信…...

编程日记 2025/9/17 17:34:48

手分割数据集labelme格式505张1类别

数据集格式：labelme格式(不包含mask文件，仅仅包含jpg图片和对应的json文件) 图片数量(jpg文件个数)：505 标注数量(json文件个数)：505 标注类别数：1 标注类别名称:["hands"] 每个类别标注的框数&#xf…...

编程日记 2025/9/16 9:10:39

2012年西部数学奥林匹克试题(几何)

2012/G1 △ A B C \triangle ABC △ABC 内有一点 P P P, P P P 在 A B AB AB, A C AC AC 上的投影分别为 E E E, F F F, 射线 B P BP BP, C P CP CP 分别交 △ A B C \triangle ABC △ABC 的外接圆于点 M M M, N N N. r r r 为 △ A B C \triangle ABC △ABC 的内…...

编程日记 2025/9/18 23:41:08

GB28181系列三：GB28181流媒体服务器ZLMediaKit

我的音视频/流媒体开源项目(github) GB28181系列目录目录一、ZLMediaKit介绍二、 ZLMediaKit安装、运行(Ubuntu) 1、安装 2、运行 3、配置三、ZLMediaKit使用一、ZLMediaKit介绍 ZLMediaKit是一个基于C11的高性能运营级流媒体服务框架，项目地址&#xf…...

编程日记 2025/9/14 21:01:30

【微服务】SpringBoot 整合Redis Stack 构建本地向量数据库相似性查询

目录一、前言二、向量数据库介绍 2.1 什么是向量数据库 2.2 向量数据库特点 2.3 向量数据库使用场景三、常用的向量数据库解决方案 3.1 Milvus 3.1.1 Milvus是什么 3.1.2 Milvus主要特点 3.2 Faiss 3.2.1 Faiss是什么 3.2.2 Faiss主要特点 3.3 Pinecone 3.3.1 …...

编程日记 2025/9/11 12:41:58

神州数码DCME-320 online_list.php存在任意文件读取漏洞

编程日记 2025/9/15 8:37:00

Shadcn UI 实战：打造可维护的企业级组件库

"我们真的需要自己写一套组件库吗？"上周的技术评审会上,我正在和团队讨论组件库的选型。作为一个快速发展的创业公司,我们既需要高质量的组件,又想保持灵活的定制能力。在对比了多个方案后,我们选择了 shadcn/ui 这个相对较新的解决方案。说实话,最开始…...

编程日记 2025/9/11 15:24:16

C#速成（GID+图形编程）

常用类类说明Brush填充图形形状,画刷GraphicsGDI绘图画面，无法继承Pen定义绘制的对象直线等（颜色，粗细）Font定义文本格式（字体，字号） 常用结构结构说明Color颜色Point在平面中定义点Rectan…...

编程日记 2025/9/16 2:22:35

CMD使用SSH登陆Ubuntu

1.确认sshserver是否安装好 ps -e | grep sshd 450 ? 00:00:00 sshd 2、如果看到sshd那说明ssh-server已经启动了其实在/etc/ssh下有一个sshd_config 文件。对这个文件进行修改vim sshd_config。往文件中添加如下内容： Port 22 Protocol 2 PermitRootLogin yes P…...

编程日记 2025/9/12 6:15:13

相关文章：