当前位置：首页 > news >正文

第四十七篇 Vision Transformer（VIT）模型解析

news 来源：原创 2025/7/4 20:06:45

ViT（Vision Transformer）模型是一种基于Transformer架构的视觉模型，它成功地将Transformer从自然语言处理（NLP）领域引入到计算机视觉（CV）领域，专门用于处理图像数据。以下是对ViT模型的详细解析：

一、ViT模型的基本结构

ViT模型主要由三个部分组成：图像特征嵌入模块、Transformer编码器模块和MLP（多层感知机）分类模块。

图像特征嵌入模块：该模块负责将输入的图像分割成多个小块（patches），并通过卷积层或线性变换将每个小块嵌入为高维特征向量。同时，添加一个可学习的类别token（class token）作为全局信息的代表。这一步骤类似于NLP中的词嵌入，但不同的是，图像块嵌入需要保留空间位置信息，因此通常还会加入位置编码。
Transformer编码器模块：该模块包含多个编码器层，每个编码器层由多头注意力机制（Multi-Head Attention）、前馈网络（Feed Forward Network）以及残差连接和层归一化（Add & Norm）组成。该模块负责捕捉图像块之间的全局依赖关系，这是Transformer模型的核心机制。
MLP分类模块：该模块位于模型最后，接收类别token的输出，并通过多层感知机（MLP）进行分类，得到最终的分类结果。

二、ViT模型的工作原理

图像块嵌入：将输入图像分割成小块，并通过嵌入层转换为高维特征向量。同时，添加一个类别token。
编码器处理：将嵌入后的图像块和类别token输入到Transformer编码器中，通过多头注意力机制和前馈网络进行迭代处理，捕捉图像块之间的全局依赖关系。
分类输出：将编码器输出的类别token送入MLP分类模块进行分类，得到最终的分类结果。

Attention 代码详解

多头注意力（Multi-Head Attention）模块，类似它是深度学习特别是自然语言处理和视觉任务中Transformer架构的核心组件。下面是对这个类的逐行解析：

class Attention(nn.Module):

定义一个名为Attention的类，它继承自PyTorch的nn.Module。nn.Module是所有神经网络模块的基类。

    def __init__(self,dim,   # 输入token的dimnum_heads=8,qkv_bias=False,qk_scale=None,attn_drop_ratio=0.,proj_drop_ratio=0.):

这是类的初始化方法，它接受以下参数：

dim：输入token的维度。
num_heads：多头注意力的头数，默认为8。
qkv_bias：是否对查询（q）、键（k）、值（v）的线性变换添加偏置项，默认为False。
qk_scale：查询和键点积后的缩放因子，默认为None，此时使用head_dim ** -0.5作为缩放因子。
attn_drop_ratio：注意力分数上的dropout比率，默认为0。
proj_drop_ratio：最终投影后的dropout比率，默认为0。

        super(Attention, self).__init__()

调用父类nn.Module的初始化方法。

        self.num_heads = num_heads

保存头数到实例变量。

        head_dim = dim // num_heads

计算每个头的维度。

        self.scale = qk_scale or head_dim ** -0.5

设置缩放因子，如果qk_scale为None，则使用head_dim ** -0.5。

        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)

定义一个线性层，它将输入维度从dim映射到dim * 3，用于生成查询、键和值。如果qkv_bias为True，则添加偏置项。

        self.attn_drop = nn.Dropout(attn_drop_ratio)

定义一个dropout层，用于注意力分数上，以减少过拟合。

        self.proj = nn.Linear(dim, dim)

定义一个线性层，用于对注意力机制的输出进行最终的投影。

        self.proj_drop = nn.Dropout(proj_drop_ratio)

定义另一个dropout层，用于最终的投影输出上。

    def forward(self, x):

定义前向传播方法。

        B, N, C = x.shape

获取输入x的形状，其中B是批次大小，N是序列长度（或token数量），C是特征维度。

        qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)

通过self.qkv线性层处理输入x，然后重塑和转置，以分离出查询、键和值，并为每个头准备它们。

        q, k, v = qkv[0], qkv[1], qkv[2]  # make torchscript happy (cannot use tensor as tuple)

从重塑后的qkv张量中提取查询、键和值。

        attn = (q @ k.transpose(-2, -1)) * self.scale

计算查询和键的点积，并应用缩放因子。

        attn = attn.softmax(dim=-1)

对缩放后的点积应用softmax函数，得到注意力权重。

        attn = self.attn_drop(attn)

对注意力权重应用dropout。

        x = (attn @ v).transpose(1, 2).reshape(B, N, C)

使用注意力权重加权值，然后重塑和转置，以匹配原始输入的维度。

        x = self.proj(x)

对加权后的值应用最终的线性投影。

        x = self.proj_drop(x)

对投影后的输出应用dropout。

        return x

返回最终的输出。

这个类实现了Transformer架构中的多头注意力机制，通过并行处理多个注意力头来捕捉输入数据的不同表示，提高了模型的表示能力和泛化能力。

github地址：https://github.com/rwightman/pytorch-image-models/
记录一下模型，方便查阅

""" Vision Transformer (ViT) in PyTorchA PyTorch implement of Vision Transformers as described in:'An Image Is Worth 16 x 16 Words: Transformers for Image Recognition at Scale'- https://arxiv.org/abs/2010.11929`How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers`- https://arxiv.org/abs/2106.10270The official jax code is released and available at https://github.com/google-research/vision_transformerDeiT model defs and weights from https://github.com/facebookresearch/deit,
paper `DeiT: Data-efficient Image Transformers` - https://arxiv.org/abs/2012.12877Acknowledgments:
* The paper authors for releasing code and weights, thanks!
* I fixed my class token impl based on Phil Wang's https://github.com/lucidrains/vit-pytorch ... check it out
for some einops/einsum fun
* Simple transformer style inspired by Andrej Karpathy's https://github.com/karpathy/minGPT
* Bert reference code checks against Huggingface Transformers and Tensorflow BertHacked together by / Copyright 2020, Ross Wightman
"""
import math
import logging
from functools import partial
from collections import OrderedDict
from copy import deepcopyimport torch
import torch.nn as nn
import torch.nn.functional as F
from timm.data import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD, IMAGENET_INCEPTION_MEAN, IMAGENET_INCEPTION_STD
from models.weight_init import  trunc_normal_, lecun_normal__logger = logging.getLogger(__name__)def adapt_input_conv(in_chans, conv_weight):conv_type = conv_weight.dtypeconv_weight = conv_weight.float()  # Some weights are in torch.half, ensure it's float for sum on CPUO, I, J, K = conv_weight.shapeif in_chans == 1:if I > 3:assert conv_weight.shape[1] % 3 == 0# For models with space2depth stemsconv_weight = conv_weight.reshape(O, I // 3, 3, J, K)conv_weight = conv_weight.sum(dim=2, keepdim=False)else:conv_weight = conv_weight.sum(dim=1, keepdim=True)elif in_chans != 3:if I != 3:raise NotImplementedError('Weight format not supported by conversion.')else:# NOTE this strategy should be better than random init, but there could be other combinations of# the original RGB input layer weights that'd work better for specific cases.repeat = int(math.ceil(in_chans / 3))conv_weight = conv_weight.repeat(1, repeat, 1, 1)[:, :in_chans, :, :]conv_weight *= (3 / float(in_chans))conv_weight = conv_weight.to(conv_type)return conv_weightdef drop_path(x, drop_prob: float = 0., training: bool = False):"""Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).This is the same as the DropConnect impl I created for EfficientNet, etc networks, however,the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted forchanging the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use'survival rate' as the argument."""if drop_prob == 0. or not training:return xkeep_prob = 1 - drop_probshape = (x.shape[0],) + (1,) * (x.ndim - 1)  # work with diff dim tensors, not just 2D ConvNetsrandom_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)random_tensor.floor_()  # binarizeoutput = x.div(keep_prob) * random_tensorreturn outputclass DropPath(nn.Module):"""Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks)."""def __init__(self, drop_prob=None):super(DropPath, self).__init__()self.drop_prob = drop_probdef forward(self, x):return drop_path(x, self.drop_prob, self.training)class PatchEmbed(nn.Module):"""2D Image to Patch Embedding"""def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=768, norm_layer=None):super().__init__()img_size = (img_size, img_size)patch_size = (patch_size, patch_size)self.img_size = img_sizeself.patch_size = patch_sizeself.grid_size = (img_size[0] // patch_size[0], img_size[1] // patch_size[1])self.num_patches = self.grid_size[0] * self.grid_size[1]self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size)self.norm = norm_layer(embed_dim) if norm_layer else nn.Identity()def forward(self, x):B, C, H, W = x.shapeassert H == self.img_size[0] and W == self.img_size[1], \f"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]})."# flatten: [B, C, H, W] -> [B, C, HW]# transpose: [B, C, HW] -> [B, HW, C]x = self.proj(x).flatten(2).transpose(1, 2)x = self.norm(x)return xclass Attention(nn.Module):def __init__(self,dim,   # 输入token的dimnum_heads=8,qkv_bias=False,qk_scale=None,attn_drop_ratio=0.,proj_drop_ratio=0.):super(Attention, self).__init__()self.num_heads = num_headshead_dim = dim // num_headsself.scale = qk_scale or head_dim ** -0.5self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)self.attn_drop = nn.Dropout(attn_drop_ratio)self.proj = nn.Linear(dim, dim)self.proj_drop = nn.Dropout(proj_drop_ratio)def forward(self, x):# [batch_size, num_patches + 1, total_embed_dim]B, N, C = x.shape# qkv(): -> [batch_size, num_patches + 1, 3 * total_embed_dim]# reshape: -> [batch_size, num_patches + 1, 3, num_heads, embed_dim_per_head]# permute: -> [3, batch_size, num_heads, num_patches + 1, embed_dim_per_head]qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)# [batch_size, num_heads, num_patches + 1, embed_dim_per_head]q, k, v = qkv[0], qkv[1], qkv[2]  # make torchscript happy (cannot use tensor as tuple)# transpose: -> [batch_size, num_heads, embed_dim_per_head, num_patches + 1]# @: multiply -> [batch_size, num_heads, num_patches + 1, num_patches + 1]attn = (q @ k.transpose(-2, -1)) * self.scaleattn = attn.softmax(dim=-1)attn = self.attn_drop(attn)# @: multiply -> [batch_size, num_heads, num_patches + 1, embed_dim_per_head]# transpose: -> [batch_size, num_patches + 1, num_heads, embed_dim_per_head]# reshape: -> [batch_size, num_patches + 1, total_embed_dim]x = (attn @ v).transpose(1, 2).reshape(B, N, C)x = self.proj(x)x = self.proj_drop(x)return xclass Mlp(nn.Module):"""MLP as used in Vision Transformer, MLP-Mixer and related networks"""def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):super().__init__()out_features = out_features or in_featureshidden_features = hidden_features or in_featuresself.fc1 = nn.Linear(in_features, hidden_features)self.act = act_layer()self.fc2 = nn.Linear(hidden_features, out_features)self.drop = nn.Dropout(drop)def forward(self, x):x = self.fc1(x)x = self.act(x)x = self.drop(x)x = self.fc2(x)x = self.drop(x)return xdef _cfg(url='', **kwargs):return {'url': url,'num_classes': 1000, 'input_size': (3, 224, 224), 'pool_size': None,'crop_pct': .9, 'interpolation': 'bicubic', 'fixed_input_size': True,'mean': IMAGENET_INCEPTION_MEAN, 'std': IMAGENET_INCEPTION_STD,'first_conv': 'patch_embed.proj', 'classifier': 'head',**kwargs}default_cfgs = {# patch models (weights from official Google JAX impl)'vit_tiny_patch16_224': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''Ti_16-i21k-300ep-lr_0.001-aug_none-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_224.npz'),'vit_tiny_patch16_384': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''Ti_16-i21k-300ep-lr_0.001-aug_none-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_384.npz',input_size=(3, 384, 384), crop_pct=1.0),'vit_small_patch32_224': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''S_32-i21k-300ep-lr_0.001-aug_light1-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_224.npz'),'vit_small_patch32_384': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''S_32-i21k-300ep-lr_0.001-aug_light1-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_384.npz',input_size=(3, 384, 384), crop_pct=1.0),'vit_small_patch16_224': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''S_16-i21k-300ep-lr_0.001-aug_light1-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_224.npz'),'vit_small_patch16_384': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''S_16-i21k-300ep-lr_0.001-aug_light1-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_384.npz',input_size=(3, 384, 384), crop_pct=1.0),'vit_base_patch32_224': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''B_32-i21k-300ep-lr_0.001-aug_medium1-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_224.npz'),'vit_base_patch32_384': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''B_32-i21k-300ep-lr_0.001-aug_light1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_384.npz',input_size=(3, 384, 384), crop_pct=1.0),'vit_base_patch16_224': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_224.npz'),'vit_base_patch16_384': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_384.npz',input_size=(3, 384, 384), crop_pct=1.0),'vit_base_patch8_224': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''B_8-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_224.npz'),'vit_large_patch32_224': _cfg(url='',  # no official model weights for this combo, only for in21k),'vit_large_patch32_384': _cfg(url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_large_p32_384-9b920ba8.pth',input_size=(3, 384, 384), crop_pct=1.0),'vit_large_patch16_224': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''L_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.1-sd_0.1--imagenet2012-steps_20k-lr_0.01-res_224.npz'),'vit_large_patch16_384': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''L_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.1-sd_0.1--imagenet2012-steps_20k-lr_0.01-res_384.npz',input_size=(3, 384, 384), crop_pct=1.0),'vit_huge_patch14_224': _cfg(url=''),'vit_giant_patch14_224': _cfg(url=''),'vit_gigantic_patch14_224': _cfg(url=''),'vit_base2_patch32_256': _cfg(url='', input_size=(3, 256, 256), crop_pct=0.95),# patch models, imagenet21k (weights from official Google JAX impl)'vit_tiny_patch16_224_in21k': _cfg(url='https://storage.googleapis.com/vit_models/augreg/Ti_16-i21k-300ep-lr_0.001-aug_none-wd_0.03-do_0.0-sd_0.0.npz',num_classes=21843),'vit_small_patch32_224_in21k': _cfg(url='https://storage.googleapis.com/vit_models/augreg/S_32-i21k-300ep-lr_0.001-aug_light1-wd_0.03-do_0.0-sd_0.0.npz',num_classes=21843),'vit_small_patch16_224_in21k': _cfg(url='https://storage.googleapis.com/vit_models/augreg/S_16-i21k-300ep-lr_0.001-aug_light1-wd_0.03-do_0.0-sd_0.0.npz',num_classes=21843),'vit_base_patch32_224_in21k': _cfg(url='https://storage.googleapis.com/vit_models/augreg/B_32-i21k-300ep-lr_0.001-aug_medium1-wd_0.03-do_0.0-sd_0.0.npz',num_classes=21843),'vit_base_patch16_224_in21k': _cfg(url='https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0.npz',num_classes=21843),'vit_base_patch8_224_in21k': _cfg(url='https://storage.googleapis.com/vit_models/augreg/B_8-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0.npz',num_classes=21843),'vit_large_patch32_224_in21k': _cfg(url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_large_patch32_224_in21k-9046d2e7.pth',num_classes=21843),'vit_large_patch16_224_in21k': _cfg(url='https://storage.googleapis.com/vit_models/augreg/L_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.1-sd_0.1.npz',num_classes=21843),'vit_huge_patch14_224_in21k': _cfg(url='https://storage.googleapis.com/vit_models/imagenet21k/ViT-H_14.npz',hf_hub='timm/vit_huge_patch14_224_in21k',num_classes=21843),# SAM trained models (https://arxiv.org/abs/2106.01548)'vit_base_patch32_224_sam': _cfg(url='https://storage.googleapis.com/vit_models/sam/ViT-B_32.npz'),'vit_base_patch16_224_sam': _cfg(url='https://storage.googleapis.com/vit_models/sam/ViT-B_16.npz'),# DINO pretrained - https://arxiv.org/abs/2104.14294 (no classifier head, for fine-tune only)'vit_small_patch16_224_dino': _cfg(url='https://dl.fbaipublicfiles.com/dino/dino_deitsmall16_pretrain/dino_deitsmall16_pretrain.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, num_classes=0),'vit_small_patch8_224_dino': _cfg(url='https://dl.fbaipublicfiles.com/dino/dino_deitsmall8_pretrain/dino_deitsmall8_pretrain.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, num_classes=0),'vit_base_patch16_224_dino': _cfg(url='https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, num_classes=0),'vit_base_patch8_224_dino': _cfg(url='https://dl.fbaipublicfiles.com/dino/dino_vitbase8_pretrain/dino_vitbase8_pretrain.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, num_classes=0),# deit models (FB weights)'deit_tiny_patch16_224': _cfg(url='https://dl.fbaipublicfiles.com/deit/deit_tiny_patch16_224-a1311bcf.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD),'deit_small_patch16_224': _cfg(url='https://dl.fbaipublicfiles.com/deit/deit_small_patch16_224-cd65a155.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD),'deit_base_patch16_224': _cfg(url='https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD),'deit_base_patch16_384': _cfg(url='https://dl.fbaipublicfiles.com/deit/deit_base_patch16_384-8de9b5d1.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, input_size=(3, 384, 384), crop_pct=1.0),'deit_tiny_distilled_patch16_224': _cfg(url='https://dl.fbaipublicfiles.com/deit/deit_tiny_distilled_patch16_224-b40b3cf7.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, classifier=('head', 'head_dist')),'deit_small_distilled_patch16_224': _cfg(url='https://dl.fbaipublicfiles.com/deit/deit_small_distilled_patch16_224-649709d9.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, classifier=('head', 'head_dist')),'deit_base_distilled_patch16_224': _cfg(url='https://dl.fbaipublicfiles.com/deit/deit_base_distilled_patch16_224-df68dfff.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, classifier=('head', 'head_dist')),'deit_base_distilled_patch16_384': _cfg(url='https://dl.fbaipublicfiles.com/deit/deit_base_distilled_patch16_384-d0272ac0.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, input_size=(3, 384, 384), crop_pct=1.0,classifier=('head', 'head_dist')),# ViT ImageNet-21K-P pretraining by MILL'vit_base_patch16_224_miil_in21k': _cfg(url='https://miil-public-eu.oss-eu-central-1.aliyuncs.com/model-zoo/ImageNet_21K_P/models/timm/vit_base_patch16_224_in21k_miil.pth',mean=(0, 0, 0), std=(1, 1, 1), crop_pct=0.875, interpolation='bilinear', num_classes=11221,),'vit_base_patch16_224_miil': _cfg(url='https://miil-public-eu.oss-eu-central-1.aliyuncs.com/model-zoo/ImageNet_21K_P/models/timm''/vit_base_patch16_224_1k_miil_84_4.pth',mean=(0, 0, 0), std=(1, 1, 1), crop_pct=0.875, interpolation='bilinear',),
}class Attention(nn.Module):def __init__(self, dim, num_heads=8, qkv_bias=False, attn_drop=0., proj_drop=0.):super().__init__()assert dim % num_heads == 0, 'dim should be divisible by num_heads'self.num_heads = num_headshead_dim = dim // num_headsself.scale = head_dim ** -0.5self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)self.attn_drop = nn.Dropout(attn_drop)self.proj = nn.Linear(dim, dim)self.proj_drop = nn.Dropout(proj_drop)def forward(self, x):B, N, C = x.shapeqkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)q, k, v = qkv.unbind(0)   # make torchscript happy (cannot use tensor as tuple)attn = (q @ k.transpose(-2, -1)) * self.scaleattn = attn.softmax(dim=-1)attn = self.attn_drop(attn)x = (attn @ v).transpose(1, 2).reshape(B, N, C)x = self.proj(x)x = self.proj_drop(x)return xclass Block(nn.Module):def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, drop=0., attn_drop=0.,drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm):super().__init__()self.norm1 = norm_layer(dim)self.attn = Attention(dim, num_heads=num_heads, qkv_bias=qkv_bias, attn_drop=attn_drop, proj_drop=drop)# NOTE: drop path for stochastic depth, we shall see if this is better than dropout hereself.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()self.norm2 = norm_layer(dim)mlp_hidden_dim = int(dim * mlp_ratio)self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)def forward(self, x):x = x + self.drop_path(self.attn(self.norm1(x)))x = x + self.drop_path(self.mlp(self.norm2(x)))return xclass VisionTransformer(nn.Module):""" Vision TransformerA PyTorch impl of : `An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale`- https://arxiv.org/abs/2010.11929Includes distillation token & head support for `DeiT: Data-efficient Image Transformers`- https://arxiv.org/abs/2012.12877"""def __init__(self, img_size=224, patch_size=16, in_chans=3, num_classes=1000, embed_dim=768, depth=12,num_heads=12, mlp_ratio=4., qkv_bias=True, representation_size=None, distilled=False,drop_rate=0., attn_drop_rate=0., drop_path_rate=0., embed_layer=PatchEmbed, norm_layer=None,act_layer=None, weight_init=''):"""Args:img_size (int, tuple): input image sizepatch_size (int, tuple): patch sizein_chans (int): number of input channelsnum_classes (int): number of classes for classification headembed_dim (int): embedding dimensiondepth (int): depth of transformernum_heads (int): number of attention headsmlp_ratio (int): ratio of mlp hidden dim to embedding dimqkv_bias (bool): enable bias for qkv if Truerepresentation_size (Optional[int]): enable and set representation layer (pre-logits) to this value if setdistilled (bool): model includes a distillation token and head as in DeiT modelsdrop_rate (float): dropout rateattn_drop_rate (float): attention dropout ratedrop_path_rate (float): stochastic depth rateembed_layer (nn.Module): patch embedding layernorm_layer: (nn.Module): normalization layerweight_init: (str): weight init scheme"""super().__init__()self.num_classes = num_classesself.num_features = self.embed_dim = embed_dim  # num_features for consistency with other modelsself.num_tokens = 2 if distilled else 1norm_layer = norm_layer or partial(nn.LayerNorm, eps=1e-6)act_layer = act_layer or nn.GELUself.patch_embed = embed_layer(img_size=img_size, patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim)num_patches = self.patch_embed.num_patchesself.cls_token = nn.Parameter(torch.zeros(1, 1, embed_dim))self.dist_token = nn.Parameter(torch.zeros(1, 1, embed_dim)) if distilled else Noneself.pos_embed = nn.Parameter(torch.zeros(1, num_patches + self.num_tokens, embed_dim))self.pos_drop = nn.Dropout(p=drop_rate)dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth)]  # stochastic depth decay ruleself.blocks = nn.Sequential(*[Block(dim=embed_dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, drop=drop_rate,attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer, act_layer=act_layer)for i in range(depth)])self.norm = norm_layer(embed_dim)# Representation layerif representation_size and not distilled:self.num_features = representation_sizeself.pre_logits = nn.Sequential(OrderedDict([('fc', nn.Linear(embed_dim, representation_size)),('act', nn.Tanh())]))else:self.pre_logits = nn.Identity()# Classifier head(s)self.head = nn.Linear(self.num_features, num_classes) if num_classes > 0 else nn.Identity()self.head_dist = Noneif distilled:self.head_dist = nn.Linear(self.embed_dim, self.num_classes) if num_classes > 0 else nn.Identity()self.init_weights(weight_init)def init_weights(self, mode=''):assert mode in ('jax', 'jax_nlhb', 'nlhb', '')head_bias = -math.log(self.num_classes) if 'nlhb' in mode else 0.trunc_normal_(self.pos_embed, std=.02)if self.dist_token is not None:trunc_normal_(self.dist_token, std=.02)trunc_normal_(self.cls_token, std=.02)self.apply(_init_vit_weights)def _init_weights(self, m):# this fn left here for compat with downstream users_init_vit_weights(m)@torch.jit.ignore()def load_pretrained(self, checkpoint_path, prefix=''):_load_weights(self, checkpoint_path, prefix)@torch.jit.ignoredef no_weight_decay(self):return {'pos_embed', 'cls_token', 'dist_token'}def get_classifier(self):if self.dist_token is None:return self.headelse:return self.head, self.head_distdef reset_classifier(self, num_classes, global_pool=''):self.num_classes = num_classesself.head = nn.Linear(self.embed_dim, num_classes) if num_classes > 0 else nn.Identity()if self.num_tokens == 2:self.head_dist = nn.Linear(self.embed_dim, self.num_classes) if num_classes > 0 else nn.Identity()def forward_features(self, x):x = self.patch_embed(x)cls_token = self.cls_token.expand(x.shape[0], -1, -1)  # stole cls_tokens impl from Phil Wang, thanksif self.dist_token is None:x = torch.cat((cls_token, x), dim=1)else:x = torch.cat((cls_token, self.dist_token.expand(x.shape[0], -1, -1), x), dim=1)x = self.pos_drop(x + self.pos_embed)x = self.blocks(x)x = self.norm(x)if self.dist_token is None:return self.pre_logits(x[:, 0])else:return x[:, 0], x[:, 1]def forward(self, x):x = self.forward_features(x)if self.head_dist is not None:x, x_dist = self.head(x[0]), self.head_dist(x[1])  # x must be a tupleif self.training and not torch.jit.is_scripting():# during inference, return the average of both classifier predictionsreturn x, x_distelse:return (x + x_dist) / 2else:x = self.head(x)return xdef _init_vit_weights(module: nn.Module, name: str = '', head_bias: float = 0., jax_impl: bool = False):""" ViT weight initialization* When called without n, head_bias, jax_impl args it will behave exactly the sameas my original init for compatibility with prev hparam / downstream use cases (ie DeiT).* When called w/ valid n (module name) and jax_impl=True, will (hopefully) match JAX impl"""if isinstance(module, nn.Linear):if name.startswith('head'):nn.init.zeros_(module.weight)nn.init.constant_(module.bias, head_bias)elif name.startswith('pre_logits'):lecun_normal_(module.weight)nn.init.zeros_(module.bias)else:if jax_impl:nn.init.xavier_uniform_(module.weight)if module.bias is not None:if 'mlp' in name:nn.init.normal_(module.bias, std=1e-6)else:nn.init.zeros_(module.bias)else:trunc_normal_(module.weight, std=.02)if module.bias is not None:nn.init.zeros_(module.bias)elif jax_impl and isinstance(module, nn.Conv2d):# NOTE conv was left to pytorch default in my original initlecun_normal_(module.weight)if module.bias is not None:nn.init.zeros_(module.bias)elif isinstance(module, (nn.LayerNorm, nn.GroupNorm, nn.BatchNorm2d)):nn.init.zeros_(module.bias)nn.init.ones_(module.weight)@torch.no_grad()
def _load_weights(model: VisionTransformer, checkpoint_path: str, prefix: str = ''):""" Load weights from .npz checkpoints for official Google Brain Flax implementation"""import numpy as npdef _n2p(w, t=True):if w.ndim == 4 and w.shape[0] == w.shape[1] == w.shape[2] == 1:w = w.flatten()if t:if w.ndim == 4:w = w.transpose([3, 2, 0, 1])elif w.ndim == 3:w = w.transpose([2, 0, 1])elif w.ndim == 2:w = w.transpose([1, 0])return torch.from_numpy(w)w = np.load(checkpoint_path)if not prefix and 'opt/target/embedding/kernel' in w:prefix = 'opt/target/'if hasattr(model.patch_embed, 'backbone'):# hybridbackbone = model.patch_embed.backbonestem_only = not hasattr(backbone, 'stem')stem = backbone if stem_only else backbone.stemstem.conv.weight.copy_(adapt_input_conv(stem.conv.weight.shape[1], _n2p(w[f'{prefix}conv_root/kernel'])))stem.norm.weight.copy_(_n2p(w[f'{prefix}gn_root/scale']))stem.norm.bias.copy_(_n2p(w[f'{prefix}gn_root/bias']))if not stem_only:for i, stage in enumerate(backbone.stages):for j, block in enumerate(stage.blocks):bp = f'{prefix}block{i + 1}/unit{j + 1}/'for r in range(3):getattr(block, f'conv{r + 1}').weight.copy_(_n2p(w[f'{bp}conv{r + 1}/kernel']))getattr(block, f'norm{r + 1}').weight.copy_(_n2p(w[f'{bp}gn{r + 1}/scale']))getattr(block, f'norm{r + 1}').bias.copy_(_n2p(w[f'{bp}gn{r + 1}/bias']))if block.downsample is not None:block.downsample.conv.weight.copy_(_n2p(w[f'{bp}conv_proj/kernel']))block.downsample.norm.weight.copy_(_n2p(w[f'{bp}gn_proj/scale']))block.downsample.norm.bias.copy_(_n2p(w[f'{bp}gn_proj/bias']))embed_conv_w = _n2p(w[f'{prefix}embedding/kernel'])else:embed_conv_w = adapt_input_conv(model.patch_embed.proj.weight.shape[1], _n2p(w[f'{prefix}embedding/kernel']))model.patch_embed.proj.weight.copy_(embed_conv_w)model.patch_embed.proj.bias.copy_(_n2p(w[f'{prefix}embedding/bias']))model.cls_token.copy_(_n2p(w[f'{prefix}cls'], t=False))pos_embed_w = _n2p(w[f'{prefix}Transformer/posembed_input/pos_embedding'], t=False)if pos_embed_w.shape != model.pos_embed.shape:pos_embed_w = resize_pos_embed(  # resize pos embedding when different size from pretrained weightspos_embed_w, model.pos_embed, getattr(model, 'num_tokens', 1), model.patch_embed.grid_size)model.pos_embed.copy_(pos_embed_w)model.norm.weight.copy_(_n2p(w[f'{prefix}Transformer/encoder_norm/scale']))model.norm.bias.copy_(_n2p(w[f'{prefix}Transformer/encoder_norm/bias']))if isinstance(model.head, nn.Linear) and model.head.bias.shape[0] == w[f'{prefix}head/bias'].shape[-1]:model.head.weight.copy_(_n2p(w[f'{prefix}head/kernel']))model.head.bias.copy_(_n2p(w[f'{prefix}head/bias']))if isinstance(getattr(model.pre_logits, 'fc', None), nn.Linear) and f'{prefix}pre_logits/bias' in w:model.pre_logits.fc.weight.copy_(_n2p(w[f'{prefix}pre_logits/kernel']))model.pre_logits.fc.bias.copy_(_n2p(w[f'{prefix}pre_logits/bias']))for i, block in enumerate(model.blocks.children()):block_prefix = f'{prefix}Transformer/encoderblock_{i}/'mha_prefix = block_prefix + 'MultiHeadDotProductAttention_1/'block.norm1.weight.copy_(_n2p(w[f'{block_prefix}LayerNorm_0/scale']))block.norm1.bias.copy_(_n2p(w[f'{block_prefix}LayerNorm_0/bias']))block.attn.qkv.weight.copy_(torch.cat([_n2p(w[f'{mha_prefix}{n}/kernel'], t=False).flatten(1).T for n in ('query', 'key', 'value')]))block.attn.qkv.bias.copy_(torch.cat([_n2p(w[f'{mha_prefix}{n}/bias'], t=False).reshape(-1) for n in ('query', 'key', 'value')]))block.attn.proj.weight.copy_(_n2p(w[f'{mha_prefix}out/kernel']).flatten(1))block.attn.proj.bias.copy_(_n2p(w[f'{mha_prefix}out/bias']))for r in range(2):getattr(block.mlp, f'fc{r + 1}').weight.copy_(_n2p(w[f'{block_prefix}MlpBlock_3/Dense_{r}/kernel']))getattr(block.mlp, f'fc{r + 1}').bias.copy_(_n2p(w[f'{block_prefix}MlpBlock_3/Dense_{r}/bias']))block.norm2.weight.copy_(_n2p(w[f'{block_prefix}LayerNorm_2/scale']))block.norm2.bias.copy_(_n2p(w[f'{block_prefix}LayerNorm_2/bias']))def resize_pos_embed(posemb, posemb_new, num_tokens=1, gs_new=()):# Rescale the grid of position embeddings when loading from state_dict. Adapted from# https://github.com/google-research/vision_transformer/blob/00883dd691c63a6830751563748663526e811cee/vit_jax/checkpoint.py#L224_logger.info('Resized position embedding: %s to %s', posemb.shape, posemb_new.shape)ntok_new = posemb_new.shape[1]if num_tokens:posemb_tok, posemb_grid = posemb[:, :num_tokens], posemb[0, num_tokens:]ntok_new -= num_tokenselse:posemb_tok, posemb_grid = posemb[:, :0], posemb[0]gs_old = int(math.sqrt(len(posemb_grid)))if not len(gs_new):  # backwards compatibilitygs_new = [int(math.sqrt(ntok_new))] * 2assert len(gs_new) >= 2_logger.info('Position embedding grid-size from %s to %s', [gs_old, gs_old], gs_new)posemb_grid = posemb_grid.reshape(1, gs_old, gs_old, -1).permute(0, 3, 1, 2)posemb_grid = F.interpolate(posemb_grid, size=gs_new, mode='bicubic', align_corners=False)posemb_grid = posemb_grid.permute(0, 2, 3, 1).reshape(1, gs_new[0] * gs_new[1], -1)posemb = torch.cat([posemb_tok, posemb_grid], dim=1)return posembdef checkpoint_filter_fn(state_dict, model):""" convert patch embedding weight from manual patchify + linear proj to conv"""out_dict = {}if 'model' in state_dict:# For deit modelsstate_dict = state_dict['model']for k, v in state_dict.items():if 'patch_embed.proj.weight' in k and len(v.shape) < 4:# For old models that I trained prior to conv based patchificationO, I, H, W = model.patch_embed.proj.weight.shapev = v.reshape(O, -1, H, W)elif k == 'pos_embed' and v.shape != model.pos_embed.shape:# To resize pos embedding when using model at different size from pretrained weightsv = resize_pos_embed(v, model.pos_embed, getattr(model, 'num_tokens', 1), model.patch_embed.grid_size)out_dict[k] = vreturn out_dictdef _create_vision_transformer(variant,img_size=224, pretrained=False, default_cfg=None, **kwargs):default_cfg = default_cfg or default_cfgs[variant]if kwargs.get('features_only', None):raise RuntimeError('features_only not implemented for Vision Transformer models.')# NOTE this extra code to support handling of repr size for in21k pretrained modelsdefault_num_classes = default_cfg['num_classes']num_classes = kwargs.get('num_classes', default_num_classes)repr_size = kwargs.pop('representation_size', None)if repr_size is not None and num_classes != default_num_classes:# Remove representation layer if fine-tuning. This may not always be the desired action,# but I feel better than doing nothing by default for fine-tuning. Perhaps a better interface?_logger.warning("Removing representation layer for fine-tuning.")repr_size = Noneprint(default_cfg)model = VisionTransformer(img_size=img_size,patch_size=kwargs['patch_size'],embed_dim=kwargs['embed_dim'],depth=kwargs['depth'],num_heads=kwargs['num_heads'],num_classes=num_classes)if pretrained:url= default_cfg.get('url', None)checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location="cpu", check_hash=True)model.load_state_dict(checkpoint["model"])return modeldef vit_tiny_patch16_224(pretrained=False, **kwargs):""" ViT-Tiny (Vit-Ti/16)"""model_kwargs = dict(patch_size=16, embed_dim=192, depth=12, num_heads=3, **kwargs)model = _create_vision_transformer('vit_tiny_patch16_224', pretrained=pretrained, **model_kwargs)return modeldef vit_tiny_patch16_384(pretrained=False, **kwargs):""" ViT-Tiny (Vit-Ti/16) @ 384x384."""model_kwargs = dict(patch_size=16, embed_dim=192, depth=12, num_heads=3, **kwargs)model = _create_vision_transformer('vit_tiny_patch16_384', pretrained=pretrained, **model_kwargs)return modeldef vit_small_patch32_224(pretrained=False, **kwargs):""" ViT-Small (ViT-S/32)"""model_kwargs = dict(patch_size=32, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('vit_small_patch32_224', pretrained=pretrained, **model_kwargs)return modeldef vit_small_patch32_384(pretrained=False, **kwargs):""" ViT-Small (ViT-S/32) at 384x384."""model_kwargs = dict(patch_size=32, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('vit_small_patch32_384', pretrained=pretrained, **model_kwargs)return modeldef vit_small_patch16_224(pretrained=False, **kwargs):""" ViT-Small (ViT-S/16)NOTE I've replaced my previous 'small' model definition and weights with the small variant from the DeiT paper"""model_kwargs = dict(patch_size=16, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('vit_small_patch16_224', pretrained=pretrained, **model_kwargs)return modeldef vit_small_patch16_384(pretrained=False, **kwargs):""" ViT-Small (ViT-S/16)NOTE I've replaced my previous 'small' model definition and weights with the small variant from the DeiT paper"""model_kwargs = dict(patch_size=16, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('vit_small_patch16_384', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch32_224(pretrained=False, **kwargs):""" ViT-Base (ViT-B/32) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-1k weights fine-tuned from in21k, source https://github.com/google-research/vision_transformer."""model_kwargs = dict(patch_size=32, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch32_224', pretrained=pretrained, **model_kwargs)return modeldef vit_base2_patch32_256(pretrained=False, **kwargs):""" ViT-Base (ViT-B/32)# FIXME experiment"""model_kwargs = dict(patch_size=32, embed_dim=896, depth=12, num_heads=14, **kwargs)model = _create_vision_transformer('vit_base2_patch32_256', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch32_384(pretrained=False, **kwargs):""" ViT-Base model (ViT-B/32) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-1k weights fine-tuned from in21k @ 384x384, source https://github.com/google-research/vision_transformer."""model_kwargs = dict(patch_size=32, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch32_384', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch16_224(pretrained=False, **kwargs):""" ViT-Base (ViT-B/16) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-1k weights fine-tuned from in21k @ 224x224, source https://github.com/google-research/vision_transformer."""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch16_224', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch16_384(pretrained=False, **kwargs):""" ViT-Base model (ViT-B/16) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-1k weights fine-tuned from in21k @ 384x384, source https://github.com/google-research/vision_transformer."""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch16_384', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch8_224(pretrained=False, **kwargs):""" ViT-Base (ViT-B/8) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-1k weights fine-tuned from in21k @ 224x224, source https://github.com/google-research/vision_transformer."""model_kwargs = dict(patch_size=8, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch8_224', pretrained=pretrained, **model_kwargs)return modeldef vit_large_patch32_224(pretrained=False, **kwargs):""" ViT-Large model (ViT-L/32) from original paper (https://arxiv.org/abs/2010.11929). No pretrained weights."""model_kwargs = dict(patch_size=32, embed_dim=1024, depth=24, num_heads=16, **kwargs)model = _create_vision_transformer('vit_large_patch32_224', pretrained=pretrained, **model_kwargs)return modeldef vit_large_patch32_384(pretrained=False, **kwargs):""" ViT-Large model (ViT-L/32) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-1k weights fine-tuned from in21k @ 384x384, source https://github.com/google-research/vision_transformer."""model_kwargs = dict(patch_size=32, embed_dim=1024, depth=24, num_heads=16, **kwargs)model = _create_vision_transformer('vit_large_patch32_384', pretrained=pretrained, **model_kwargs)return modeldef vit_large_patch16_224(pretrained=False, **kwargs):""" ViT-Large model (ViT-L/32) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-1k weights fine-tuned from in21k @ 224x224, source https://github.com/google-research/vision_transformer."""model_kwargs = dict(patch_size=16, embed_dim=1024, depth=24, num_heads=16, **kwargs)model = _create_vision_transformer('vit_large_patch16_224', pretrained=pretrained, **model_kwargs)return modeldef vit_large_patch16_384(pretrained=False, **kwargs):""" ViT-Large model (ViT-L/16) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-1k weights fine-tuned from in21k @ 384x384, source https://github.com/google-research/vision_transformer."""model_kwargs = dict(patch_size=16, embed_dim=1024, depth=24, num_heads=16, **kwargs)model = _create_vision_transformer('vit_large_patch16_384', pretrained=pretrained, **model_kwargs)return modeldef vit_huge_patch14_224(pretrained=False, **kwargs):""" ViT-Huge model (ViT-H/14) from original paper (https://arxiv.org/abs/2010.11929)."""model_kwargs = dict(patch_size=14, embed_dim=1280, depth=32, num_heads=16, **kwargs)model = _create_vision_transformer('vit_huge_patch14_224', pretrained=pretrained, **model_kwargs)return modeldef vit_giant_patch14_224(pretrained=False, **kwargs):""" ViT-Giant model (ViT-g/14) from `Scaling Vision Transformers` - https://arxiv.org/abs/2106.04560"""model_kwargs = dict(patch_size=14, embed_dim=1408, mlp_ratio=48/11, depth=40, num_heads=16, **kwargs)model = _create_vision_transformer('vit_giant_patch14_224', pretrained=pretrained, **model_kwargs)return modeldef vit_gigantic_patch14_224(pretrained=False, **kwargs):""" ViT-Gigantic model (ViT-G/14) from `Scaling Vision Transformers` - https://arxiv.org/abs/2106.04560"""model_kwargs = dict(patch_size=14, embed_dim=1664, mlp_ratio=64/13, depth=48, num_heads=16, **kwargs)model = _create_vision_transformer('vit_gigantic_patch14_224', pretrained=pretrained, **model_kwargs)return modeldef vit_tiny_patch16_224_in21k(pretrained=False, **kwargs):""" ViT-Tiny (Vit-Ti/16).ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has valid 21k classifier head and no representation (pre-logits) layer"""model_kwargs = dict(patch_size=16, embed_dim=192, depth=12, num_heads=3, **kwargs)model = _create_vision_transformer('vit_tiny_patch16_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_small_patch32_224_in21k(pretrained=False, **kwargs):""" ViT-Small (ViT-S/16)ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has valid 21k classifier head and no representation (pre-logits) layer"""model_kwargs = dict(patch_size=32, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('vit_small_patch32_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_small_patch16_224_in21k(pretrained=False, **kwargs):""" ViT-Small (ViT-S/16)ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has valid 21k classifier head and no representation (pre-logits) layer"""model_kwargs = dict(patch_size=16, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('vit_small_patch16_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch32_224_in21k(pretrained=False, **kwargs):""" ViT-Base model (ViT-B/32) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has valid 21k classifier head and no representation (pre-logits) layer"""model_kwargs = dict(patch_size=32, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch32_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch16_224_in21k(pretrained=False, **kwargs):""" ViT-Base model (ViT-B/16) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has valid 21k classifier head and no representation (pre-logits) layer"""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch16_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch8_224_in21k(pretrained=False, **kwargs):""" ViT-Base model (ViT-B/8) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has valid 21k classifier head and no representation (pre-logits) layer"""model_kwargs = dict(patch_size=8, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch8_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_large_patch32_224_in21k(pretrained=False, **kwargs):""" ViT-Large model (ViT-L/32) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has a representation layer but the 21k classifier head is zero'd out in original weights"""model_kwargs = dict(patch_size=32, embed_dim=1024, depth=24, num_heads=16, representation_size=1024, **kwargs)model = _create_vision_transformer('vit_large_patch32_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_large_patch16_224_in21k(pretrained=False, **kwargs):""" ViT-Large model (ViT-L/16) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has valid 21k classifier head and no representation (pre-logits) layer"""model_kwargs = dict(patch_size=16, embed_dim=1024, depth=24, num_heads=16, **kwargs)model = _create_vision_transformer('vit_large_patch16_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_huge_patch14_224_in21k(pretrained=False, **kwargs):""" ViT-Huge model (ViT-H/14) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has a representation layer but the 21k classifier head is zero'd out in original weights"""model_kwargs = dict(patch_size=14, embed_dim=1280, depth=32, num_heads=16, representation_size=1280, **kwargs)model = _create_vision_transformer('vit_huge_patch14_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch16_224_sam(pretrained=False, **kwargs):""" ViT-Base (ViT-B/16) w/ SAM pretrained weights. Paper: https://arxiv.org/abs/2106.01548"""# NOTE original SAM weights release worked with representation_size=768model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch16_224_sam', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch32_224_sam(pretrained=False, **kwargs):""" ViT-Base (ViT-B/32) w/ SAM pretrained weights. Paper: https://arxiv.org/abs/2106.01548"""# NOTE original SAM weights release worked with representation_size=768model_kwargs = dict(patch_size=32, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch32_224_sam', pretrained=pretrained, **model_kwargs)return modeldef vit_small_patch16_224_dino(pretrained=False, **kwargs):""" ViT-Small (ViT-S/16) w/ DINO pretrained weights (no head) - https://arxiv.org/abs/2104.14294"""model_kwargs = dict(patch_size=16, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('vit_small_patch16_224_dino', pretrained=pretrained, **model_kwargs)return modeldef vit_small_patch8_224_dino(pretrained=False, **kwargs):""" ViT-Small (ViT-S/8) w/ DINO pretrained weights (no head) - https://arxiv.org/abs/2104.14294"""model_kwargs = dict(patch_size=8, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('vit_small_patch8_224_dino', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch16_224_dino(pretrained=False, **kwargs):""" ViT-Base (ViT-B/16) /w DINO pretrained weights (no head) - https://arxiv.org/abs/2104.14294"""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch16_224_dino', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch8_224_dino(pretrained=False, **kwargs):""" ViT-Base (ViT-B/8) w/ DINO pretrained weights (no head) - https://arxiv.org/abs/2104.14294"""model_kwargs = dict(patch_size=8, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch8_224_dino', pretrained=pretrained, **model_kwargs)return modeldef deit_tiny_patch16_224(pretrained=False, **kwargs):""" DeiT-tiny model @ 224x224 from paper (https://arxiv.org/abs/2012.12877).ImageNet-1k weights from https://github.com/facebookresearch/deit."""model_kwargs = dict(patch_size=16, embed_dim=192, depth=12, num_heads=3, **kwargs)model = _create_vision_transformer('deit_tiny_patch16_224', pretrained=pretrained, **model_kwargs)return modeldef deit_small_patch16_224(pretrained=False, **kwargs):""" DeiT-small model @ 224x224 from paper (https://arxiv.org/abs/2012.12877).ImageNet-1k weights from https://github.com/facebookresearch/deit."""model_kwargs = dict(patch_size=16, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('deit_small_patch16_224', pretrained=pretrained, **model_kwargs)return modeldef deit_base_patch16_224(pretrained=False, **kwargs):""" DeiT base model @ 224x224 from paper (https://arxiv.org/abs/2012.12877).ImageNet-1k weights from https://github.com/facebookresearch/deit."""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('deit_base_patch16_224', pretrained=pretrained, **model_kwargs)return modeldef deit_base_patch16_384(pretrained=False, **kwargs):""" DeiT base model @ 384x384 from paper (https://arxiv.org/abs/2012.12877).ImageNet-1k weights from https://github.com/facebookresearch/deit."""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('deit_base_patch16_384', pretrained=pretrained, **model_kwargs)return modeldef deit_tiny_distilled_patch16_224(pretrained=False, **kwargs):""" DeiT-tiny distilled model @ 224x224 from paper (https://arxiv.org/abs/2012.12877).ImageNet-1k weights from https://github.com/facebookresearch/deit."""model_kwargs = dict(patch_size=16, embed_dim=192, depth=12, num_heads=3, **kwargs)model = _create_vision_transformer('deit_tiny_distilled_patch16_224', pretrained=pretrained,  distilled=True, **model_kwargs)return modeldef deit_small_distilled_patch16_224(pretrained=False, **kwargs):""" DeiT-small distilled model @ 224x224 from paper (https://arxiv.org/abs/2012.12877).ImageNet-1k weights from https://github.com/facebookresearch/deit."""model_kwargs = dict(patch_size=16, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('deit_small_distilled_patch16_224', pretrained=pretrained,  distilled=True, **model_kwargs)return modeldef deit_base_distilled_patch16_224(pretrained=False, **kwargs):""" DeiT-base distilled model @ 224x224 from paper (https://arxiv.org/abs/2012.12877).ImageNet-1k weights from https://github.com/facebookresearch/deit."""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('deit_base_distilled_patch16_224', pretrained=pretrained,  distilled=True, **model_kwargs)return modeldef deit_base_distilled_patch16_384(pretrained=False, **kwargs):""" DeiT-base distilled model @ 384x384 from paper (https://arxiv.org/abs/2012.12877).ImageNet-1k weights from https://github.com/facebookresearch/deit."""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('deit_base_distilled_patch16_384', pretrained=pretrained, distilled=True, **model_kwargs)return modeldef vit_base_patch16_224_miil_in21k(pretrained=False, **kwargs):""" ViT-Base (ViT-B/16) from original paper (https://arxiv.org/abs/2010.11929).Weights taken from: https://github.com/Alibaba-MIIL/ImageNet21K"""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, qkv_bias=False, **kwargs)model = _create_vision_transformer('vit_base_patch16_224_miil_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch16_224_miil(pretrained=False, **kwargs):""" ViT-Base (ViT-B/16) from original paper (https://arxiv.org/abs/2010.11929).Weights taken from: https://github.com/Alibaba-MIIL/ImageNet21K"""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, qkv_bias=False, **kwargs)model = _create_vision_transformer('vit_base_patch16_224_miil', pretrained=pretrained, **model_kwargs)return model

第四十七篇 Vision Transformer（VIT）模型解析

ViT（Vision Transformer）模型是一种基于Transformer架构的视觉模型，它成功地将Transformer从自然语言处理（NLP）领域引入到计算机视觉（CV）领域，专门用于处理图像数据。以下是对ViT模型…...

编程日记 2025/7/4 20:06:45

Redis篇-4--原理篇3--Redis发布/订阅（Pub/Sub）

1、概述 Redis 发布/订阅（Publish/Subscribe，简称 Pub/Sub）是一种消息传递模式，允许客户端订阅一个或多个通道（channel），并接收其他客户端发布到这些通道的消息。 2、Redis 发布/订阅的主要概…...

编程日记 2025/7/4 17:55:56

Spring Boot 3 中Bean的配置和实例化详解

一、引言在Java企业级开发领域，Spring Boot凭借其简洁、快速、高效的特点，迅速成为了众多开发者的首选框架。Spring Boot通过自动配置、起步依赖等特性，极大地简化了Spring应用的搭建和开发过程。而在Spring Boot的众多核心特性中&#xff…...

编程日记 2025/7/2 20:28:35

一文理解 “Bootstrap“ 在统计学背景下的含义

🍉 CSDN 叶庭云：https://yetingyun.blog.csdn.net/ 一文理解 “Bootstrap“ 在统计学背景下的含义类比：重新抽样假设我参加了班级的考试，每位同学都获得了一个成绩。现在，我想了解整个班级的平均成绩，但…...

编程日记 2025/6/30 18:09:13

多媒体文件解复用（Demuxing）过程

多媒体文件的解复用（Demuxing）过程指的是从一个多媒体容器文件（如 MP4、MKV、AVI 等）中提取不同类型的多媒体数据流（例如视频流、音频流、字幕流等）的过程。容器文件本身并不包含实际的视频或音频数据&…...

编程日记 2025/7/1 1:12:31

ARINC 标准全解析：航空电子领域多系列标准的核心内容、应用与重要意义

ARINC标准概述 ARINC标准是航空电子领域一系列重要的标准规范，由航空电子工程委员会（AEEC）编制，众多航空公司等参与支持。这些标准涵盖了从飞机设备安装、数据传输到航空电子设备功能等众多方面，确保航空电子系统的兼…...

编程日记 2025/7/4 8:06:03

开源架构安全深度解析：挑战、措施与未来

开源架构安全深度解析：挑战、措施与未来一、引言二、开源架构面临的安全挑战（一）代码漏洞 —— 隐藏的定时炸弹（二）依赖项安全 —— 牵一发而动全身（三）社区安全 —— 开放中的潜在危机三、开…...

编程日记 2025/7/2 1:27:00

Python装饰器设计模式：为函数增添风味

Python装饰器设计模式：为函数增添风味什么是装饰器？为什么需要装饰器？如何使用装饰器？示例1：简单的装饰器示例2：带参数的装饰器装饰器的使用场景总结大家好，今天我们要学习一个非常有趣的Pyt…...

编程日记 2025/7/3 3:56:54

Vue.js的生命周期

Vue.js 是一个构建用户界面的渐进式框架，它提供了一个响应式和组件化的方式来构建前端应用。了解 Vue 的生命周期对于开发者来说至关重要，因为它可以帮助我们更好地控制组件的状态和行为。本文将详细介绍 Vue 的生命周期，并提供相应的代码示例…...

编程日记 2025/7/3 19:14:40

【数据库】关系代数和SQL语句

一对于教学数据库的三个基本表学生S(S#,SNAME,AGE,SEX) 学习SC(S#,C#,GRADE) 课程(C#,CNAME,TEACHER) （1）试用关系代数表达式和SQL语句表示：检索WANG同学不学的课程号 select C# from C where C# not in(select C# from SCwhere S# in…...

编程日记 2025/7/4 18:11:07

基础使用 Pytest 测试用例实现代码 import pytest from server.service import Servicepytest.fixture def service():return Service(logger)class TestService:classmethoddef setup_class(cls):"""初始化设置一次:return:"""logger.info(&q…...

编程日记 2025/7/2 22:19:11

KV Shifting Attention Enhances Language Modeling

基本信息 📝 原文链接: https://arxiv.org/abs/2411.19574👥 作者: Mingyu Xu, Wei Cheng, Bingning Wang, Weipeng Chen🏷️ 关键词: KV shifting attention, induction heads, language modeling📚 分类: 机器学习, 自然语言处…...

编程日记 2025/7/2 16:39:42

从 Zuul 迁移到 Spring Cloud Gateway：一步步实现服务网关的升级

从 Zuul 迁移到 Spring Cloud Gateway：一步步实现服务网关的升级迁移前的准备工作迁移步骤详解第一步：查看源码第二步：启动类迁移第三步：引入 Gateway 依赖第四步编写bootstrap.yaml第五步：替换路由配置第六步&#…...

编程日记 2025/7/4 1:30:28

导入excel动态生成海报

需求：给出一份excel表格（1000条数据）,要将表格中的字段数据渲染到一张背景图片上，然后再下载图片，貌似浏览器做了限制，当连续下载10张图片后就不在下载了，然后用异步操作解决了这个问题。 // e…...

编程日记 2025/7/2 22:19:18

Unity 使用LineRenderer制作模拟2d绳子

效果展示： 实现如下： 首先，直接上代码： using System.Collections; using System.Collections.Generic; using UnityEngine;public class LineFourRender : MonoBehaviour {public Transform StartNode;public Transform MidNod…...

编程日记 2025/7/3 21:21:07

Android启动优化指南

文章目录前言一、启动分类与优化目标1、冷启动1.1 优化思路1.2 延迟初始化与按需加载1.3 并行加载与异步执行1.4 资源优化与懒加载1.5 内存优化与垃圾回收控制 2. 温启动2.1 优化应用的生命周期管理2.2 数据缓存与懒加载2.3 延迟渲染与视图优化 3. 热启动3.1 保持应用的状态3.…...

编程日记 2025/7/2 19:48:44

每日一练 | 华为 eSight 创建的缺省角色

01 真题题目下列选项中，不属于华为 eSight 创建的缺省角色的是： A. Administrator B. Monitor C. Operator D. End-User 02 真题答案 D 03 答案解析华为 eSight 是一款综合性的网络管理平台，提供了多种管理和监控功能。为了确保不同用…...

编程日记 2025/7/2 3:40:34

ubuntu 手动更换库文件解决nvcc -V和nvidia-smi不一致

NVML 库版本与驱动不匹配问题现象问题排查限制解决禁止自动更新降低库版本问题现象笔主在训练之前想查看gpu占用情况，使用watch -n 1 nvidia-smi发现： 且在推理、训练时无法使用到显卡。问题排查 cat /proc/driver/nvidia/version查看当前显卡驱…...

编程日记 2025/6/30 5:00:15

DataSophon集成CMAK KafkaManager

本次集成基于DDP1.2.1 集成CMAK-3.0.0.6 设计的json和tar包我放网盘了. 通过网盘分享的文件：DDP集成CMAK 链接: https://pan.baidu.com/s/1BR70Ajj9FxvjBlsOX4Ivhw?pwdcpmc 提取码: cpmc CMAK github上提供了zip压缩包.将压缩包解压之后在根目录下加入启动脚本…...

编程日记 2025/7/1 1:12:39

2024-2025关于华为ICT大赛考试平台常见问题

一、考生考试流程第一步：收到正式考试链接后点击考试链接并登录； 第二步：请仔细阅读诚信考试公约，阅读完成后勾选“我已阅读”，并点击确定； 第三步：上传身份证人像面进行考前校验&#xff0…...

编程日记 2025/7/3 3:18:52

Halcon中lines_gauss(Operator)算子原理及应用详解

在Halcon图像处理库中，lines_gauss算子是一个用于检测图像中线条的强大工具，它能够提供亚像素精度的线条轮廓。以下是对lines_gauss (ImageReducedTracks, Lines, 1.5, 1, 8, ‘light’, ‘true’, ‘bar-shaped’, ‘true’)算子的详细解释：…...

编程日记 2025/7/1 20:13:14

Flink集群搭建整合Yarn运行

Flink 集群 1. 服务器规划服务器h1、h4、h5 2. StandAlone 模式（不推荐） 2.1 会话模式在h1操作 #1、解压 tar -zxvf flink-1.19.1-bin-scala_2.12.tgz -C /app/#2、修改配置文件 cd /app/flink-1.19.1/conf vim conf.yaml ##内容：## j…...

编程日记 2025/6/28 19:42:40

FPGA工作原理、架构及底层资源

FPGA工作原理、架构及底层资源文章目录 FPGA工作原理、架构及底层资源前言一、FPGA工作原理二、FPGA架构及底层资源 1.FPGA架构2.FPGA底层资源 2.1可编程输入/输出单元简称（IOB）2.2可配置逻辑块2.3丰富的布线资源2.4数字时钟管理模块(DCM)2.5嵌入式块 …...

编程日记 2025/6/28 18:13:22

Postman的使用

（一）创建Collections：Collections->New Collection->创建界面填入Collection名称，比如某个系统/模块名，描述里可以稍微更详细的介绍集合的信息 Collection创建时，还可以定义Authorization 如下&#…...

编程日记 2025/6/28 19:16:39

【报错】新建springboot项目时缺少resource

1.问题描述在新建springboot项目时缺少resources,刚刚新建时的目录刚好就是去掉涂鸦的resources后的目录 2.解决方法步骤如下：【文件】--【项目结构】--【模块】--【源】--在main文件夹右击选择新建文件夹并命名为resources--在test文件夹右击选择新建文件夹并命名…...

编程日记 2025/7/2 3:14:33

phpstudy访问本地localhost无目录解决办法

phpstudy访问本地localhost无目录解决办法错误： 直接访问本地http://localhost/，出现hello word，或者直接报错，无法出现本地目录解决办法： 对于Phpstudy-2018版本来说： 找到这里的Phpstudy设置 2. 打…...

编程日记 2025/7/2 4:29:18

架构16-向微服务迈进

零、文章目录架构16-向微服务迈进 1、向微服务迈进 （1）软件开发中的“银弹”概念 **背景：**软件开发过程中常常出现工期延误、预算超支、产品质量低劣等问题，这使得管理者、程序员和用户都渴望找到一种能够显著降低成本的“银…...

编程日记 2025/6/28 20:09:53

基于Springboot汽车资讯网站【附源码】

基于Springboot汽车资讯网站效果如下： 系统主页面汽车信息页面系统登陆页面汽车信息推荐页面经销商页面留言反馈页面用户管理页面汽车信息页面研究背景随着信息技术的快速发展和互联网的普及，互联网已成为人们查找信息的重要场所。汽车资讯…...

编程日记 2025/6/28 20:13:34

Tomcat项目本地部署

今天分享一下如何在本地，不依赖于idea部署聚合项目，以我做过的哈米音乐项目为例，项目结构如下： ham-core模块为公共模块，我们只需将另外三个模块：前台、后台、文件服务器打包，将打好的jar、war包…...

编程日记 2025/7/2 15:14:58

【OpenCV】直方图

理论可以将直方图视为图形或曲线图，从而使您对图像的强度分布有一个整体的了解。它是在X轴上具有像素值(不总是从0到255的范围)，在Y轴上具有图像中相应像素数的图。这只是理解图像的另一种方式。通过查看图像的直方图，您可以直观地了解该…...

编程日记 2025/7/3 9:27:49

pika：适用于大数据量持久化的类redis组件｜jedis集成pika（二）

文章目录 0. 引言1. pika客户端支持2. jedis集成pika3. pika性能测试 0. 引言上节我们讲解了pika的搭建，这节我们来看下如何在java项目中利用jedis集成pika 1. pika客户端支持 pika支持的客户端与redis完全一致，所以理论上redis支持的客户端pika也都…...

编程日记 2025/7/3 1:18:54

Linux 进程间通信

Linux进程间通信进程间通信（IPC，Inter-Process Communication）在 Linux 下常用的方法包括： 1）管道（Pipe） 2）有名管道（FIFO） 3）消息队列&#x…...

编程日记 2025/7/2 3:47:12

【C++】快速排序详解与优化

博客主页： [小ᶻ☡꙳ᵃⁱᵍᶜ꙳] 本文专栏: C 文章目录 💯前言💯快速排序的核心思想1. 算法原理2. 算法复杂度分析时间复杂度空间复杂度 💯快速排序的代码实现与解析代码实现代码解析1. 递归终止条件2. 动态分配子数组3. 分区…...

编程日记 2025/6/28 20:14:29

【JAVA高级篇教学】第二篇：使用 Redisson 实现高效限流机制

在高并发系统中，限流是一项非常重要的技术手段，用于保护后端服务，防止因流量过大导致系统崩溃。本文将详细介绍如何使用 Redisson 提供的 RRateLimiter 实现分布式限流，以及其原理、使用场景和完整代码示例。目录一、什么是限流…...

编程日记 2025/6/28 18:12:03

NanoLog起步笔记-1

nonolog起步笔记-1 背景与上下文写在前面Nanolog与一般的实时log的异同现代log的一般特性Nanolog的选择背景与上下文因为工作中用到了NanoLog。有必要研究一下。前段时间研究了许多内容，以为写了比较详实的笔记，今天找了找，不仅笔记没找到…...

编程日记 2025/7/3 6:38:31

vs打开unity项目新建文件后无法自动补全

问题第一次双击c#文件自动打开vs编辑器的时候能自动补全，再一次在unity中新建c#文件后双击打开发现vs不能自动补全了。每次都要重新打开vs编辑器才能自动补全，导致效率很低，后面发现是没有安装扩展，注意扩展和工具的区别。解决…...

编程日记 2025/7/4 19:40:14

HDFS的Federation机制的实现原理和Erasure Coding节省存储空间的原理

目录 Federation机制的实现原理1.HDFS的分层图解（1）NameSpace（2）Block Storage1）Block Management2）Storage 2.Federation机制的优点3.Federation机制的缺点4.Federation机制的实现（1&#xff0…...

编程日记 2025/6/28 18:12:53

经验笔记：使用 PyTorch 计算多分类问题中Dice Loss 的正确方法

经验笔记：使用 PyTorch 计算多分类问题中Dice Loss 的正确方法概述 Dice Loss 是一种广泛应用于图像分割任务中的损失函数，它基于 Dice 系数（也称为 F1-score），用于衡量预测结果与真实标签之间的相似度。在 PyTorch…...

编程日记 2025/6/28 19:45:15

如何在 Ubuntu 22.04 上安装 PostgreSQL

简介 PostgreSQL（或简称Postgres）是一个关系型数据库管理系统，它提供了SQL查询语言的实现。它符合标准，并且拥有许多高级特性，比如可靠的事务处理和无需读锁的并发控制。本指南将展示如何在Ubuntu 22.04服务器上快速…...

编程日记 2025/6/28 19:24:51

正则表达式的高级方法

正则表达式的高级方法正则表达式（regex）不仅仅是简单的模式匹配工具，它还提供了一系列高级功能，使得处理复杂文本任务变得更加灵活和强大。以下是一些Python中正则表达式的高级用法： 1. 命名捕获组命名捕获组允许…...

编程日记 2025/7/3 5:57:57

axios的get和post请求，关于携带参数相关的讲解一下

在使用 Axios 发送 HTTP 请求时，GET 和 POST 请求携带参数的方式有所不同。以下是关于这两种请求方法携带参数的详细讲解： GET 请求携带参数对于 GET 请求，参数通常附加在 URL 之后，以查询字符串的形式传递。直接在 URL 中拼接…...

编程日记 2025/7/1 4:19:34

中间件--MongoDB部署及初始化js脚本（docker部署，docker-entrypoint-initdb.d，数据迁移，自动化部署）

一、概述 MongoDB是一种常见的Nosql数据库（非关系型数据库），以文档（Document）的形式存储数据。是非关系型数据库中最像关系型数据库的一种。本篇主要介绍下部署和数据迁移。在 MongoDB 官方镜像部署介绍中&#xff…...

编程日记 2025/6/28 10:48:32

基于SpringBoot框架的民宿连锁店业务系统（计算机毕业设计）+万字说明文档

系统合集跳转源码获取链接一、系统环境运行环境: 最好是java jdk 1.8，我们在这个平台上运行的。其他版本理论上也可以。 IDE环境： Eclipse,Myeclipse,IDEA或者Spring Tool Suite都可以 tomcat环境： Tomcat 7.x,8.x,9.x版本均可操作系统…...

编程日记 2025/7/3 12:19:54

PHP8 动态属性被弃用兼容方案

PHP 类中可以动态设置和获取没有声明过的类属性。这些属性不遵循具体的规则，并且需要使用 __get() 和 __set() 魔术方法对动态属性如何读写进行有效控制。 class User {private int $uid; }$user new User(); $user->name Foo; 上述代码中，User 类…...

编程日记 2025/7/3 3:32:20

Spring Boot 3.0 + MySQL 8.0 + kkFileView 实现完整文件服务

Spring Boot 3.0 MySQL 8.0 kkFileView 实现完整文件服务背景：比较常见的需求，做成公共的服务，后期维护比较简单，可扩展多个存储介质，上传逻辑简单，上传后提供一个文件id，后期可直接通过此i…...

编程日记 2025/7/1 13:06:54

【YashanDB知识库】php查询超过256长度字符串，数据被截断的问题

本文内容来自YashanDB官网，原文内容请见：https://www.yashandb.com/newsinfo/7488290.html?templateId1718516 问题现象如下图，php使用odbc数据源，查询表数据，mysql可以显示出来，yashan显示数据被截断。…...

编程日记 2025/7/1 12:59:50

为什么ETH 3.0需要Lumoz的ZK算力网络？

1.Lumoz 模块化计算层 Lumoz 协议是一个全球分布式模块化计算协议，致力于提供先进的零知识证明（ZKP）服务，支持ZK技术的发展，为ZK、AI等前沿技术提供强大的算力支撑。面对当前零知识计算领域计算成本的挑战&#xff0c…...

编程日记 2025/7/2 19:44:14

反向代理-缓存篇

文章目录强缓存一、Expires（http1.0 规范）二、cache-control（http1.1 出现的 header 信息）Cache-Control 的常用选项Cache-Control 常用选项的选择三、弊端协商缓存一、ETag二、If-None-Match三、Last-modified四、If-Modified-Since浏览器的三种刷新方式静态资源部署策略…...

编程日记 2025/6/28 20:07:58

（重点来啦！）MySql基础增删查改操作（详细）

目录一、客户端和数据库操作： 二、表操作 1.查看当前数据库中有哪些表 2.创建一张新表 3.查看表结构： 4.删除表三、CRUD增删查改 1.新增——插入 2.查询操作 a.全列查询： b.指定列查询： c.列名为表达式的查询&#…...

编程日记 2025/6/28 19:44:41

一、ViT模型的基本结构

二、ViT模型的工作原理

Attention 代码详解

相关文章：