当前位置: 首页 > news >正文

第四十七篇 Vision Transformer(VIT)模型解析

ViT(Vision Transformer)模型是一种基于Transformer架构的视觉模型,它成功地将Transformer从自然语言处理(NLP)领域引入到计算机视觉(CV)领域,专门用于处理图像数据。以下是对ViT模型的详细解析:

一、ViT模型的基本结构

ViT模型主要由三个部分组成:图像特征嵌入模块、Transformer编码器模块和MLP(多层感知机)分类模块。

  1. 图像特征嵌入模块:该模块负责将输入的图像分割成多个小块(patches),并通过卷积层或线性变换将每个小块嵌入为高维特征向量。同时,添加一个可学习的类别token(class token)作为全局信息的代表。这一步骤类似于NLP中的词嵌入,但不同的是,图像块嵌入需要保留空间位置信息,因此通常还会加入位置编码。
  2. Transformer编码器模块:该模块包含多个编码器层,每个编码器层由多头注意力机制(Multi-Head Attention)、前馈网络(Feed Forward Network)以及残差连接和层归一化(Add & Norm)组成。该模块负责捕捉图像块之间的全局依赖关系,这是Transformer模型的核心机制。
  3. MLP分类模块:该模块位于模型最后,接收类别token的输出,并通过多层感知机(MLP)进行分类,得到最终的分类结果。

二、ViT模型的工作原理

  1. 图像块嵌入:将输入图像分割成小块,并通过嵌入层转换为高维特征向量。同时,添加一个类别token。
  2. 编码器处理:将嵌入后的图像块和类别token输入到Transformer编码器中,通过多头注意力机制和前馈网络进行迭代处理,捕捉图像块之间的全局依赖关系。
  3. 分类输出:将编码器输出的类别token送入MLP分类模块进行分类,得到最终的分类结果。
Description of GIF

Attention 代码详解

多头注意力(Multi-Head Attention)模块,类似它是深度学习特别是自然语言处理和视觉任务中Transformer架构的核心组件。下面是对这个类的逐行解析:

class Attention(nn.Module):

定义一个名为Attention的类,它继承自PyTorch的nn.Modulenn.Module是所有神经网络模块的基类。

    def __init__(self,dim,   # 输入token的dimnum_heads=8,qkv_bias=False,qk_scale=None,attn_drop_ratio=0.,proj_drop_ratio=0.):

这是类的初始化方法,它接受以下参数:

  • dim:输入token的维度。
  • num_heads:多头注意力的头数,默认为8。
  • qkv_bias:是否对查询(q)、键(k)、值(v)的线性变换添加偏置项,默认为False
  • qk_scale:查询和键点积后的缩放因子,默认为None,此时使用head_dim ** -0.5作为缩放因子。
  • attn_drop_ratio:注意力分数上的dropout比率,默认为0。
  • proj_drop_ratio:最终投影后的dropout比率,默认为0。
        super(Attention, self).__init__()

调用父类nn.Module的初始化方法。

        self.num_heads = num_heads

保存头数到实例变量。

        head_dim = dim // num_heads

计算每个头的维度。

        self.scale = qk_scale or head_dim ** -0.5

设置缩放因子,如果qk_scaleNone,则使用head_dim ** -0.5

        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)

定义一个线性层,它将输入维度从dim映射到dim * 3,用于生成查询、键和值。如果qkv_biasTrue,则添加偏置项。

        self.attn_drop = nn.Dropout(attn_drop_ratio)

定义一个dropout层,用于注意力分数上,以减少过拟合。

        self.proj = nn.Linear(dim, dim)

定义一个线性层,用于对注意力机制的输出进行最终的投影。

        self.proj_drop = nn.Dropout(proj_drop_ratio)

定义另一个dropout层,用于最终的投影输出上。

    def forward(self, x):

定义前向传播方法。

        B, N, C = x.shape

获取输入x的形状,其中B是批次大小,N是序列长度(或token数量),C是特征维度。

        qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)

通过self.qkv线性层处理输入x,然后重塑和转置,以分离出查询、键和值,并为每个头准备它们。

        q, k, v = qkv[0], qkv[1], qkv[2]  # make torchscript happy (cannot use tensor as tuple)

从重塑后的qkv张量中提取查询、键和值。

        attn = (q @ k.transpose(-2, -1)) * self.scale

计算查询和键的点积,并应用缩放因子。

        attn = attn.softmax(dim=-1)

对缩放后的点积应用softmax函数,得到注意力权重。

        attn = self.attn_drop(attn)

对注意力权重应用dropout。

        x = (attn @ v).transpose(1, 2).reshape(B, N, C)

使用注意力权重加权值,然后重塑和转置,以匹配原始输入的维度。

        x = self.proj(x)

对加权后的值应用最终的线性投影。

        x = self.proj_drop(x)

对投影后的输出应用dropout。

        return x

返回最终的输出。

这个类实现了Transformer架构中的多头注意力机制,通过并行处理多个注意力头来捕捉输入数据的不同表示,提高了模型的表示能力和泛化能力。

github地址:https://github.com/rwightman/pytorch-image-models/
记录一下模型,方便查阅

""" Vision Transformer (ViT) in PyTorchA PyTorch implement of Vision Transformers as described in:'An Image Is Worth 16 x 16 Words: Transformers for Image Recognition at Scale'- https://arxiv.org/abs/2010.11929`How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers`- https://arxiv.org/abs/2106.10270The official jax code is released and available at https://github.com/google-research/vision_transformerDeiT model defs and weights from https://github.com/facebookresearch/deit,
paper `DeiT: Data-efficient Image Transformers` - https://arxiv.org/abs/2012.12877Acknowledgments:
* The paper authors for releasing code and weights, thanks!
* I fixed my class token impl based on Phil Wang's https://github.com/lucidrains/vit-pytorch ... check it out
for some einops/einsum fun
* Simple transformer style inspired by Andrej Karpathy's https://github.com/karpathy/minGPT
* Bert reference code checks against Huggingface Transformers and Tensorflow BertHacked together by / Copyright 2020, Ross Wightman
"""
import math
import logging
from functools import partial
from collections import OrderedDict
from copy import deepcopyimport torch
import torch.nn as nn
import torch.nn.functional as F
from timm.data import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD, IMAGENET_INCEPTION_MEAN, IMAGENET_INCEPTION_STD
from models.weight_init import  trunc_normal_, lecun_normal__logger = logging.getLogger(__name__)def adapt_input_conv(in_chans, conv_weight):conv_type = conv_weight.dtypeconv_weight = conv_weight.float()  # Some weights are in torch.half, ensure it's float for sum on CPUO, I, J, K = conv_weight.shapeif in_chans == 1:if I > 3:assert conv_weight.shape[1] % 3 == 0# For models with space2depth stemsconv_weight = conv_weight.reshape(O, I // 3, 3, J, K)conv_weight = conv_weight.sum(dim=2, keepdim=False)else:conv_weight = conv_weight.sum(dim=1, keepdim=True)elif in_chans != 3:if I != 3:raise NotImplementedError('Weight format not supported by conversion.')else:# NOTE this strategy should be better than random init, but there could be other combinations of# the original RGB input layer weights that'd work better for specific cases.repeat = int(math.ceil(in_chans / 3))conv_weight = conv_weight.repeat(1, repeat, 1, 1)[:, :in_chans, :, :]conv_weight *= (3 / float(in_chans))conv_weight = conv_weight.to(conv_type)return conv_weightdef drop_path(x, drop_prob: float = 0., training: bool = False):"""Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).This is the same as the DropConnect impl I created for EfficientNet, etc networks, however,the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted forchanging the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use'survival rate' as the argument."""if drop_prob == 0. or not training:return xkeep_prob = 1 - drop_probshape = (x.shape[0],) + (1,) * (x.ndim - 1)  # work with diff dim tensors, not just 2D ConvNetsrandom_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)random_tensor.floor_()  # binarizeoutput = x.div(keep_prob) * random_tensorreturn outputclass DropPath(nn.Module):"""Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks)."""def __init__(self, drop_prob=None):super(DropPath, self).__init__()self.drop_prob = drop_probdef forward(self, x):return drop_path(x, self.drop_prob, self.training)class PatchEmbed(nn.Module):"""2D Image to Patch Embedding"""def __init__(self, img_size=224, patch_size=16, in_chans=3, embed_dim=768, norm_layer=None):super().__init__()img_size = (img_size, img_size)patch_size = (patch_size, patch_size)self.img_size = img_sizeself.patch_size = patch_sizeself.grid_size = (img_size[0] // patch_size[0], img_size[1] // patch_size[1])self.num_patches = self.grid_size[0] * self.grid_size[1]self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size)self.norm = norm_layer(embed_dim) if norm_layer else nn.Identity()def forward(self, x):B, C, H, W = x.shapeassert H == self.img_size[0] and W == self.img_size[1], \f"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]})."# flatten: [B, C, H, W] -> [B, C, HW]# transpose: [B, C, HW] -> [B, HW, C]x = self.proj(x).flatten(2).transpose(1, 2)x = self.norm(x)return xclass Attention(nn.Module):def __init__(self,dim,   # 输入token的dimnum_heads=8,qkv_bias=False,qk_scale=None,attn_drop_ratio=0.,proj_drop_ratio=0.):super(Attention, self).__init__()self.num_heads = num_headshead_dim = dim // num_headsself.scale = qk_scale or head_dim ** -0.5self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)self.attn_drop = nn.Dropout(attn_drop_ratio)self.proj = nn.Linear(dim, dim)self.proj_drop = nn.Dropout(proj_drop_ratio)def forward(self, x):# [batch_size, num_patches + 1, total_embed_dim]B, N, C = x.shape# qkv(): -> [batch_size, num_patches + 1, 3 * total_embed_dim]# reshape: -> [batch_size, num_patches + 1, 3, num_heads, embed_dim_per_head]# permute: -> [3, batch_size, num_heads, num_patches + 1, embed_dim_per_head]qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)# [batch_size, num_heads, num_patches + 1, embed_dim_per_head]q, k, v = qkv[0], qkv[1], qkv[2]  # make torchscript happy (cannot use tensor as tuple)# transpose: -> [batch_size, num_heads, embed_dim_per_head, num_patches + 1]# @: multiply -> [batch_size, num_heads, num_patches + 1, num_patches + 1]attn = (q @ k.transpose(-2, -1)) * self.scaleattn = attn.softmax(dim=-1)attn = self.attn_drop(attn)# @: multiply -> [batch_size, num_heads, num_patches + 1, embed_dim_per_head]# transpose: -> [batch_size, num_patches + 1, num_heads, embed_dim_per_head]# reshape: -> [batch_size, num_patches + 1, total_embed_dim]x = (attn @ v).transpose(1, 2).reshape(B, N, C)x = self.proj(x)x = self.proj_drop(x)return xclass Mlp(nn.Module):"""MLP as used in Vision Transformer, MLP-Mixer and related networks"""def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):super().__init__()out_features = out_features or in_featureshidden_features = hidden_features or in_featuresself.fc1 = nn.Linear(in_features, hidden_features)self.act = act_layer()self.fc2 = nn.Linear(hidden_features, out_features)self.drop = nn.Dropout(drop)def forward(self, x):x = self.fc1(x)x = self.act(x)x = self.drop(x)x = self.fc2(x)x = self.drop(x)return xdef _cfg(url='', **kwargs):return {'url': url,'num_classes': 1000, 'input_size': (3, 224, 224), 'pool_size': None,'crop_pct': .9, 'interpolation': 'bicubic', 'fixed_input_size': True,'mean': IMAGENET_INCEPTION_MEAN, 'std': IMAGENET_INCEPTION_STD,'first_conv': 'patch_embed.proj', 'classifier': 'head',**kwargs}default_cfgs = {# patch models (weights from official Google JAX impl)'vit_tiny_patch16_224': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''Ti_16-i21k-300ep-lr_0.001-aug_none-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_224.npz'),'vit_tiny_patch16_384': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''Ti_16-i21k-300ep-lr_0.001-aug_none-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_384.npz',input_size=(3, 384, 384), crop_pct=1.0),'vit_small_patch32_224': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''S_32-i21k-300ep-lr_0.001-aug_light1-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_224.npz'),'vit_small_patch32_384': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''S_32-i21k-300ep-lr_0.001-aug_light1-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_384.npz',input_size=(3, 384, 384), crop_pct=1.0),'vit_small_patch16_224': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''S_16-i21k-300ep-lr_0.001-aug_light1-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_224.npz'),'vit_small_patch16_384': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''S_16-i21k-300ep-lr_0.001-aug_light1-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_384.npz',input_size=(3, 384, 384), crop_pct=1.0),'vit_base_patch32_224': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''B_32-i21k-300ep-lr_0.001-aug_medium1-wd_0.03-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_224.npz'),'vit_base_patch32_384': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''B_32-i21k-300ep-lr_0.001-aug_light1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.03-res_384.npz',input_size=(3, 384, 384), crop_pct=1.0),'vit_base_patch16_224': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_224.npz'),'vit_base_patch16_384': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_384.npz',input_size=(3, 384, 384), crop_pct=1.0),'vit_base_patch8_224': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''B_8-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_224.npz'),'vit_large_patch32_224': _cfg(url='',  # no official model weights for this combo, only for in21k),'vit_large_patch32_384': _cfg(url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_large_p32_384-9b920ba8.pth',input_size=(3, 384, 384), crop_pct=1.0),'vit_large_patch16_224': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''L_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.1-sd_0.1--imagenet2012-steps_20k-lr_0.01-res_224.npz'),'vit_large_patch16_384': _cfg(url='https://storage.googleapis.com/vit_models/augreg/''L_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.1-sd_0.1--imagenet2012-steps_20k-lr_0.01-res_384.npz',input_size=(3, 384, 384), crop_pct=1.0),'vit_huge_patch14_224': _cfg(url=''),'vit_giant_patch14_224': _cfg(url=''),'vit_gigantic_patch14_224': _cfg(url=''),'vit_base2_patch32_256': _cfg(url='', input_size=(3, 256, 256), crop_pct=0.95),# patch models, imagenet21k (weights from official Google JAX impl)'vit_tiny_patch16_224_in21k': _cfg(url='https://storage.googleapis.com/vit_models/augreg/Ti_16-i21k-300ep-lr_0.001-aug_none-wd_0.03-do_0.0-sd_0.0.npz',num_classes=21843),'vit_small_patch32_224_in21k': _cfg(url='https://storage.googleapis.com/vit_models/augreg/S_32-i21k-300ep-lr_0.001-aug_light1-wd_0.03-do_0.0-sd_0.0.npz',num_classes=21843),'vit_small_patch16_224_in21k': _cfg(url='https://storage.googleapis.com/vit_models/augreg/S_16-i21k-300ep-lr_0.001-aug_light1-wd_0.03-do_0.0-sd_0.0.npz',num_classes=21843),'vit_base_patch32_224_in21k': _cfg(url='https://storage.googleapis.com/vit_models/augreg/B_32-i21k-300ep-lr_0.001-aug_medium1-wd_0.03-do_0.0-sd_0.0.npz',num_classes=21843),'vit_base_patch16_224_in21k': _cfg(url='https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0.npz',num_classes=21843),'vit_base_patch8_224_in21k': _cfg(url='https://storage.googleapis.com/vit_models/augreg/B_8-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0.npz',num_classes=21843),'vit_large_patch32_224_in21k': _cfg(url='https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_large_patch32_224_in21k-9046d2e7.pth',num_classes=21843),'vit_large_patch16_224_in21k': _cfg(url='https://storage.googleapis.com/vit_models/augreg/L_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.1-sd_0.1.npz',num_classes=21843),'vit_huge_patch14_224_in21k': _cfg(url='https://storage.googleapis.com/vit_models/imagenet21k/ViT-H_14.npz',hf_hub='timm/vit_huge_patch14_224_in21k',num_classes=21843),# SAM trained models (https://arxiv.org/abs/2106.01548)'vit_base_patch32_224_sam': _cfg(url='https://storage.googleapis.com/vit_models/sam/ViT-B_32.npz'),'vit_base_patch16_224_sam': _cfg(url='https://storage.googleapis.com/vit_models/sam/ViT-B_16.npz'),# DINO pretrained - https://arxiv.org/abs/2104.14294 (no classifier head, for fine-tune only)'vit_small_patch16_224_dino': _cfg(url='https://dl.fbaipublicfiles.com/dino/dino_deitsmall16_pretrain/dino_deitsmall16_pretrain.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, num_classes=0),'vit_small_patch8_224_dino': _cfg(url='https://dl.fbaipublicfiles.com/dino/dino_deitsmall8_pretrain/dino_deitsmall8_pretrain.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, num_classes=0),'vit_base_patch16_224_dino': _cfg(url='https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, num_classes=0),'vit_base_patch8_224_dino': _cfg(url='https://dl.fbaipublicfiles.com/dino/dino_vitbase8_pretrain/dino_vitbase8_pretrain.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, num_classes=0),# deit models (FB weights)'deit_tiny_patch16_224': _cfg(url='https://dl.fbaipublicfiles.com/deit/deit_tiny_patch16_224-a1311bcf.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD),'deit_small_patch16_224': _cfg(url='https://dl.fbaipublicfiles.com/deit/deit_small_patch16_224-cd65a155.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD),'deit_base_patch16_224': _cfg(url='https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD),'deit_base_patch16_384': _cfg(url='https://dl.fbaipublicfiles.com/deit/deit_base_patch16_384-8de9b5d1.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, input_size=(3, 384, 384), crop_pct=1.0),'deit_tiny_distilled_patch16_224': _cfg(url='https://dl.fbaipublicfiles.com/deit/deit_tiny_distilled_patch16_224-b40b3cf7.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, classifier=('head', 'head_dist')),'deit_small_distilled_patch16_224': _cfg(url='https://dl.fbaipublicfiles.com/deit/deit_small_distilled_patch16_224-649709d9.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, classifier=('head', 'head_dist')),'deit_base_distilled_patch16_224': _cfg(url='https://dl.fbaipublicfiles.com/deit/deit_base_distilled_patch16_224-df68dfff.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, classifier=('head', 'head_dist')),'deit_base_distilled_patch16_384': _cfg(url='https://dl.fbaipublicfiles.com/deit/deit_base_distilled_patch16_384-d0272ac0.pth',mean=IMAGENET_DEFAULT_MEAN, std=IMAGENET_DEFAULT_STD, input_size=(3, 384, 384), crop_pct=1.0,classifier=('head', 'head_dist')),# ViT ImageNet-21K-P pretraining by MILL'vit_base_patch16_224_miil_in21k': _cfg(url='https://miil-public-eu.oss-eu-central-1.aliyuncs.com/model-zoo/ImageNet_21K_P/models/timm/vit_base_patch16_224_in21k_miil.pth',mean=(0, 0, 0), std=(1, 1, 1), crop_pct=0.875, interpolation='bilinear', num_classes=11221,),'vit_base_patch16_224_miil': _cfg(url='https://miil-public-eu.oss-eu-central-1.aliyuncs.com/model-zoo/ImageNet_21K_P/models/timm''/vit_base_patch16_224_1k_miil_84_4.pth',mean=(0, 0, 0), std=(1, 1, 1), crop_pct=0.875, interpolation='bilinear',),
}class Attention(nn.Module):def __init__(self, dim, num_heads=8, qkv_bias=False, attn_drop=0., proj_drop=0.):super().__init__()assert dim % num_heads == 0, 'dim should be divisible by num_heads'self.num_heads = num_headshead_dim = dim // num_headsself.scale = head_dim ** -0.5self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)self.attn_drop = nn.Dropout(attn_drop)self.proj = nn.Linear(dim, dim)self.proj_drop = nn.Dropout(proj_drop)def forward(self, x):B, N, C = x.shapeqkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)q, k, v = qkv.unbind(0)   # make torchscript happy (cannot use tensor as tuple)attn = (q @ k.transpose(-2, -1)) * self.scaleattn = attn.softmax(dim=-1)attn = self.attn_drop(attn)x = (attn @ v).transpose(1, 2).reshape(B, N, C)x = self.proj(x)x = self.proj_drop(x)return xclass Block(nn.Module):def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, drop=0., attn_drop=0.,drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm):super().__init__()self.norm1 = norm_layer(dim)self.attn = Attention(dim, num_heads=num_heads, qkv_bias=qkv_bias, attn_drop=attn_drop, proj_drop=drop)# NOTE: drop path for stochastic depth, we shall see if this is better than dropout hereself.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()self.norm2 = norm_layer(dim)mlp_hidden_dim = int(dim * mlp_ratio)self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)def forward(self, x):x = x + self.drop_path(self.attn(self.norm1(x)))x = x + self.drop_path(self.mlp(self.norm2(x)))return xclass VisionTransformer(nn.Module):""" Vision TransformerA PyTorch impl of : `An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale`- https://arxiv.org/abs/2010.11929Includes distillation token & head support for `DeiT: Data-efficient Image Transformers`- https://arxiv.org/abs/2012.12877"""def __init__(self, img_size=224, patch_size=16, in_chans=3, num_classes=1000, embed_dim=768, depth=12,num_heads=12, mlp_ratio=4., qkv_bias=True, representation_size=None, distilled=False,drop_rate=0., attn_drop_rate=0., drop_path_rate=0., embed_layer=PatchEmbed, norm_layer=None,act_layer=None, weight_init=''):"""Args:img_size (int, tuple): input image sizepatch_size (int, tuple): patch sizein_chans (int): number of input channelsnum_classes (int): number of classes for classification headembed_dim (int): embedding dimensiondepth (int): depth of transformernum_heads (int): number of attention headsmlp_ratio (int): ratio of mlp hidden dim to embedding dimqkv_bias (bool): enable bias for qkv if Truerepresentation_size (Optional[int]): enable and set representation layer (pre-logits) to this value if setdistilled (bool): model includes a distillation token and head as in DeiT modelsdrop_rate (float): dropout rateattn_drop_rate (float): attention dropout ratedrop_path_rate (float): stochastic depth rateembed_layer (nn.Module): patch embedding layernorm_layer: (nn.Module): normalization layerweight_init: (str): weight init scheme"""super().__init__()self.num_classes = num_classesself.num_features = self.embed_dim = embed_dim  # num_features for consistency with other modelsself.num_tokens = 2 if distilled else 1norm_layer = norm_layer or partial(nn.LayerNorm, eps=1e-6)act_layer = act_layer or nn.GELUself.patch_embed = embed_layer(img_size=img_size, patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim)num_patches = self.patch_embed.num_patchesself.cls_token = nn.Parameter(torch.zeros(1, 1, embed_dim))self.dist_token = nn.Parameter(torch.zeros(1, 1, embed_dim)) if distilled else Noneself.pos_embed = nn.Parameter(torch.zeros(1, num_patches + self.num_tokens, embed_dim))self.pos_drop = nn.Dropout(p=drop_rate)dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth)]  # stochastic depth decay ruleself.blocks = nn.Sequential(*[Block(dim=embed_dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, drop=drop_rate,attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer, act_layer=act_layer)for i in range(depth)])self.norm = norm_layer(embed_dim)# Representation layerif representation_size and not distilled:self.num_features = representation_sizeself.pre_logits = nn.Sequential(OrderedDict([('fc', nn.Linear(embed_dim, representation_size)),('act', nn.Tanh())]))else:self.pre_logits = nn.Identity()# Classifier head(s)self.head = nn.Linear(self.num_features, num_classes) if num_classes > 0 else nn.Identity()self.head_dist = Noneif distilled:self.head_dist = nn.Linear(self.embed_dim, self.num_classes) if num_classes > 0 else nn.Identity()self.init_weights(weight_init)def init_weights(self, mode=''):assert mode in ('jax', 'jax_nlhb', 'nlhb', '')head_bias = -math.log(self.num_classes) if 'nlhb' in mode else 0.trunc_normal_(self.pos_embed, std=.02)if self.dist_token is not None:trunc_normal_(self.dist_token, std=.02)trunc_normal_(self.cls_token, std=.02)self.apply(_init_vit_weights)def _init_weights(self, m):# this fn left here for compat with downstream users_init_vit_weights(m)@torch.jit.ignore()def load_pretrained(self, checkpoint_path, prefix=''):_load_weights(self, checkpoint_path, prefix)@torch.jit.ignoredef no_weight_decay(self):return {'pos_embed', 'cls_token', 'dist_token'}def get_classifier(self):if self.dist_token is None:return self.headelse:return self.head, self.head_distdef reset_classifier(self, num_classes, global_pool=''):self.num_classes = num_classesself.head = nn.Linear(self.embed_dim, num_classes) if num_classes > 0 else nn.Identity()if self.num_tokens == 2:self.head_dist = nn.Linear(self.embed_dim, self.num_classes) if num_classes > 0 else nn.Identity()def forward_features(self, x):x = self.patch_embed(x)cls_token = self.cls_token.expand(x.shape[0], -1, -1)  # stole cls_tokens impl from Phil Wang, thanksif self.dist_token is None:x = torch.cat((cls_token, x), dim=1)else:x = torch.cat((cls_token, self.dist_token.expand(x.shape[0], -1, -1), x), dim=1)x = self.pos_drop(x + self.pos_embed)x = self.blocks(x)x = self.norm(x)if self.dist_token is None:return self.pre_logits(x[:, 0])else:return x[:, 0], x[:, 1]def forward(self, x):x = self.forward_features(x)if self.head_dist is not None:x, x_dist = self.head(x[0]), self.head_dist(x[1])  # x must be a tupleif self.training and not torch.jit.is_scripting():# during inference, return the average of both classifier predictionsreturn x, x_distelse:return (x + x_dist) / 2else:x = self.head(x)return xdef _init_vit_weights(module: nn.Module, name: str = '', head_bias: float = 0., jax_impl: bool = False):""" ViT weight initialization* When called without n, head_bias, jax_impl args it will behave exactly the sameas my original init for compatibility with prev hparam / downstream use cases (ie DeiT).* When called w/ valid n (module name) and jax_impl=True, will (hopefully) match JAX impl"""if isinstance(module, nn.Linear):if name.startswith('head'):nn.init.zeros_(module.weight)nn.init.constant_(module.bias, head_bias)elif name.startswith('pre_logits'):lecun_normal_(module.weight)nn.init.zeros_(module.bias)else:if jax_impl:nn.init.xavier_uniform_(module.weight)if module.bias is not None:if 'mlp' in name:nn.init.normal_(module.bias, std=1e-6)else:nn.init.zeros_(module.bias)else:trunc_normal_(module.weight, std=.02)if module.bias is not None:nn.init.zeros_(module.bias)elif jax_impl and isinstance(module, nn.Conv2d):# NOTE conv was left to pytorch default in my original initlecun_normal_(module.weight)if module.bias is not None:nn.init.zeros_(module.bias)elif isinstance(module, (nn.LayerNorm, nn.GroupNorm, nn.BatchNorm2d)):nn.init.zeros_(module.bias)nn.init.ones_(module.weight)@torch.no_grad()
def _load_weights(model: VisionTransformer, checkpoint_path: str, prefix: str = ''):""" Load weights from .npz checkpoints for official Google Brain Flax implementation"""import numpy as npdef _n2p(w, t=True):if w.ndim == 4 and w.shape[0] == w.shape[1] == w.shape[2] == 1:w = w.flatten()if t:if w.ndim == 4:w = w.transpose([3, 2, 0, 1])elif w.ndim == 3:w = w.transpose([2, 0, 1])elif w.ndim == 2:w = w.transpose([1, 0])return torch.from_numpy(w)w = np.load(checkpoint_path)if not prefix and 'opt/target/embedding/kernel' in w:prefix = 'opt/target/'if hasattr(model.patch_embed, 'backbone'):# hybridbackbone = model.patch_embed.backbonestem_only = not hasattr(backbone, 'stem')stem = backbone if stem_only else backbone.stemstem.conv.weight.copy_(adapt_input_conv(stem.conv.weight.shape[1], _n2p(w[f'{prefix}conv_root/kernel'])))stem.norm.weight.copy_(_n2p(w[f'{prefix}gn_root/scale']))stem.norm.bias.copy_(_n2p(w[f'{prefix}gn_root/bias']))if not stem_only:for i, stage in enumerate(backbone.stages):for j, block in enumerate(stage.blocks):bp = f'{prefix}block{i + 1}/unit{j + 1}/'for r in range(3):getattr(block, f'conv{r + 1}').weight.copy_(_n2p(w[f'{bp}conv{r + 1}/kernel']))getattr(block, f'norm{r + 1}').weight.copy_(_n2p(w[f'{bp}gn{r + 1}/scale']))getattr(block, f'norm{r + 1}').bias.copy_(_n2p(w[f'{bp}gn{r + 1}/bias']))if block.downsample is not None:block.downsample.conv.weight.copy_(_n2p(w[f'{bp}conv_proj/kernel']))block.downsample.norm.weight.copy_(_n2p(w[f'{bp}gn_proj/scale']))block.downsample.norm.bias.copy_(_n2p(w[f'{bp}gn_proj/bias']))embed_conv_w = _n2p(w[f'{prefix}embedding/kernel'])else:embed_conv_w = adapt_input_conv(model.patch_embed.proj.weight.shape[1], _n2p(w[f'{prefix}embedding/kernel']))model.patch_embed.proj.weight.copy_(embed_conv_w)model.patch_embed.proj.bias.copy_(_n2p(w[f'{prefix}embedding/bias']))model.cls_token.copy_(_n2p(w[f'{prefix}cls'], t=False))pos_embed_w = _n2p(w[f'{prefix}Transformer/posembed_input/pos_embedding'], t=False)if pos_embed_w.shape != model.pos_embed.shape:pos_embed_w = resize_pos_embed(  # resize pos embedding when different size from pretrained weightspos_embed_w, model.pos_embed, getattr(model, 'num_tokens', 1), model.patch_embed.grid_size)model.pos_embed.copy_(pos_embed_w)model.norm.weight.copy_(_n2p(w[f'{prefix}Transformer/encoder_norm/scale']))model.norm.bias.copy_(_n2p(w[f'{prefix}Transformer/encoder_norm/bias']))if isinstance(model.head, nn.Linear) and model.head.bias.shape[0] == w[f'{prefix}head/bias'].shape[-1]:model.head.weight.copy_(_n2p(w[f'{prefix}head/kernel']))model.head.bias.copy_(_n2p(w[f'{prefix}head/bias']))if isinstance(getattr(model.pre_logits, 'fc', None), nn.Linear) and f'{prefix}pre_logits/bias' in w:model.pre_logits.fc.weight.copy_(_n2p(w[f'{prefix}pre_logits/kernel']))model.pre_logits.fc.bias.copy_(_n2p(w[f'{prefix}pre_logits/bias']))for i, block in enumerate(model.blocks.children()):block_prefix = f'{prefix}Transformer/encoderblock_{i}/'mha_prefix = block_prefix + 'MultiHeadDotProductAttention_1/'block.norm1.weight.copy_(_n2p(w[f'{block_prefix}LayerNorm_0/scale']))block.norm1.bias.copy_(_n2p(w[f'{block_prefix}LayerNorm_0/bias']))block.attn.qkv.weight.copy_(torch.cat([_n2p(w[f'{mha_prefix}{n}/kernel'], t=False).flatten(1).T for n in ('query', 'key', 'value')]))block.attn.qkv.bias.copy_(torch.cat([_n2p(w[f'{mha_prefix}{n}/bias'], t=False).reshape(-1) for n in ('query', 'key', 'value')]))block.attn.proj.weight.copy_(_n2p(w[f'{mha_prefix}out/kernel']).flatten(1))block.attn.proj.bias.copy_(_n2p(w[f'{mha_prefix}out/bias']))for r in range(2):getattr(block.mlp, f'fc{r + 1}').weight.copy_(_n2p(w[f'{block_prefix}MlpBlock_3/Dense_{r}/kernel']))getattr(block.mlp, f'fc{r + 1}').bias.copy_(_n2p(w[f'{block_prefix}MlpBlock_3/Dense_{r}/bias']))block.norm2.weight.copy_(_n2p(w[f'{block_prefix}LayerNorm_2/scale']))block.norm2.bias.copy_(_n2p(w[f'{block_prefix}LayerNorm_2/bias']))def resize_pos_embed(posemb, posemb_new, num_tokens=1, gs_new=()):# Rescale the grid of position embeddings when loading from state_dict. Adapted from# https://github.com/google-research/vision_transformer/blob/00883dd691c63a6830751563748663526e811cee/vit_jax/checkpoint.py#L224_logger.info('Resized position embedding: %s to %s', posemb.shape, posemb_new.shape)ntok_new = posemb_new.shape[1]if num_tokens:posemb_tok, posemb_grid = posemb[:, :num_tokens], posemb[0, num_tokens:]ntok_new -= num_tokenselse:posemb_tok, posemb_grid = posemb[:, :0], posemb[0]gs_old = int(math.sqrt(len(posemb_grid)))if not len(gs_new):  # backwards compatibilitygs_new = [int(math.sqrt(ntok_new))] * 2assert len(gs_new) >= 2_logger.info('Position embedding grid-size from %s to %s', [gs_old, gs_old], gs_new)posemb_grid = posemb_grid.reshape(1, gs_old, gs_old, -1).permute(0, 3, 1, 2)posemb_grid = F.interpolate(posemb_grid, size=gs_new, mode='bicubic', align_corners=False)posemb_grid = posemb_grid.permute(0, 2, 3, 1).reshape(1, gs_new[0] * gs_new[1], -1)posemb = torch.cat([posemb_tok, posemb_grid], dim=1)return posembdef checkpoint_filter_fn(state_dict, model):""" convert patch embedding weight from manual patchify + linear proj to conv"""out_dict = {}if 'model' in state_dict:# For deit modelsstate_dict = state_dict['model']for k, v in state_dict.items():if 'patch_embed.proj.weight' in k and len(v.shape) < 4:# For old models that I trained prior to conv based patchificationO, I, H, W = model.patch_embed.proj.weight.shapev = v.reshape(O, -1, H, W)elif k == 'pos_embed' and v.shape != model.pos_embed.shape:# To resize pos embedding when using model at different size from pretrained weightsv = resize_pos_embed(v, model.pos_embed, getattr(model, 'num_tokens', 1), model.patch_embed.grid_size)out_dict[k] = vreturn out_dictdef _create_vision_transformer(variant,img_size=224, pretrained=False, default_cfg=None, **kwargs):default_cfg = default_cfg or default_cfgs[variant]if kwargs.get('features_only', None):raise RuntimeError('features_only not implemented for Vision Transformer models.')# NOTE this extra code to support handling of repr size for in21k pretrained modelsdefault_num_classes = default_cfg['num_classes']num_classes = kwargs.get('num_classes', default_num_classes)repr_size = kwargs.pop('representation_size', None)if repr_size is not None and num_classes != default_num_classes:# Remove representation layer if fine-tuning. This may not always be the desired action,# but I feel better than doing nothing by default for fine-tuning. Perhaps a better interface?_logger.warning("Removing representation layer for fine-tuning.")repr_size = Noneprint(default_cfg)model = VisionTransformer(img_size=img_size,patch_size=kwargs['patch_size'],embed_dim=kwargs['embed_dim'],depth=kwargs['depth'],num_heads=kwargs['num_heads'],num_classes=num_classes)if pretrained:url= default_cfg.get('url', None)checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location="cpu", check_hash=True)model.load_state_dict(checkpoint["model"])return modeldef vit_tiny_patch16_224(pretrained=False, **kwargs):""" ViT-Tiny (Vit-Ti/16)"""model_kwargs = dict(patch_size=16, embed_dim=192, depth=12, num_heads=3, **kwargs)model = _create_vision_transformer('vit_tiny_patch16_224', pretrained=pretrained, **model_kwargs)return modeldef vit_tiny_patch16_384(pretrained=False, **kwargs):""" ViT-Tiny (Vit-Ti/16) @ 384x384."""model_kwargs = dict(patch_size=16, embed_dim=192, depth=12, num_heads=3, **kwargs)model = _create_vision_transformer('vit_tiny_patch16_384', pretrained=pretrained, **model_kwargs)return modeldef vit_small_patch32_224(pretrained=False, **kwargs):""" ViT-Small (ViT-S/32)"""model_kwargs = dict(patch_size=32, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('vit_small_patch32_224', pretrained=pretrained, **model_kwargs)return modeldef vit_small_patch32_384(pretrained=False, **kwargs):""" ViT-Small (ViT-S/32) at 384x384."""model_kwargs = dict(patch_size=32, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('vit_small_patch32_384', pretrained=pretrained, **model_kwargs)return modeldef vit_small_patch16_224(pretrained=False, **kwargs):""" ViT-Small (ViT-S/16)NOTE I've replaced my previous 'small' model definition and weights with the small variant from the DeiT paper"""model_kwargs = dict(patch_size=16, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('vit_small_patch16_224', pretrained=pretrained, **model_kwargs)return modeldef vit_small_patch16_384(pretrained=False, **kwargs):""" ViT-Small (ViT-S/16)NOTE I've replaced my previous 'small' model definition and weights with the small variant from the DeiT paper"""model_kwargs = dict(patch_size=16, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('vit_small_patch16_384', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch32_224(pretrained=False, **kwargs):""" ViT-Base (ViT-B/32) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-1k weights fine-tuned from in21k, source https://github.com/google-research/vision_transformer."""model_kwargs = dict(patch_size=32, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch32_224', pretrained=pretrained, **model_kwargs)return modeldef vit_base2_patch32_256(pretrained=False, **kwargs):""" ViT-Base (ViT-B/32)# FIXME experiment"""model_kwargs = dict(patch_size=32, embed_dim=896, depth=12, num_heads=14, **kwargs)model = _create_vision_transformer('vit_base2_patch32_256', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch32_384(pretrained=False, **kwargs):""" ViT-Base model (ViT-B/32) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-1k weights fine-tuned from in21k @ 384x384, source https://github.com/google-research/vision_transformer."""model_kwargs = dict(patch_size=32, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch32_384', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch16_224(pretrained=False, **kwargs):""" ViT-Base (ViT-B/16) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-1k weights fine-tuned from in21k @ 224x224, source https://github.com/google-research/vision_transformer."""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch16_224', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch16_384(pretrained=False, **kwargs):""" ViT-Base model (ViT-B/16) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-1k weights fine-tuned from in21k @ 384x384, source https://github.com/google-research/vision_transformer."""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch16_384', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch8_224(pretrained=False, **kwargs):""" ViT-Base (ViT-B/8) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-1k weights fine-tuned from in21k @ 224x224, source https://github.com/google-research/vision_transformer."""model_kwargs = dict(patch_size=8, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch8_224', pretrained=pretrained, **model_kwargs)return modeldef vit_large_patch32_224(pretrained=False, **kwargs):""" ViT-Large model (ViT-L/32) from original paper (https://arxiv.org/abs/2010.11929). No pretrained weights."""model_kwargs = dict(patch_size=32, embed_dim=1024, depth=24, num_heads=16, **kwargs)model = _create_vision_transformer('vit_large_patch32_224', pretrained=pretrained, **model_kwargs)return modeldef vit_large_patch32_384(pretrained=False, **kwargs):""" ViT-Large model (ViT-L/32) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-1k weights fine-tuned from in21k @ 384x384, source https://github.com/google-research/vision_transformer."""model_kwargs = dict(patch_size=32, embed_dim=1024, depth=24, num_heads=16, **kwargs)model = _create_vision_transformer('vit_large_patch32_384', pretrained=pretrained, **model_kwargs)return modeldef vit_large_patch16_224(pretrained=False, **kwargs):""" ViT-Large model (ViT-L/32) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-1k weights fine-tuned from in21k @ 224x224, source https://github.com/google-research/vision_transformer."""model_kwargs = dict(patch_size=16, embed_dim=1024, depth=24, num_heads=16, **kwargs)model = _create_vision_transformer('vit_large_patch16_224', pretrained=pretrained, **model_kwargs)return modeldef vit_large_patch16_384(pretrained=False, **kwargs):""" ViT-Large model (ViT-L/16) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-1k weights fine-tuned from in21k @ 384x384, source https://github.com/google-research/vision_transformer."""model_kwargs = dict(patch_size=16, embed_dim=1024, depth=24, num_heads=16, **kwargs)model = _create_vision_transformer('vit_large_patch16_384', pretrained=pretrained, **model_kwargs)return modeldef vit_huge_patch14_224(pretrained=False, **kwargs):""" ViT-Huge model (ViT-H/14) from original paper (https://arxiv.org/abs/2010.11929)."""model_kwargs = dict(patch_size=14, embed_dim=1280, depth=32, num_heads=16, **kwargs)model = _create_vision_transformer('vit_huge_patch14_224', pretrained=pretrained, **model_kwargs)return modeldef vit_giant_patch14_224(pretrained=False, **kwargs):""" ViT-Giant model (ViT-g/14) from `Scaling Vision Transformers` - https://arxiv.org/abs/2106.04560"""model_kwargs = dict(patch_size=14, embed_dim=1408, mlp_ratio=48/11, depth=40, num_heads=16, **kwargs)model = _create_vision_transformer('vit_giant_patch14_224', pretrained=pretrained, **model_kwargs)return modeldef vit_gigantic_patch14_224(pretrained=False, **kwargs):""" ViT-Gigantic model (ViT-G/14) from `Scaling Vision Transformers` - https://arxiv.org/abs/2106.04560"""model_kwargs = dict(patch_size=14, embed_dim=1664, mlp_ratio=64/13, depth=48, num_heads=16, **kwargs)model = _create_vision_transformer('vit_gigantic_patch14_224', pretrained=pretrained, **model_kwargs)return modeldef vit_tiny_patch16_224_in21k(pretrained=False, **kwargs):""" ViT-Tiny (Vit-Ti/16).ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has valid 21k classifier head and no representation (pre-logits) layer"""model_kwargs = dict(patch_size=16, embed_dim=192, depth=12, num_heads=3, **kwargs)model = _create_vision_transformer('vit_tiny_patch16_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_small_patch32_224_in21k(pretrained=False, **kwargs):""" ViT-Small (ViT-S/16)ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has valid 21k classifier head and no representation (pre-logits) layer"""model_kwargs = dict(patch_size=32, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('vit_small_patch32_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_small_patch16_224_in21k(pretrained=False, **kwargs):""" ViT-Small (ViT-S/16)ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has valid 21k classifier head and no representation (pre-logits) layer"""model_kwargs = dict(patch_size=16, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('vit_small_patch16_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch32_224_in21k(pretrained=False, **kwargs):""" ViT-Base model (ViT-B/32) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has valid 21k classifier head and no representation (pre-logits) layer"""model_kwargs = dict(patch_size=32, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch32_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch16_224_in21k(pretrained=False, **kwargs):""" ViT-Base model (ViT-B/16) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has valid 21k classifier head and no representation (pre-logits) layer"""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch16_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch8_224_in21k(pretrained=False, **kwargs):""" ViT-Base model (ViT-B/8) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has valid 21k classifier head and no representation (pre-logits) layer"""model_kwargs = dict(patch_size=8, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch8_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_large_patch32_224_in21k(pretrained=False, **kwargs):""" ViT-Large model (ViT-L/32) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has a representation layer but the 21k classifier head is zero'd out in original weights"""model_kwargs = dict(patch_size=32, embed_dim=1024, depth=24, num_heads=16, representation_size=1024, **kwargs)model = _create_vision_transformer('vit_large_patch32_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_large_patch16_224_in21k(pretrained=False, **kwargs):""" ViT-Large model (ViT-L/16) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has valid 21k classifier head and no representation (pre-logits) layer"""model_kwargs = dict(patch_size=16, embed_dim=1024, depth=24, num_heads=16, **kwargs)model = _create_vision_transformer('vit_large_patch16_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_huge_patch14_224_in21k(pretrained=False, **kwargs):""" ViT-Huge model (ViT-H/14) from original paper (https://arxiv.org/abs/2010.11929).ImageNet-21k weights @ 224x224, source https://github.com/google-research/vision_transformer.NOTE: this model has a representation layer but the 21k classifier head is zero'd out in original weights"""model_kwargs = dict(patch_size=14, embed_dim=1280, depth=32, num_heads=16, representation_size=1280, **kwargs)model = _create_vision_transformer('vit_huge_patch14_224_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch16_224_sam(pretrained=False, **kwargs):""" ViT-Base (ViT-B/16) w/ SAM pretrained weights. Paper: https://arxiv.org/abs/2106.01548"""# NOTE original SAM weights release worked with representation_size=768model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch16_224_sam', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch32_224_sam(pretrained=False, **kwargs):""" ViT-Base (ViT-B/32) w/ SAM pretrained weights. Paper: https://arxiv.org/abs/2106.01548"""# NOTE original SAM weights release worked with representation_size=768model_kwargs = dict(patch_size=32, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch32_224_sam', pretrained=pretrained, **model_kwargs)return modeldef vit_small_patch16_224_dino(pretrained=False, **kwargs):""" ViT-Small (ViT-S/16) w/ DINO pretrained weights (no head) - https://arxiv.org/abs/2104.14294"""model_kwargs = dict(patch_size=16, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('vit_small_patch16_224_dino', pretrained=pretrained, **model_kwargs)return modeldef vit_small_patch8_224_dino(pretrained=False, **kwargs):""" ViT-Small (ViT-S/8) w/ DINO pretrained weights (no head) - https://arxiv.org/abs/2104.14294"""model_kwargs = dict(patch_size=8, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('vit_small_patch8_224_dino', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch16_224_dino(pretrained=False, **kwargs):""" ViT-Base (ViT-B/16) /w DINO pretrained weights (no head) - https://arxiv.org/abs/2104.14294"""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch16_224_dino', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch8_224_dino(pretrained=False, **kwargs):""" ViT-Base (ViT-B/8) w/ DINO pretrained weights (no head) - https://arxiv.org/abs/2104.14294"""model_kwargs = dict(patch_size=8, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('vit_base_patch8_224_dino', pretrained=pretrained, **model_kwargs)return modeldef deit_tiny_patch16_224(pretrained=False, **kwargs):""" DeiT-tiny model @ 224x224 from paper (https://arxiv.org/abs/2012.12877).ImageNet-1k weights from https://github.com/facebookresearch/deit."""model_kwargs = dict(patch_size=16, embed_dim=192, depth=12, num_heads=3, **kwargs)model = _create_vision_transformer('deit_tiny_patch16_224', pretrained=pretrained, **model_kwargs)return modeldef deit_small_patch16_224(pretrained=False, **kwargs):""" DeiT-small model @ 224x224 from paper (https://arxiv.org/abs/2012.12877).ImageNet-1k weights from https://github.com/facebookresearch/deit."""model_kwargs = dict(patch_size=16, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('deit_small_patch16_224', pretrained=pretrained, **model_kwargs)return modeldef deit_base_patch16_224(pretrained=False, **kwargs):""" DeiT base model @ 224x224 from paper (https://arxiv.org/abs/2012.12877).ImageNet-1k weights from https://github.com/facebookresearch/deit."""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('deit_base_patch16_224', pretrained=pretrained, **model_kwargs)return modeldef deit_base_patch16_384(pretrained=False, **kwargs):""" DeiT base model @ 384x384 from paper (https://arxiv.org/abs/2012.12877).ImageNet-1k weights from https://github.com/facebookresearch/deit."""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('deit_base_patch16_384', pretrained=pretrained, **model_kwargs)return modeldef deit_tiny_distilled_patch16_224(pretrained=False, **kwargs):""" DeiT-tiny distilled model @ 224x224 from paper (https://arxiv.org/abs/2012.12877).ImageNet-1k weights from https://github.com/facebookresearch/deit."""model_kwargs = dict(patch_size=16, embed_dim=192, depth=12, num_heads=3, **kwargs)model = _create_vision_transformer('deit_tiny_distilled_patch16_224', pretrained=pretrained,  distilled=True, **model_kwargs)return modeldef deit_small_distilled_patch16_224(pretrained=False, **kwargs):""" DeiT-small distilled model @ 224x224 from paper (https://arxiv.org/abs/2012.12877).ImageNet-1k weights from https://github.com/facebookresearch/deit."""model_kwargs = dict(patch_size=16, embed_dim=384, depth=12, num_heads=6, **kwargs)model = _create_vision_transformer('deit_small_distilled_patch16_224', pretrained=pretrained,  distilled=True, **model_kwargs)return modeldef deit_base_distilled_patch16_224(pretrained=False, **kwargs):""" DeiT-base distilled model @ 224x224 from paper (https://arxiv.org/abs/2012.12877).ImageNet-1k weights from https://github.com/facebookresearch/deit."""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('deit_base_distilled_patch16_224', pretrained=pretrained,  distilled=True, **model_kwargs)return modeldef deit_base_distilled_patch16_384(pretrained=False, **kwargs):""" DeiT-base distilled model @ 384x384 from paper (https://arxiv.org/abs/2012.12877).ImageNet-1k weights from https://github.com/facebookresearch/deit."""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, **kwargs)model = _create_vision_transformer('deit_base_distilled_patch16_384', pretrained=pretrained, distilled=True, **model_kwargs)return modeldef vit_base_patch16_224_miil_in21k(pretrained=False, **kwargs):""" ViT-Base (ViT-B/16) from original paper (https://arxiv.org/abs/2010.11929).Weights taken from: https://github.com/Alibaba-MIIL/ImageNet21K"""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, qkv_bias=False, **kwargs)model = _create_vision_transformer('vit_base_patch16_224_miil_in21k', pretrained=pretrained, **model_kwargs)return modeldef vit_base_patch16_224_miil(pretrained=False, **kwargs):""" ViT-Base (ViT-B/16) from original paper (https://arxiv.org/abs/2010.11929).Weights taken from: https://github.com/Alibaba-MIIL/ImageNet21K"""model_kwargs = dict(patch_size=16, embed_dim=768, depth=12, num_heads=12, qkv_bias=False, **kwargs)model = _create_vision_transformer('vit_base_patch16_224_miil', pretrained=pretrained, **model_kwargs)return model

相关文章:

第四十七篇 Vision Transformer(VIT)模型解析

ViT&#xff08;Vision Transformer&#xff09;模型是一种基于Transformer架构的视觉模型&#xff0c;它成功地将Transformer从自然语言处理&#xff08;NLP&#xff09;领域引入到计算机视觉&#xff08;CV&#xff09;领域&#xff0c;专门用于处理图像数据。以下是对ViT模型…...

Redis篇-4--原理篇3--Redis发布/订阅(Pub/Sub)

1、概述 Redis 发布/订阅&#xff08;Publish/Subscribe&#xff0c;简称 Pub/Sub&#xff09;是一种消息传递模式&#xff0c;允许客户端订阅一个或多个通道&#xff08;channel&#xff09;&#xff0c;并接收其他客户端发布到这些通道的消息。 2、Redis 发布/订阅的主要概…...

Spring Boot 3 中Bean的配置和实例化详解

一、引言 在Java企业级开发领域&#xff0c;Spring Boot凭借其简洁、快速、高效的特点&#xff0c;迅速成为了众多开发者的首选框架。Spring Boot通过自动配置、起步依赖等特性&#xff0c;极大地简化了Spring应用的搭建和开发过程。而在Spring Boot的众多核心特性中&#xff…...

一文理解 “Bootstrap“ 在统计学背景下的含义

&#x1f349; CSDN 叶庭云&#xff1a;https://yetingyun.blog.csdn.net/ 一文理解 “Bootstrap“ 在统计学背景下的含义 类比&#xff1a;重新抽样 假设我参加了班级的考试&#xff0c;每位同学都获得了一个成绩。现在&#xff0c;我想了解整个班级的平均成绩&#xff0c;但…...

多媒体文件解复用(Demuxing)过程

多媒体文件的解复用&#xff08;Demuxing&#xff09;过程指的是从一个多媒体容器文件&#xff08;如 MP4、MKV、AVI 等&#xff09;中提取不同类型的多媒体数据流&#xff08;例如视频流、音频流、字幕流等&#xff09;的过程。 容器文件本身并不包含实际的视频或音频数据&…...

ARINC 标准全解析:航空电子领域多系列标准的核心内容、应用与重要意义

ARINC标准概述 ARINC标准是航空电子领域一系列重要的标准规范&#xff0c;由航空电子工程委员会&#xff08;AEEC&#xff09;编制&#xff0c;众多航空公司等参与支持。这些标准涵盖了从飞机设备安装、数据传输到航空电子设备功能等众多方面&#xff0c;确保航空电子系统的兼…...

开源架构安全深度解析:挑战、措施与未来

开源架构安全深度解析&#xff1a;挑战、措施与未来 一、引言二、开源架构面临的安全挑战&#xff08;一&#xff09;代码漏洞 —— 隐藏的定时炸弹&#xff08;二&#xff09;依赖项安全 —— 牵一发而动全身&#xff08;三&#xff09;社区安全 —— 开放中的潜在危机 三、开…...

Python装饰器设计模式:为函数增添风味

Python装饰器设计模式&#xff1a;为函数增添风味 什么是装饰器&#xff1f;为什么需要装饰器&#xff1f;如何使用装饰器&#xff1f;示例1&#xff1a;简单的装饰器示例2&#xff1a;带参数的装饰器 装饰器的使用场景总结 大家好&#xff0c;今天我们要学习一个非常有趣的Pyt…...

Vue.js的生命周期

Vue.js 是一个构建用户界面的渐进式框架&#xff0c;它提供了一个响应式和组件化的方式来构建前端应用。了解 Vue 的生命周期对于开发者来说至关重要&#xff0c;因为它可以帮助我们更好地控制组件的状态和行为。本文将详细介绍 Vue 的生命周期&#xff0c;并提供相应的代码示例…...

【数据库】关系代数和SQL语句

一 对于教学数据库的三个基本表 学生S(S#,SNAME,AGE,SEX) 学习SC(S#,C#,GRADE) 课程(C#,CNAME,TEACHER) &#xff08;1&#xff09;试用关系代数表达式和SQL语句表示&#xff1a;检索WANG同学不学的课程号 select C# from C where C# not in(select C# from SCwhere S# in…...

Pytest测试用例使用小结

基础使用 Pytest 测试用例实现代码 import pytest from server.service import Servicepytest.fixture def service():return Service(logger)class TestService:classmethoddef setup_class(cls):"""初始化设置一次:return:"""logger.info(&q…...

KV Shifting Attention Enhances Language Modeling

基本信息 &#x1f4dd; 原文链接: https://arxiv.org/abs/2411.19574&#x1f465; 作者: Mingyu Xu, Wei Cheng, Bingning Wang, Weipeng Chen&#x1f3f7;️ 关键词: KV shifting attention, induction heads, language modeling&#x1f4da; 分类: 机器学习, 自然语言处…...

从 Zuul 迁移到 Spring Cloud Gateway:一步步实现服务网关的升级

从 Zuul 迁移到 Spring Cloud Gateway&#xff1a;一步步实现服务网关的升级 迁移前的准备工作迁移步骤详解第一步&#xff1a;查看源码第二步&#xff1a;启动类迁移第三步&#xff1a;引入 Gateway 依赖第四步 编写bootstrap.yaml第五步&#xff1a;替换路由配置第六步&#…...

推荐几款国外AI音频工具

【加拿大】Resemble AI - 提供AI驱动的语音合成 【加拿大】Resemble AI - 提供AI驱动的语音合成和克隆工具 Resemble,AI提供AI驱动的语音合成和克隆工具,帮助用户高效生成和处理语音内容,其语音合成功能可以自动生成自然流畅的语音,提升音频项目的表现力,Resemble,AI的语音克…...

导入excel动态生成海报

需求&#xff1a;给出一份excel表格&#xff08;1000条数据&#xff09;,要将表格中的字段数据渲染到一张背景图片上&#xff0c;然后再下载图片&#xff0c;貌似浏览器做了限制&#xff0c;当连续下载10张图片后就不在下载了&#xff0c;然后用异步操作解决了这个问题。 // e…...

Unity 使用LineRenderer制作模拟2d绳子

效果展示&#xff1a; 实现如下&#xff1a; 首先&#xff0c;直接上代码&#xff1a; using System.Collections; using System.Collections.Generic; using UnityEngine;public class LineFourRender : MonoBehaviour {public Transform StartNode;public Transform MidNod…...

Android启动优化指南

文章目录 前言一、启动分类与优化目标1、冷启动1.1 优化思路1.2 延迟初始化与按需加载1.3 并行加载与异步执行1.4 资源优化与懒加载1.5 内存优化与垃圾回收控制 2. 温启动2.1 优化应用的生命周期管理2.2 数据缓存与懒加载2.3 延迟渲染与视图优化 3. 热启动3.1 保持应用的状态3.…...

每日一练 | 华为 eSight 创建的缺省角色

01 真题题目 下列选项中&#xff0c;不属于华为 eSight 创建的缺省角色的是&#xff1a; A. Administrator B. Monitor C. Operator D. End-User 02 真题答案 D 03 答案解析 华为 eSight 是一款综合性的网络管理平台&#xff0c;提供了多种管理和监控功能。 为了确保不同用…...

ubuntu 手动更换库文件解决nvcc -V和nvidia-smi不一致

NVML 库版本与驱动不匹配 问题现象问题排查限制解决禁止自动更新降低库版本 问题现象 笔主在训练之前想查看gpu占用情况&#xff0c;使用watch -n 1 nvidia-smi发现&#xff1a; 且在推理、训练时无法使用到显卡。 问题排查 cat /proc/driver/nvidia/version查看当前显卡驱…...

DataSophon集成CMAK KafkaManager

本次集成基于DDP1.2.1 集成CMAK-3.0.0.6 设计的json和tar包我放网盘了. 通过网盘分享的文件&#xff1a;DDP集成CMAK 链接: https://pan.baidu.com/s/1BR70Ajj9FxvjBlsOX4Ivhw?pwdcpmc 提取码: cpmc CMAK github上提供了zip压缩包.将压缩包解压之后 在根目录下加入启动脚本…...

2024-2025关于华为ICT大赛考试平台常见问题

一、考生考试流程 第一步&#xff1a;收到正式考试链接后点击考试链接并登录&#xff1b; 第二步&#xff1a;请仔细阅读诚信考试公约&#xff0c;阅读完成后勾选“我已阅读”&#xff0c;并点击确定&#xff1b; 第三步&#xff1a;上传身份证人像面进行考前校验&#xff0…...

Halcon中lines_gauss(Operator)算子原理及应用详解

在Halcon图像处理库中&#xff0c;lines_gauss算子是一个用于检测图像中线条的强大工具&#xff0c;它能够提供亚像素精度的线条轮廓。以下是对lines_gauss (ImageReducedTracks, Lines, 1.5, 1, 8, ‘light’, ‘true’, ‘bar-shaped’, ‘true’)算子的详细解释&#xff1a;…...

Flink集群搭建整合Yarn运行

Flink 集群 1. 服务器规划 服务器h1、h4、h5 2. StandAlone 模式&#xff08;不推荐&#xff09; 2.1 会话模式 在h1操作 #1、解压 tar -zxvf flink-1.19.1-bin-scala_2.12.tgz -C /app/#2、修改配置文件 cd /app/flink-1.19.1/conf vim conf.yaml ##内容&#xff1a;## j…...

FPGA工作原理、架构及底层资源

FPGA工作原理、架构及底层资源 文章目录 FPGA工作原理、架构及底层资源前言一、FPGA工作原理二、FPGA架构及底层资源 1.FPGA架构2.FPGA底层资源 2.1可编程输入/输出单元简称&#xff08;IOB&#xff09;2.2可配置逻辑块2.3丰富的布线资源2.4数字时钟管理模块(DCM)2.5嵌入式块 …...

Postman的使用

&#xff08;一&#xff09;创建Collections&#xff1a;Collections->New Collection->创建界面填入Collection名称&#xff0c;比如某个系统/模块名&#xff0c;描述里可以稍微更详细的介绍集合的信息 Collection创建时&#xff0c;还可以定义Authorization 如下&#…...

【报错】新建springboot项目时缺少resource

1.问题描述 在新建springboot项目时缺少resources,刚刚新建时的目录刚好就是去掉涂鸦的resources后的目录 2.解决方法 步骤如下&#xff1a;【文件】--【项目结构】--【模块】--【源】--在main文件夹右击选择新建文件夹并命名为resources--在test文件夹右击选择新建文件夹并命名…...

phpstudy访问本地localhost无目录解决办法

phpstudy访问本地localhost无目录解决办法 错误&#xff1a; 直接访问本地http://localhost/&#xff0c;出现hello word&#xff0c;或者直接报错&#xff0c;无法出现本地目录 解决办法&#xff1a; 对于Phpstudy-2018版本来说&#xff1a; 找到这里的Phpstudy设置 2. 打…...

架构16-向微服务迈进

零、文章目录 架构16-向微服务迈进 1、向微服务迈进 &#xff08;1&#xff09;软件开发中的“银弹”概念 **背景&#xff1a;**软件开发过程中常常出现工期延误、预算超支、产品质量低劣等问题&#xff0c;这使得管理者、程序员和用户都渴望找到一种能够显著降低成本的“银…...

基于Springboot汽车资讯网站【附源码】

基于Springboot汽车资讯网站 效果如下&#xff1a; 系统主页面 汽车信息页面 系统登陆页面 汽车信息推荐页面 经销商页面 留言反馈页面 用户管理页面 汽车信息页面 研究背景 随着信息技术的快速发展和互联网的普及&#xff0c;互联网已成为人们查找信息的重要场所。汽车资讯…...

Tomcat项目本地部署

今天分享一下如何在本地&#xff0c;不依赖于idea部署聚合项目&#xff0c;以我做过的哈米音乐项目为例&#xff0c;项目结构如下&#xff1a; ham-core模块为公共模块&#xff0c;我们只需将另外三个模块&#xff1a;前台、后台、文件服务器打包&#xff0c;将打好的jar、war包…...

【OpenCV】直方图

理论 可以将直方图视为图形或曲线图&#xff0c;从而使您对图像的强度分布有一个整体的了解。它是在X轴上具有像素值(不总是从0到255的范围)&#xff0c;在Y轴上具有图像中相应像素数的图。 这只是理解图像的另一种方式。通过查看图像的直方图&#xff0c;您可以直观地了解该…...

pika:适用于大数据量持久化的类redis组件|jedis集成pika(二)

文章目录 0. 引言1. pika客户端支持2. jedis集成pika3. pika性能测试 0. 引言 上节我们讲解了pika的搭建&#xff0c;这节我们来看下如何在java项目中利用jedis集成pika 1. pika客户端支持 pika支持的客户端与redis完全一致&#xff0c;所以理论上redis支持的客户端pika也都…...

Linux 进程间通信

Linux进程间通信 进程间通信&#xff08;IPC&#xff0c;Inter-Process Communication&#xff09;在 Linux 下常用的方法包括&#xff1a; 1&#xff09;管道&#xff08;Pipe&#xff09; 2&#xff09;有名管道&#xff08;FIFO&#xff09; 3&#xff09;消息队列&#x…...

【C++】快速排序详解与优化

博客主页&#xff1a; [小ᶻ☡꙳ᵃⁱᵍᶜ꙳] 本文专栏: C 文章目录 &#x1f4af;前言&#x1f4af;快速排序的核心思想1. 算法原理2. 算法复杂度分析时间复杂度空间复杂度 &#x1f4af;快速排序的代码实现与解析代码实现代码解析1. 递归终止条件2. 动态分配子数组3. 分区…...

【JAVA高级篇教学】第二篇:使用 Redisson 实现高效限流机制

在高并发系统中&#xff0c;限流是一项非常重要的技术手段&#xff0c;用于保护后端服务&#xff0c;防止因流量过大导致系统崩溃。本文将详细介绍如何使用 Redisson 提供的 RRateLimiter 实现分布式限流&#xff0c;以及其原理、使用场景和完整代码示例。 目录 一、什么是限流…...

NanoLog起步笔记-1

nonolog起步笔记-1 背景与上下文写在前面Nanolog与一般的实时log的异同现代log的一般特性Nanolog的选择 背景与上下文 因为工作中用到了NanoLog。有必要研究一下。 前段时间研究了许多内容&#xff0c;以为写了比较详实的笔记&#xff0c;今天找了找&#xff0c;不仅笔记没找到…...

vs打开unity项目 新建文件后无法自动补全

问题 第一次双击c#文件自动打开vs编辑器的时候能自动补全&#xff0c;再一次在unity中新建c#文件后双击打开发现vs不能自动补全了。每次都要重新打开vs编辑器才能自动补全&#xff0c;导致效率很低&#xff0c;后面发现是没有安装扩展&#xff0c;注意扩展和工具的区别。 解决…...

HDFS的Federation机制的实现原理和Erasure Coding节省存储空间的原理

目录 Federation机制的实现原理1.HDFS的分层图解&#xff08;1&#xff09;NameSpace&#xff08;2&#xff09;Block Storage1&#xff09;Block Management2&#xff09;Storage 2.Federation机制的优点3.Federation机制的缺点4.Federation机制的实现&#xff08;1&#xff0…...

经验笔记:使用 PyTorch 计算多分类问题中Dice Loss 的正确方法

经验笔记&#xff1a;使用 PyTorch 计算多分类问题中Dice Loss 的正确方法 概述 Dice Loss 是一种广泛应用于图像分割任务中的损失函数&#xff0c;它基于 Dice 系数&#xff08;也称为 F1-score&#xff09;&#xff0c;用于衡量预测结果与真实标签之间的相似度。在 PyTorch…...

如何在 Ubuntu 22.04 上安装 PostgreSQL

简介 PostgreSQL&#xff08;或简称Postgres&#xff09;是一个关系型数据库管理系统&#xff0c;它提供了SQL查询语言的实现。它符合标准&#xff0c;并且拥有许多高级特性&#xff0c;比如可靠的事务处理和无需读锁的并发控制。 本指南将展示如何在Ubuntu 22.04服务器上快速…...

正则表达式的高级方法

正则表达式的高级方法 正则表达式&#xff08;regex&#xff09;不仅仅是简单的模式匹配工具&#xff0c;它还提供了一系列高级功能&#xff0c;使得处理复杂文本任务变得更加灵活和强大。以下是一些Python中正则表达式的高级用法&#xff1a; 1. 命名捕获组 命名捕获组允许…...

axios的get和post请求,关于携带参数相关的讲解一下

在使用 Axios 发送 HTTP 请求时&#xff0c;GET 和 POST 请求携带参数的方式有所不同。以下是关于这两种请求方法携带参数的详细讲解&#xff1a; GET 请求携带参数 对于 GET 请求&#xff0c;参数通常附加在 URL 之后&#xff0c;以查询字符串的形式传递。 直接在 URL 中拼接…...

中间件--MongoDB部署及初始化js脚本(docker部署,docker-entrypoint-initdb.d,数据迁移,自动化部署)

一、概述 MongoDB是一种常见的Nosql数据库&#xff08;非关系型数据库&#xff09;&#xff0c;以文档&#xff08;Document&#xff09;的形式存储数据。是非关系型数据库中最像关系型数据库的一种。本篇主要介绍下部署和数据迁移。 在 MongoDB 官方镜像部署介绍中&#xff…...

基于SpringBoot框架的民宿连锁店业务系统(计算机毕业设计)+万字说明文档

系统合集跳转 源码获取链接 一、系统环境 运行环境: 最好是java jdk 1.8&#xff0c;我们在这个平台上运行的。其他版本理论上也可以。 IDE环境&#xff1a; Eclipse,Myeclipse,IDEA或者Spring Tool Suite都可以 tomcat环境&#xff1a; Tomcat 7.x,8.x,9.x版本均可 操作系统…...

PHP8 动态属性被弃用兼容方案

PHP 类中可以动态设置和获取没有声明过的类属性。这些属性不遵循具体的规则&#xff0c;并且需要使用 __get() 和 __set() 魔术方法对动态属性如何读写进行有效控制。 class User {private int $uid; }$user new User(); $user->name Foo; 上述代码中&#xff0c;User 类…...

Spring Boot 3.0 + MySQL 8.0 + kkFileView 实现完整文件服务

Spring Boot 3.0 MySQL 8.0 kkFileView 实现完整文件服务 背景&#xff1a;比较常见的需求&#xff0c;做成公共的服务&#xff0c;后期维护比较简单&#xff0c;可扩展多个存储介质&#xff0c;上传逻辑简单&#xff0c;上传后提供一个文件id&#xff0c;后期可直接通过此i…...

【YashanDB知识库】php查询超过256长度字符串,数据被截断的问题

本文内容来自YashanDB官网&#xff0c;原文内容请见&#xff1a;https://www.yashandb.com/newsinfo/7488290.html?templateId1718516 问题现象 如下图&#xff0c;php使用odbc数据源&#xff0c;查询表数据&#xff0c;mysql可以显示出来&#xff0c;yashan显示数据被截断。…...

为什么ETH 3.0需要Lumoz的ZK算力网络?

1.Lumoz 模块化计算层 Lumoz 协议是一个全球分布式模块化计算协议&#xff0c;致力于提供先进的零知识证明&#xff08;ZKP&#xff09;服务&#xff0c;支持ZK技术的发展&#xff0c;为ZK、AI等前沿技术提供强大的算力支撑。面对当前零知识计算领域计算成本的挑战&#xff0c…...

反向代理-缓存篇

文章目录 强缓存一、Expires(http1.0 规范)二、cache-control(http1.1 出现的 header 信息)Cache-Control 的常用选项Cache-Control 常用选项的选择三、弊端协商缓存一、ETag二、If-None-Match三、Last-modified四、If-Modified-Since浏览器的三种刷新方式静态资源部署策略…...

(重点来啦!)MySql基础增删查改操作(详细)

目录 一、客户端和数据库操作&#xff1a; 二、表操作 1.查看当前数据库中有哪些表 2.创建一张新表 3.查看表结构&#xff1a; 4.删除表 三、CRUD增删查改 1.新增——插入 2.查询操作 a.全列查询&#xff1a; b.指定列查询&#xff1a; c.列名为表达式的查询&#…...