当前位置：首页 > news >正文

[自然语言处理]pytorch概述--什么是张量(Tensor)和基本操作

news 来源：原创 2025/9/4 8:44:32

pytorch概述

PyTorch 是⼀个开源的深度学习框架，由 Facebook 的⼈⼯智能研究团队开发和维护，于2017年在GitHub上开源，在学术界和⼯业界都得到了⼴泛应⽤

pytorch能做什么

GPU加速
自动求导
常用网络层

pytorch基础

量的概念

标量：数字1,2,3
向量：一维表格[1,2,3]
矩阵：二维表格[(1,2),(3,4)]

通过向量、矩阵描述的物体最多是H*W维，而生活中很多东西有更高维度，就用张量表示
前面三种量也可以当成张量的一种
在这里插入图片描述

张量（Tensor）的基本概念

张量（Tensor）是pytorch中的基本单位，也是深度学习框架构成的重要组成。
我们可以先把张量看做是⼀个容器，⾥⾯承载了需要运算的数据。
tensor 即“张量”。实际上跟numpy数组、向量、矩阵的格式基本一样。但是是专门针对GPU来设计的，可以运行在GPU上来加快计算效率
在PyTorch中，张量Tensor是最基础的运算单位，与NumPy中的NDArray类似，张量表示的是一个多维矩阵。不同的是，PyTorch中的Tensor可以运行在GPU上，而NumPy的NDArray只能运行在CPU上。由于Tensor能在GPU上运行，因此大大加快了运算速度。
一句话总结：一个可以运行在gpu上的多维数据而已

样本和模型 --> Y=WX+B
X：表示样本
W、B：表示变量
Y：表示标签

张量的类型

Data type	dtype	Legacy Constructors（type）
32-bit floating point	torch.float32 or torch.float	torch.*.FloatTensor
64-bit floating point	torch.float64 or torch.double	torch.*.DoubleTensor
64-bit complex	torch.complex64 or torch.cfloat
128-bit complex	torch.complex128 or torch.cdouble
16-bit floating point	torch.float16 or torch.half	torch.*.HalfTensor
16-bit floating point	torch.bfloat16	torch.*.BFloat16Tensor
8-bit integer (无符号)	torch.uint8	torch.*.ByteTensor
8-bit integer (有符号)	torch.int8	torch.*.CharTensor
16-bit integer (有符号)	torch.int16 or torch.short	torch.*.ShortTensor
32-bit integer (有符号)	torch.int32 or torch.int	torch.*.IntTensor
64-bit integer (有符号)	torch.int64 or torch.long	torch.*.LongTensor
Boolean（布尔型）	torch.bool	torch.*.BoolTensor

张量的创建

函数	功能
Tensor(*size)	基础构造函数
Tensor(data)	类似np.array
ones(*size)	全1 Tensor
zeros(*size)	全0 Tensor
eye(*size)	对角线为1，其他为0
arange(s,e,step)	从s到e，步长为step的等差数列（不包含e这个值）
linspace(s,e,steps)	从s到e，均匀切分成steps份，steps是值的个数
rand/randn(*size)	均匀/标准分布
normal(mean,std)/uniform_(from,to)	正态分布/均匀分布
randperm(m)	随机排列

张量初始化方法

1.直接从数据，张量可以直接从数据中创建。数据类型是⾃动推断的


data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)
x_data

tensor([[1, 2],[3, 4]])

x_data.dtype

torch.int64

2.从numpy数组中创建张量（反之亦然）

import numpy as np
np_array = np.array(data)
x_np = torch.from_numpy(np_array)

3.从另一个张量创建张量[除非明确覆盖，否则新张量保留参数张量的属性]

x_ones = torch.ones_like(x_data) # 保留x_data的属性
print(f"Ones Tensor: \n {x_ones} \n")
#由于x_data的数据类型是int64，rand_like函数会生成一个随机张量，数据类型与x_data相同
#而torch.rand()方法是创建一个服从均匀分布的随机张量，值在 [0, 1)，数据类型是float32，所以需要强制转换
x_rand = torch.rand_like(x_data, dtype=torch.float) # 重写x_data的数据类型
print(f"Random Tensor: \n {x_rand} \n")

Ones Tensor: tensor([[1, 1],[1, 1]]) Random Tensor: tensor([[0.3156, 0.5076],[0.8555, 0.4440]])

4.使用随机值或常量值

shape 是张量维度的元组。在下⾯的函数中，它决定了输出张量的维度

shape = (2,3,) # 一个标量
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)
print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

Random Tensor: tensor([[0.5733, 0.8237, 0.1398],[0.9530, 0.9231, 0.2764]]) Ones Tensor: tensor([[1., 1., 1.],[1., 1., 1.]]) Zeros Tensor: tensor([[0., 0., 0.],[0., 0., 0.]])

5.其他一些创建方法

基于现有tensor构建，但使⽤新值填充

m = torch.ones(5,3, dtype=torch.double)
n = torch.rand_like(m, dtype=torch.float)
# 获取tensor的⼤⼩
print(m.size()) # torch.Size([5,3])
# 均匀分布
torch.rand(5,3)
# 标准正态分布
torch.randn(5,3)
# 离散正态分布
torch.normal(mean=.0,std=1.0,size=([5,3]))
# 线性间隔向量(返回⼀个1维张量，包含在区间start和end上均匀间隔的steps个点) 等差数列
torch.linspace(start=1,end=10,steps=20)

torch.Size([5, 3])
tensor([ 1.0000,  1.4737,  1.9474,  2.4211,  2.8947,  3.3684,  3.8421,  4.3158,4.7895,  5.2632,  5.7368,  6.2105,  6.6842,  7.1579,  7.6316,  8.1053,8.5789,  9.0526,  9.5263, 10.0000])

张量的属性

每个Tensor有torch.dtype、torch.device、torch.layout三种属性
torch.device标识了torch.Tensor对象在创建之后所存储在的设备名称（cpu还是GPU）
torch.layout表示torch.Tensor内存布局的对象
张量的属性描述了张量的形状、数据类型和存储它们的设备。
以对象的⻆度来判断，张量可以看做是具有特征和⽅法的对象

tensor = torch.rand(3,4)
print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu

张量运算

官网总结的100多种张量运算包括算术、线性代数、矩阵操作（转置、索引、切⽚）、采样等等
这些操作中的每⼀个都可以在 GPU 上运⾏（速度通常⽐在 CPU 上更快）

默认情况下，张量是在 CPU 上创建的。

我们可以使⽤使⽤ .to() ⽅法明确地将张量移动到 GPU (GPU可⽤的情况下)。
请注意！跨设备复制内容量较⼤的张量，在时间和内存⽅⾯可能成本很⾼！

# 设置张量在GPU上运算
# We move our tensor to the GPU if available
if torch.cuda.is_available():tensor = tensor.to('cuda')

张量的索引和切片

tensor = torch.ones(4, 4) # 创建一个4x4的张量
print('First row: ', tensor[0]) # 打印第一行
print('First column: ', tensor[:, 0]) # 打印第一列
print('Last column:', tensor[..., -1]) # 打印最后一列
tensor[:,1] = 0 # 第二列赋值为0
print(tensor)

First row:  tensor([1., 1., 1., 1.])
First column:  tensor([1., 1., 1., 1.])
Last column: tensor([1., 1., 1., 1.])
tensor([[1., 0., 1., 1.],[1., 0., 1., 1.],[1., 0., 1., 1.],[1., 0., 1., 1.]])

张量的拼接

可以使⽤ torch.cat ⽤来连接指定维度的⼀系列张量。另⼀个和 torch.cat 功能类似的函数是torch.stack

方法	含义	格式
torch.cat	沿现有维度连接给定的序列	torch.cat（tensor， dim = 0 ， *， out = None ）
torch.stack	沿新维度连接一系列张量	torch.stack（张量， dim = 0， *， out = None）

print(tensor) # 打印原始张量
t1 = torch.cat([tensor, tensor, tensor], dim=1) # 按列拼接
# dim 参数决定了拼接操作沿着哪个维度进行。具体来说：
# 	•	dim=-1 表示沿着最后一个维度拼接
# 	•	dim=0 表示沿着第一个维度（行的方向）拼接。
# 	•	dim=1 表示沿着第二个维度（列的方向）拼接。
# 	•	dim=2 表示沿着第三个维度（深度方向，通常是针对三维张量）拼接，以此类推。
print(t1)

tensor([[1., 0., 1., 1.],[1., 0., 1., 1.],[1., 0., 1., 1.],[1., 0., 1., 1.]])
tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])

算数运算

加法运算

# 加法运算
t1 = torch.tensor([[1,2],[3,4]])
print(t1)
t2 = torch.tensor([[5,6],[7,6]])
print(t2)
t3 = t1 + t2
print(t3)
t4 = torch.add(t1, t2)
print(t4)
print(t1.add(t2))  
print(t1)
#t1.add_(t2) # 会改变t1的值

tensor([[1, 2],[3, 4]])
tensor([[5, 6],[7, 6]])
tensor([[ 6,  8],[10, 10]])
tensor([[ 6,  8],[10, 10]])
tensor([[ 6,  8],[10, 10]])
tensor([[1, 2],[3, 4]])

减法运算

#减法运算
print(t1 - t2)
print(torch.sub(t1, t2))
print(t1.sub(t2))
print(t1)

tensor([[-4, -4],[-4, -2]])
tensor([[-4, -4],[-4, -2]])
tensor([[-4, -4],[-4, -2]])
tensor([[1, 2],[3, 4]])

乘法运算

计算两个张量之间矩阵乘法的⼏种⽅式。 y1, y2, y3 最后的值是⼀样的
二维矩阵乘法运算包括torch.mm(),torch.matmul()(高维度仅支持),@

对于高维度的Tensor（dim>2），定义其矩阵乘法仅在最后的两个维度上,要求前面的维度必须保持一致，就像矩阵的索引一样并且运算操作只有torch.matul()

print(tensor) # 打印原始张量
y1 = tensor @ tensor.T
print(y1) # 等价于 tensor.matmul(tensor.T) 
y2 = tensor.matmul(tensor.T)
print(y2) # 等价于 tensor @ tensor.T
y3 = torch.rand_like(tensor) # 与tensor形状相同的随机张量(初始化y3)
torch.matmul(tensor, tensor.T, out=y3) # 输出到y3
print(y3)

tensor([[1., 0., 1., 1.],[1., 0., 1., 1.],[1., 0., 1., 1.],[1., 0., 1., 1.]])
tensor([[3., 3., 3., 3.],[3., 3., 3., 3.],[3., 3., 3., 3.],[3., 3., 3., 3.]])
tensor([[3., 3., 3., 3.],[3., 3., 3., 3.],[3., 3., 3., 3.],[3., 3., 3., 3.]])
tensor([[3., 3., 3., 3.],[3., 3., 3., 3.],[3., 3., 3., 3.],[3., 3., 3., 3.]])

#高维度矩阵运算
t5 = torch.ones(1,2,3,4)
print(t5)
t6 = torch.ones(1,2,4,3)
print(t6)
print(t5.matmul(t6)) # torch.Size([1, 2, 3, 1, 2, 3])
print(torch.matmul(t5, t6)) # torch.Size([1, 2, 3, 1, 2, 3])

tensor([[[[1., 1., 1., 1.],[1., 1., 1., 1.],[1., 1., 1., 1.]],[[1., 1., 1., 1.],[1., 1., 1., 1.],[1., 1., 1., 1.]]]])
tensor([[[[1., 1., 1.],[1., 1., 1.],[1., 1., 1.],[1., 1., 1.]],[[1., 1., 1.],[1., 1., 1.],[1., 1., 1.],[1., 1., 1.]]]])
tensor([[[[4., 4., 4.],[4., 4., 4.],[4., 4., 4.]],[[4., 4., 4.],[4., 4., 4.],[4., 4., 4.]]]])
tensor([[[[4., 4., 4.],[4., 4., 4.],[4., 4., 4.]],[[4., 4., 4.],[4., 4., 4.],[4., 4., 4.]]]])

计算张量逐元素相乘的⼏种⽅法。 z1, z2, z3 最后的值是⼀样的
哈达码积（element wise,对应元素相乘）

print(tensor) # 打印原始张量
z1 = tensor * tensor # 逐元素相乘
print(z1) # 等价于 tensor.mul(tensor)
z2 = tensor.mul(tensor) # 逐元素相乘
print(z2) # 等价于 tensor * tensor
z3 = torch.rand_like(tensor) # 与tensor形状相同的随机张量(初始化z3)
torch.mul(tensor, tensor, out=z3) # 输出到z3
print(z3)

tensor([[1., 0., 1., 1.],[1., 0., 1., 1.],[1., 0., 1., 1.],[1., 0., 1., 1.]])
tensor([[1., 0., 1., 1.],[1., 0., 1., 1.],[1., 0., 1., 1.],[1., 0., 1., 1.]])
tensor([[1., 0., 1., 1.],[1., 0., 1., 1.],[1., 0., 1., 1.],[1., 0., 1., 1.]])
tensor([[1., 0., 1., 1.],[1., 0., 1., 1.],[1., 0., 1., 1.],[1., 0., 1., 1.]])

除法运算

#除法运算
print(t1 / t2)
print(torch.div(t1, t2))
print(t1.div(t2))
print(t1)

tensor([[0.2000, 0.3333],[0.4286, 0.6667]])
tensor([[0.2000, 0.3333],[0.4286, 0.6667]])
tensor([[0.2000, 0.3333],[0.4286, 0.6667]])
tensor([[1, 2],[3, 4]])

幂运算

使用torch.pow(tensor,2);**;两种方法
e指函数：torch.exp(tensor)

print(t1)
print(torch.pow(t1, 2)) # 每个元素平方
print(t1.pow(2)) # 每个元素平方
print(t1**2) # 每个元素平方
#print(t1.pow_(2)) # 每个元素平方

tensor([[1, 2],[3, 4]])
tensor([[ 1,  4],[ 9, 16]])
tensor([[ 1,  4],[ 9, 16]])
tensor([[ 1,  4],[ 9, 16]])

开方运算

tensor.sqrt()
tensor.sqrt_()

对数运算

torch.log2(tensor)
torch.log10(tensor)
torch.log(tensor)
torch.log_(tensor)

单元素张量

如果⼀个单元素张量，例如将张量的值聚合计算，可以使⽤ item() ⽅法将其转换为Python 数值

print(tensor) # 打印原始张量
agg = tensor.sum() # 求和
print(agg)
agg_item = agg.item() # 将张量的值转换为Python数值
print(agg_item, type(agg_item)) # 打印agg_item的值和类型

tensor([[1., 0., 1., 1.],[1., 0., 1., 1.],[1., 0., 1., 1.],[1., 0., 1., 1.]])
tensor(12.)
12.0 <class 'float'>

In-place操作

把计算结果存储到当前操作数中的操作就称为就地操作。含义和pandas中inPlace参数的含义⼀样。pytorch中，这些操作是由带有下划线 _ 后缀的函数表⽰。
例如：x.copy_(y) , x.t_() , 将改变 x ⾃⾝的值

In-place操作虽然节省了⼀部分内存，但在计算导数时可能会出现问题，因为它会⽴即丢失历史记录。因此，不⿎励使⽤它们。

x = torch.tensor([(1, 2, 3), (4, 5, 6), (7, 8, 9)])
print(x)
x.add_(2) # 逐元素加2
print(x) # 打印x的值
# 注意：任何以`_`结尾的操作都会用结果替换原始张量。例如：x.copy_(y), x.t_() , 将更改 `x`.

tensor([[1, 2, 3],[4, 5, 6],[7, 8, 9]])
tensor([[ 3,  4,  5],[ 6,  7,  8],[ 9, 10, 11]])

与numpy之间的转换

CPU 和 NumPy 数组上的张量共享底层内存位置，所以改变⼀个另⼀个也会变

张量到numpy数组

t1 = torch.ones(6) # 创建一个张量
print(f"t1:{t1}") #这里的f是格式化张量的内容到字符串中
n1 = t1.numpy() # 张量转numpy数组
print(f"n1:{n1}") # 打印numpy数组

t1:tensor([1., 1., 1., 1., 1., 1.])
n1:[1. 1. 1. 1. 1. 1.]

t1.add_(1) # 逐元素加1
print(f"t1:{t1}") # 打印张量
print(f"n1:{n1}") # 打印numpy数组

t1:tensor([2., 2., 2., 2., 2., 2.])
n1:[2. 2. 2. 2. 2. 2.]

Numpy数组到张量

n2 = np.ones(5) # 创建一个numpy数组
print(f"n2:{n2}") # 打印numpy数组
t2 = torch.from_numpy(n2) # numpy数组转张量
print(f"t2:{t2}") # 打印张量
#Numpy数组和PyTorch张量将共享它们的底层内存位置，因此对一个进行更改将导致另一个也发生更改。
np.add(n2,1,out=n2) # 逐元素加1
print(f"t2:{t2}") # 打印张量
print(f"n2:{n2}") # 打印numpy数组

n2:[1. 1. 1. 1. 1.]
t2:tensor([1., 1., 1., 1., 1.], dtype=torch.float64)
t2:tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
n2:[2. 2. 2. 2. 2.]

计算图

在进⼀步学习pytorch之前，先要了解⼀个概念 —— 计算图( Computation graph)所有的深度学习框架都依赖于计算图来完成梯度下降、优化梯度值等计算。
⽽计算图的创建和应⽤，通常包含如下两个部分：

用户构前向传播图
框架处理后向传播(梯度更新)

模型从简单到复杂，pytorch和tensorflow都使⽤计算图来完成⼯作。
但是，这两个框架所使⽤的计算图也却有所不同：
tensorflow1.x 使⽤的是静态计算图，tensorflow2.x和pytorch使⽤的是动态计算图

静态计算图

先搭建计算图，后运行;允许编译器进行优化
通常包括以下两个阶段。

定义⼀个架构(可以使⽤⼀些基本的流控制⽅法，⽐如循环和条件指令)
运⾏⼀组数据来训练模型，进⾏推理

优点：允许对图进⾏强⼤的离线优化/调度，所以速度相对较快。
缺点：难以调试，对代码中处理结构化或者可变⼤⼩的数据处理⽐较复杂

动态计算图

编好程序即可执行
在执⾏正向计算时，隐式地定义图(动态构建)。

优点：灵活，侵⼊性⼩，允许动态构建和评估
缺点：难以优化

两种计算图⽐较起来，可以看出：动态图是对调试友好的(对程序员友好)。它允许逐⾏执⾏代码，并可以访问所有张量。这样更便于发现和找到我们计算或逻辑中的问题

pytorch计算图可视化

通过torchviz可以实现

import torch
from torchviz import make_dot# 定义矩阵 A，向量 b 和常数 c
A = torch.randn(10, 10,requires_grad=True)   #requires_grad=True表示需要计算梯度,对A求导
b = torch.randn(10,requires_grad=True)
c = torch.randn(1,requires_grad=True)
x = torch.randn(10, requires_grad=True)
# 计算 x^T * A + b * x + c
result = torch.matmul(A, x.T) + torch.matmul(b, x) + c
# ⽣成计算图节点
dot = make_dot(result, params={'A': A, 'b': b, 'c': c, 'x': x})
# 绘制计算图
dot.render('expression', format='png', cleanup=True, view=False)

在这里插入图片描述