当前位置：首页 > news >正文

【AI】Sklearn

news 来源：原创 2025/7/16 3:17:42

长期更新，建议关注、收藏、点赞。

友情链接：
AI中的数学_线代微积分概率论最优化
Python
numpy_pandas_matplotlib_spicy

建议路线：机器学习->深度学习->强化学习

预处理

模型选择

分类

实例：二分类比赛 +网格搜索

import numpy as np
import pandas as pd
train_data=pd.read_csv('train_data.csv')
train_data.head()
# train_data
train_data.drop(['ID'],inplace=True,axis=1)
train_data.head()#训练数据分出输入和最后预测的值
train_X=train_data.iloc[:,train_data.columns!='y']
print(train_X.head())
train_y=train_data.iloc[:,train_data.columns=='y']
print(train_y.head())test_data=pd.read_csv('test_set.csv')
test_data.head()
test_data.drop(['ID'],inplace=True,axis=1)
test_data.head()#特征提取#LabelEncoder
#pd.Categorical().codes可以直接得到原始数据的对应序号列表 详细参考官网：https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Categorical.html
#相当于encode
c = ['A','A','A','B','B','C','C','C','C']
category = pd.Categorical(c)
#接下来查看category的label即可print(category.codes)  #[0 0 0 1 1 2 2 2 2]
print(category.dtype) #category#factorize相当于编码encoding
job_feature=train_X['job'].unique() #去重
# print(job_feature)
len(job_feature)
example=train_X
example['job'],uniques=pd.factorize(example['job'])
#pd.factorize:Encode the object as an enumerated type or categorical variable.
print(pd.factorize(example['job']))
# print(example['job'])
# example.head()train_X['job']=train_X['job']+1marital_feature=train_X['marital'].unique()
print(marital_feature)
len(marital_feature)train_X['marital'],unique=pd.factorize(train_X['marital'])
train_X['marital']=train_X['marital']+1
train_X.head()education_feature=train_X['education'].unique()
print(education_feature)
len(education_feature)train_X['education'],unique=pd.factorize(train_X['education'])
train_X['education']=train_X['education']+1
train_X.head()contact_feature=train_X['contact'].unique()
print(contact_feature)
len(contact_feature)train_X['contact'],unique=pd.factorize(train_X['contact'])
train_X['contact']=train_X['contact']+1
train_X.head()month_feature=train_X['month'].unique()
print(month_feature)
len(month_feature)train_X['month'],unique=pd.factorize(train_X['month'])
train_X['month']=train_X['month']+1
train_X.head()poutcome_feature=train_X['poutcome'].unique()
print(poutcome_feature)
len(poutcome_feature)train_X['poutcome'],unique=pd.factorize(train_X['poutcome'])
train_X['poutcome']=train_X['poutcome']+1
train_X.head()default_feature=train_X['default'].unique()
print(default_feature)
len(default_feature)train_X['default'],unique=pd.factorize(train_X['default'])
train_X['default']=train_X['default']+1
train_X.head()housing_feature=train_X['housing'].unique()
print(housing_feature)
len(housing_feature)
train_X['housing'],unique=pd.factorize(train_X['housing'])
train_X['housing']=train_X['housing']+1
train_X.head()loan_feature=train_X['loan'].unique()
print(loan_feature)
len(loan_feature)
train_X['loan'],unique=pd.factorize(train_X['loan'])
train_X['loan']=train_X['loan']+1
train_X.head()#测试集数据数字化
test_data.head()
test_data['job'],jnum=pd.factorize(test_data['job'])
test_data['job']=test_data['job']+1
test_data.head()test_data['marital'],jnum=pd.factorize(test_data['marital'])
test_data['marital']=test_data['marital']+1test_data['education'],jnum=pd.factorize(test_data['education'])
test_data['education']=test_data['education']+1test_data['default'],jnum=pd.factorize(test_data['default'])
test_data['default']=test_data['default']+1test_data['housing'],jnum=pd.factorize(test_data['housing'])
test_data['housing']=test_data['housing']+1test_data['loan'],jnum=pd.factorize(test_data['loan'])
test_data['loan']=test_data['loan']+1test_data['contact'],jnum=pd.factorize(test_data['contact'])
test_data['contact']=test_data['contact']+1test_data['month'],jnum=pd.factorize(test_data['month'])
test_data['month']=test_data['month']+1test_data['poutcome'],jnum=pd.factorize(test_data['poutcome'])
test_data['poutcome']=test_data['poutcome']+1test_data.head()#LogisticRegression
from sklearn.linear_model import LogisticRegression
LR=LogisticRegression()
LR.fit(train_X,train_y)
#测试
test_y=LR.predict(test_data)
test_y
df_test=pd.read_csv('test_set.csv')
df_test['pred']=test_y.tolist()
df_result=df_test.loc[:,['ID','pred']]#save res
df_result.to_csv('LR.csv',index=False)#SVM
from sklearn.svm import LinearSVC
classifierSVM=LinearSVC()
classifierSVM.fit(train_X,train_y)
test_ySVM=classifierSVM.predict(test_data)
df_test=pd.read_csv('test_set.csv')
df_test['pred']=test_ySVM.tolist()
df_result=df_test.loc[:,['ID','pred']]
df_result.to_csv('LSVM.csv',index=False)#knn#decision tree#average prediction
test_yAver=(test_y+test_ySVM+test_yKNN+test_yTree)/4
test_yAver #array([0.  , 0.  , 0.  , ..., 0.25, 0.  , 0.25])
df_test=pd.read_csv('test_set.csv')
df_test['pred']=test_yAver.tolist()
df_result=df_test.loc[:,['ID','pred']]
df_result.to_csv('Aver.csv',index=False)#提高泛化能力
'''
GridSearchCV网格搜索
Exhaustive search over specified parameter values for an estimator.
The parameters of the estimator used to apply these methods are 
optimized by cross-validated grid-search over a parameter grid.param_grid:
e.g. {'n_estimators':list(range(10,401,10))}
每一轮 params其中一个元素为{'n_estimators':x 其中一个值 从前往后}
Dictionary with parameters names (str) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings.scoring：Strategy to evaluate the performance of the cross-validated model on the test set.cv：Determines the cross-validation splitting strategy.n_estimators：the number of trees to be used in the forest.
The number of boosting stages to perform. 
Gradient boosting is fairly robust to over-fitting 
so a large number usually results in better performance. 
Values must be in the range [1, inf).min_samples_split：
determines the minimum number of features to consider while looking for a split.min_samples_leaf：
The minimum number of samples required to be at a leaf node.
A split point at any depth will only be considered if it 
leaves at least min_samples_leaf training samples in each of the left 
and right branches. 
This may have the effect of smoothing the model, especially in regression.
--------------
GradientBoostingClassifier
基于决策树DT
subsample：The fraction比例 of samples to be used for fitting the individual单个 base learners. max_features：The number of features to consider when looking for the best split
Choosing max_features < n_features leads to a reduction of variance and an increase in bias.
the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.
若一个节点一直没找到一个有效划分，则一直找，即使已经找过超过max_featuresrandom_state：Controls the random seed given to each Tree estimator at each boosting iteration. In addition, it controls the random permutation of the features at each split (see Notes for more details).'''
param_test1={'n_estimators':list(range(10,401,10))}#网格搜索max_iteration
gsearch1=GridSearchCV(estimator=GradientBoostingClassifier(learning_rate=0.1,max_features=None, subsample=0.8,random_state=10),param_grid=param_test1,scoring='roc_auc',iid=False,cv=3)
gsearch1.fit(train_X.values,train_y2)
gsearch1.grid_scores_,gsearch1.best_params_,gsearch1.best_score_
##{'n_estimators': 350}, 0.8979275309747781)
## 找到一个合适的迭代次数，开始对决策树进行调参。
'''
grid_scores_:
每轮打印 mean/std/paramsbest_params_:
e.g. {'n_estimators': 350}指向这个350轮
Parameter setting that gave the best results on the hold out data.best_score_:
Mean cross-validated score of the best_estimator
'''
param_test2={'max_depth':list(range(3,14,2)),'min_samples_split':list(range(20,100,10))}#网格搜索max_depth
gsearch2=GridSearchCV(estimator=GradientBoostingClassifier(learning_rate=0.1,n_estimators=350,min_samples_leaf=20,max_features=None,subsample=0.8,random_state=10),param_grid=param_test2,scoring='roc_auc',iid=False,cv=3  )
gsearch2.fit(train_X.values,train_y2)
gsearch2.grid_scores_,gsearch2.best_params_,gsearch2.best_score_
#{'max_depth': 3, 'min_samples_split': 90}, 0.8973756708021962)'''
上述的决策树的深度可以定下来，
但是划分所需要的最小样本数min_samples_split还不能定下来，
这个参数还与决策树其他参数存在关联记下来对内部节点再划分所需最小样本数min_samples_split和叶子结点最少样本数min_samples_leaf一起调参
'''
param_test3={'min_samples_split':list(range(80,1080,100)),'min_samples_leaf':list(range(60,101,10))}
gsearch3=GridSearchCV(estimator=GradientBoostingClassifier(learning_rate=0.1,n_estimators=350,max_depth=3,max_features=None,subsample=0.8,random_state=10),param_grid=param_test3,scoring='roc_auc',iid=False,cv=3)
gsearch3.fit(train_X.values,train_y2)
gsearch3.grid_scores_,gsearch3.best_params_,gsearch3.best_score_
##{'min_samples_leaf': 60, 'min_samples_split': 280}, 0.8976660805899851)##调完参后，放到GBDT里面看看效果
gbm1=GradientBoostingClassifier(learning_rate=0.1,n_estimators=350,max_depth=3,min_samples_leaf=60,min_samples_split=280,max_features=None,subsample=0.8,random_state=10)
gbm1.fit(train_X.values,train_y2)
y_pred=gbm1.predict(train_X)
y_predprob=gbm1.predict_proba(train_X)[:,1]
print("Accuracy : %.4g" % metrics.accuracy_score(train_y.values,y_pred))
print("AUC score(Train):%f" % metrics.roc_auc_score(train_y,y_predprob))## 对最大特征数max_features进行网格搜索
param_test4={'max_features':list(range(4,16,2))}
gsearch4=GridSearchCV(estimator=GradientBoostingClassifier(learning_rate=0.1,n_estimators=350,max_depth=3,min_samples_leaf=60 ,min_samples_split=280,subsample=0.8,random_state=10),param_grid=param_test4,scoring='roc_auc',iid=False,cv=3)
gsearch4.fit(train_X.values,train_y2)
gsearch4.grid_scores_,gsearch4.best_params_,gsearch4.best_score_
## {'max_features': 14}, 0.8971037288653009)## 对子采样比例进行网格搜索
param_test5={'subsample':[0.6,0.7,0.75,0.8,0.85,0.9]}
gsearch5=GridSearchCV(estimator=GradientBoostingClassifier(learning_rate=0.1,n_estimators=350,max_depth=3,min_samples_leaf=60,min_samples_split=280,max_features=14,random_state=10),param_grid=param_test5,scoring='roc_auc',iid=False,cv=3)
gsearch5.fit(train_X.values,train_y2)
gsearch5.grid_scores_,gsearch5.best_params_,gsearch5.best_score_
##{'subsample': 0.85}, 0.8976770026809427)#基本得到所有调优的参数结果了，可以减半步长，加倍最大迭代次数增加模型的泛化能力
gbm2=GradientBoostingClassifier(learning_rate=0.05,n_estimators=350,max_depth=3,min_samples_leaf=60,min_samples_split=280,max_features=14,subsample=0.85,random_state=10)
gbm2.fit(train_X.values,train_y2)
y_pred=gbm2.predict(train_X)
y_predprob=gbm2.predict_proba(train_X)[:,1]
print("Accuracy : %.4g" % metrics.accuracy_score(train_y.values,y_pred))
print("AUC Score(Train): %f" % metrics.roc_auc_score(train_y,y_predprob))gbm5=GradientBoostingClassifier(learning_rate=0.05,n_estimators=700,max_depth=3,min_samples_leaf=60,min_samples_split=280,max_features=14,subsample=0.85,random_state=10)
gbm5.fit(train_X.values,train_y2)
y_pred=gbm5.predict(train_X)
y_predprob=gbm5.predict_proba(train_X)[:,1]
print("Accuracy : %.4g" % metrics.accuracy_score(train_y.values,y_pred))
print("AUC Score(Train): %f" % metrics.roc_auc_score(train_y,y_predprob))#继续减小步长，增加迭代次数
gbm3=GradientBoostingClassifier(learning_rate=0.01,n_estimators=350,max_depth=3,min_samples_leaf=60,min_samples_split=280,max_features=14,subsample=0.85,random_state=10)
gbm3.fit(train_X.values,train_y2)
y_pred=gbm3.predict(train_X)
y_predprob=gbm3.predict_proba(train_X)[:,1]
print("Accuracy : %.4g" % metrics.accuracy_score(train_y.values,y_pred))
print("AUC Score(Train): %f" % metrics.roc_auc_score(train_y,y_predprob))#继续减小步长，增加迭代次数
gbm4=GradientBoostingClassifier(learning_rate=0.01,n_estimators=600,max_depth=3,min_samples_leaf=60,min_samples_split=280,max_features=14,subsample=0.85,random_state=10)
gbm4.fit(train_X.values,train_y2)
y_pred=gbm4.predict(train_X)
y_predprob=gbm4.predict_proba(train_X)[:,1]
print("Accuracy : %.4g" % metrics.accuracy_score(train_y.values,y_pred))
print("AUC Score(Train): %f" % metrics.roc_auc_score(train_y,y_predprob))#继续减小步长，增加迭代次数
gbm6=GradientBoostingClassifier(learning_rate=0.005,n_estimators=1200,max_depth=3,min_samples_leaf=60,min_samples_split=280,max_features=14,subsample=0.85,random_state=10)
gbm6.fit(train_X.values,train_y2)
y_pred=gbm6.predict(train_X)
y_predprob=gbm6.predict_proba(train_X)[:,1]
print("Accuracy : %.4g" % metrics.accuracy_score(train_y.values,y_pred))
print("AUC Score(Train): %f" % metrics.roc_auc_score(train_y,y_predprob))gbm7=GradientBoostingClassifier(learning_rate=0.05,n_estimators=1200,max_depth=3,min_samples_leaf=60,min_samples_split=280,max_features=14,subsample=0.85,random_state=10)
gbm7.fit(train_X.values,train_y2)
y_pred=gbm7.predict(train_X)
y_predprob=gbm7.predict_proba(train_X)[:,1]
print("Accuracy : %.4g" % metrics.accuracy_score(train_y.values,y_pred))
print("AUC Score(Train): %f" % metrics.roc_auc_score(train_y,y_predprob))gbm8=GradientBoostingClassifier(learning_rate=0.01,n_estimators=1200,max_depth=3,min_samples_leaf=60,min_samples_split=280,max_features=14,subsample=0.85,random_state=10)
gbm8.fit(train_X.values,train_y2)
y_pred=gbm8.predict(train_X)
y_predprob=gbm8.predict_proba(train_X)[:,1]
print("Accuracy : %.4g" % metrics.accuracy_score(train_y.values,y_pred))
print("AUC Score(Train): %f" % metrics.roc_auc_score(train_y,y_predprob))#调来调去发现gbm7的accuracy最高0.954668，选这个保存
test_y_predprob=gbm7.predict_proba(test_data)[:,1]
df_test['pred']=test_y_predprob.tolist()
df_result=df_test.loc[:,['ID','pred']]
df_result.to_csv('GBDToptimiza.csv',index=False)

实例：MNIST数字分类

采用逻辑回归。
Note that this accuracy of this l1-penalized linear model is significantly below what can be reached by an l2-penalized linear model or a non-linear multi-layer perceptron model on this dataset.不如L2正则化以及非线性模型的

# Authors: The scikit-learn developers
# SPDX-License-Identifier: BSD-3-Clauseimport timeimport matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.utils import check_random_state# Turn down for faster convergence
t0 = time.time()
train_samples = 10000# Load data from https://www.openml.org/d/554
X, y = fetch_openml("mnist_784", version=1, return_X_y=True, as_frame=False)
#type:ndarray
#y:label
#X:70000张图片矩阵random_state = check_random_state(0)#return <class 'numpy.random.mtrand.RandomState'>
permutation = random_state.permutation(X.shape[0])#70000个随机数
X = X[permutation]#打乱,得到随机数对应的图片和label
y = y[permutation]
#X = X.reshape((X.shape[0], -1)) #这个操作实际上没什么必要,一直是70000*784X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=train_samples, test_size=10000
)scaler = StandardScaler()#训练集、测试集都要标准化 
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)# Turn up tolerance for faster convergence
clf = LogisticRegression(C=50.0 / train_samples, penalty="l1", solver="saga", tol=0.1)
#c:Inverse of regularization strength;正则化强度的逆，c值越小正则化越强,
#solver:Algorithm to use in the optimization problem.saga适合较大的数据集,
#tol:Tolerance for stopping criteria.什么时候停止
clf.fit(X_train, y_train)
#print(clf.coef_.shape)#the number == 7840
print(np.mean(clf.coef_==0))#coef相关系数, True=1 False=0来计算mean
#print(np.sum(clf.coef_==0))
#print(np.sum(clf.coef_!=0))sparsity = np.mean(clf.coef_ == 0) * 100 #.coef即相关系数coefficient
#用这个表示稀疏程度 
#等价于np.sum(clf.coef_==0)/(clf.coef_.shape[0]*clf.coef_.shape[1])score = clf.score(X_test, y_test)
# print('Best C % .4f' % clf.C_)
print("Sparsity with L1 penalty: %.2f%%" % sparsity)
print("Test score with L1 penalty: %.4f" % score)coef = clf.coef_.copy()
plt.figure(figsize=(10, 5))
scale = np.abs(coef).max()#取出里面相关系数最大的数的绝对值for i in range(10):l1_plot = plt.subplot(2, 5, i + 1)#放置第i+1个图l1_plot.imshow(#利用图片的相关系数，也可以画出大致数字的轮廓coef[i].reshape(28, 28),interpolation="nearest",#插值法cmap=plt.cm.RdBu,vmin=-scale,vmax=scale,)l1_plot.set_xticks(())l1_plot.set_yticks(())l1_plot.set_xlabel("Class %i" % i)
plt.suptitle("Classification vector for...")run_time = time.time() - t0
print("Example run in %.3f s" % run_time)
plt.show()

在这里插入图片描述

回归

聚类

降维

综合实例1：鸢尾花数据集

#下载鸢尾花数据集
import seaborn as sns
iris = sns.load_dataset("iris")#数据查看
type(iris)#pandas.core.frame.DataFrame
iris.shape#(150, 5)
iris.head()
iris.info()
iris.describe()
iris.species.value_counts()#3个分类分别的样例数目
sns.pairplot(data=iris, hue="species")#根据species形成不同颜色，根据属性形成笛卡尔积数据展示图#数据清洗
iris_simple = iris.drop(["sepal_length", "sepal_width"], axis=1)
iris_simple.head()
#删掉了这两列#标签编码
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
iris_simple["species"] = encoder.fit_transform(iris_simple["species"])
#将species的字符串编码为int#数据集标准化
from sklearn.preprocessing import StandardScaler
import pandas as pd
trans = StandardScaler()
_iris_simple = trans.fit_transform(iris_simple[["petal_length", "petal_width"]])
_iris_simple = pd.DataFrame(_iris_simple, columns = ["petal_length", "petal_width"])
_iris_simple.describe()#构建训练集、测试集
from sklearn.model_selection import train_test_split
train_set, test_set = train_test_split(iris_simple, test_size=0.2)
test_set.head()iris_x_train = train_set[["petal_length", "petal_width"]]
iris_x_train.head()iris_y_train = train_set["species"].copy()
iris_y_train.head()iris_x_test = test_set[["petal_length", "petal_width"]]
iris_x_test.head()iris_y_test = test_set["species"].copy()
iris_y_test.head()

对上述数据集采用不同的机器学习算法。

k近邻算法

from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier()#new一个分类器对象
clf
clf.fit(iris_x_train, iris_y_train)#训练
res = clf.predict(iris_x_test)#预测
print(res)
print(iris_y_test.values)#打印比对#翻转：int反编码回原来的分类string
encoder.inverse_transform(res)#评估
accuracy = clf.score(iris_x_test, iris_y_test)
print("预测正确率:{:.0%}".format(accuracy))#存储数据
out = iris_x_test.copy()
out["y"] = iris_y_test
out["pre"] = res #prediction
out
out.to_csv("iris_predict.csv")#可视化
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as pltdef draw(clf):# 网格化M, N = 500, 500x1_min, x2_min = iris_simple[["petal_length", "petal_width"]].min(axis=0)x1_max, x2_max = iris_simple[["petal_length", "petal_width"]].max(axis=0)t1 = np.linspace(x1_min, x1_max, M)t2 = np.linspace(x2_min, x2_max, N)x1, x2 = np.meshgrid(t1, t2)#把向量转换成array# 预测x_show = np.stack((x1.flat, x2.flat), axis=1)#列堆叠y_predict = clf.predict(x_show)# 配色cm_light = mpl.colors.ListedColormap(["#A0FFA0", "#FFA0A0", "#A0A0FF"])cm_dark = mpl.colors.ListedColormap(["g", "r", "b"])# 绘制预测区域图plt.figure(figsize=(10, 6))plt.pcolormesh(t1, t2, y_predict.reshape(x1.shape), cmap=cm_light)#Create a pseudocolor plot with a non-regular rectangular grid.# 绘制原始数据点plt.scatter(iris_simple["petal_length"], iris_simple["petal_width"], label=None,c=iris_simple["species"], cmap=cm_dark, marker='o', edgecolors='k')plt.xlabel("petal_length")plt.ylabel("petal_width")# 绘制图例color = ["g", "r", "b"]species = ["setosa", "virginica", "versicolor"]for i in range(3):plt.scatter([], [], c=color[i], s=40, label=species[i])    # 利用空点绘制图例plt.legend(loc="best")#放置图例 best指最佳位置plt.title('iris_classfier')draw(clf)

朴素贝叶斯算法
探究：当X=(x1, x2)发生的时候，哪一个yk发生的概率最大

#步骤跟之前相同
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()#构造分类器对象
clf.fit(iris_x_train, iris_y_train)#训练
res = clf.predict(iris_x_test)#预测
print(res)
print(iris_y_test.values)
accuracy = clf.score(iris_x_test, iris_y_test)#评估
print("预测正确率:{:.0%}".format(accuracy))
draw(clf)#可视化

决策树算法
CART算法：每次通过一个特征，将数据尽可能的分为纯净的两类，递归的分下去

from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier()
clf.fit(iris_x_train, iris_y_train)
res = clf.predict(iris_x_test)
print(res)
print(iris_y_test.values)
accuracy = clf.score(iris_x_test, iris_y_test)
print("预测正确率:{:.0%}".format(accuracy))
draw(clf)

逻辑回归算法
训练：通过一个映射方式，将特征X=（x1, x2）映射成 P(y=ck), 求使得所有概率之积最大化的映射方式里的参数
预测：计算p(y=ck) 取概率最大的那个类别作为预测对象的分类

from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(solver='saga', max_iter=1000)
'''
solverAlgorithm to use in the optimization problem. 
Default is ‘lbfgs’.
For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones;For multiclass problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss;‘liblinear’ and ‘newton-cholesky’ can only handle binary classification by default. To apply a one-versus-rest scheme for the multiclass setting one can wrapt it with the OneVsRestClassifier.‘newton-cholesky’ is a good choice for n_samples >> n_features, especially with one-hot encoded categorical features with rare categories. Be aware that the memory usage of this solver has a quadratic dependency on n_features because it explicitly computes the Hessian matrix.
'''
clf.fit(iris_x_train, iris_y_train)
res = clf.predict(iris_x_test)
print(res)
print(iris_y_test.values)
accuracy = clf.score(iris_x_test, iris_y_test)
print("预测正确率:{:.0%}".format(accuracy))
draw(clf)

支持向量机算法
以二分类为例，假设数据可用完全分开：
用一个超平面将两类数据完全分开，且最近点到平面的距离最大

from sklearn.svm import SVC   
clf = SVC()
clf #打印查看有什么属性
clf.fit(iris_x_train, iris_y_train)
res = clf.predict(iris_x_test)
print(res)
print(iris_y_test.values)
accuracy = clf.score(iris_x_test, iris_y_test)
print("预测正确率:{:.0%}".format(accuracy))
draw(clf)

集成方法——随机森林
训练集m，有放回的随机抽取m个数据，构成一组，共抽取n组采样集
n组采样集训练得到n个弱分类器弱分类器一般用决策树或神经网络
将n个弱分类器进行组合得到强分类器

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
clf
clf.fit(iris_x_train, iris_y_train)
res = clf.predict(iris_x_test)
print(res)
print(iris_y_test.values)
accuracy = clf.score(iris_x_test, iris_y_test)
print("预测正确率:{:.0%}".format(accuracy))
draw(clf)

集成方法——Adaboost
训练集m，用初始数据权重训练得到第一个弱分类器，根据误差率计算弱分类器系数，更新数据的权重
使用新的权重训练得到第二个弱分类器，以此类推
根据各自系数，将所有弱分类器加权求和获得强分类器

from sklearn.ensemble import AdaBoostClassifier
clf = AdaBoostClassifier()
clf
clf.fit(iris_x_train, iris_y_train)
res = clf.predict(iris_x_test)
print(res)
print(iris_y_test.values)
accuracy = clf.score(iris_x_test, iris_y_test)
print("预测正确率:{:.0%}".format(accuracy))
draw(clf)

集成方法——梯度提升树GBDT
训练集m，获得第一个弱分类器，获得残差，然后不断地拟合残差
所有弱分类器相加得到强分类器
（残差在数理统计中是指实际观察值与估计值（拟合值）之间的差。）

from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier()
clf
clf.fit(iris_x_train, iris_y_train)
res = clf.predict(iris_x_test)
print(res)
print(iris_y_test.values)
accuracy = clf.score(iris_x_test, iris_y_test)
print("预测正确率:{:.0%}".format(accuracy))
draw(clf)

更多常见可选模型
【1】xgboost
GBDT的损失函数只对误差部分做负梯度（一阶泰勒）展开
XGBoost损失函数对误差部分做二阶泰勒展开，更加准确，更快收敛

【2】lightgbm
微软：快速的，分布式的，高性能的基于决策树算法的梯度提升框架，速度更快

【3】stacking
堆叠或者叫模型融合
先建立几个简单的模型进行训练，第二级学习器会基于前级模型的预测结果进行再训练

【4】神经网络

综合实例2：用8种不同算法

使用 8 种不同算法

import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas_profiling as ppf
import seaborn as snsdef load_data(file_path):'''导入数据:param file_path: 数据存放路径:return: 返回数据列表'''f = open(file_path)data = []for line in f.readlines():row = []  # 记录每一行lines = line.strip().split("\t")for x in lines:row.append(x)data.append(row)f.close()return datadata = load_data('datingTestSet.txt')
# data
data = pd.DataFrame(data, columns=['每年的飞行距离', '玩视频游戏所耗时间的百分比', '每周消费冰激凌的公升数', '喜欢的程度'])data = data.astype(float)
# data['喜欢的程度'] = data['喜欢的程度'].astype(int)data['喜欢的程度'].value_counts()#每种值对应多少个rowppf.ProfileReport(data)#输出report# windows版解决sns.pairplot()中文问题
from matplotlib.font_manager import FontProperties
myfont=FontProperties(fname=r'C:\Windows\Fonts\simhei.ttf',size=14)
sns.set(font=myfont.get_name())sns.pairplot(data=data, hue='喜欢的程度')#数据预处理：标签编码、处理缺失值、数据标准化
#本例无需标签编码，没有缺失值，需要进行数据标准化
from sklearn.preprocessing import StandardScaler
trans = StandardScaler()
data_simple = trans.fit_transform(data[['每年的飞行距离', '玩视频游戏所耗时间的百分比', '每周消费冰激凌的公升数']])
data_simple = pd.DataFrame(data, columns=['每年的飞行距离', '玩视频游戏所耗时间的百分比', '每周消费冰激凌的公升数'])
data_simple.head(10)#构建训练集和测试集
from sklearn.model_selection import train_test_split
train_set, test_set = train_test_split(data, test_size=0.2)
train_set.head()data_x_train = train_set[['每年的飞行距离', '玩视频游戏所耗时间的百分比', '每周消费冰激凌的公升数']]
data_y_train = train_set['喜欢的程度'].copy()
# data_x_train.head()
data_y_train.head()data_x_test = test_set[['每年的飞行距离', '玩视频游戏所耗时间的百分比', '每周消费冰激凌的公升数']]
data_y_test = test_set['喜欢的程度'].copy()# 使用 8 种不同算法，分别对数据集进行训练，获得分类模型，并用测试集进行测试，最后将预测结果存储到本地文件中
#k近邻算法
#朴素贝叶斯算法
#决策树算法
#逻辑回归算法
#支持向量机算法
#集成方法——随机森林
#集成方法——Adaboost
#集成方法——梯度提升树GBDT#找一个表现较好的算法，对比舍弃一个不重要特征与否对模型性能的影响
data = data.drop(['每周消费冰激凌的公升数'], axis=1)
data_simple = trans.fit_transform(data[['每年的飞行距离', '玩视频游戏所耗时间的百分比']])
data_simple = pd.DataFrame(data, columns=['每年的飞行距离', '玩视频游戏所耗时间的百分比'])
data_simple.head(10)
# data.head()train_set, test_set = train_test_split(data, test_size=0.2)
train_set.head()data_x_train = train_set[['每年的飞行距离', '玩视频游戏所耗时间的百分比']]
data_y_train = train_set['喜欢的程度'].copy()
data_y_train.head()data_x_test = test_set[['每年的飞行距离', '玩视频游戏所耗时间的百分比']]
data_y_test = test_set['喜欢的程度'].copy()clf = GradientBoostingClassifier()
clf.fit(data_x_train, data_y_train)
res = clf.predict(data_x_test)#预测结果
accuracy = clf.score(data_x_test, data_y_test)
print("预测正确率:{:.0%}".format(accuracy))#可视化
def draw(clf):# 网格化M, N = 500, 500x1_min, x2_min = data_simple[['每年的飞行距离', '玩视频游戏所耗时间的百分比']].min(axis=0)x1_max, x2_max = data_simple[['每年的飞行距离', '玩视频游戏所耗时间的百分比']].max(axis=0)t1 = np.linspace(x1_min, x1_max, M)t2 = np.linspace(x2_min, x2_max, N)x1, x2 = np.meshgrid(t1, t2)# 预测x_show = np.stack((x1.flat, x2.flat), axis=1)y_predict = clf.predict(x_show)# 配色cm_light = mpl.colors.ListedColormap(["#A0FFA0", "#FFA0A0", "#A0A0FF"])cm_dark = mpl.colors.ListedColormap(["g", "r", "b"])# 绘制预测区域图plt.figure(figsize=(10, 6))plt.pcolormesh(t1, t2, y_predict.reshape(x1.shape), cmap=cm_light)# 绘制原始数据点plt.scatter(data_simple["每年的飞行距离"], data_simple["玩视频游戏所耗时间的百分比"], label=None,c=data["喜欢的程度"], cmap=cm_dark, marker='o', edgecolors='k')plt.xlabel("每年的飞行距离")plt.ylabel("玩视频游戏所耗时间的百分比")# 绘制图例color = ["g", "r", "b"]species = ["1", "2", "3"]for i in range(3):plt.scatter([], [], c=color[i], s=40, label=species[i])    # 利用空点绘制图例#s：The marker size in points**2 (typographic points are 1/72 in.)plt.legend(loc="best")plt.title('data_classfier')

【AI】Sklearn

长期更新，建议关注、收藏、点赞。友情链接： AI中的数学_线代微积分概率论最优化 Python numpy_pandas_matplotlib_spicy 建议路线：机器学习->深度学习->强化学习目录预处理模型选择分类实例： 二分类比赛网格搜索实例&…...

编程日记 2025/7/16 3:17:42

图数据库 | 10、图数据库架构设计——高性能图存储架构(上)

老夫在之前的三大篇内容中，介绍了图数据库的三大组件—图计算、图存储以及图查询语言。（都归拢在图数据库原理、架构与应用这个专栏中了，感兴趣的朋友可以在去找阅读。） 接下来，老夫还将继续深化这三大组件&#xff0…...

编程日记 2025/7/14 5:20:19

el-table 组件二次封装（vue2）

PublicTable.vue  <template><div class"table-common"><el-table v-loading"loading" :ref"tableid" border style"width: 100%" :data"tableDatas" :row-key"rowKey&quo…...

编程日记 2025/7/15 16:18:55

张量并行和流水线并行在Transformer中的具体部位

目录张量并行和流水线并行在Transformer中的具体部位一、张量并行二、流水线并行张量并行和流水线并行在Transformer中的具体部位张量并行和流水线并行是Transformer模型中用于提高训练效率的两种并行策略。它们分别作用于模型的不同部位，以下是对这两种并行的具体说…...

编程日记 2025/7/13 8:06:50

详解Qt pdf 之QPdfSelection 选择文本类

文章目录 QPdfSelection 类详解前言详细说明公共函数说明1. 构造函数2. text3. boundingRect4. isEmpty5. startPage6. endPage 使用场景示例代码代码说明总结 QPdfSelection 类详解前言 QPdfSelection 是 Qt PDF 模块中的一个类，用于表示在 PDF 文档中被选中的…...

编程日记 2025/7/12 21:44:12

一款支持80+语言，包括：拉丁文、中文、阿拉伯文、梵文等开源OCR库

大家好，今天给大家分享一个基于PyTorch的OCR库EasyOCR，它允许开发者通过简单的API调用来读取图片中的文本，无需复杂的模型训练过程。项目介绍 EasyOCR 是一个基于Python的开源项目，它提供了一个简单易用的光学字符识别&#xff…...

编程日记 2025/7/14 16:16:11

matlab 中的 bug

在matlab中绘图，设置 axe 的背景颜色 axes_in3.Color #00235B ;打印的时候 print(figure_handle1,-dpng,-r300,"merge_yt_ey") ;此时保存的图片无法识别背景颜色原因在于 matlab 中的 InverseHardcopy 将 InvertHardcopy 设置成 off 则可以解决这个问…...

编程日记 2025/7/1 18:23:19

【算法刷题指南】优先级队列

🌈个人主页： 南桥几晴秋 🌈C专栏： 南桥谈C 🌈C语言专栏： C语言学习系列 🌈Linux学习专栏： 南桥谈Linux 🌈数据结构学习专栏： 数据结构杂谈 🌈数据…...

编程日记 2025/7/15 3:57:50

android user版本默认usb模式为充电模式

android插入usb时会切换至默认设置的模式，debug版本为adb，user版本为mtp protected long getChargingFunctions() {// if ADB is enabled, reset functions to ADB// else enable MTP as usual.if (isAdbEnabled()) {return UsbManager.FUNCTION_ADB;} e…...

编程日记 2025/7/11 4:53:14

[极客大挑战 2019]HardSQL--详细解析

信息搜集登录系统，有两个可能的注入点： 随便输一下看看传参类型： 都是GET型。 SQL注入传参 usernameadmin’&password123 传参 usernameadmin&password123’ username和password传参，四种闭合方式只有单引号报错&a…...

编程日记 2025/7/15 19:09:39

java基础概念46-数据结构1

一、引入 List集合的三种实现类使用了不同的数据结构！ 二、数据结构的定义三、常见的数据结构 3-1、栈特点：先进后出，后进先出。 java内存容器： 3-2、队列特点：先进先出、后进后出。栈VS队列-小结 3-3、数组 3-…...

编程日记 2025/7/15 1:19:11

数学建模选MATLAB还是Python？

在进行数学建模时，选择合适的编程语言和工具对于建模的效率和效果至关重要。目前，MATLAB和Python是两个常用的数学建模工具，它们各自有优缺点，适用于不同的场景。本文将从多个维度对MATLAB和Python进行比较，帮助大家做…...

编程日记 2025/7/14 23:05:40

【C++】多线程

目录一概念 1 多线程 2 多进程与多线程 3 多线程理解二创建线程 1 thread 2 join() 和 detach() 3 this_thread 三 std::mutex 1 lock 和 unlock 2 lock_guard 3 unique_lock 四 condition_variable 五 std::atomic 一概念 1 多线程在C11之前&#xff0…...

编程日记 2025/7/12 17:15:59

【计算机网络】实验2：总线型以太网的特性

实验 2：总线型以太网的特性一、实验目的加深对MAC地址，IP地址，ARP协议的理解。了解总线型以太网的特性（广播，竞争总线，冲突）。二、实验环境 • Cisco Packet Tracer 模拟器三、实…...

编程日记 2025/7/12 19:04:34

基于Matlab合成孔径雷达(SAR)回波信号建模与多指标质量评估

本研究基于合成孔径雷达（SAR）技术，建立了一个雷达回波信号的模拟模型，并通过多项评价指标对信号质量进行深入评估。首先，研究定义了与SAR系统相关的关键物理参数，如工作频率、平台速度、脉冲宽度、采样率等…...

编程日记 2025/7/14 23:02:11

spring boot3.3.5 logback-spring.xml 配置

新建 resources/logback-spring.xml 控制台输出颜色有点花可以自己更改 <?xml version"1.0" encoding"UTF-8"?>  <configuration debug"false" scan"false"><springProperty …...

编程日记 2025/7/15 23:48:04

浅谈C#库之DevExpress

一、DevExpress库介绍 DevExpress是一个功能强大、界面美观的UI组件库，广泛应用于桌面应用程序和Web应用程序的开发中。它提供了丰富的控件和工具，帮助开发人员快速构建现代化的用户界面。DevExpress控件库以其功能丰富、应用简便、界面华丽以及方便定制…...

编程日记 2025/7/15 1:34:34

【webApp之h5端实战】项目基础结构搭建及欢迎页面的实现

这是一个实战项目的webapp，主要是使用原生js/css/html来实现我们的业务。预览下面的实战效果，我们将会从0到1实现这个系列的项目。包括大量的原生js知识，css3动画的开发，以及页面的交互实现。效果预览项目准备工作封装的工具类，用于获取原生dom节点，处理原生dom事件的…...

编程日记 2025/7/10 22:10:53

生成树详解（STP、RSTP、MSTP）

目录 1、STP 1.概述 2.基本概念 3.端口角色及其作用 4.报文结构 5.STP的端口状态 6.三种定时器 7.STP选举步骤 8.配置BPDU的比较原则 9.TCN BPDU 10.临时环路的问题 11.传统STP的不足拓扑变更处理过程 2、RSTP 1.端口角色 2.端口状态 3.P/A（Propo…...

编程日记 2025/7/14 21:24:26

C++趣味编程：基于树莓派Pico的模拟沙漏-倾斜开关与LED的互动实现

沙漏，作为一种古老的计时工具，利用重力让沙子通过狭小通道，形成了计时效果。在现代，我们可以通过电子元件模拟沙漏的工作原理。本项目利用树莓派Pico、倾斜开关和LED，实现了一个电子沙漏。以下是项目的详细技术解析与C++代码实现。一、项目概述 1. 项目目标通过倾斜开关…...

编程日记 2025/7/15 10:32:10

Matlab Simulink 电力电子仿真-单相电压型半桥逆变电路分析

目录一、单相电压型半桥逆变电路仿真模型 1.电路模型 2.电路模型参数二、仿真分析三、总结 1.优缺点 2.应用场景一、单相电压型半桥逆变电路仿真模型 1.电路模型单相电压型半桥逆变电路是一种常见的逆变电路，主要用于将直流电源转换为交流电源。 &…...

编程日记 2025/7/15 9:08:01

在超表面中琼斯矩阵的使用

琼斯矩阵（Jones Matrix） 是一种线性代数方法，用于描述光的偏振状态和偏振变化，是偏振光学中重要的数学工具。它在超表面理论设计中广泛应用，尤其是在设计和调控光与物质相互作用时，例如偏振控制、相位调制…...

编程日记 2025/7/13 8:31:17

threeJs学习贴图：地球

效果图： 贴图以后的效果： vue代码： <template><div class"scene_box"><p>创建纹理贴图TextureLoader</p><div class"canvas"></div></div> </template><script s…...

编程日记 2025/7/15 16:22:42

详解Rust多线程编程

文章目录多线程模型创建和管理线程自定义线程行为线程传递数据线程间通信线程池错误处理与线程Condvar(条件变量)无锁并发高性能并发库 Rust的多线程编程提供了一种安全、高效的方式来进行并发操作。Rust的并发性设计原则之一是确保线程安全，同时避免运行时的开销&…...

编程日记 2025/7/15 11:25:23

Uniapp触底刷新

在你的代码中，使用了 scroll-view 来实现一个可滚动的评论区域，并且通过监听 scrolltolower 事件来触发 handleScrollToLower 函数，以实现“触底更新”或加载更多评论的功能。关键部分分析： scroll-view 组件: scroll-view 是一…...

编程日记 2025/7/12 17:46:42

【前端】安装hadoop后，前端启动报错，yarn命令

新安装hadoop后，前端启动项目用yarn命令，报错。报错：系统找不到指定的路径。 No HADOOP_CONF_DIR set. Please specify it either in yarn-env.cmd or in the environment. 解决：删掉hadoop目录下yarn的文件检查：…...

编程日记 2025/7/15 12:57:34

T620存储安全方案SoC芯片技术手册

系统资源集成32位国产CPU CK803S；最高工作频率260Mhz CK803S内置16KB I/D Cache，内置32KB DTCM 32KB ROM；256KB SRAM；8KB SRAM（系统专用） 512KB/1MB 片内Flash 安全算法支持SM4数据加密，加密性…...

编程日记 2025/7/13 7:51:06

Rust循环引用与多线程并发

循环引用与自引用循环引用的概念循环引用指的是两个或多个对象之间相互持有对方的引用。在 Rust 中，由于所有权和生命周期的严格约束，直接创建循环引用通常会导致编译失败。例如： // 错误的循环引用示例 struct Node {next: Option<B…...

编程日记 2025/7/15 11:27:16

力扣二叉树的锯齿形层序遍历-103

二叉树的锯齿形层序遍历-103 此题就是再二叉树层序遍历的基础上，加了反转当前层数组元素的函数reverse()，也可以不反转，直接在遍历当前层的所有节点的for循环里直接进行if判断，根据遍历方向，决定如何插入元素。 clas…...

编程日记 2025/7/15 19:48:29

PyCryptodome：Python中的密码学库

简介 PyCryptodome是一个功能强大的Python密码学库，提供了各种密码学算法的实现，包括对称加密、非对称加密、哈希函数、消息认证码等。它是对Python的Crypto库的一个现代化和增强版，提供了更好的性能和安全性。 Git地址 PyCryptodome的代码可…...

编程日记 2025/7/13 4:30:26

我眼中的“懂重构”（一）

初识重构 2017年的时候，领导让我看公司的一本书《重构——改善代码的既有设计》，这是一本JAVA版本的，前后看了2遍。那时候看书因为不懂看的格外仔细。我只是那时候不懂，然而多年后的今天我仍然发现很多人对重构充满误解。在刚进入…...

编程日记 2025/7/11 7:54:58

Excel中日期格式“年月日时间”修改为“年月日”

需求： 将Excel中“yyyy-mm-dd hh:mm:ss”格式的时间转换为“yyyy-mm-dd”格式的时间，选中转换后的时间时编辑栏中依然会显示“yyyy-mm-dd hh:mm:ss”格式。方法一、在原数据列进行转换： 1、选中需要转换的数据列，右键--【设置…...

编程日记 2025/7/14 1:27:43

CSS底层基础：小白速来

1. CSS简介 CSS (Cascading Style Sheets) 是一种用来描述HTML或XML文档样式的语言。它使得开发者能够控制网页的布局和外观，包括字体、颜色、间距等。CSS通过选择器来指定要应用样式的元素，并定义这些元素的具体样式属性。基本结构示例： …...

编程日记 2025/7/15 6:14:08

【MySQL】库和表的基本操作

目录库库的增删查改字符集与校验集库的备份与恢复表表的创建和删除用不同的存储引擎创建表的区别查看表修改表添加删除属性修改改变属性上篇博客我们讲了数据库的基本理解，对数据库有了一个大致的概念，下面我们来介绍一下库和表的…...

编程日记 2025/7/15 4:26:20

5款AI智能办公工具丨提升办公效率‼️

办公效率低？工作压力大？别担心，这里有五款超实用的AI办公工具，帮你轻松搞定各种任务！🌟 简直不要太实用，快快收藏起来总有一天你会用得上~ 红薯通AI📝写作文案的好帮手&#xff0c…...

编程日记 2025/7/13 9:27:35

华为HarmonyOS 让应用快速拥有账号能力 -- 3 获取用户手机号

场景介绍当应用对获取的手机号时效性要求不高时，可使用Account Kit提供的手机号授权与快速验证能力，向用户发起手机号授权申请，经用户同意授权后，获取到手机号并为用户提供相应服务。以下只针对Account kit提供的手机号授权与快…...

编程日记 2025/7/15 5:43:03

lambda strem流表达式处理工具

一个通用的lambda stream流处理工具, 包含了工作中绝大部分场景常用的使用方式 import java.math.BigDecimal; import java.util.*; import java.util.function.BiFunction; import java.util.function.Consumer; import java.util.function.Function; import java.util.funct…...

编程日记 2025/7/13 1:28:10

Android 编译和使用libheif

项目中需要使用libheif,libde265,libyuv。一下是相应的cmakelist.txt。这里直接使用了静态库。里面涉及到c包的链接，需要stdc。 ${PROJECT_SOURCE_DIR}/../jniLibs/${ANDROID_ABI}/liblibde265.a这个路径由于操作过程中copy出现问题，多了一层路径&…...

编程日记 2025/7/8 12:34:30

新一代零样本无训练目标检测

🏡作者主页：点击！ 🤖编程探索专栏：点击！ ⏰️创作时间：2024年12月2日21点02分神秘男子影, 秘而不宣藏。泣意深不见, 男子自持重, 子夜独自沉。论文链接点击开启你的论文编程之旅h…...

编程日记 2025/7/15 21:41:01

神经网络入门实战：（九）分类问题 → 神经网络模型搭建模版和训练四步曲

(一) 神经网络模型搭建官方文档每一层基本都有权重和偏置，可以仔细看官方文档。 pytorch 官网的库：torch.nn — PyTorch 2.5 documentation Containers库：用来搭建神经网络框架（包含所有的神经网络的框架）&#xff1b…...

编程日记 2025/7/10 23:56:28

写NFC微信小程序跳转Uri标签

本示例使用的发卡器：https://item.taobao.com/item.htm?spma21dvs.23580594.0.0.52de2c1b8bEEGz&ftt&id615391857885 Dim dispstr As String Dim status As Byte Dim status1 As Byte Dim afi As Byte Dim myctrlword As Byte Dim mypiccserial(0 To 7) …...

编程日记 2025/7/14 20:52:18

Gradle vs. Maven: 到底哪个更适合java 项目?

ApiHug ApiHug - API Design & Develop New Paradigm.ApiHug - API Design & Develop New Paradigm.https://apihug.com/ 首先 ApiHug 整个工具链是基于 gradle 构建,包括项目模版， 插件； 说到 Java 项目管理，有两个巨头脱颖而出&a…...

编程日记 2025/7/9 19:12:07

【赛博保安】安全日记之常用术语(一)

"企业的信息安全治理水平，直接取决于安全团队人员的技术专业度，而非运营经验值。所谓的技术，并非指渗透和挖洞的能力，而是指软件开发、IT 架构、网络拓扑相关的知识和经验。站在乙方的角度来看，技术薄弱的安全人…...

编程日记 2025/7/14 13:34:59

设计模式更新ing

设计模式 1、六大原则1.1 单一设计原则 SRP1.2 开闭原则1.3 里氏替换原则1.4 迪米特法则1.5 接口隔离原则1.6 依赖倒置原则 2、工厂模式 1、六大原则 1.1 单一设计原则 SRP 一个类应该只有一个变化的原因比如一个视频软件，区分不同的用户级别包括访客&#xff0…...

编程日记 2025/7/13 1:33:37

008静态路由-特定主机路由

按照如上配置，用192.168.0.1 电脑ping 192.168.1.1 发现能够ping通用192.168.0.1 电脑ping 192.168.2.1 发现不能ping通这是因为192.168.0.1 和 192.168.1.1 使用的是同一个路由器R1。 192.168.0.1 和 192.168.2.1 通信需要先经过R1，再经过R2 &#xf…...

编程日记 2025/7/10 18:17:28

MySQL 慢查询日志记录 SQL优化性能优化日志查询 Explain

介绍慢查询日志记录了所有执行时间超过指定参数(long_query_time，单位:秒，默认10秒)的所有SQL语句的日志。MySQL的慢查询日志默认没有开启，需要在MySQL的配置文件(/etc/my.cnf)中配置针对这些慢查询的SQL语句进行优化。 #开启慢查询开关 s…...

编程日记 2025/7/14 7:45:27

VINS_MONO视觉导航算法【三】ROS基础知识介绍

文章目录其他文章说明ROSlaunch文件基本概念定义用途文件结构根标签常用标签\<node>\<param>\<rosparam>\<remap>\<include>\<arg>\<group> 示例基本示例嵌套示例使用方法启动 *.launch 文件传递参数总结 ROS topicTopic 的基本…...

编程日记 2025/7/13 13:43:04

Python 3 教程第13篇（集合）

Python3 集合集合（set）是一个无序的不重复元素序列。集合中的元素不会重复，并且可以进行交集、并集、差集等常见的集合操作。可以使用大括号 { } 创建集合，元素之间用逗号 , 分隔， 或者也可以使用 set() 函数创建集…...

编程日记 2025/7/13 9:27:20

cesium 3dtile ClippingPlanes 多边形挖洞ClippingPlaneCollection

原理就是3dtiles里面的属性clippingPlanes 采用ClippingPlaneCollection，构成多边形来挖洞。其次就是xyz法向量挖洞 clippingPlanes: new this.ffCesium.Cesium.ClippingPlaneCollection({unionClippingRegions: true, // true 表示多个切割面能合并为一个有效的…...

编程日记 2025/7/11 21:24:45

开发者如何使用GCC提升开发效率GUI操作

看此篇前请先阅读https://blog.csdn.net/qq_20330595/article/details/144139026?spm1001.2014.3001.5502 先上效果图找到对应的环境版本配置环境目录结构 CtrlShiftP c_cpp_properties.json {"configurations": [{"name": "Win32","i…...

编程日记 2025/7/15 19:47:49

目录

预处理

模型选择

分类

实例： 二分类比赛 +网格搜索

实例：MNIST数字分类

回归

聚类

降维

综合实例1：鸢尾花数据集

综合实例2：用8种不同算法

相关文章：

实例：二分类比赛 +网格搜索