lightgbm的LGBMClassifier，如何对鸢尾花数据集进行预测？

发布于2024年02月02日作者：苏南大叔来源：平行空间笔记本~

我们相信：世界是美好的，你是我也是。平行空间的世界里面，不同版本的生活也在继续...

继上一个XGBoost之后，这里再次迎来一个没有收录在sklearn里面的预测模型：来自microsoft的LightGBM。很厉害是吧？其实使用方式上，还是一样的非常简单，和其它的模型的使用方法上基本相同。但是，需要设置默认参数才能避免输出很多警告信息，这可能是其特殊的地方。

苏南大叔的“平行空间笔记本”博客，记录苏南大叔的代码感想感悟。本文测试环境：win10，python@3.12.0，pandas@2.1.3，scikit-learn@1.3.2，LightGBM@4.1.0。

LightGBM

LightGBM目前由微软维护，根正苗红。参考链接：

https://github.com/microsoft/LightGBM

苏南大叔：lightgbm的LGBMClassifier，如何对鸢尾花数据集进行预测？ - micrsoft-lightgbm — lightgbm的LGBMClassifier，如何对鸢尾花数据集进行预测？（图4-2）

官方页面的介绍文字：

LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed and efficient with the following advantages:
Faster training speed and higher efficiency.
Lower memory usage.
Better accuracy.
Support of parallel, distributed, and GPU learning.
Capable of handling large-scale data.

LightGBM依然是没有集成在sklearn里面的。所以，依然需要单独安装。

pip install lightgbm

本文还是使用这个LightGBM来处理一下鸢尾花数据集。看看效果如何。

加载鸢尾花数据集

老生常谈部分，代码如下：

from sklearn.model_selection import train_test_split
import pandas as pd
data_url = "http://download.tensorflow.org/data/iris_training.csv"
column_names = ["萼长", "萼宽", "瓣长", "瓣宽", "种类"]
data = pd.read_csv(data_url, header=0, names=column_names)
X = data.iloc[:, :-1].values
y = data.iloc[:, -1:].values.flatten()
X_train, X_true, y_train, y_true = train_test_split(X, y, test_size=0.2, random_state=8)

不明白的读者，可以参考下面的文章：

LGBMClassifier模型预测

import lightgbm as lgb
model = lgb.LGBMClassifier(verbose=-1, num_threads=2)
model.fit(X_train, y_train)
y_pred = model.predict(X_true)
print(y_pred)
print("LGBM算法预测准确率:", model.score(X_true, y_true))

输出：

[1 2 2 2 1 1 0 0 1 1 0 2 2 0 0 2 2 2 2 0 0 2 0 1]
LGBM算法预测准确率: 0.9166666666666666

苏南大叔：lightgbm的LGBMClassifier，如何对鸢尾花数据集进行预测？ - lightgbm-运算结果 — lightgbm的LGBMClassifier，如何对鸢尾花数据集进行预测？（图4-3）

可能存在的问题

在这个部分，LGBMClassifier是存在着一些特殊情况的。如果不设置参数的话，它处理鸢尾花数据集的时候，会有一些警告信息输出的。比如：

D:\Program Files\Python312\Lib\site-packages\joblib\externals\loky\backend\context.py:136: UserWarning: Could not find the number of physical cores for the following reason:
found 0 physical cores < 1
Returning the number of logical cores instead. You can silence this warning by setting LOKY_MAX_CPU_COUNT to the number of cores you want to use.
  warnings.warn(
  File "D:\Program Files\Python312\Lib\site-packages\joblib\externals\loky\backend\context.py", line 282, in _count_physical_cores
    raise ValueError(f"found {cpu_count_physical} physical cores < 1")

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000049 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 80
[LightGBM] [Info] Number of data points in the train set: 96, number of used features: 4
[LightGBM] [Info] Start training from score -1.037988
[LightGBM] [Info] Start training from score -1.232144
[LightGBM] [Info] Start training from score -1.037988
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf

它的解决方案是，设置参数：

verbose=-1，抑止错误输出。
num_threads=2，设置线程数量，这个和测试机的cpu是几核的有关。大家就根据实际情况设置吧。

苏南大叔：lightgbm的LGBMClassifier，如何对鸢尾花数据集进行预测？ - 可能存在的错误信息 — lightgbm的LGBMClassifier，如何对鸢尾花数据集进行预测？（图4-4）

模型评估

模型得分这部分也是固定套路，套公式感十足。本文就用最新研究的.classification_report()来做评测吧。

from sklearn.metrics import classification_report
report = classification_report(y_true, y_pred)
print(report)

输出：

              precision    recall  f1-score   support
           0       1.00      1.00      1.00         8
           1       1.00      0.75      0.86         8
           2       0.80      1.00      0.89         8

    accuracy                           0.92        24
   macro avg       0.93      0.92      0.92        24
weighted avg       0.93      0.92      0.92        24

对于这个结果，解释说明，可以参考文章：

https://seosn.com/say/sklearn-classification_report.html

结语

机器学习，全称machine learning，简称ml。链接：

https://seosn.com/tag/ml

如果本文对您有帮助，或者节约了您的时间，欢迎打赏瓶饮料，建立下友谊关系。

本博客不欢迎：各种镜像采集行为。请尊重原创文章内容，转载请保留作者链接。

【福利】腾讯云最新爆款活动！1核2G云服务器首年50元！

【源码】本文代码片段及相关软件，请点此获取更多信息

【绝密】秘籍文章入口，仅传授于有缘之人 python sklearn

pip安装whl文件时，如何解决invalid file name的问题？
如何判断 python 的 arch 构架信息？是32还是64？
~gohlke/pythonlibs不能访问了，替换方案是什么？
如何使用python装饰器@classmethod定义类方法?
如何使用python装饰器@staticmethod定义静态方法?
python如何利用pickle序列化和反序列化任意变量？

前一篇后一篇联系苏南打赏大叔【真香警告】本站同款服务器，赞助商腾讯云

	原创不易，转载请保留链接，谢绝镜像采集
	如果能解决您的困扰，那么想必定是极好的
	快来这里！大家都在这儿等你讨论这个问题