目录

梯度提升决策树

misaraty 更新 | 2024-05-09
前言
梯度提升决策树(Gradient Boosting Decision Tree, GBDT): XGBoost(XGBoost.jl)、LightGBM(LightGBM.jl)、CatBoost(CatBoost.jl)等。

概述

./star-history-202459.png
Github Star趋势

./Primary_ML_Software.jpg
Primary ML Software

Like last year, the most commonly used algorithms were linear and logistic regression, followed closely by decision trees and random forests. Of more complex methods, gradient boosting machines and convolutional neural networks were the most popular approaches.

./kaggle1.jpg
State of Data Science and Machine Learning 2021

We also saw strong year-over-year growth in the use of large language models such as transformer networks (BERT, GPT-3, etc).

./kaggle2.jpg
State of Data Science and Machine Learning 2021

Python-based tools continue to dominate the machine learning frameworks.

Like last year, Scikit-learn, a swiss army knife applicable to most projects, is the top with over 80% of data scientists using it. TensorFlow and Keras, notably used in combination for deep learning, were each selected on about half of the data scientist surveys. Gradient boosting library xgboost is fourth, with about the same usage as 2020 and 2019.

The most popular of the new tools added to the survey this year is Huggingface reaching over 10%.

./kaggle3.jpg
State of Data Science and Machine Learning 2021

Despite being used less frequently overall, we continue to see strong year-over-year growth of the PyTorch framework.

./kaggle4.jpg
State of Data Science and Machine Learning 2021

Scikit-learn is the most popular ML framework while PyTorch has been growing steadily year-over-year

./kaggle5.jpg
State of Data Science and Machine Learning 2022

算法

XGBoost

1
pip install xgboost

LightGBM

1
pip install lightgbm

CatBoost

1
pip install catboost