目录

Ensemble learning

misaraty 更新 | 2022-11-30
前言
Ensemble learning algorithms: XGBoost(XGBoost.jl)、LightGBM(LightGBM.jl)、CatBoost(CatBoost.jl)、NGBoost、Random Forest等。

./ensemble_learning.png
Ensemble learning


./star-history-20221130.png
Github Star趋势


./Primary_ML_Software.jpg
Primary ML Software


Like last year, the most commonly used algorithms were linear and logistic regression, followed closely by decision trees and random forests. Of more complex methods, gradient boosting machines and convolutional neural networks were the most popular approaches.

./kaggle1.jpg
State of Data Science and Machine Learning 2021


We also saw strong year-over-year growth in the use of large language models such as transformer networks (BERT, GPT-3, etc).

./kaggle2.jpg
State of Data Science and Machine Learning 2021


Python-based tools continue to dominate the machine learning frameworks.

Like last year, Scikit-learn, a swiss army knife applicable to most projects, is the top with over 80% of data scientists using it. TensorFlow and Keras, notably used in combination for deep learning, were each selected on about half of the data scientist surveys. Gradient boosting library xgboost is fourth, with about the same usage as 2020 and 2019.

The most popular of the new tools added to the survey this year is Huggingface reaching over 10%.

./kaggle3.jpg
State of Data Science and Machine Learning 2021


Despite being used less frequently overall, we continue to see strong year-over-year growth of the PyTorch framework.

./kaggle4.jpg
State of Data Science and Machine Learning 2021

XGBoost

在线安装

1
pip install xgboost

离线安装

1
pip install xgboost-1.5.2-py3-none-manylinux2014_x86_64.whl

LightGBM

在线安装

1
pip install lightgbm

离线安装

1
pip install lightgbm-3.3.2-py3-none-manylinux1_x86_64.whl

CatBoost

在线安装

1
pip install catboost

离线安装

1
2
pip install graphviz-0.19.1-py3-none-any.whl
pip install catboost-1.0.4-cp38-none-manylinux1_x86_64.whl