Ensemble learning

Ensemble learning algorithms: XGBoost(XGBoost.jl)、LightGBM(LightGBM.jl)、CatBoost(CatBoost.jl)、NGBoost、Random Forest等。

Like last year, the most commonly used algorithms were linear and logistic regression, followed closely by decision trees and random forests. Of more complex methods, gradient boosting machines and convolutional neural networks were the most popular approaches.

We also saw strong year-over-year growth in the use of large language models such as transformer networks (BERT, GPT-3, etc).

Python-based tools continue to dominate the machine learning frameworks.

Like last year, Scikit-learn, a swiss army knife applicable to most projects, is the top with over 80% of data scientists using it. TensorFlow and Keras, notably used in combination for deep learning, were each selected on about half of the data scientist surveys. Gradient boosting library xgboost is fourth, with about the same usage as 2020 and 2019.

The most popular of the new tools added to the survey this year is Huggingface reaching over 10%.

Despite being used less frequently overall, we continue to see strong year-over-year growth of the PyTorch framework.

XGBoost

在线安装

 1  pip install xgboost 

离线安装

 1  pip install xgboost-1.5.2-py3-none-manylinux2014_x86_64.whl 

LightGBM

在线安装

 1  pip install lightgbm 

离线安装

 1  pip install lightgbm-3.3.2-py3-none-manylinux1_x86_64.whl 

CatBoost

在线安装

 1  pip install catboost 

离线安装

 1 2  pip install graphviz-0.19.1-py3-none-any.whl pip install catboost-1.0.4-cp38-none-manylinux1_x86_64.whl