Draft:Streaming Gradient Boosting (SGB)

Submission rejected on 1 September 2025 by Caleb Stanford (talk).

This topic is not sufficiently notable for inclusion in Wikipedia.

Rejected by Caleb Stanford 41 days ago. Last edited by Caleb Stanford 41 days ago.

Comment: Please do not use Wikipedia to promote your latest research paper. The entire article exists only to promote Gunasekara et al. (2024). Caleb Stanford (talk) 00:26, 1 September 2025 (UTC)

Streaming Gradient Boosting (SGB) is a family of online machine learning algorithms that extend gradient boosting to evolving data streams. Unlike batch boosting methods, which require multiple passes over static datasets, streaming variants update incrementally under strict memory and time constraints, while adapting to concept drift.

Background

Gradient boosting was introduced by Friedman (2000) as a stage-wise additive model that combines weak learners to minimize a differentiable loss function.^[1]. In batch learning, implementations such as XGBoost achieve state-of-the-art results by optimizing both accuracy and computational efficiency^[2]

In the data stream setting, ensemble methods such as OzaBag^[3], Adaptive Random Forest (ARF)^[4], and Streaming Random Patches (SRP)^[5] became widely used. Boosting, however, historically underperformed for streaming tasks until recent advances in gradient-based formulations.

Streaming Gradient Boosted Trees (SGBT)

Streaming Gradient Boosted Trees (SGBT) were introduced for streaming classification by Gunasekara et al. (2024).^[6] They employ second-order Taylor approximation of the loss function, as in XGBoost, to guide tree construction incrementally.

At boosting step S, the model $\phi$ is trained to predict ${\hat {y}}_{i}$ ^[1]:

{\hat {y}}_{i}=\phi (x_{i})=\sum _{s=1}^{s}f_{s}(x_{i}),\quad f_{s}\in {\mathcal {F}}

.

Here, ${\mathcal {F}}$ is the space of regressors. The loss at the $s$ th boosting step can approximated by:

{\mathcal {L}}^{(s)}\approx \sum _{i=1}^{n}\left[l(y_{i},{\hat {y}}_{i}^{(s-1)})+g_{i}f_{s}(x_{i})+{\tfrac {1}{2}}h_{i}f_{s}^{2}(x_{i})\right]+\Omega (f_{s})

where $g_{i}$ and $h_{i}$ are the first- and second-order derivatives of the loss with respect to $s-1$ -th prediction^[1]^[2]. Here, ${\mathcal {n}}$ is the size of the dataset or the stream.

Furthermore, the above loss at $s$ can be rewritten to explain it as a weighted squared loss^[1]^[2] with weight $h_{i}$ and target $g_{i}/h_{i}$ :

${\mathcal {L}}^{(s)}\approx \sum _{i=1}^{n}{\frac {1}{2}}h_{i}(f_{t}(x_{i})-g_{i}/h_{i})^{2}+\Omega (f_{s})+constant.$

Above weight $h_{i}$ and target $g_{i}/h_{i}$ can be used to train an incremental regression tree at $s$ .

Empirical studies show that SGBT surpasses random subspace and random patch ensemble methods in streaming classification under different drift conditions.^[6]

Streaming Gradient Boosted Regression (SGBR)

Regression with streaming gradient boosting poses additional challenges. Vanilla SGBT with squared loss exhibits high variance, leading to poor predictive performance^[7]. To address this, Gunasekara et al. (2025) proposed Streaming Gradient Boosted Regression (SGBR)^[7], which integrates bagging-based streaming regressors to reduce variance.

Two main designs are described:

SGB(Bag): bagged streaming regressors are used as base learners within the boosting framework. Variants include:
- SGB(Oza) – employs the Oza Bag Regressor
- SGB(ARF) – employs Adaptive Random Forest Regressor
- SGB(SRP) – employs Streaming Random Patches Regressor

Bag(SGBT): ensembles of boosted models are combined via bagging (e.g. Oza(SGBT), SRP(SGBT)).

Advantages of SGB(Oza)

Among SGBR variants, SGB(Oza) has shown the best performance. Key advantages include:

Variance reduction: Bagging mitigates boosting’s tendency toward variance inflation in regression^[1]^[7].
Predictive accuracy: On 11 benchmark datasets with concept drift, SGB(Oza) achieved higher adjusted $R^{2}$ scores than state-of-the-art regressors such as Self-Optimising k-Nearest Leaves (SOKNL)^[7].
Computational efficiency: Unlike Bag(SGBT) variants, SGB(Oza) achieves these gains without added time complexity^[7].
Drift robustness: Maintains stable performance across abrupt, gradual, and recurrent drift scenarios.^[7]

Applications

Streaming gradient boosting methods are applicable to real-time regression tasks such as:

financial forecasting
sensor network monitoring
online energy demand prediction
traffic volume prediction
large-scale retail demand forecasting

Software Implementations

Software implementations of SGBT and SGBR are available in MOA (Java) and CapyMOA (Python) stream learning platforms.

References

^ ^a ^b ^c ^d ^e Friedman, J. H. (2000). "Greedy function approximation: A gradient boosting machine". Annals of Statistics. 29(5): 1189–1232. doi:10.1214/aos/1013203451.
^ ^a ^b ^c Chen, T., & Guestrin, C. (2016). "XGBoost: A scalable tree boosting system". Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 785–794. doi:10.1145/2939672.2939785.
^ Oza, N., & Russell, S. (2001). "Online bagging and boosting". Proceedings of the 8th International Workshop on Artificial Intelligence and Statistics.
^ Gomes, H. M., Bifet, A., Read, J., Barddal, J. P., Enembreck, F., Pfahringer, B., Holmes, G., & Abdessalem, T. (2017). "Adaptive random forests for evolving data stream classification". Machine Learning. 106(9–10): 1469–1495. doi:10.1007/s10994-017-5642-8.
^ H. M. Gomes, J. Read and A. Bifet, "Streaming Random Patches for Evolving Data Stream Classification," 2019 IEEE International Conference on Data Mining (ICDM), Beijing, China, 2019, pp. 240-249, doi: 10.1109/ICDM.2019.00034.
^ ^a ^b Gunasekara, N., Pfahringer, B., Gomes, H., Bifet, A. Gradient boosted trees for evolving data streams. Mach Learn 113, 3325–3352 (2024). https://doi.org/10.1007/s10994-024-06517-y
^ ^a ^b ^c ^d ^e ^f Gunasekara, N., Pfahringer, B., Gomes, H.M., Bifet, A. Gradient boosted bagging for evolving data stream regression. Data Min Knowl Disc 39, 65 (2025). https://doi.org/10.1007/s10618-025-01147-x

[friedman2000-1] Friedman, J. H. (2000). "Greedy function approximation: A gradient boosting machine". Annals of Statistics. 29(5): 1189–1232. doi:10.1214/aos/1013203451.

[xgboost-2] Chen, T., & Guestrin, C. (2016). "XGBoost: A scalable tree boosting system". Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 785–794. doi:10.1145/2939672.2939785.

[oza2001-3] Oza, N., & Russell, S. (2001). "Online bagging and boosting". Proceedings of the 8th International Workshop on Artificial Intelligence and Statistics.

[arf2017-4] Gomes, H. M., Bifet, A., Read, J., Barddal, J. P., Enembreck, F., Pfahringer, B., Holmes, G., & Abdessalem, T. (2017). "Adaptive random forests for evolving data stream classification". Machine Learning. 106(9–10): 1469–1495. doi:10.1007/s10994-017-5642-8.

[srp2019-5] H. M. Gomes, J. Read and A. Bifet, "Streaming Random Patches for Evolving Data Stream Classification," 2019 IEEE International Conference on Data Mining (ICDM), Beijing, China, 2019, pp. 240-249, doi: 10.1109/ICDM.2019.00034.

[sgbt2024-6] Gunasekara, N., Pfahringer, B., Gomes, H., Bifet, A. Gradient boosted trees for evolving data streams. Mach Learn 113, 3325–3352 (2024). https://doi.org/10.1007/s10994-024-06517-y

[sgbr2025-7] ^ ^a ^b ^c ^d ^e ^f Gunasekara, N., Pfahringer, B., Gomes, H.M., Bifet, A. Gradient boosted bagging for evolving data stream regression. Data Min Knowl Disc 39, 65 (2025). https://doi.org/10.1007/s10618-025-01147-x

[1]

[2]

[3]

[4]

[5]

[6]

[7]