Skip to the content.

General information about River

River is a Python library for online machine learning. With River, as opposed to the batch setting, we encourage a different approach, which is to continuously learn a stream of data. This would allow any massive dataset to be learned, even when it does not fit in the memory. It has a lot of use cases, ranging from time series forecasting, spam filtering, recommender systems, CTR prediction and IoT applications.

The clustering module of River is currently the most complete among all open-source projects, with not only the highest number of already-available clustering algorithms, but also with the large number of internal and external metrics that supports online learning contexts.

The currently available clustering algorithms include

Additionally, with 20 internal validation metrics and 18 external validation metrics, River is currently the package with the highest number of metrics offered for data stream continuous or incremental validation.

The viewers may also be interested to visit the repositories of two packages that are merged to become River, including scikit-multiflow and creme:

In order to cite scikit-multiflow or creme, you can refer to the associated JMLR paper and the Github repository, as follows:

@article{skmultiflow,
  author  = {Jacob Montiel and Jesse Read and Albert Bifet and Talel Abdessalem},
  title   = {Scikit-Multiflow: A Multi-output Streaming Framework },
  journal = {Journal of Machine Learning Research},
  year    = {2018},
  volume  = {19},
  number  = {72},
  pages   = {1-5},
  url     = {http://jmlr.org/papers/v19/18-251.html}
}

and

@software{creme,
  title = {creme, a Python library for online machine learning},
  author = {Halford, Max and Bolmier, Geoffrey and Sourty, Raphael and Vaysse, Robin and Zouitine, Adil},
  url = {https://github.com/MaxHalford/creme},
  version = {0.6.1},
  date = {2020-06-10},
  year = {2019}
}

Source code and demos

The source code for the demo of River can be found here.

The benchmarking of River clustering algorithms are conducted locally, and can be executed using this repository.

Presentation slides and videos

The complete slide deck of Part 1 and 2 of the tutorial, in PDF, can be found here.

The complete video of the on-site tutorial will be made available by the organizers, tentatively when the conference is officially over.

In September 2022, Hoang-Anh Ngo also gave an invited talk in a seminar organized at LIAAD, INSEC TEC, University of Porto related to River and the clustering module. The slides of the talk can be found here, or within the GitHub repository for the demo of River.