Wendelin - Open Source Big Data Platform
Wendelin platform is part of an ongoing research project Nexedi is
leading. The goal is to develope the technological framework for a
open source Big Data platform "Made in France". Wendelin will integrate
libraries widely used in the data science community for data collection, analysis
computation and visualization. The research project also comprises development of
first prototype applications in the automative and green energy sector as it's
purpose is to provide a ready-to-use solution applicable in different industrial
One Stack To Rule Them All
The Wendelin stack is written in 100% Python. On the base layer, SlapOS
handles configuration, deployment and management of all components running on the Wendelin stack.
Distributed storage is provided by
NEO while ERP5 is used
as platform to connect the various libraries, store data, provide a connecting
user interface plus enable the creation of web-based Big Data applications up to
integration of more complex business processes ("Convergence Ready"). At the
heart of Wendelin is "Wendelin Core", component that will provide
computation capabilities allowing Wendelin based stacks to go beyond the limits
imposed by available RAM in a cluster of machines. On top of this stack different
libraries will be integrated - most importantly Scikit-Learn for machine
Juypter which was the topic for the current release.
New Features in 0.5
After focussing on easing installation in the last release, Wendelin 0.5 is all
Juypter into Wendelin. As with scikit, the idea was to provide a familiar
interface to get started quickly without having to understand everything that
happens under the hood from the get-go. You can check out the iPython Notebook
pictured below here.
Besides integrating Juypter we have also switched to using Wendelin.Core 0.5 by
default which includes the new ZBLk1 block storage which stores multiple objects
in the database (distributed) compared to ZBlk0 (more info/performance tests).
We have also switched to setting up Wendelin in a "cluster mode". Instead
of initially running on a single node, Wendelin can now be setup on a cluster of
Zope nodes (the smallest cluster just having a single node), allowing parallel
code execution through activities (this is one of the features provided and managed
by ERP5 sitting under Wendelin).
Finally, some work has gone into improving the wendelin-standalone script that
installs all components required in a VM. The VM has also been switched to setup
with auto-restart enabled, so that a reboot also updates all components to their
latest versions. All efforts here are going into providing easy to use tools
for someone to evaluate, test and develop on Wendelin.
If you want to try your hands at Wendelin, the website now has a
Developer Documentation which includes a detailed Installation Guide along with
instructions for configuration instructions.
There is also a sample Jupyter demo showing how to work
with Juypter on Wendelin.
A look ahead
Two major topics are scheduled for Wendelin 0.6.
We will provide means to track data ingestion and status of analytics being run.
This will include finding a way to describe what data is being ingested, for
example by adding a sensor/data source description and using Data Supply
to mimic real examples. This will most likely be handled using a variation
of ERP5 business processes along with trade_state.
We will wrap up Wendelin.Core version 1 and move on version 2, which means
implementation on the file system for speed and simplicity. This will take
quite a bit of time so don't expect to see Wendelin 0.6 before a couple of
months. The new release will then also include a performance comparison
showing how Wendelin.Core 0.6 is superior to 0.5.
On the sideline we will also work on SlapOS to provide containerization as
it seems like this way of doing things seems to be moving from bandwagon to
becoming a standard process which should also be available through SlapOS.