As you may know, I work a lot on a python flow-programming framework named PyF.

We just released its 2.0 version, and I thought it may be a good occasion to present it here (I used Mathieu’s article as a basis for this presentation):

PyF is a Python framework for writing highly scalable data processing, data mining applications, and more. PyF is Free software, distributed under the terms of the MIT license.

To achieve the scalability, PyF is based on flow programming: instead of processing « a certain quantity of data », we process a « flow » of data, so that at any point, we only ever have one object in memory, no matter how much data we will process in total. That’s right, mining your huge customer database and generating reports with PyF will not take your servers down to their knees.

To achieve this, we use Python generators (no need for python extensions like stackless):

Each unit of the whole processing chain takes a generator as input and yields values as soon as they were processed. We could even handle a never ending flow of input data and keep processing them, yielding each one after the other!

Now, down to the details, PyF is composed of several layers:

  • At the low level, you have only the basic subset of core functions that will help you write flow-based applications.
  • At the mid level, you can run your processes in your application, using a wide range of plugins (or writing your own)
  • At the highest level though, you will find a full-blown web application that allows you to graphically design your processing chain (we call it a tube) by dragging and dropping processing units (we call them components) and chaining them, output to input. We have several default generic components that can be used to do all sorts of processing and reporting already, and it is pretty easy to write your own if necessary (we will gladly help in any case). We even have a built-in scheduler so you can specify when to automatically launch your processes!

We wrote a simple tutorial to get PyF, and a series of tutorials to actually make your first steps.

We already have some documentation which should be more than enough to get you started, although we are working on making it more comprehensive.

About the new things in 2.0, one of the most exciting is the addition of multiprocessing: you can move any node in your tube to a separate process with a simple checkbox.

If you have any question, come hang out on our mailing-list or our IRC channel:

Oh, and of course, visit our website: