Essentially it is a grid computing framework, that can run silently in the background on user workstations or on dedicated clusters of computers. It comprises worker nodes, a distributor and a work-flow add-on. It is very versatile in that it can manage the parallel processing of computational tasks, traditional applications and native code (Python, Java and .Net).
It has a general XML-RPC interface and a Python API. One of the cool features is the work-flow add-on, Flow. Here one can define processes very complex process flows. The steps of a process can be 1 to many and many to 1 including processes made up out of a heirachy of sub processes. processes are versioned and can be stored in SVN.
The grid workers can execute any python code, either passed in the api call or accessible to the worker. Along a similar lines the grid can access Java and .net libraries. The Grid evolved in a private computing environment where there was a requirement to access hared storage devices and databases for the processing of large datasets. We quickly learned the lessons managing IO and the implications generating huge load on databases and network traffic.
It has a full audit trail of everything that happens including access to the stderr and stdout of all execution. There is a lot more technical detail that we can communicate with you directly on.
We have played with a few examples of bursting our grid into various cloud platforms like EC2, GoGrid, VMWare and Eucalyptus. It is still early days and this work is largely still in proof of concept stage.
I am building a Virtual Box virtual machine image that will be available for download some time in the near future. We can also distribute expiry versions of the grid anyone to play with or evaluate. If you are interested please leave a comment on this blog or reach me through LinkedIn or via the contact for on the ScaleFast website.
0 comments:
Post a Comment