Complex distributed database and Grid system such as OGSA-DAI/OGSA-DM has need for monitoring workflow execution and measuring of the performace of different parameters.
Monitoring distributed systems, usually includes four stages: (i) generation of events, sensors enquiring entities and encoding the measurements according to a given schema; (ii) processing of generated events is application-specific and may take place during any stage of the monitoring process, typical examples include filtering according to some predefined criteria, or summarizing a group of events (i.e., computing the average); (iii) distribution refers to the transmission of the events from the source to any interested parties; (iv) finally, presentation typically involves some further processing so that the overwhelming number of received events will be provided in a series of abstractions in order to enable an end-user to draw conclusions about the operation of the monitored system.
On Figure bellow is shown general architecture for workflow monitoring which is consist of server and client components. The idea of workflow monitoring is to measure start/end of each experiment (workflow execution) and each activity inside the workflow. This data are stored in Hawkeye database. We provide several tools to analyse them, and some of this data are later used in optimization of workflow execution.
The second part of this system is performance measuring of distributed nodes in the system. We created Round Robin database in which we store historycal data for selected parameters such as CPU usage, RAM memory usage, disk usage, or data from other sensors connected to the system. These data are also analyzed and used for optimization of workflow execution.