home / results / events / news / downloads / links / contacts

  • Objectives

The project involves five main Partner organizations and runs for a total of two years. The projectís main objectives are structured into four main phases or milestones (all of them have already been completed):

  • Specification and validation of data-mining-aware grid tools and interfaces to be developed by the project (due date: February 2005),
  • Early implementation of a mock-up prototype featuring some of the more critical aspects of the project (architecture, middleware, data-mining-aware grid data access interfaces) (due date: August 2005),
  • Delivery of middleware-integrated components (due date: April 2006), and finally
  • A fully evaluated set DataMiningGrid components, tools, interfaces and application demonstrators from different application scenarios (due date: August 2006).

In order to address a wide range of requirements arising from the need and context to mine data in distributed computing environments, the project developed a test bed consisting, among other things, of various demonstrator applications. Demonstrators from biology and medicine will address data-mining problems requiring a data-mining-aware access of distributed and very large databases (e.g., molecular dynamics unfolding simulation data) and the construction of compute-intensive predictive models. Demonstrators in the automotive industry and other text-mining scenarios need to mine large and inherently distributed text repositories (e.g., car repair protocols, customer relationship management data). Another demonstrator will directly mine data arising from logs produced by grid computing middleware. To meet these requirements, the following critical technology components will be developed by the project:

  • A workflow manager and editor that will facilitate the composition, execution and management of complex data mining workflows in grid environments;
  • Data Mining Application Enabler whic is used to grid enable existing data mining applications and upload them in the grid environment;
  • Grid-enabled data access and integration services that will allow the users to identify (locate), access, integrate and interface distributed data sources and grid-enabled data mining programs in a flexible way;
  • A resource broker and information services which will execute data mining applications in grid environment;

Updated on June 3, 2008