Data Mining Grid - Index page

... private
project officer

WP1: DataMiningGrid Requirements Specification and Validation

The objectives of this workpackage are to

  • develop use cases and to collect user requirements from the application areas and by consulting the user groups, including end user groups and technology users,
  • identify technical constraints as well as standards to be adopted, and
  • make a component design for the DataMiningGrid.

This will ensure that

  • the project addresses specific end user and industry needs and the generic technology can be extended for specific user needs;
  • development is in line with the major Grid initiatives by leveraging on, complementing and not reduplicating work already done in this area;
  • data mining technology is tightly integrated with Grid middleware in a generic manner;
  • pre-existing technology developed by the project Partners is harmonized – for example, e.g. if Partner A brings in tool x and Partner B tool y , and they have to interface which each other, an both are e.g. implemented in Java, then we have to make sure that they run with the same JVM, or that different tools accessing an Oracle database work with the same version of Oracle etc.

WP2: DataMiningGrid Data Services

Data sets and data sources used for data mining vary considerable in structure, size, problem solving context, background knowledge, and other statistical and technological aspects across different domains and sectors. In many scientific domains, for instance, experimental data is specifically generated with very specific analytical tasks in mind, whereas in customer-oriented businesses, the data to be analysed is often pre-existing, generated as a by-product of the organization's business processes. Data is different to streams of bits and bytes, this fact needs to be reflected in the protocols and services layers. We explicitly emphasise the fact that Grid-enabled data tools and services for data mining may not necessarily be the same as for query-oriented systems and applications. Therefore, emerging and exiting technologies for the latter class of problems (e.g. DataGrid) may or may not be suitable for this project.

WP3: DataMiningGrid Analysis Service and Workflow Editor

This workpackage has two main goals:

  • To develop DataMiningGrid Analysis services that implement data mining tasks as Grid-enabled services, rather than specific algorithms.
  • To define and realize a workflow-based framework for a seamless use of available Grid Data Mining Analysis Services (DMGAS) and to build and execute distributed data mining analyses.

The system developed in this workpackage provides a user-friendly environment for defining, organizing and executing analysis tasks (both the basic services mentioned in WP2 and WP3 and the state-of-the-art methods mentioned in WP4). The basic process consists of pre-processing (warehousing), including data access, extraction, and loading, analysis (analysis), including statistics, machine learning, and visualization, and post-processing (interpretation, validation, sharing).

WP4: DataMiningGrid Text-Mining, Ontology-Learning and Kernel Methods

While WP3 aims at developing generic services, WP4 develops specific applications based on real-world use cases to show and evaluate the usefulness of the generic framework. The methods developed in WP4 will be tested in real-world applications, whereas the realized demonstrators for these methods will be shown on public data.

The generic services and tools developed in WP3 can be used as primitives for more specialized data mining approaches. As a proof-of-concept for the validity of the generic design, a variety of cutting-edge methods will be designed, Grid enabled and implemented on top of the framework. These methods will cover a broad range of knowledge Grid services, so that the potential of data mining can be assessed on a sufficiently large sample. The methods to be developed focus on

  • distributed text mining,
  • automatic ontology construction, and
  • distributed structured Kernel methods for bioinformatics.

WP5: DataMiningGrid Middleware, Integration and Testing

The DataMiningGrid middleware will mediate between three entities: the owner of the computational or storage resource, the owner of the data, and the user of the data mining tool that processes the data. The middleware will thus have three main interfaces: First, the middleware will extensively use existing Grid toolkits such as Globus and Condor to negotiate with resources. Then, it will enable owners of data sources to connect it to the system while dictating the terms of usage in the data. Finally, it will drive the execution of mining tasks on top of the resources.

otice that the methods developed in WP3 and WP4 improve on known techniques and adapt them to the requirements of mining tools that are developed in the scope of the DataMiningGrid. In contrast, WP5 will use those and available tools, given certain requirements by the client of the mining process, and given a certain availability of resources. Thus, WP5 is actually putting things together for a certain set of mining tools in the open Grid context.

WP6: DataMiningGrid Demonstrations

The tasks in WP6 are concerned with the demonstration and testing of the DataMiningGrid components on the basis of a selected set of real-world applications from different domains. All individual demonstrators will be following the typical software development cycle for specifying the requirements, design, implementation, and testing in the context of the addressed demonstrator application. We will pool the specs and designs of all demonstrators in the common deliverables D61 and D62. The collection of actual demonstrator (software implementations), constitute the common deliverable D63.

WP7: Concertation

Concertation activities will include:

  • organization of a mini-conference in Autumn 2005 on Data Mining in Grid Computing Environments , open for registration
  • liaising with other EU projects and initiatives. Specifically, the project Partners will collaborate with:

    • EGEE
    • UniGrids for considering Unicore as an alternative for Globus,
    • K-Wf Grid on Grid data mining workflow management,
    • inteliGrid on interoperability issues,
    • CoreGRID with the "Institute on Knowledge and Data Management",
    • OntoGrid on data mining Grid ontology,
    • SIMDAT on data Grid aspects of data mining.

  • collaboration with the HealthGrid cluster.

WP8: DataMiningGrid Dissemination, Awareness and Exploitation

All Partners will participate in awareness raising activities, such as:

  • Awareness creation through promotional activities and dissemination
  • Focussed dissemination to the DataMiningGrid user group
  • Exploitation

WP9: Project Management

The DataMiningGrid project Partners are experienced with management of EU projects and in organizing national and international research projects and events. The project management will assure that all contractual, financial, legal, management and “political” issues are taken into account and acted upon. In relation to the European Commission special care will be taken to meet all deadlines, and to produce all reports in good order. In relation to the scientists, special care will be taken to minimize their paperwork, and to let them concentrate on the scientific part of the work. With regard to the Partner organizations, care will be taken to assure that all financial issues are handled timely. The project will assure that clear guidelines are given to the Partner Finance departments. The Consortium has defined these procedures as well as other internal relations, Intellectual Property Rights (IPR) etc. in a Consortium Agreement.

Updated on July 5, 2004
Disclaimer: The sole responsibility for this website is with the authors; the information published does not express the opinions of the Community or of the project partners.