Future and emerging complex problem-solving environments are characterized by two important features:
- Increasing amounts of digital data and,
- Rising demands for co-ordinated resource sharing across geographically widely dispersed sites.
The effective and efficient management and use of increasing amunts of stored data and in particular the transformation of these data into information and knowledge, is considered a key requirement. Data mining (also known as knowledge discovery in databases) is the technology addressing this information need. Data mining technology is used for the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. However, this field has mainly been developed for largely homogeneous and localized computing environments. These assumptions are increasingly not met in modern scientific and industrial complex-problem solving environments, which are more an more relying on the sharing of geographically dispersed computing resources. Next-generation grid technologies are promising to provide the necessary infrastructure facilitating a seamless sharing of computing resources in complex problem-solving environments. This shift to large-scale distributed computing has profound implications for the way data are analysed. Grid computing promises to be capable of addressing the changing computing requirements of future distributed data-mining environments.
Currently there exists no coherent framework for developing and deploying data-mining applications on the grid. The DataMiningGrid project will address this gap by developing generic and sector-independent data mining tools and services for the grid. Grid interfaces will be developed allowing data mining tools to operate in a distributed grid computing environment. Therefore a major innovation in the project comes from distributing knowledge acquisition. Both for technical reasons – e.g. bandwidth – and for organizational reasons – privacy and security – it is impossible to bring all data to a centralized place. Therefore the aim of the project is to upgrade data mining technologies in such a way that makes traditional knowledge discovery approaches distributed. Data mining is upgraded to the working and organizational habit of the new knowledge workers.
A test bed consisting of several real-world applications from a diverse set of sectors will serve as a platform for demonstrating and promoting the technology developed by the DataMiningGrid. The philosophy of the project's demonstrations is presented in the following picture:
The importance of data mining in many differnet sectors suggests that the project's impact will be significant as it will be an important step towards more effective and efficient exploitation of available data and information resources. In the long run, the impact of the project will contribute to new business and R & D opportunities in the European market. The project will also contribute to standardisation efforts of grid and datamining technologies. Because of its relevance across so many sectors, the DataMiningGrid project has the potential to improve the sharing and exploitation of information in Europe and could consequently help to enhance the quality of life in Europe.
Updated on January 13, 2005