This paper introduces a brand new and powerful decision support tool, data mining, in the context of knowledge management. Among other things, the most striking features of data mining techniques are clustering and prediction. The clustering aspect of data mining offers comprehensive characteristics analysis of students, while the predicting function estimates the likelihood for a variety of outcomes of them, such as transferability, persistence, retention and success in classes. Compared to traditional analytical studies that are often hindsight and aggregate, data mining is forward looking and is oriented to individual students. A real life project presents the work of data mining in predicting the possibility of returning to school for every student currently enrolled at a community college in Silicon Valley. The project applies neural network, C&RT and C5.0 to choose the best prediction followed by a clustering analysis using TwoStep. The list of students who are predicted as less likely to return to school by data mining is then turned over to faculty and management for direct or indirect intervention. The benefits of data mining are its ability to gain deeper understanding of the patterns previously unseen using current available reporting capabilities. Further, prediction from data mining allows the college an opportunity to act before a student drops out or to plan for resource allocation with confidence gained from knowing how many students will transfer or take a particular course.
Data mining is the process of extracting patterns from data. Data mining is becoming an increasingly important tool to transform this data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery. Data mining can be used to uncover patterns in data but is often carried out only on samples of data. The mining process will be ineffective if the samples are not a good representation of the larger body of data. Data mining cannot discover patterns that may be present in the larger body of data if those patterns are not present in the sample being "mined". Inability to find patterns may become a cause for some disputes between customers and service providers. Therefore data mining is not fool proof but may be useful if sufficiently representative data samples are collected. The discovery of a particular pattern in a particular set of data does not necessarily mean that a pattern is found elsewhere in the larger data from which that sample was drawn. An important part of the process is the verification and validation of patterns on other samples of data. The related terms data dredging, data fishing and data snooping refer to the use of data mining techniques to sample sizes that are (or may be) too small for statistical inferences to be made about the validity of any patterns discovered (see also data-snooping bias). Data dredging may, however, be used to develop new hypotheses, which must then be validated with sufficiently large sample sets.
ITH the continuous development of database technology and the extensive applications of database management system, the data volume stored in database increases rapidly and in the large amounts of data much important information is hidden. If the information can be extracted from the database they will create a lot of potential profit for the companies, and the technology of mining information from the massive database is known as data mining. Data mining tools can forecast the future trends and activities to support the decision of people. For example, through analyzing the whole database system of the company the data mining tools can answer the problems such as “Which customer is most likely to respond to the e-mail marketing activities of our company, why”, and other similar problems. Some data mining tools can also resolve some traditional problems which consumed much time, this is because that they can rapidly browse the entire database and find some useful information experts unnoticed.
Data mining commonly involves four classes of tasks:
Classification - Arranges the data into predefined groups. For example an email program might attempt to classify an email as legitimate or spam. Common algorithms include Decision Tree Learning, Nearest neighbor, naive Bayesian classification and Neural network.
Clustering - Is like classification but the groups are not predefined, so the algorithm will try to group similar items together.
Regression - Attempts to find a function which models the data with the least error.
Association rule learning - Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.
See also structured data analysis
Data Mining is a process through which one can extract valuable knowledge from a large database. The necessity for the development of data mining evolved due to the immense and quick growth of the volume of stored corporate data. Ordinary querying methods could no longer produce results showing hidden patterns in such vast amounts of data. Using advanced methods derived from artificial intelligence, pattern recognition and statistics, data mining can construct a comprehensively descriptive model on input data. The data model can be produced in various forms and serves the purpose of describing and predicting behaviour of the data
On the other hand, teaching and administrative activities that must be carried out during different education processes, usually rest on performing data analysis techniques in order to reveal key concepts and their relationships. Data mining techniques are best suited for extracting from data such key concepts along with their relationships. In this paper, we investigate the application of data mining techniques within the education framework, aiming at interesting the reader in such an idea, presenting some practical cases. The rest of the paper is organized as follows. The data mining process is described briefly in next subsection. Application areas of data mining, concerning education processes is described in the following section. The paper ends with a case study and some concluding remarks.