A PIONEERING STRATEGY FOR DISCOVERING OF CRUCIAL NUGGETS FROM DATA SETS

Mallipamula Megha Sai Sree, Sdasu Shravani

Abstract


In modern times, detecting patterns as well as outliers has appeared as an imperative area of effort in field of data mining. It has quite a lot of applications together with detecting deception in business transactional data, recognize network intrusions, separating nonstandard trends in time-series information, as well as picking out mistrustful criminal activity. Distance-based measures have been used in algorithms to describe outliers or else abnormal records from regular records on the other hand, not much effort has focused on discovery of critical nuggets of information that might be hidden in data sets. In classification efforts, the most important goal is to obtain a precise representative data model that cans accurately categorize new test data instances. Critical nuggets in convinced cases might entail outliers, but this might not always be true. One can make use of an intuition that is motivated by two normally occurring situation in classification algorithms such as Points near the boundary, generally, are critical. A novel metric, the CRscore, was set up for measuring criticality of a subset or nugget. The information of critical nuggets moreover helped to diminish number of false positives as well as false negatives and, thus, considerably improve general accurateness of classification tasks. The post processing method of improving classification correctness projected in this work can also be evaluated with other techniques in field of classification algorithms.


Keywords


Critical nuggets; Patterns; False positives; Distance-based measure;

References


A. Ghoting, S. Parthasarathy, and M.E. Otey, “Fast Mining of Distance-Based Outliers in High-Dimensional Datasets,” Data Mining and Knowledge Discovery, vol. 16, no. 3, pp. 349-364, 2008.

M.M. Breunig, H.-P. Kriegel, R.T. Ng, and J. Sander, “LOF: Identifying Density-Based Local Outliers,” SIGMOD Record, vol. 29, no. 2, pp. 93-104, 2000.

L. Duan, L. Xu, Y. Liu, and J. Lee, “Cluster-Based Outlier Detection,” Annals of Operations Research, vol. 168, no. 1, pp. 151- 168, http://dx.doi.org/10.1007/s10479-008-0371-9, Apr. 2009.

N. Panda, E.Y. Chang, and G. Wu, “Concept Boundary Detection for Speeding Up SVMs,” Proc. 23rd Int’l Conf. Machine Learning (ICML), W.W. Cohen and A. Moore eds., vol. 148, pp. 681-688,2006.

P. Domingos, “Metacost: A General Method for Making Classifiers Cost-Sensitive,” Proc. Fifth Int’l Conf. Knowledge Discovery and Data Mining, pp. 155-164, 1999.

A. Frank and A. Asuncion, “UCI Machine Learning Repository,” http://archive.ics.uci .edu/ml, 2010.


Full Text: PDF

Refbacks

  • There are currently no refbacks.




Copyright © 2012 - 2023, All rights reserved.| ijitr.com

Creative Commons License
International Journal of Innovative Technology and Research is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJITR , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.