Improving Efficiency In High Dimensional Data Sets

Authors(2) :-B. Swathi, P. Praveen Kumar

Data Retrieving in high dimensional information with few perceptions are ending up more typical, particularly in microarray information. Amid the most recent two decades, loads of effective arrangement Flows and FS algorithms, This is higher for proposed to forecast correctness. In any case, the result of a FS algorithm with considering expectation precision can be shaky among the varieties in the preparation set, particularly with high dimensional information. This paper suggests another assessment calculation Q-statistic that consolidates the solidness of the chose include subset notwithstanding the forecast precision. At that point and the future of the Booster of a FS algorithm that lifts the estimation of Q-statisticof the calculation connected. Observational investigations demonstrate that Booster helped in the estimation of the Q-statistic as well as the expectation exactness of the calculation connected unless the informational index is characteristically hard to anticipate with the given algorithm.

Authors and Affiliations

B. Swathi
M. Tech (CSE), Vignana Bharathi Institute of technology, Hyderabad, Telangana, India
P. Praveen Kumar
Assistant Professor, Vignana Bharathi institute of technology, Hyderabad, Telangana, India

Accuracy, Prediction algorithms, Redundancy, Q-statistic, FS, Booster

  1. K.M .Ting, J.R. Wells,S.C Tan, S.W.Teng, and G.I Webb,"Feature -subspace aggregating: Ensembles for stable and unstable learners,"Mach.Learn.,vol.82, no.3,pp.375-397,2011.
  2. D. Aha and D. Kibler, "Instance-based learning algorithms," Mach. Learn., vol. 6, no. 1, pp. 37-66, 1991.
  3. S. Alelyan, "On feature selection stability: A data perspective," PhD dissertation, Arizona State Univ., Tempe, AZ, USA, 2013.
  4. A. A. Alizadeh, M. B. Eisen, R. E. Davis, C. M. Izidore, S. Lossos, A. Rosenwald, J. C. Boldrick, H. Sabet, T. Tran, X. Yu, J. I. Powell, L. Yang, G. E. Marti, T. Moore, J. H. Jr, L. Lu, D. B. Lewis, R. Tibshirani, G. Sherlock, W. C. Chan, T. C. Greiner, D. D. Weisenburger, J. O. Armitage, R. Warnke, R. Levy, W. Wilson, M. R. Grever, J. C. Byrd, D. Botstein, P. O. Brown, and L. M. Staudt, "Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling," Nature, vol. 403, no. 6769, pp. 503-511, 2000.
  5. L.I.Kuncheva,"A,stabilityindexforfeatureselection,"inProcArtif.Intell.Appl,pp.421-427,2007.
  6. F. Alonso-Atienza, J. L. Rojo-Alvare, A. Rosado-Mu~noz, J. J. Vinagre, A. Garcia-Alberola, and G. Camps-Valls, "Feature selection using support vector machines and bootstrap methods for ventricular fibrillation detection," Expert Syst. Appl., vol. 39, no. 2, pp. 1956-1967, 2012.
  7. P. J. Bickel and E. Levina, "Some theory for Fisher’s linear discriminant function, naive Bayes, and some alternatives when there are many more variables than observations," Bernoulli, vol. 10, no. 6, pp. 989-1010, 2004.
  8. Z. I. Botev, J. F. Grotowski, and D. P. Kroese, "Kernel density estimation via diffusion," The Ann. Statist., vol. 38, no. 5, pp. 2916-2957, 2010.
  9. G. Brown, A. Pocock, M. J. Zhao, and M. Lujan, "Conditional likelihood maximization: A unifying framework for information theoretic feature selection," J. Mach. Learn. Res., vol. 13, no. 1, pp. 27-66, 2012.
  10. C. Kamath, Scientific data mining: a practical perspective, Siam, 2009.
  11. G.H.John,R.Kohavi,andK.Pfleger,"Irrelevant features and the subset selection problem",in Proc.11th Int.Conf.Mach.Learn.,vol.94, pp.121-129, 1994.
  12. C. Corinna and V. Vapnik, "Support-vector networks," Mach. Learn., vol. 20, no. 3, pp. 273-297, 1995.
  13. T. M. Cover and J. A. Thomas, Elements of Information Theory (Series in Telecommunications and Signal Processing), 2nd ed. Hoboken, NJ, USA: Wiley, 2002.
  14. D. Dembele, "A ?exible microarray data simulataion model," Microarrays, vol. 2, no. 2, pp. 115-130, 2013.
  15. D. Dernoncourt, B. Hanczar, and J. D. Zucker, "Analysis of feature selection stability on high dimension and small sample data," Comput. Statist.Data Anal., vol. 71, pp. 681-693, 2014.
  16. J. Fan and Y. Fan, "High dimensional classi?cation using features annealed independence rules," Ann. Statist., vol. 36, no. 6, pp. 2605-2637, 2008.

Publication Details

Published in : Volume 3 | Issue 2 | January-February 2018
Date of Publication : 2018-04-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 88-93
Manuscript Number : CSEIT1831512
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

B. Swathi, P. Praveen Kumar, "Improving Efficiency In High Dimensional Data Sets", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 2, pp.88-93, January-February.2018
URL : http://ijsrcseit.com/CSEIT1831512

Follow Us

Contact Us