Improving Efficiency In High Dimensional Data Sets

Authors

  • B. Swathi  M. Tech (CSE), Vignana Bharathi Institute of technology, Hyderabad, Telangana, India
  • P. Praveen Kumar  Assistant Professor, Vignana Bharathi institute of technology, Hyderabad, Telangana, India

Keywords:

Accuracy, Prediction algorithms, Redundancy, Q-statistic, FS, Booster

Abstract

Data Retrieving in high dimensional information with few perceptions are ending up more typical, particularly in microarray information. Amid the most recent two decades, loads of effective arrangement Flows and FS algorithms, This is higher for proposed to forecast correctness. In any case, the result of a FS algorithm with considering expectation precision can be shaky among the varieties in the preparation set, particularly with high dimensional information. This paper suggests another assessment calculation Q-statistic that consolidates the solidness of the chose include subset notwithstanding the forecast precision. At that point and the future of the Booster of a FS algorithm that lifts the estimation of Q-statisticof the calculation connected. Observational investigations demonstrate that Booster helped in the estimation of the Q-statistic as well as the expectation exactness of the calculation connected unless the informational index is characteristically hard to anticipate with the given algorithm.

References

  1. K.M .Ting, J.R. Wells,S.C Tan, S.W.Teng, and G.I Webb,"Feature -subspace aggregating: Ensembles for stable and unstable learners,"Mach.Learn.,vol.82, no.3,pp.375-397,2011.
  2. D. Aha and D. Kibler, "Instance-based learning algorithms," Mach. Learn., vol. 6, no. 1, pp. 37-66, 1991.
  3. S. Alelyan, "On feature selection stability: A data perspective," PhD dissertation, Arizona State Univ., Tempe, AZ, USA, 2013.
  4. A. A. Alizadeh, M. B. Eisen, R. E. Davis, C. M. Izidore, S. Lossos, A. Rosenwald, J. C. Boldrick, H. Sabet, T. Tran, X. Yu, J. I. Powell, L. Yang, G. E. Marti, T. Moore, J. H. Jr, L. Lu, D. B. Lewis, R. Tibshirani, G. Sherlock, W. C. Chan, T. C. Greiner, D. D. Weisenburger, J. O. Armitage, R. Warnke, R. Levy, W. Wilson, M. R. Grever, J. C. Byrd, D. Botstein, P. O. Brown, and L. M. Staudt, "Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling," Nature, vol. 403, no. 6769, pp. 503-511, 2000.
  5. L.I.Kuncheva,"A,stabilityindexforfeatureselection,"inProcArtif.Intell.Appl,pp.421-427,2007.
  6. F. Alonso-Atienza, J. L. Rojo-Alvare, A. Rosado-Mu~noz, J. J. Vinagre, A. Garcia-Alberola, and G. Camps-Valls, "Feature selection using support vector machines and bootstrap methods for ventricular fibrillation detection," Expert Syst. Appl., vol. 39, no. 2, pp. 1956-1967, 2012.
  7. P. J. Bickel and E. Levina, "Some theory for Fisher’s linear discriminant function, naive Bayes, and some alternatives when there are many more variables than observations," Bernoulli, vol. 10, no. 6, pp. 989-1010, 2004.
  8. Z. I. Botev, J. F. Grotowski, and D. P. Kroese, "Kernel density estimation via diffusion," The Ann. Statist., vol. 38, no. 5, pp. 2916-2957, 2010.
  9. G. Brown, A. Pocock, M. J. Zhao, and M. Lujan, "Conditional likelihood maximization: A unifying framework for information theoretic feature selection," J. Mach. Learn. Res., vol. 13, no. 1, pp. 27-66, 2012.
  10. C. Kamath, Scientific data mining: a practical perspective, Siam, 2009.
  11. G.H.John,R.Kohavi,andK.Pfleger,"Irrelevant features and the subset selection problem",in Proc.11th Int.Conf.Mach.Learn.,vol.94, pp.121-129, 1994.
  12. C. Corinna and V. Vapnik, "Support-vector networks," Mach. Learn., vol. 20, no. 3, pp. 273-297, 1995.
  13. T. M. Cover and J. A. Thomas, Elements of Information Theory (Series in Telecommunications and Signal Processing), 2nd ed. Hoboken, NJ, USA: Wiley, 2002.
  14. D. Dembele, "A flexible microarray data simulataion model," Microarrays, vol. 2, no. 2, pp. 115-130, 2013.
  15. D. Dernoncourt, B. Hanczar, and J. D. Zucker, "Analysis of feature selection stability on high dimension and small sample data," Comput. Statist.Data Anal., vol. 71, pp. 681-693, 2014.
  16. J. Fan and Y. Fan, "High dimensional classification using features annealed independence rules," Ann. Statist., vol. 36, no. 6, pp. 2605-2637, 2008.

Downloads

Published

2018-04-30

Issue

Section

Research Articles

How to Cite

[1]
B. Swathi, P. Praveen Kumar, " Improving Efficiency In High Dimensional Data Sets, IInternational Journal of Scientific Research in Computer Science, Engineering and Information Technology(IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 2, pp.88-93, January-February-2018.