Text Mining Pathology and Radiology Records To Habitually Classify Against Disease : Computing The Control of Linking Data Sources

Dr. P. Radha; B. Meena Preethi

doi:10.32628/CSEIT1833781

Authors

Dr. P. Radha Assistant Professor, PG & Research Department of Computer Science, Government Arts College, Coimbatore, India
B. Meena Preethi Assistant Professor, Department of BCA and M.Sc.SS, Sri Krishna Arts and Science College, Coimbatore, India

Keywords:

Health Informatics, Support Vector Machine, Handling Skewed Data

Abstract

Text data mining, equivalent to text analytics is the course of action of deriving high-quality information from text. Text and data mining slot in recreation of obtaining insights from Health and Hospital Information Systems. Text mining system used for detecting admissions noticeable as positive for numerous diseases: Lung Cancer, Breast Cancer, Colon Cancer, Secondary Malignant Neoplasm of Respiratory and Digestive Organs, Multiple Myeloma and Malignant Plasma Cell Neoplasm, Pneumonia, and Pulmonary Embolism. Text mining explicitly inspects the effect of relating several data sources on text classification performance. Vector Machine classifiers for eight information resource combinations, and estimate using the metrics of Precision, Recall and F-Score. Sub-sampling techniques are used to address unbalanced datasets of medical records Radiology reports used as an initial data resource and add other sources, such as pathology reports and patient and hospital admission data, sequentially evaluate the research inquiry concerning the impact of the value of multiple data sources. Statistical significance is measured using the Wilcoxon signed-rank test. A subsequent set of experiments explores aspects of the system in greater depth, focusing on Lung Cancer. These tests tender improved understanding of how to optimum apply text classification in the context of imbalanced data of changeable completeness. Radiology questions plus patient and hospital admission data contribute valuable information for detecting most of the diseases, significantly improving performance when added to radiology reports alone or to the combination of radiology and pathology reports. The preference of the majority efficient combination of data sources depends on the precise disease to be classified. An approach whereby reports are electronically received and automatically processed, abstracted and analyzed has the potential to support expert clinical coders in their decision-making and assist with improving accuracy in data recording. Improving the cancer notifications process would provide significant benefits to oncology service providers, health administrators, clinicians and patients. The ultimate aim is to develop an automated system that can be trained to detect a new condition by having an expert in that condition analyse and annotate data directly.

References

G. Hripcsak, J.H.M. Austin, P.O. Alderson, et al., Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports, Radiology 224 (2002) 157–163, http://dx.doi.org/ 10.1148/radiol.2241011118.
A.N. Nguyen, M.J. Lawley, D.P. Hansen, et al., Symbolic rule-based classification of lung cancer stages from free-text pathology reports, J. Am. Med. Inform. Assoc.17(2010)440 445, http://dx.doi.org/10.1136/jamia.2010.003707.
A. Nguyen, J. Moore, G. Zuccon, et al., Classification of pathology reports for cancer registry notifications, Stud. Health Technol. Inform. 178 (2012) 150– 156. <http://www.ncbi.nlm.nih.gov/pubmed/22797034>.
A. Coden, G. Savova, I. Sominsky, et al., Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model, J. Biomed. Inform. 42 (2009) 937–949, http://dx.doi. org/10.1016/j.jbi.2008.12.005.
M. Tanenblatt, A. Coden, I. Sominsky, The ConceptMapper approach to named entity recognition, Proc. Seventh Conf. Int. Lang. Resour. Eval. Lr. (2010) 546–551.
J. Sorace, D.R. Aberle, D. Elimam, et al., Integrating pathology and radiology disciplines: an emerging opportunity?, BMC Med 10 (2012) 100, http://dx.doi.org/10.1186/1741-7015-10-100.
S. Kocbek, L. Cavedon, D. Martinez, et al., Evaluating classification power of linked admission data sources with text mining, in: R. Piskac, (Ed.), Annual Conference in Big Data in Health analytics (David Hansen 20 October 2015 to 21 October 2015). vol. 1468, 2015, pp. 1–7. <http://ceur-ws.org/Vol-1468/ bd2015_kocbek.pdf>.
C. Bain, C. Mac Manus, Advancing data management and usage in a major Australian health service, in: 2014 International Conference on Data Science & Engineering (ICDSE), IEEE, 2014, http://dx.doi.org/10.1109/ ICDSE.2014.6974609.
Online reference, Vic. Addit. to Aust. Coding Stand. Eff. 1July2015.<https://www2.health.vic.gov.au/Api/downloadmedia/{AFF11A9F-85A0-401C-ABE6-819167D9EC0A}> (accessed 1 Apr 2016).
I. Spasic , J. Livsey, J.A. Keane, et al., Text mining of cancer-related information: review of current status and future directions, Int. J. Med. Inform. 83 (2014) 605–623, http://dx.doi.org/10.1016/j.ijmedinf.2014.06.009.
C. Drummond, R.C. Holte, C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling, in: Work Learn from Imbalanced Datasets II, 2003, pp. 1–8, doi: 10.1.1.68.6858.
B.X. Wang, N. Japkowicz, Boosting support vector machines for imbalanced data sets, Knowl. Inf. Syst. 25 (2009) 1–20, http://dx.doi.org/10.1007/s10115- 009-0198-y.
A.R. Aronson, Metamap: Mapping Text to the umls metathesaurus, Bethesda MD NLM NIH DHHS, 2006, pp. 1–26.<http://0-skr.nlm.nih.gov.library. law.suffolk.edu/papers/references/metamap06.pdf>.
A. Sokolov, C. Funk, K. Graim, et al., Combining heterogeneous data sources for accurate functional annotation of proteins, BMC Bioinformatics 14 (Suppl 3) (2013) S10, http://dx.doi.org/10.1186/1471-2105-14-S3-S10.
G.H.G. John, P. Langley, Estimating continuous distributions in Bayesianclassifiers, Proc. Elev. Conf. Uncertain Artif. Intell. Montr. Quebec, Canada 1(1995) 338–345. doi: 10.1007/b13634.

Text Mining Pathology and Radiology Records To Habitually Classify Against Disease : Computing The Control of Linking Data Sources

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite