Detecting Salient features and Summarizing Health Review using Latent Dirichlet Analysis

Authors(2) :-Mozibur Raheman Khan, Rajkumar Kannan

Review means that “To examine something carefully especially before making decision or judgments”. Health consumers especially for health service providers author health reviews. Since the number of reviews are enormous, hence there is need to summarize these reviews. In this paper, we propose a simple approach to select the interesting topics of health consumers discuss when reviewing their health providers online. Our approach does not rely on any manual tagging of the information, and operates on the text of online reviews. We analyze a large set of reviews and find out the topics discussed when reviewing providers with different specialties. The health-rating information is based on the sentiment-classification result. The condensed descriptions of health reviews are generated from the feature-based summarization. We propose a novel approach based on Latent Dirichlet Analysis (LDA) to identify health features. Furthermore, we find a way to reduce the size of summary based on the health features obtained from LDA.

Authors and Affiliations

Mozibur Raheman Khan
Department of Computer Science, Bishop Heber College (Autonomous), Tiruchirappalli, India
Rajkumar Kannan
Department of Computer Science, Bishop Heber College (Autonomous), Tiruchirappalli, India

Health Consumers, Health Features, Latent Dirichlet Analysis, Natural Language Processing (NLP), Text Analysis, Text Mining

  1. W. Chou, Y. Hunt, E Beckjord, R Moser, and B. Hesse." Social media use in the United States: Implications for health communication". J MedInternet Res, 11(4):e48, 2009.
  2. L Frostholm, P Fink, E Oernboel, K Christensen, T. Toft, F Olesen, and J .Weinman. "The uncertain consultation and patient satisfaction: The impact of patients' illness perceptions and a randomized controlled trial on the training of physicians' communication skills". Psychosomatic Medicine,67:897–905, 2005.
  3. H Rubin, B Gandek, W Rogers, M Kosinski, C McHorney, and J Ware. Patients' ratings of outpatient visits in different practice settings results from the medical outcomes study. JAMA,270(7):835–840, 1993.
  4. Press Ganey Associates. Medical practice pulse report: Patient perspectives on American health care. Med Practice PulseReport.pdf, 2009.
  5. S O'Brien and E Peterson. Identifying high quality hospitals: Consult the ratings or flip a coin? Arch Intern Med, 167(13):1342–1344,2007.
  6. B Hughes, I Joshi, and J Wareham. Health 2.0 and Medicine 2.0: Tensions and controversies in the field. J Med Internet Res, 10(3):e23, 2008.
  7. G Eysenbach. Medicine 2.0: social networking, collaboration, participation, apomediation, and openness. J Med Internet Res, 10(3):e22, 2008.
  8. G Eysenbach. From intermediation to disintermediation and apomediation: new models for consumers to access and assess the credibility of health information in the age of Web 2.0. Stud Health Technol Inform, 129(Pt 1):162–166,2007.
  9. B Pang and L Lee. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2):1–135, 2008.
  10. Kim, Hyun Duk, et al. "Comprehensive review of opinion summarization." (2011).
  11. Ge, Wang, Pu Pengbo, and Liang Yongquan. "Feature Extraction and Opinion Summarization in Chinese Reviews." Open Automation and Control Systems Journal 7 (2015): 533-539.
  12. LU, Y., Zhai, C., and Sundaresan, N. 2009. Rated aspect summarization of short comments. In WWW '09: Proceedings of the 18th international conference on World wide web. ACM, New York, NY, USA, 131–140.
  13. Archak, N., Ghose, A., and Ipeirotis, P. G. 2007. Show me the money!: deriving the pricing power of product features by mining consumer reviews. In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, NY, USA, 56–65.
  14. Zhuang, L., Jing, F., and Zhu, X.-Y. 2006. Movie review mining and summarization. In CIKM '06:Proceedings of the 15th ACM international conference on Information and knowledge management. ACM, New York, NY, USA, 43–50.
  15. KU, L.-W., Liang, Y.-T., and Chen, H.-H. 2006. Opinion extraction, summarization and tracking in news and blog corpora. In AAAI Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW).100–107.
  16. Popescu, A.-M. and etzioni, O. 2005. Extracting product features and opinions from reviews. In HLT '05:Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Morristown, NJ, USA, 339–346.
  17. Hu, M. and Liu, B. 2004a. Mining and summarizing customer reviews. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York,NY, USA, 168–177.
  18. Hu, M. and Liu, B. 2004b. Mining opinion features in customer reviews. In AAAI'04: Proceedings of the 19th national conference on Artifical intelligence. AAAI Press, 755–760
  19. B. Pang, L. Lee, and S.Vaithyanathan, "Thumbs up?: Sentiment classification using machine learning techniques," in Proc. ACL-02 Conf. Empirical Methods Natural Lang. Process., 2002, pp. 79–86.
  20. B. Pang and L. Lee, "Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales," in Proc. 43rd Annu.Meet. Assoc. Comput. Linguist, Morristown, NJ: Assoc.  Comput.  Linguist.,2005, pp. 115–124.
  21. A.B. Goldberg and X. Zhu, "seeing stars when there aren't many stars: Graph-based semi-supervised learning for sentiment categorization," inProc. TextGraphs: First Workshop Graph Based Methods Nat. Lang.Process, Morristown, NJ: Assoc. Comput. Linguist., 2006, pp. 45-52.
  22. B. Snyder and R. Barzilay, "Multiple aspect ranking using the good griefalgorithm," in Proc. HLT-NAACL, 2007, pp. 300–307.
  23. Hummel, R. A. and Zucker, S. W. 1987. On the foundations of relaxation labeling processes. 585–605. J., M., Y., Z., Y., G., AND H., Y. 1982. tong2yi4ci2ci2lin2. Shanghai Dictionary Press.
  24. Turney, P. D. and Littman, M. L. 2003. Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. Inf. Syst. 21, 4, 315–346.
  25. V. Hatzivassiloglou and K. R. McKeown, "Predicting the semantic orientation of adjectives," in Proc. 8th Conf. Eur. Chap. Assoc. Comput.Linguist., Morristown, NJ: Assoc. Comput. Linguist., 1997, pp. 174–181.
  26. A. Esuli and F. Sebastiani, "Determining the semantic orientation of terms through gloss classification," in Proc. 14th ACM Int. Conf. Inf. Knowl.Manage., 2005, pp. 617–624.
  27. Hu, M. and Liu, B. 2006. Opinion extraction and summarization on the web. In AAAI'06: proceedings of the21st national conference on Artificial intelligence. AAAI Press, 1621–1624.
  28. Liu, B., Hu, M., and  Cheng, J. 2005. Opinion observer: analyzing and comparing opinions on the web. In WWW '05: Proceedings of the 14th international conference on World Wide Web. ACM, New York, NY, USA,342–351.
  29. Titov, I. and  Mcdonald, R. 2008. Modeling online reviews with multi-grain topic models. In WWW '08:Proceeding of the 17th international conference on World Wide Web. ACM, New York, NY, USA, 111–120.
  30. Mei, Q., Ling, X., Wondra, M., Su, H., and Zhai, C. 2007. Topic sentiment mixture: modeling facets and opinions in weblogs. In WWW '07: Proceedings of the 16th international conference on World Wide Web.ACM, New York, NY, USA, 171–180.
  31. D Blei, A Ng, and M Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research,3:993–1022.
  32. I Bhattacharya and L Getoor. A latent dirichlet model for unsupervised entity resolution. Proc.SIAM International Conference on Data Mining.2006.
  33. X Wei and W Croft. LDA-based document models for ad-hoc retrieval. Proc. of the ACM SIGIR conference. pp. 178-185. 2006.
  34. L Fei-Fei and P Perona. A Bayesian hierarchical model for learning natural scene categories. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2005.
  35. I.Titov and R McDonald.  A joint model of text and aspect ratings for sentiment summarization.Proc. of the Conference of the Association for Computational Linguistics (ACL). pp 308–316.2008.
  36. S Brody and N Elhadad. An unsupervised aspect-sentimentmodel for online reviews. Proc.of the Conference of the North American Chapter of the Association for computational Linguistics(NAACL-HLT), pp 804–812, 2010.
  37. E Levine and E Domany. Resampling method for unsupervised estimation of cluster validity. Neural Comput., 13(11):2573–2593, 2001.
  38. ZY Niu, DH Ji, and CL Tan. I2R: Three systems for word sense discrimination, Chinese word sense disambiguation, and English word sense disambiguation. Proc. of the International Workshop on Semantic Evaluations (SemEval), pp.177–182, 2007.
  39. M. Hu and B. Liu, "Mining and summarizing customer reviews," in Proc.10th CMSIGKDD Int. Conf.Knowl. Discov.DataMining, 2004, pp. 168–177.
  40. L. Zhuang, F. Jing, and X.-Y. Zhu, "Movie review mining and summarization,"in Proc. 15th ACM Int. Conf. Inf. Knowl. Manage., 2006, pp. 43–50.

Publication Details

Published in : Volume 3 | Issue 3 | March-April 2018
Date of Publication : 2018-04-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 506-522
Manuscript Number : CSEIT1833154
Publisher : Technoscience Academy

ISSN : 2456-3307

Cite This Article :

Mozibur Raheman Khan, Rajkumar Kannan, "Detecting Salient features and Summarizing Health Review using Latent Dirichlet Analysis ", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 3, Issue 3, pp.506-522, March-April-2018.
Journal URL :

Article Preview