Text mining as a new discipline in searching and data mining

Tanzila Saba

doi:10 .4172/2327-4581.1000411

Text mining as a new discipline in searching and data mining

Tanzila Saba

Prince Sultan University, Saudi Arabia

: J Comput Eng Inf Technol

Abstract

Indeed, today, a lot of available information and e-business data are captured in text files that are normally unstructured such as customerÃ¢Â€Â™s feedback on product/service using different social medias, bibliographic databases including authors details/customers detail, journal articles and so forth. This information is multiplying day by day and mostly is in unstructured form. Consequently, such information cannot be used positively unless and until are converted into structured format. Currently researchers focused on the knowledge discovery from huge databases, warehouses to transform unstructured text in to meaningful information. The discovery of knowledge from such database sources containing free text is called Ã¢Â€Â˜text miningÃ¢Â€Â™ rather more specific Text Data Mining. Text mining is either the discovery of texts or the exploration of texts in search of valuable, yet hidden information. Formally, text mining is a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research or investigation. Text mining applications are: Document visualization- it refers to documents clustering based on keywords such as research publications on similar issues and text analysis and understandingit refers to natural language processing techniques such as text categorization, information extraction and summarization. Text mining challenges includes: 1) Forms semantic analysis- documents semantic analysis is relatively a new trend in text data mining. To come out with refine knowledge, it is mandatory to apply semantic analysis techniques to derive a sufficiently rich representation to capture the relationship between the objects or concepts reflected in text documents; 2) multilingual text refining- multilingual data mining is basically language independent. Hence, to cluster information being extracted from multilingual sources is a fresh research area with a lot of scope. Therefore, text refining algorithms development is the need of the day, that could process multilingual text documents and produce language-independent intermediate forms; and Personalized autonomous mining- current text mining tools are not simple and need training , expertise. Future text mining tools, as part of the knowledge management systems, need simple personalization features for end users without having enough technical skills such as business executives.

Biography

Email: tanzilasaba@yahoo.com

PDF

Download