Text Mining and Analytics

Start Date: 07/05/2020

Course Type: Common Course

Course Link: https://www.coursera.org/learn/text-mining

About Course

This course will cover the major techniques for mining and analyzing text data to discover interesting patterns, extract useful knowledge, and support decision making, with an emphasis on statistical approaches that can be generally applied to arbitrary text data in any natural language with no or minimum human effort. Detailed analysis of text data requires understanding of natural language text, which is known to be a difficult task for computers. However, a number of statistical approaches have been shown to work well for the "shallow" but robust analysis of text data for pattern finding and knowledge discovery. You will learn the basic concepts, principles, and major algorithms in text mining and their potential applications.

Course Syllabus

During this module, you will learn the overall course design, an overview of natural language processing techniques and text representation, which are the foundation for all kinds of text-mining applications, and word association mining with a particular focus on mining one of the two basic forms of word associations (i.e., paradigmatic relations).

Coursera Plus banner featuring three learners and university partner logos

Course Introduction

Text Mining and Analytics Welcome to Text Mining and Analytics, the first course in the Data Science Specialization from Coursera. Over 350 000 Lessons with over 400 different data scientists and engineers taking part over the course 4 weeks. This course will introduce you to Text Mining and its most important analytical questions, while providing you with a strong conceptual base to mine large text databases for text data. This course is one part of a larger specialization as Data Science Specialization 3 focuses on more advanced techniques to mine larger text datasets. The goal of this Specialization is to learn and practice the basic techniques for extracting meaningful information from text data by mining large text databases.Week 1 Week 2 Week 3 Week 4 The Tools of Text Mining This course explains how text data is processed and linked to related text files. It covers the basic tools of text mining including how documents are sorted, the different file formats for each service, and basic information about document types. Upon completing this course, you will be able to: 1. Describe the document types 2. Link text data to text files 3. Compute the information needed to extract meaningful information from a document 4. Manipulate data to extract meaningful information from a document 5. Find objects in a document using a practical tool 6. Solve problems in a text mining application 7. Review basic information in a document such

Course Tag

Data Clustering Algorithms Text Mining Probabilistic Models Sentiment Analysis

Related Wiki Topic

Article Example
Text mining The term text analytics describes a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation. The term is roughly synonymous with text mining; indeed, Ronen Feldman modified a 2000 description of "text mining" in 2004 to describe "text analytics." The latter term is now used more frequently in business settings while "text mining" is used in some of the earliest application areas, dating to the 1980s, notably life-sciences research and government intelligence.
Text mining Text mining, also referred to as text data mining, roughly equivalent to text analytics, is the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling ("i.e.", learning relations between named entities).
Text mining Text mining is starting to be used in marketing as well, more specifically in analytical customer relationship management. Coussement and Van den Poel (2008) apply it to improve predictive analytics models for customer churn (customer attrition). Text mining is also being applied in stock returns prediction.
Text mining Text analysis involves information retrieval, lexical analysis to study word frequency distributions, pattern recognition, tagging/annotation, information extraction, data mining techniques including link and association analysis, visualization, and predictive analytics. The overarching goal is, essentially, to turn text into data for analysis, via application of natural language processing (NLP) and analytical methods.
Text mining Text mining computer programs are available from many commercial and open source companies and sources. See List of text mining software.
Text mining Subtasks—components of a larger text-analytics effort—typically include:
Text mining Hearst's 1999 statement of need fairly well describes the state of text analytics technology and practice a decade later.
Fractal Analytics Fractal Analytics is based on Artificial Intelligence and Machine Learning techniques. Their products include Customer Genomics; Trial Run, a cloud-based business platform; text mining suite: dCrypt; and Centralized Analytics Environment (CAE), a collaborative workbench built on the KNIME Server.
Text mining The term text analytics also describes that application of text analytics to respond to business problems, whether independently or in conjunction with query and analysis of fielded, numerical data. It is a truism that 80 percent of business-relevant information originates in unstructured form, primarily text. These techniques and processes discover and present knowledge – facts, business rules, and relationships – that is otherwise locked in textual form, impenetrable to automated processing.
Text mining Labor-intensive manual text mining approaches first surfaced in the mid-1980s, but technological advances have enabled the field to advance during the past decade. Text mining is an interdisciplinary field that draws on information retrieval, data mining, machine learning, statistics, and computational linguistics. As most information (common estimates say over 80%) is currently stored as text, text mining is believed to have a high commercial potential value.
Text mining Yet as management information systems developed starting in the 1960s, and as BI emerged in the '80s and '90s as a software category and field of practice, the emphasis was on numerical data stored in relational databases. This is not surprising: text in "unstructured" documents is hard to process. The emergence of text analytics in its current form stems from a refocusing of research in the late 1990s from algorithm development to application, as described by Prof. Marti A. Hearst in the paper Untangling Text Data Mining:
Text mining One online text mining application in the biomedical literature is PubGene that combines biomedical text mining with network visualization as an Internet service.
Text mining Until recently, websites most often used text-based searches, which only found documents containing specific user-defined words or phrases. Now, through use of a semantic web, text mining can find content based on meaning and context (rather than just by a specific word). Additionally, text mining software can be used to build large dossiers of information about specific people and events. For example, large datasets based on data extracted from news reports can be built to facilitate social networks analysis or counter-intelligence. In effect, the text mining software may act in a capacity similar to an intelligence analyst or research librarian, albeit with a more limited scope of analysis. Text mining is also used in some email spam filters as a way of determining the characteristics of messages that are likely to be advertisements or other unwanted material. Text mining plays an important role in determining financial market sentiment.
Text mining Many text mining software packages are marketed for security applications, especially monitoring and analysis of online plain text sources such as Internet news, blogs, etc. for national security purposes. It is also involved in the study of text encryption/decryption.
Analytics Analytics is multidisciplinary. There is extensive use of mathematics and statistics, the use of descriptive techniques and predictive models to gain valuable knowledge from data—data analysis. The insights from data are used to recommend action or to guide decision making rooted in business context. Thus, analytics is not so much concerned with individual analyses or analysis steps, but with the entire methodology. There is a pronounced tendency to use the term "analytics" in business settings e.g. text analytics vs. the more generic text mining to emphasize this broader perspective. . There is an increasing use of the term "advanced analytics", typically used to describe the technical aspects of analytics, especially in the emerging fields such as the use of machine learning techniques like neural networks to do predictive modeling.
Text mining By contrast to Europe, the flexible nature of US copyright law, and in particular fair use, means that text mining in America, as well as other fair use countries such as Israel, Taiwan and South Korea, is viewed as being legal. As text mining is transformative, meaning that it does not supplant the original work, it is viewed as being lawful under fair use. For example, as part of the Google Book settlement the presiding judge on the case ruled that Google's digitisation project of in-copyright books was lawful, in part because of the transformative uses that the digitisation project displayed—one such use being text and data mining.
Text mining Academic institutions have also become involved in the text mining initiative:
Text mining A range of text mining applications in the biomedical literature has been described.
Text mining Text mining methods and software is also being researched and developed by major firms, including IBM and Microsoft, to further automate the mining and analysis processes, and by different firms working in the area of search and indexing in general as a way to improve their results.
Text mining The issue of text mining is of importance to publishers who hold large databases of information needing indexing for retrieval. This is especially true in scientific disciplines, in which highly specific information is often contained within written text. Therefore, initiatives have been taken such as Nature's proposal for an Open Text Mining Interface (OTMI) and the National Institutes of Health's common Journal Publishing Document Type Definition (DTD) that would provide semantic cues to machines to answer specific queries contained within text without removing publisher barriers to public access.