O'Reilly logo
live online training icon Live Online training

Text analysis for business analytics with Python

enter image description here

Extracting insight from text data

Walter Paczkowski, Ph.D.

Social media and online reviews in the internet era have given businesses a new form of data: text. Unlike the well-structured and organized numbers-oriented data of the pre-internet era, text data is highly unstructured and chaotic, as it includes verbatim survey responses, call center logs, notes from field representatives, customer emails, logs of online chats, warranty claims, dealer technician lines, and report orders. And yet it is data: a structure can be imposed, and it can be analyzed to extract useful information and insights for decision making in areas such as new product development, customer services, and message development. The problem is that few business analysts know how to work with text data—or are overwhelmed by the many toolsets available for text analysis.

Join Expert Walter Paczkowski to learn how to work with text data to extract meaningful insights such as sentiments (positive and negative) about your products and company, opinions, product suggestions and complaints, customer misunderstandings, and competitive actions and positions. Over three hours, you’ll dive into sophisticated text processing tools and methods and discover the possibilities of text-processing software, such as Python packages.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • The unstructured nature of text data, including the concepts of a document and a corpus
  • The singular value decomposition (SVD) of a document-term matrix (DTM)
  • Python packages used for text analysis and when to use them
  • How to prepare text data for analysis, including data cleaning, stop words, and grammar inconsistencies
  • How to summarize text data using text frequency/inverse document frequency (TF/IDF) weights
  • How to extract meaning from a DTM: keywords, phrases, and topics

And you’ll be able to:

  • Impose structure on text data
  • Extract keywords, phrases, and topics with text analysis tools
  • Analyze a business text dataset for key insights using Python packages
  • Apply these techniques to business problems

This training course is for you because...

  • You’re an advanced business analyst who deals with text data.
  • Your background is largely analytical, and you want to expand both your knowledge and toolset of analytical methods.

Prerequisites

  • Familiarity with Python and the Jupyter Notebook

Recommended preparation:

Recommended follow-up:

About your instructor

  • Walter R. Paczkowski has a Ph.D. in Economics from Texas A&M University (1977). With over 40 years of extensive quantitative experience as an analyst in AT&T's Analytical Support Center, a Member of the Technical Staff at AT&T Bell Labs, head of Pricing Research at AT&T's Computer Systems division, and founder of Data Analytics Corp., he brings a wealth of knowledge to share about data analysis. His work as a market research consultant is focused on helping companies in a wide range of industries, such as telecommunications, pharmaceuticals, jewelry, food & beverages, and automotive to mention a few, to turn their market data into actionable market information. Walter is also currently on the faculty of the Department of Economics, Rutgers University (Adjunct) and was formerly with the Department of Mathematics & Statistics, The College of New Jersey (Adjunct). Walter is also the author of two analytical books: Market Data Analysis Using JMP (SAS Press, 2016) and Pricing Analytics (Routledge 2018) with a third forthcoming on quantitative methods for new product development (Routledge, 2019). You can learn more about Walter and his consulting company, Data Analytics Corp., at www.dataanalyticscorp.com.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction (10 minutes)

  • Lecture: Text data in businesses; text versus numeric data; using Python and the Python package scikit-learn for text analytics; case study—product reviews for new product development
  • Group discussion: How do you currently use text analysis in your business?

Basic text preprocessing (40 minutes)

  • Lecture: Documents, corpus, and corpora; stop words; using scikit-learn to cleanse text data: stemming, lemmatization, spell-checking, and punctuation handling; tokenizing sentences and words with scikit-learn; creating a bag of words (BOW) of product reviews
  • Group discussion: Knowledge check on key concepts
  • Hands-on exercise: Eliminate stop words, tokenize texts, and create a bag of words from product reviews
  • Q&A
  • Break (5 minutes)

Text modeling (40 minutes)

  • Lecture: Creating a document-term matrix (DTM) from a BOW with scikit-learn functions; handling sparse matrices; TF/IDF weights: the reason for weights; creating weights; weight application
  • Group discussion: What does it mean to say a DTM is sparse? Why apply an IDF to a TF? Why weight the DTM?
  • Hands-on exercises: Create a DTM using the BOW; calculate a TF/IDF set of weights and apply the weights to the DTM
  • Q&A
  • Break (5 minutes)

Text analysis (75 minutes)

  • Lecture: Word frequency counts; word clouds; extracting key phrases as n-grams; high-level overview of SVD for statistical analysis of text data; topic extraction using latent semantic analysis and latent Dirichlet allocation; overview and overuse of sentiment analysis and opinion mining
  • Group discussion: How does a word frequency count relate to a word cloud? What message or messages does the word cloud present? What message or messages does the LDA on the BOW present? How would you use sentiment analysis in your business?
  • Hands-on exercises: Use weighted DTM to create a word frequency count and word cloud; complete an LDA on the BOW for product reviews
  • Q&A

Wrap-up and Q&A (5 minutes)