Natural Language Processing: Python and NLTK

Name: Natural Language Processing: Python and NLTK | Learn to build expert NLP and machine learning projects using NLTK and other Python libraries
Brand: Packt Publishing Limited
Price: 77.99 EUR
Availability: OnlineOnly

Learn to build expert NLP and machine learning projects using NLTK and other Python libraries

Deepti Chopra Jacob Perkins Iti Mathur Nisheeth Joshi Nitin Hardeniya(Author)

Packt Publishing Limited

1st Edition

Published on 15. April 2025

702 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-78728-784-6 (ISBN)

€77.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Persons

Chopra Deepti :
Deepti Chopra is an Assistant Professor at Banasthali University. Her primary area of research is computational linguistics, Natural Language Processing, and artificial intelligence. She is also involved in the development of MT engines for English to Indian languages. She has several publications in various journals and conferences and also serves on the program committees of several conferences and journals.Perkins Jacob :
Jacob Perkins is the cofounder and CTO of Weotta, a local search company. Weotta uses NLP and machine learning to create powerful and easy-to-use natural language search for what to do and where to go. He is the author of Python Text Processing with NLTK 2.0 Cookbook, Packt Publishing, and has contributed a chapter to the Bad Data Handbook, O'Reilly Media. He writes about NLTK, Python, and other technology topics at http://streamhacker.com. To demonstrate the capabilities of NLTK and natural language processing, he developed http://text-processing.com, which provides simple demos and NLP APIs for commercial use. He has contributed to various open source projects, including NLTK, and created NLTK-Trainer to simplify the process of training NLTK models. For more information, visit https://github.com/japerk/nltk-trainer.Mathur Iti :
Iti Mathur is an Assistant Professor at Banasthali University. Her areas of interest are computational semantics and ontological engineering. Besides this, she is also involved in the development of MT engines for English to Indian languages. She is one of the experts empaneled with TDIL program, Department of Electronics and Information Technology (DeitY), Govt. of India, a premier organization that oversees Language Technology Funding and Research in India. She has several publications in various journals and conferences and also serves on the program committees and editorial boards of several conferences and journals.Joshi Nisheeth :
Nisheeth Joshi is an associate professor and a researcher at Banasthali University. He has also done a PhD in Natural Language Processing. He is an expert with the TDIL Program, Department of IT, Government of India, the premier organization overseeing language technology funding and research in India. He has several publications to his name in various journals and conferences, and also serves on the program committees and editorial boards of several conferences and journals.Hardeniya Nitin :
Nitin Hardeniya is a data scientist with more than 4 years of experience working with companies such as Fidelity, Groupon, and [24]7-inc. He has worked on a variety of business problems across different domains. He holds a master's degree in computational linguistics from IIIT-H. He is the author of 5 patents in the field of customer experience. He is passionate about language processing and large unstructured data. He has been using Python for almost 5 years in his day-to-day work. He believes that Python could be a single-point solution to most of the problems related to data science. He has put on his hacker's hat to write this book and has tried to give you an introduction to all the sophisticated tools related to NLP and machine learning in a very simplified form. In this book, he has also provided a workaround using some of the amazing capabilities of Python libraries, such as NLTK, scikit-learn, pandas, and NumPy.

Content

Cover
Copyright
Credits
Preface
Table of Contents
Module 1: NLTK Essentials
Chapter 1: Introduction to Natural Language Processing
Why learn NLP?
Let's start playing with Python!
Diving into NLTK
Your turn
Summary
Chapter 2: Text Wrangling and Cleansing
What is text wrangling?
Text cleansing
Sentence splitter
Tokenization
Stemming
Lemmatization
Stop word removal
Rare word removal
Spell correction
Your turn
Summary
Chapter 3: Part of Speech Tagging
What is Part of speech tagging
Named Entity Recognition (NER)
Your Turn
Summary
Chapter 4: Parsing Structure in Text
Shallow versus deep parsing
The two approaches in parsing
Why we need parsing
Different types of parsers
Dependency parsing
Chunking
Information extraction
Summary
Chapter 5: NLP Applications
Building your first NLP application
Other NLP applications
Summary
Chapter 6: Text Classification
Machine learning
Text classification
Sampling
The Random forest algorithm
Text clustering
Topic modeling in text
References
Summary
Chapter 7: Web Crawling
Web crawlers
Writing your first crawler
Data flow in Scrapy
The Sitemap spider
The item pipeline
External references
Summary
Chapter 8: Using NLTK with Other Python Libraries
NumPy
SciPy
pandas
matplotlib
External references
Summary
Chapter 9: Social Media Mining in Python
Data collection
Data extraction
Geovisualization
Summary
Chapter 10: Text Mining at Scale
Different ways of using Python on Hadoop
NLTK on Hadoop
Scikit-learn on Hadoop
PySpark
Summary
Module 2: Python 3 Text Processing with NLTK 3 Cookbook
Chapter 1: Tokenizing Text and WordNet Basics
Introduction
Tokenizing text into sentences
Tokenizing sentences into words
Tokenizing sentences using regular expressions
Training a sentence tokenizer
Filtering stopwords in a tokenized sentence
Looking up Synsets for a word in WordNet
Looking up lemmas and synonyms in WordNet
Calculating WordNet Synset similarity
Discovering word collocations
Chapter 2: Replacing and Correcting Words
Introduction
Stemming words
Lemmatizing words with WordNet
Replacing words matching regular expressions
Removing repeating characters
Spelling correction with Enchant
Replacing synonyms
Replacing negations with antonyms
Chapter 3: Creating Custom Corpora
Introduction
Setting up a custom corpus
Creating a wordlist corpus
Creating a part-of-speech tagged word corpus
Creating a chunked phrase corpus
Creating a categorized text corpus
Creating a categorized chunk corpus reader
Lazy corpus loading
Creating a custom corpus view
Creating a MongoDB-backed corpus reader
Corpus editing with file locking
Chapter 4: Part-of-speech Tagging
Introduction
Default tagging
Training a unigram part-of-speech tagger
Combining taggers with backoff tagging
Training and combining ngram taggers
Creating a model of likely word tags
Tagging with regular expressions
Affix tagging
Training a Brill tagger
Training the TnT tagger
Using WordNet for tagging
Tagging proper names
Classifier-based tagging
Training a tagger with NLTK-Trainer
Chapter 5: Extracting Chunks
Introduction
Chunking and chinking with regular expressions
Merging and splitting chunks with regular expressions
Expanding and removing chunks with regular expressions
Partial parsing with regular expressions
Training a tagger-based chunker
Classification-based chunking
Extracting named entities
Extracting proper noun chunks
Extracting location chunks
Training a named entity chunker
Training a chunker with NLTK-Trainer
Chapter 6: Transforming Chunks and Trees
Introduction
Filtering insignificant words from a sentence
Correcting verb forms
Swapping verb phrases
Swapping noun cardinals
Swapping infinitive phrases
Singularizing plural nouns
Chaining chunk transformations
Converting a chunk tree to text
Flattening a deep tree
Creating a shallow tree
Converting tree labels
Chapter 7: Text Classification
Introduction
Bag of words feature extraction
Training a Naive Bayes classifier
Training a decision tree classifier
Training a maximum entropy classifier
Training scikit-learn classifiers
Measuring precision and recall of a classifier
Calculating high information words
Combining classifiers with voting
Classifying with multiple binary classifiers
Training a classifier with NLTK-Trainer
Chapter 8: Distributed Processing and Handling Large Datasets
Introduction
Distributed tagging with execnet
Distributed chunking with execnet
Parallel list processing with execnet
Storing a frequency distribution in Redis
Storing a conditional frequency distribution in Redis
Storing an ordered dictionary in Redis
Distributed word scoring with Redis and execnet
Chapter 9: Parsing Specific Data Types
Introduction
Parsing dates and times with dateutil
Timezone lookup and conversion
Extracting URLs from HTML with lxml
Cleaning and stripping HTML
Converting HTML entities with BeautifulSoup
Detecting and converting character encodings
Appendix: Penn Treebank Part-of-speech Tags
Module 3: Mastering Natural Language Processing with Python
Chapter 1: Working with Strings
Tokenization
Normalization
Substituting and correcting tokens
Applying Zipf's law to text
Similarity measures
Summary
Chapter 2: Statistical Language Modeling
Understanding word frequency
Applying smoothing on the MLE model
Develop a back-off mechanism for MLE
Applying interpolation on data to get mix and match
Evaluate a language model through perplexity
Applying metropolis hastings in modeling languages
Applying Gibbs sampling in language processing
Summary
Chapter 3: Morphology - Getting Our Feet Wet
Introducing morphology
Understanding stemmer
Understanding lemmatization
Developing a stemmer for non-English language
Morphological analyzer
Morphological generator
Search engine
Summary
Chapter 4: Parts-of-Speech Tagging - Identifying Words
Introducing parts-of-speech tagging
Creating POS-tagged corpora
Selecting a machine learning algorithm
Statistical modeling involving the n-gram approach
Developing a chunker using pos-tagged corpora
Summary
Chapter 5: Parsing - Analyzing Training Data
Introducing parsing
Treebank construction
Extracting Context Free Grammar (CFG) rules from Treebank
Creating a probabilistic Context Free Grammar from CFG
CYK chart parsing algorithm
Earley chart parsing algorithm
Summary
Chapter 6: Semantic Analysis - Meaning Matters
Introducing semantic analysis
Generation of the synset id from Wordnet
Disambiguating senses using Wordnet
Summary
Chapter 7: Sentiment Analysis - I Am Happy
Introducing sentiment analysis
Summary
Chapter 8: Information Retrieval - Accessing Information
Introducing information retrieval
Vector space scoring and query operator interaction
Developing an IR system using latent semantic indexing
Text summarization
Question-answering system
Summary
Chapter 9: Discourse Analysis - Knowing Is Believing
Introducing discourse analysis
Summary
Chapter 10: Evaluation of NLP Systems - Analyzing Performance
The need for evaluation of NLP systems
Evaluation of IR system
Metrics for error identification
Metrics based on lexical matching
Metrics based on syntactic matching
Metrics using shallow semantic matching
Summary
Biblography

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Natural Language Processing: Python and NLTK

Description

More details

Persons

Content

System requirements