
A Corpus Based Approach to Generalising a Chatbot System
Applying Simple Natural Language Processing Techniques to Build Knowledge Base of ALICE Chatbot System
Bayan Abu Shawar(Author)
LAP Lambert Academic Publishing
Published on 4. October 2011
Book
Paperback/Softback
184 pages
978-3-8443-8706-3 (ISBN)
Description
Chatbot tools are computer programs which interact with users using natural languages. This thesis shows that chatbot technology could be used in many different ways in addition to being a tool for having fun. A chatbot could be used as a tool to learn or to study a new language; a tool to access an information system, a tool to visualise the contents of a corpus; and a tool to give answers to questions in a specific domain. Instead of being restricted to a specific domain or written language, a chatbot could be trained with any text in any language. Some of the differences between real human conversations and human-chatbot dialogues are presented. A Java program has been developed to read a text from a machine readable text (corpus) and convert it to ALICE chatbot format language (AIML). The program was built to be general, the generality in this respect implies, no restrictions on specific language, domain, or structure. Different languages were tested: English, Arabic, Afrikaans, French, and Spanish. At the same time different corpora structure were used: dialogue, monologue, and structured text.
More details
Language
English
Place of publication
Germany
Product notice
Paperback (trade)
Unsewn / adhesive bound
Dimensions
Height: 220 mm
Width: 150 mm
Thickness: 12 mm
Weight
292 gr
ISBN-13
978-3-8443-8706-3 (9783844387063)
Copyright in bibliographic data and cover images is held by Nielsen Book Services Limited or by the publishers or by their respective licensors: all rights reserved.
Schweitzer Classification
Person
Bayan Abu Shawar is an assistant professor at Arab Open Univeristy, Jordan. She obtained her PhD from the School of Computing at University of Leeds in 2005. Her research interests are: natural language processing, information retrieval, e-learning and learning management systems