
Web Content Mining with Java
Techniques for Exploiting the World's Biggest Information Resource
Tony Loton(Author)
Wiley (Publisher)
1st Edition
Published on 28. March 2002
Book
Paperback/Softback
XXII, 306 pages
978-0-470-84311-6 (ISBN)
Description
Unlock the potential of the world's biggest database.
This practical book shows you how to build portals, construct search engines and other knowledge-based applications to mine the information you need from the Web.
* Written by a developer for developers
* A practical, hands-on approach
* Illustrates how Java associated tools (XML, HTML) can be combined with database technology to display and manipulate Web-derived information more effectively.
* Demonstrates how to build a structure browser, portal, meta-search engine and how to make 'Talking Pages'
This practical book shows you how to build portals, construct search engines and other knowledge-based applications to mine the information you need from the Web.
* Written by a developer for developers
* A practical, hands-on approach
* Illustrates how Java associated tools (XML, HTML) can be combined with database technology to display and manipulate Web-derived information more effectively.
* Demonstrates how to build a structure browser, portal, meta-search engine and how to make 'Talking Pages'
Reviews / Votes
"When I got this book, I couldn t put it down. A lot of computer books sit on the shelf or send me to sleep, but not this one. Not only is it both topical and useful, but it hits a just about ideal balance between code and food for thought. The author has a real knack for useful solutions to complex problems." (www. Java Ranch 17 May 2002)More details
Product info
PB
Edition
1., Auflage
Language
English
Place of publication
Chichester
United Kingdom
Publishing group
John Wiley and Sons Ltd
Target group
College/higher education
Professional and scholarly
Dimensions
Height: 23.5 cm
Width: 18.8 cm
Thickness: 1.9 cm
Weight
602 gr
ISBN-13
978-0-470-84311-6 (9780470843116)
Schweitzer Classification
Person
Tony Loton, LOTONTech Ltd, Middlewich, UK
Tony Loton launched LOTONtech as a vehicle for researching and developing innovative software solutions. He developed the WebDataKit: a Java 2 solution comprising an API and a Structured Query Language designed specifically for the automatic extraction of HTML and XML from web sources. Tony's early Java web mining ideas have been featured previously as a case study contribution to "Professional Java Data programming" (Wrox Press). This book takes the ideas much further, with brand new material.
Tony Loton launched LOTONtech as a vehicle for researching and developing innovative software solutions. He developed the WebDataKit: a Java 2 solution comprising an API and a Structured Query Language designed specifically for the automatic extraction of HTML and XML from web sources. Tony's early Java web mining ideas have been featured previously as a case study contribution to "Professional Java Data programming" (Wrox Press). This book takes the ideas much further, with brand new material.
Content
Preface.
About the Author.
Acknowlegements.
Surveying the Scene
Language of the Web
HTML and XML Parsing
Data Filters and Structured Queries
Building a Portal with Java
Building a Search Engine with Java
Mail Mining with Java
Introduction to Text Mining
Introduction of Data Mining
Loose Ends and Looking Ahead
Appendix A: Software Installation and Configuration
Appendix B: Javadoc Extracts
Appendix C: Earlier Versions of JAXP
Appendix D: License and Copyright Statements
Appendix E: Census 1891Data XML
Appendix F: Share Price Cluster Data
Appendix G: Glossary of Acronyms
References
Further Reading
Index
About the Author.
Acknowlegements.
Surveying the Scene
Language of the Web
HTML and XML Parsing
Data Filters and Structured Queries
Building a Portal with Java
Building a Search Engine with Java
Mail Mining with Java
Introduction to Text Mining
Introduction of Data Mining
Loose Ends and Looking Ahead
Appendix A: Software Installation and Configuration
Appendix B: Javadoc Extracts
Appendix C: Earlier Versions of JAXP
Appendix D: License and Copyright Statements
Appendix E: Census 1891Data XML
Appendix F: Share Price Cluster Data
Appendix G: Glossary of Acronyms
References
Further Reading
Index