Natural Language Annotation for Machine Learning_ A Guide to Corpus-... [Pustejovsky & Stubbs 2012-11-04].pdf

(5773 KB) Pobierz
Natural Language Annotation for
Machine Learning
James Pustejovsky and Amber Stubbs
Natural Language Annotation for Machine Learning
by James Pustejovsky and Amber Stubbs
Copyright © 2013 James Pustejovsky and Amber Stubbs. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/
institutional sales department: 800-998-9938 or
corporate@oreilly.com.
Editors:
Julie Steele and Meghan Blanchette
Production Editor:
Kristen Borg
Copyeditor:
Audrey Doyle
Proofreader:
Linley Dolby
Indexer:
WordCo Indexing Services
Cover Designer:
Randy Comer
Interior Designer:
David Futato
Illustrator:
Rebecca Demarest
October 2012:
First Edition
Revision History for the First Edition:
2012-10-10
First release
See
http://oreilly.com/catalog/errata.csp?isbn=9781449306663
for release details.
Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly
Media, Inc.
Natural Language Annotation for Machine Learning,
the image of a cockatiel, and related trade
dress are trademarks of O’Reilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as
trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trade­
mark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher and authors assume
no responsibility for errors or omissions, or for damages resulting from the use of the information contained
herein.
ISBN: 978-1-449-30666-3
[LSI]
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1. The Basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
The Importance of Language Annotation
The Layers of Linguistic Description
What Is Natural Language Processing?
A Brief History of Corpus Linguistics
What Is a Corpus?
Early Use of Corpora
Corpora Today
Kinds of Annotation
Language Data and Machine Learning
Classification
Clustering
Structured Pattern Induction
The Annotation Development Cycle
Model the Phenomenon
Annotate with the Specification
Train and Test the Algorithms over the Corpus
Evaluate the Results
Revise the Model and Algorithms
Summary
1
3
4
5
8
10
13
14
21
22
22
22
23
24
27
29
30
31
31
33
34
35
41
41
42
iii
2. Defining Your Goal and Dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Defining Your Goal
The Statement of Purpose
Refining Your Goal: Informativity Versus Correctness
Background Research
Language Resources
Organizations and Conferences
Zgłoś jeśli naruszono regulamin