
Application of Graph Rewriting to Natural Language Processing
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Content
- Cover
- Half-Title Page
- Title Page
- Copyright Page
- Contents
- Introduction
- 1. Programming with Graphs
- 1.1. Creating a graph
- 1.2. Feature structures
- 1.3. Information searches
- 1.3.1. Access to nodes
- 1.3.2. Extracting edges
- 1.4. Recreating an order
- 1.5. Using patterns with the GREW library
- 1.5.1. Pattern syntax
- 1.5.2. Common pitfalls
- 1.6. Graph rewriting
- 1.6.1. Commands
- 1.6.2. From rules to strategies
- 1.6.3. Using lexicons
- 1.6.4. Packages
- 1.6.5. Common pitfalls
- 2. Dependency Syntax: Surface Structure and Deep Structure
- 2.1. Dependencies versus constituents
- 2.2. Surface syntax: different types of syntactic dependency
- 2.2.1. Lexical word arguments
- 2.2.2. Modifiers
- 2.2.3. Multiword expressions
- 2.2.4. Coordination
- 2.2.5. Direction of dependencies between functional and lexical words
- 2.3. Deep syntax
- 2.3.1. Example
- 2.3.2. Subjects of infinitives, participles, coordinated verbs and adjectives
- 2.3.3. Neutralization of diatheses
- 2.3.4. Abstraction of focus and topicalization procedures
- 2.3.5. Deletion of functional words
- 2.3.6. Coordination in deep syntax
- 3. Graph Rewriting and Transformation of Syntactic Annotations in a Corpus
- 3.1. Pattern matching in syntactically annotated corpora
- 3.1.1. Corpus correction
- 3.1.2. Searching for linguistic examples in a corpus
- 3.2. From surface syntax to deep syntax
- 3.2.1. Main steps in the SSQ_to_DSQ transformation
- 3.2.2. Lessons in good practice
- 3.2.3. The UD_to_AUD transformation system
- 3.2.4. Evaluation of the SSQ_to_DSQ and UD_to_AUD systems
- 3.3. Conversion between surface syntax formats
- 3.3.1. Differences between the SSQ and UD annotation schemes
- 3.3.2. The SSQ to UD format conversion system
- 3.3.3. The UD to SSQ format conversion system
- 4. From Logic to Graphs for Semantic Representation
- 4.1. First order logic
- 4.1.1. Propositional logic
- 4.1.2. Formula syntax in FOL
- 4.1.3. Formula semantics in FOL
- 4.2. Abstract meaning representation (AMR)
- 4.2.1. General overview of AMR
- 4.2.2. Examples of phenomena modeled using AMR
- 4.3. Minimal recursion semantics, MRS
- 4.3.1. Relations between quantifier scopes
- 4.3.2. Why use an underspecified semantic representation?
- 4.3.3. The RMRS formalism
- 4.3.4. Examples of phenomenon modeling in MRS
- 4.3.5. From RMRS to DMRS
- 5. Application of Graph Rewriting to Semantic Annotation in a Corpus
- 5.1. Main stages in the transformation process
- 5.1.1. Uniformization of deep syntax
- 5.1.2. Determination of nodes in the semantic graph
- 5.1.3. Central arguments of predicates
- 5.1.4. Non-core arguments of predicates
- 5.1.5. Final cleaning
- 5.2. Limitations of the current system
- 5.3. Lessons in good practice
- 5.3.1. Decomposing packages
- 5.3.2. Ordering packages
- 5.4. The DSQ_to_DMRS conversion system
- 5.4.1. Modifiers
- 5.4.2. Determiners
- 6. Parsing Using Graph Rewriting
- 6.1. The Cocke-Kasami-Younger parsing strategy
- 6.1.1. Introductory example
- 6.1.2. The parsing algorithm
- 6.1.3. Start with non-ambiguous compositions
- 6.1.4. Revising provisional choices once all information is available
- 6.2. Reducing syntactic ambiguity
- 6.2.1. Determining the subject of a verb
- 6.2.2. Attaching complements found on the right of their governors
- 6.2.3. Attaching other complements
- 6.2.4. Realizing interrogatives and conjunctive and relative subordinates
- 6.3. Description of the POS_to_SSQ rule system
- 6.4. Evaluation of the parser
- 7. Graphs, Patterns and Rewriting
- 7.1. Graphs
- 7.2. Graph morphism
- 7.3. Patterns
- 7.3.1. Pattern decomposition in a graph
- 7.4. Graph transformations
- 7.4.1. Operations on graphs
- 7.4.2. Command language
- 7.5. Graph rewriting system
- 7.5.1. Semantics of rewriting
- 7.5.2. Rule uniformity
- 7.6. Strategies
- 8. Analysis of Graph Rewriting
- 8.1. Variations in rewriting
- 8.1.1. Label changes
- 8.1.2. Addition and deletion of edges
- 8.1.3. Node deletion
- 8.1.4. Global edge shifts
- 8.2. What can and cannot be computed
- 8.3. The problem of termination
- 8.3.1. Node and edge weights
- 8.3.2. Proof of the termination theorem
- 8.4. Confluence and verification of confluence
- Appendix: Mathematical Tools and Notation
- Bibliography
- Index
- Other titles from iSTE in Cognitive Science and Knowledge Management
- EULA
1
Programming with Graphs
In this chapter, we shall discuss elements of programming for graphs. Our work is based on PYTHON, a language widely used for natural language processing, as in the case of the NLTK library1 (Natural Language ToolKit), used in our work. However, the elements presented here can easily be translated into another language. Several different data structures may be used to manage graphs. We chose to use dictionaries; this structure is elementary (i.e. unencapsulated), reasonably efficient and extensible. For what follows, we recommend opening an interactive PYTHON session2.
Notes for advanced programmers: by choosing such a primitive structure, we do not have the option to use sophisticated error management mechanisms. There is no domain (or type) verification, no identifier verification, etc. Generally speaking, we shall restrict ourselves to the bare minimum in this area for reasons of time and space. Furthermore, we have chosen not to use encapsulation so that the structure remains as transparent as possible. Defining a class for graphs should make it easier to implement major projects. Readers are encouraged to take a more rigorous approach to that presented here after reading the book. Finally, note that the algorithms used here are not always optimal; once again, our primary aim is to improve readability.
Notes for "beginner" programmers: this book is not intended as an introduction to PYTHON, and we presume that readers have some knowledge of the language, specifically with regard to the use of lists, dictionaries and sets.
The question will be approached from a mathematical perspective in Chapter 7, but for now, we shall simply make use of an intuitive definition of graphs. A graph is a set of nodes connected by labeled edges. The nodes are also labeled (with a phonological form, a feature structure, a logical predicate, etc.). The examples of graphs used in this chapter are dependency structures, which simply connect words in a sentence using syntactic functions. The nodes in these graphs are words (W1, W2, ., W5 in the example below), the edges are links (suj, obj, det) and the labels on the nodes provide the phonology associated with each node. We shall consider the linguistic aspects of dependency structures in Chapter 2.
Note that it is important to distinguish between nodes and their labels. This enables us to differentiate between the two occurrences of "the" in the graph above, corresponding to the two nodes W1 and W4.
In what follows, the nodes in the figures will not be named for ease of reading, but they can be found in the code in the form of strings: 'W1', 'W2', etc.
1.1. Creating a graph
A graph is represented using a dictionary. Let us start with a graph with no nodes or edges.
g = dict () The nodes are dictionary keys. The value corresponding to key v is a pair (a, sucs) made up of a label a and of the list sucs of labeled edges starting from v. Let us add a node 'W1' labeled "the" to g:
g['W1'] = ('the', []) Now, add a second and a third node, with the edges that connect them:
g['W2'] = ('child', []) g['W3'] = ('plays', []) g['W3'][1].append (('suj', 'W2')) g['W2'][1].append (('det', 'W1')) and print the result:
g {'W1': ('the', []), 'W2': ('child', [('det', 'W1')]), 'W3': ('plays', [('suj', 'W2')])} The last box shows the output from the PYTHON interpreter. This graph is represented in the following form:
Let us return to the list of successors of a node. This is given in the form of a list of pairs (e, t), indicating the label e of the edge and the identifier t of the target node. In our example, the list of successors of node 'W2' is given by g['W2'][1]. It contains a single pair ('det', 'W1') indicating that the node 'W1' corresponding to "the" is the determiner of 'W2', i.e. the common noun "child".
In practice, it is easier to use construction functions:
def add_node(g, u, a): #Add a node u labeled a in graph g g[u] = (a, []) def add_edge(g, u, e, v): # Add an edge labeled e from u to v in graph g if (e, v) not in g[u][1]: g[u][1].append((e, v)) This may be used as follows:
add_node(g, 'W4', 'the') add_node(g, 'W5', 'fool') add_edge(g, 'W3', 'obj', 'W5') add_edge(g, 'W5', 'det', 'W4') to construct the dependency structure of the sentence "the child plays the fool":
Let us end with the segmentation of a sentence into words. This is represented as a flat graph, connecting words in their order; we add an edge, 'SUC', between each word and its successor. Thus, for the sentence "She takes a glass", we obtain:
The following solution integrates the NLTK segmenter:
import nltk word_list = nltk.word_tokenize("She takes a glass") word_graph = dict () for i in range(len(word_list)): add_node(word_graph, 'W%s' % i, word_list[i]) for i in range(len(word_list) - 1): add_edge(word_graph, 'W%s' % i, 'SUC', 'W%s' % (i + 1)) word_graph {'W3': ('glass', []), 'W1': ('takes', [('SUC', 'W2')]), 'W2': ('a', [('SUC', 'W3')]), 'W0': ('She', [('SUC', 'W1')])} Readers may wish to practice using the two exercises as follows.
EXERCISE 1.1.- Finish constructing the following flat graph so that there is a 'SUC*' edge between each word and one of its distant successors. For example, the chain "She takes a glass" will be transformed as follows:
EXERCISE 1.2.- Write a function to compute a graph in which all of the edges have been reversed. For example, we go from the dependency structure of "the child plays the fool" to:
1.2. Feature structures
So far, node labels have been limited to their phonological form, i.e. a string of characters. Richer forms of structure, namely feature structures, may be required. Once again, we shall use a dictionary:
fs_plays = {'phon' : 'plays', 'cat' : 'V'} The fs_plays dictionary designates a feature structure with two features, 'phon' and 'cat', with respective values 'plays' and 'V'. To find the category 'cat' of the feature structure fs_plays, we...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.