Corpora in Language Acquisition Research

Name: Corpora in Language Acquisition Research | History, methods, perspectives
Brand: John Benjamins Publishing Company
Price: 105.99 EUR
Availability: OnlineOnly

History, methods, perspectives

Heike Behrens(Editor)

John Benjamins Publishing Company

1st Edition

Published on 9. April 2008

XXX, 234 pages

E-Book

PDF with Adobe-DRM

System requirements

978-90-272-9026-7 (ISBN)

€105.99incl. 7% vat

System requirements

for PDF with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Person

Content

Corpora in Language Acquisition Research
Editorial page
Title page
LCC data
Table of contents
List of contributors
Preface
Corpora in language acquisition research: History, methods, perspectives
1. Introduction
2. Building child language corpora: Sampling methods
2.1 Longitudinal data
2.1.1 Diaries
2.1.2 Audio- and video-recorded longitudinal data
2.1.3 Cross-sectional studies
2.1.4 Combination of sampling techniques
3. Data archiving and sharing
3.1 From diaries and mimeographs to machine-readable corpora
3.2 From text-only to multimedia corpora
3.3 Establishing databases
3.4 Data maintenance
3.5 Annotation
4. Information retrieval: From manual to automatic analyses
5. Quality control
5.1 Individual responsibilities
5.2 Institutional responsibilities
6. Open issues and future perspectives in the use of corpora
6.1 Phonetic and prosodic analyses
6.2 Type and token frequency
6.3 Distributional analyses
6.4 Studies on crosslinguistic and individual variation
6.5 Bridging the age gap
6.6 Communicative processes
6.7 Replication studies
6.8 Research synthesis and meta-analyses
6.9 Method handbook for the study of child language
7. About this volume
How big is big enough?
1. Introduction
2. Sampling and errors in children's early productions
2.1 The effect of sample size on error estimates
2.1.1 Small samples fail to capture infrequent errors
2.1.2 Small samples fail to capture short-lived errors or errors in low frequency structures
Figure 1. Percentage of Lara's wh-questions with forms of DO/modal auxiliaries that were errors of commission over stage IV.
Table 1. Rates of inversion error in Lara's wh-questions calculated from samples of different sizes (% of questions).
2.1.3 Small corpora yield unreliable error rates, especially in low frequency structures
2.2 The effect of calculating overall error rates
2.2.1 High frequency items dominate overall error rates
2.2.2 Overall error rates collapse over time
2.2.3 Overall error rates collapse over subsystems
Table 2. Number of verb contexts requiring present tense inflection and percentage rate of agreement error.*
2.3 Sampling and error rates: Some solutions
2.3.1 Techniques for maximising the effectiveness of new corpora
2.3.1.1 Statistical methods for assessing how much data is required
Figure 2. Probability of capturing at least one target during a one week period, given different sampling densities and target frequencies.
2.3.1.2 Using different types of sampling regimes
2.3.2 Techniques for maximising the reliability of analyses on existing corpora
2.3.2.1 Statistical methods
2.3.2.3 Combining different types of samples
Table 3. Comparison of descriptive statistics: Manchester corpus children and Lara
2.4 Summary
3. Sampling and the investigation of productivity
3.1 The effect of sample size on measures of productivity
Table 4. Effect of sample size on estimates of lexical specificity in Lara's wh-questions
3.2 The effect of frequency statistics on measures of productivity
3.3 The effect of vocabulary size on productivity measures
3.4 Assessing productivity: A solution
Table 5. Average number of inflections per verb in the data from Juan, Lucia and their parents.
4. Conclusion
Appendix: The use of error codes with the CHAT transcription system and the CHILDES database
Core morphology in child directed speech
1. Introduction
1.1 Noun plurals in acquisition
1.1.1 Dual-route accounts
1.1.2 Challenges to the dual-route
1.2 Complexity in the formation of noun plurals
Table 1. A fragment of the interaction between gender and sonority in Austrian German
2. Language systems
2.1 Dutch plural formation
Table 2. Sonority in Dutch
2.2 German plural formation
Table 3. Interaction of gender and sonority in Austrian German
2.3 Danish plural formation
Table 4. Interaction of gender and sonority in Danish
2.4 Hebrew plural formation
Table 5. Interaction of gender and sonority in Hebrew
3. Databases
3.1 Dutch
3.2 German
3.3 Danish
3.4 Hebrew
3.5 General frequencies across the four data-sets
Table 6. General word frequencies in types and tokens across the four data-sets
4. Plurals in child directed speech and child speech
Table 7. Raw frequencies and percentages of nouns and noun plurals in CDS
Table 8. Raw frequencies and percentages of nouns and noun plurals in CS
4.1 Distribution of plural categories in CDS
4.1.1 Dutch
Table 9. Suffix distribution on the basis of word-final phonology: types in Dutch CDS
Table 10. Suffix distribution on the basis of word-final phonology: tokens in Dutch CDS
4.1.2 German
Table 11. Suffix distribution on the basis of item gender and word-final phonology: types in German CDS
Table 12. Suffix distribution on the basis of item gender and word-final phonology: tokens in German CDS
4.1.3 Danish
Table 13. Suffix distribution on the basis of item gender and word-final phonology: types in Danish CDS
Table 14. Suffix distribution on the basis of item gender and word-final phonology: tokens in Danish CDS
4.1.4 Hebrew
Table 15. Suffix distribution on the basis of item gender and word-final phonology: types in Hebrew CDS
Table 16. Suffix distribution on the basis of item gender and word-final phonology: tokens in Hebrew CDS
4.2 Distribution of plural categories in CS
4.2.1 German
Table 17. Suffix distribution on the basis of item gender and word-final phonology: types in German CS
Table 18. Suffix distribution on the basis of item gender and word-final phonology: tokens in German CS
4.2.2 Danish
Table 19. Suffix distribution on the basis of item gender and word-final phonology: types in Danish CS
Table 20. Suffix distribution on the basis of item gender and word-final phonology: tokens in Danish CS
4.2.3 Hebrew
Table 21. Suffix distribution on the basis of item gender and word-final phonology: types in Hebrew CS
Table 22. Suffix distribution on the basis of item gender and word-final phonology: tokens in Hebrew CS
5. General discussion
5.1 CDS compared with adult directed speech (ADS)
Figure I. Predictability of the plural suffix -en in Dutch ADS and CDS according to the form of the final rhyme (wordtypes)
Figure II. Predictability of the plural suffix -en in Dutch ADS and CDS according to the form of the final rhyme (wordtokens)
5.2 Typological perspectives
6. Conclusions
Learning the English auxiliary
1. Introduction
1.1 The early stages of English auxiliary development
1.2 Generativist accounts of auxiliary development
1.3 Usage-based approaches
1.4 Different approaches to accounting for children's auxiliary errors
1.5 Productivity
2. The present study
2.1 Method
2.1.1 Participants
2.1.2 Data collection
2.2 Utterances and frames
Table 1. Number of multi-verb utterances
2.3 Analyses
2.4 Results
2.4.1 Age and MLU
Table 2. Age and MLU in words at the start and end of the study
Figure 1. Cumulative 3-verb frames
Table 3. Number of frames and the percentage of utterances accounted for by frames
2.4.2 Order of emergence of frames
Table 4. Frames produced by at least 5 children and rank order of emergence
Table 5. Frames produced by fewer than 5 children and order of emergence
2.4.3 Evidence for developing schematicity and generalisation
Table 6. The children's non-tag question errors
Table 7. Age at which different structures are attested
Table 8. The first two examples of ellipsis for each child
2.5 Relationship to input
Table 9. Frames used by the mothers in the Manchester CHILDES corpus and not produced by the children in the present study
3. Discussion
3.1 Frequency and sampling
3.2 How abstract is the child's knowledge of auxiliaries?
3.3 Using different methodologies
3.4 Individual differences
4. Conclusion
Appendix A. The children's tag questions
Appendix B. Mean rank order of frequency of mothers' frames (Manchester corpus)
Using corpora to examine discourse effects in syntax
1. Introduction
2. The effect of information flow on argument realization in adult speech
3. The effect of information flow on argument realization in child speech
4. Individual accessibility features
4.1 Newness
4.2 Topicality
4.3 Absence
4.4 Query
4.5 Disambiguation / contrast / interference
4.6 Explicit contrast / emphasis
4.7 Person
4.8 Animacy
4.9 Attention
4.10 Developmental trends
4.11 Summary
5. Accessibility features working in combination
5.1 Several features in one coding category
5.2 Threshold approach
5.3 Incremental contribution
5.4 Independent contribution
5.5 Case study of interaction between two features
5.6 Summary
6. Usefulness of extended stretches of discourse
6.1 Preferred argument structure
6.2 Conversational sequences
6.3 Managing miscommunication
6.4 Summary
7. Experimental studies
7.1 Strengths of production studies
7.2 Difficulties with production studies
7.3 Summary
8. Discussion and conclusion
Integration of multiple probabilistic cues in syntax acquisition
1. Introduction
2. The chicken and egg problem of syntax acquisition
3. Solutions to the chicken and egg problem - innate categories don't help
4. Intra-linguistic cues in the utterance: from statistics to structure
4.1 Measuring potential information in the corpus
4.2 Deriving syntactic structure from the corpus
5. Intra-linguistic cues in the word: Phonology to structure
Table 1. Phonological and prosodic cues found to distinguish grammatical categories in English
5.1 Individual cues in categorisation
5.2 Combined cues for categorisation
6. Combining intra-linguistic cues
7. Converging evidence for the use of multiple cues
7.1 Learning to segment artificial language with multiple cues
7.2 Learning to categorise artificial language with multiple cues
8. How are multiple cues integrated?
Figure 1. Classifications of nouns and verbs based on distributional cues alone (horizontal dotted line), phonological cues alone (vertical dotted line), and combined cues (oblique dashed line)
9. Extra-linguistic cues and language learning
10. Future directions for multiple cue research
10.1 Quantifying new cues
10.2 Cues for different levels of language learning
10.3 Computational and developmental approaches to multiple cues
11. Conclusion
Enriching CHILDES for morphosyntactic analysis
1. Introduction
2. Analysis by transcript scanning
3. Analysis by lexical tracking
4. Measures of morphosyntactic development
5. Generative frameworks
6. Analysis based on automatic morphosyntactic coding
6.1. MOR and FST
6.2. Understanding MOR
6.3 Compounds and complex forms
6.4 Lemmatization
6.5 Errors and replacements
7. Using MOR with a new corpus
8. Affixes and control features
9. MOR for bilingual corpora
10. Training POST
11. Difficult decisions
12. Building MOR grammars
13. Chinese MOR
14. GRASP
15. Research using the new infrastructure
16. Next steps
17. Conclusion
Exploiting corpora for language acquisition research
1. Introduction
2. Corpus creation
3. Corpus size
4. Longitudinal case studies
5. Early production data (ages 1-2)
6. Nature of the input and learnability issues
7. Discourse context and the structure of language
8. Interactions between corpus and experimental studies
9. Areas ripe for further corpus research
10. Limitations of corpus research
11. Converging evidence from corpus and experimental studies
References
Index
The series Trends in Language Acquisition Research

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Corpora in Language Acquisition Research

Description

More details

Other editions

Additional editions

Person

Content

System requirements