Chapter 1: Computational linguistics
An interdisciplinary discipline, computational linguistics focuses on the computer modeling of natural language, as well as the investigation of relevant computational methods to various linguistic challenges. In general, computational linguistics draws on a wide variety of fields, including but not limited to linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, ethnography, and neuroscience.
In the past, computational linguistics developed as a subfield of artificial intelligence carried out by computer scientists who had specialized in the use of computers in the translation and analysis of natural languages. During the 1970s and 1980s, the subject was able to become more established thanks to the introduction of independent conference series as well as the foundation of the Association for Computational Linguistics (ACL).
The Association for Computational Linguistics (ACL) provides the following definition for the field of computational linguistics::
...the use of scientific methods and computer analysis to the study of language. Researchers in the field of computational linguistics are interested in developing computer models of many different types of language processes.
The terms "natural language processing" (NLP) and "(human) language technology" are increasingly being seen as being almost synonymous with the word "computational linguistics." This is the case in the year 2020. Since the beginning of the 2000s, these phrases have placed more of a focus on the investigation of real applications rather than theoretical concepts. Although they solely pertain to the subfield of applied computational linguistics, in practice, they have largely supplanted the term "computational linguistics" in the NLP/ACL community. This is because they refer more explicitly to the subject of applied computational linguistics.
The study of computational linguistics incorporates both theoretical and practical aspects. The field of theoretical computational linguistics focuses on problems that arise in the fields of cognitive science and theoretical linguistics.
The creation of formal theories of grammar (parsing) and semantics is an important part of theoretical computational linguistics. These theories are often rooted in formal logics and symbolic (knowledge-based) techniques. Research domains that are within the purview of theoretical computational linguistics include the following::
The computational difficulty of natural language, which is based mostly on automata theory and makes use of context-sensitive grammar and linearly bounded Turing machines.
Determining appropriate logics for the encoding of linguistic meaning, automatically creating such logics, and reasoning with those logics are all components of computational semantics.
Machine learning, which has typically relied on statistical approaches and, from the middle of the 2010s, neural networks: Socher et al., is the most important aspect of applied computational linguistics (2012)
Other divisions of computational into main fields according to various criteria exist, such as the divide that exists between theoretical and practical computational linguistics. These divisions of computational include::
regardless of the spoken or written form of the language that is being processed: The fields of voice recognition and speech synthesis investigate how computers can comprehend spoken language and construct their own versions of it.
job that is being carried out, such as analyzing language (which involves recognition) or generating language (which involves generation): The subfields of computational linguistics that deal with disassembling and reassembling language are called parsing and generation, respectively.
Traditionally, the use of computers to solve research issues in areas of linguistics that fall under the purview of other subfields has been categorized as being within the purview of the field of computational linguistics. This involves a number of things, amongst others.
Computer-aided corpus linguistics, which has been utilized as a method to achieve comprehensive breakthroughs in the area of discourse analysis since its inception in the 1970s, Simulation and investigation of the development of language using historical linguistics and glottochronology.
Despite the fact that computational linguistics existed before the invention of artificial intelligence, it is often classified as a subfield of the study of artificial intelligence (AI). The field of computational linguistics may trace its roots back to attempts made in the 1950s in the United States to utilize computers to mechanically translate writings from other languages, most notably Russian scientific publications, into English. These efforts gave rise to the field. Some research in the subject of computational linguistics tries to develop voice or text processing systems that are operational, while other research in the field intends to develop a system that enables interaction between humans and machines. Conversational agents are a kind of software designed to facilitate communication between humans and machines.
In the same way that computational linguistics may be carried out by specialists from a number of professions and via a broad variety of departments, the research domains can also approach a varied range of themes. The sections that follow will talk about some of the literature that is available across the entirety of the field. This literature is divided into four main areas of discourse, which are as follows: developmental linguistics, structural linguistics, linguistic production, and linguistic comprehension.
A person's linguistic ability is a cognitive talent that grows and evolves throughout the course of their lifetime. This developmental process has been investigated using a variety of methodologies, and one of those methods is a computational approach. Understanding human language via the use of a computer approach is made more difficult by the limits that arise throughout the evolution of human language. For example, throughout the process of learning a language, human infants are almost exclusively presented with favorable facts. Consequently, this outlines specific parameters for a computational method of simulating the process of language development and acquisition in an individual.
A computational approach has been used in an effort to explain the developmental process of language acquisition in children, which has led to the creation of statistical grammars as well as connectionist models. Both of these results may be attributed to the robustness of the artificial neural network that the project was responsible for developing.
For the purpose of putting various linguistic theories to the test, the capacity of children to acquire language has also been mimicked using robots. The ability to learn in a manner similar to that of children led to the development of a model that was based on an affordance model. This model included the creation of mappings between actions, perceptions, and effects, all of which were connected to spoken words. Importantly, these robots were able to acquire functional word-to-meaning mappings without the requirement for grammatical structure. This enormously simplified the learning process and shed light on material that furthers our present knowledge of the genesis of language. It is essential to understand that this knowledge could have only been experimentally validated via the use of a computational strategy.
Using neural networks and learning robotic systems, our knowledge of the language evolution of a human over the course of their lifetime is continuously improving, which is allowing us to draw some interesting conclusions, It is also essential to bear in mind that languages, in and of themselves, evolve and progress throughout the course of time.
The use of computational methods to try to explain this phenomena has led to the discovery of some extremely intriguing facts.
Using the Price equation and Pólya urn dynamics, Researchers have developed a model that not only forecasts the course of future linguistic development but also sheds light on the path that contemporary languages have taken throughout their evolutionary journey.
The results of this modeling work are as follows:, by using the field of computational linguistics, what would have been unthinkable under any other circumstances.
The advancements that have been made in the field of computational linguistics have unquestionably contributed to a much enhanced knowledge of the language development that occurred not just in humans but also all throughout the course of evolutionary history. Because it is possible to model and alter systems at will, science now has a technique that is both ethical and capable of testing theories that would otherwise be difficult to investigate.
It is essential to have a grasp of the structure of language in order to develop more accurate computer models of language. To this purpose, the English language has been subjected to painstaking research using computational methods, with the goal of gaining a deeper comprehension of the English language's operation on a structural level. The availability of large linguistic corpora or samples is one of the most significant components of being able to examine the structure of a language. [Case in point:] [Case in point:] This provides computational linguists with the raw data that is essential to run their models and obtain a better grasp of the underlying patterns that are present in the large quantity of data that is included in any one language. The Penn...