The system of rules that governs how we assign meaning to the morphemes we use is called (3 points)

Complexity and Anthropology

J. Stephen Lansing, Sean S. Downey, in Philosophy of Complex Systems, 2011

1.1 Complexity and the Geisteswissenschaften: structuralism

For roughly the past half century, humanistic approaches to sociocultural anthropology have been dominated by the structural anthropology of Claude LéviStrauss, and the “post-structuralism” of his many successors, among them the Tel Quel group in Paris (1960–1982) which included Roland Barthes, Georges Bataille, Maurice Blanchot, Jacques Derrida, Michel Foucault and Julia Kristeva. Structuralism posed a profound challenge to the earlier humanistic tradition in anthropology, which sought to uncover the subjective meaning of cultural symbols and practices. Structuralists did away with the question of the subject's awareness of meaning, replacing it with an account of how language produces meanings that define subjects. The prominent structuralist Roland Barthes (1915–80) argued that the implications of this epistemological reversal could hardly be exaggerated, predicting that the “infinity of language” would replace the Kantian-Husserlian “infinity of consciousness.” The ascendancy of structuralism in anthropology in the 1960s created an ongoing philosophical crisis with respect to the nature of the anthropological subject, which continues today.indexmorphemes

Interestingly, it is probably easier to give a coherent account of the structuralist program from the perspective of complexity, than from that of humanistic anthropology. Structuralism defines various components of language, such as phonemes and morphemes, in terms of logical operations on trees or networks. This marked a radical departure from traditional interpretive approaches to language and culture. A century ago, the Swiss linguist Ferdinand de Saussure (1857–1913) defined the linguistic sign as comprised of two elements, the sensible sound-image (signifier) and the intelligible concept (signified). Saussure argued that linguistic signs are unmotivated and acquire their meaning only through differential relations with other signs.1 He suggested that the same dynamics occur at the level of phonology: the boundaries of phonemes are defined by paired contrasts with the other phonemes that they most closely resemble. Thus in English the slight difference between [p] and [b] marks the boundary between two phonemes, creating a meaningful distinction between, for example, “pit” and “bit.” In this way, binary contrasts or antonymy define signifiers: the written phonetic symbol [b] points to a particular sound (or range of sounds produced by different speakers). Roland Barthes later described this as first-order signification; i.e. the denotative meaning of the signifier. Barthes developed a concept of higher-order signifiers which enabled him to extend the structuralist approach from language to cultural phenomena. For example, the denotative or first-order meaning of the English signifier “blue” depends on the other color terms with which it can be contrasted. Barthes argued that second-order meanings are also defined by binary contrasts. Thus blue is traditionally associated with male infants, and pink with female infants, in American hospitals. This association is an example of metonymy: blue is to pink as male is to female. Barthes argued that such metonymic associations are ubiquitous, generating symbolic classificatory systems for cultural objects.

This idea was further developed by anthropologists such as Marshall Sahlins, who used it to analyze the systemic properties of cultural symbols. For example, Sahlins argued that the Fijian words for “sea” and “land” are first-order signifiers defined by their binary opposition: that which is sea is not land [Sahlins, 1976]. This contrast is extended by metonymic chaining: in Fiji, men are associated with the sea and women with the land; further, chiefs are also associated with the sea and commoners with the land. The seaward side of a Fijian house thus is associated with male and chiefly power. Similarly, the sea itself is subclassed into the lagoon (landward sea) and the outer or seawards sea. Fishing is a male occupation, but if women fish, they do so in the lagoon.

In this example, a relationship of binary opposition between two first-order signifiers “sea” and “land”, forms the root of a tree of symbolic associations (Fig. 1) in which the initial defining contrast is repeated with other paired oppositions, like seawards land and inland land.

Figure 1. A structuralist analysis of Fijian classification.

Figure redrawn from: Sahlins, M. D. (1976). Culture and practical reason. University of Chicago Press.

The tree model was criticized by post-structuralists, who argued that there are no privileged first-order signifiers which unambiguously root trees of symbolic associations (thus the Sea/Land opposition in Sahlin's example would not be accepted as foundational). According to this argument, signification is not fully defined by any single oppositional pair in the chain of signifiers, but rather by metonymic associations. The post-structuralist psychoanalyst Jacques Lacan argued that the mind glides like a butterfly through networks of signifiers, each of which points beyond itself to other signifiers. Hence the correct model is not a rooted tree, but rather a network of signifiers: the chain of differences spreads throughout semantic space and never comes to rest in an ultimate ‘signified’ [Sandwell, 1996, 365–6]. This argument was elaborated by Jacques Derrida, who drew attention to the ‘free play’ of signifiers: they are not fixed to their signifieds but point beyond themselves in an ‘indefinite referral of signifier to signified’ [Derrida, 1978, 25]. Hence both Derrida and Lacan portray relationships among signifiers as networks, not trees. While for Saussure the meaning of signs derived from how they differ from each other, Derrida coined the term différance to allude to the ways in which meaning is endlessly deferred. He concluded that there is no ‘transcendent signified’ [Derrida, 1978,278–280; 1976, 20].

Derrida and other post-structuralists famously developed these ideas into a relativistic epistemology, arguing that the meaning of texts can never be fixed. This conclusion echoed that of many Anglo-American analytic philosophers, who at about the same time (1970s) had begun to acknowledge that their quest for an unambiguous observation language had failed. Meanwhile in linguistics, the structuralist program sputtered to an end as it became clear that networks defined by binary oppositions are not very informative for linguistic phenomena more complex than phonemes and morphemes.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780444520760500201

Big Data Analytics

Venkat N. Gudivada, ... Vijay V. Raghavan, in Handbook of Statistics, 2015

2.5.4 Big Data and Parsing

Deep parsing of natural language text holds key to harnessing unstructured Big Data. For example, dependency tree (DT) is a representation produced via deep parsing. A DT is an acyclic graph and depicts dependencies between lexical entities (words or morphemes). When two words are connected by a dependency relation, one of the words is the head and the other is the dependent. A dependency link is an arrow pointing from the head to the dependent. Usually, the dependent is the modifier, and the head plays a larger role in determining the behavior of the pair in the text. There is growing body of work on creating new tree-banks for training dependency parsers for different languages.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780444634924000095

First Language Acquisition

N.B. Ratner, in International Encyclopedia of Education (Third Edition), 2010

The Nature of Human Language

Human languages are distinct from animal communication systems in a wide variety of ways. Among them are infinite creativity (the ability for speakers and hearers to produce and understand an infinite variety of utterances), their symbolic nature (the arbitrary relationships among words, utterances, and the concepts to which they refer), and hierarchical organization, which allows a number of levels of rules governing appropriate structure and use (see Fromkin et al., 2007).

Within any language, there is a set of rules that governs appropriate use of sounds, words, grammar, and meaning. Moreover, competent language users must also master socially appropriate means of conveying and interpreting linguistic messages. Briefly, these subsystems of language knowledge consist of phonology, morphology, syntax, semantics, and pragmatics.

Phonological features of any given language specify its sound inventory (phonemes) as well as ways in which sounds may be legally combined to create well-formed words (phonotactics). The smallest units of language that convey meaning or grammatical distinctions are morphemes; for example, in English, a word such as cats consists of one lexical morpheme, cat, which can stand alone, and one grammatical morpheme to signal the plural. Languages have large numbers of lexical (or open class) morphemes, and a much smaller and delimited number of grammatical morphemes (closed class), which may stand alone (such as the or can), or must be attached to lexical morphemes (such as the plural, past tense, possessive, etc.)

Mastery of syntax requires appropriate use of morphology as well as any rules governing the ordering of elements in sentences and their smaller constituents, such as noun and verb phrases. Some languages, such as Finnish, permit fairly free word order, while others are highly constrained. In addition, languages may differ in basic word order; for example, English tends to employ subject–verb–object as its canonical ordering, while the Philippine language Tagalog is primarily verb–subject–object.

Meaning in language can be conveyed by the meanings of individual words (as in knowing what the word chair refers to), as well as the order in which words are combined to reflect themes such as subject and object (most readers will readily appreciate that “John loves Mary” does not necessarily mean that “Mary loves John”). Finally, the meanings of sentences often go beyond the strict interpretation of their words and syntax. Pragmatic intent is obtained by evaluation of the sentence within a context to ascertain its function within conversation – whether one's objective is to inform, warn, request action, etc., as might variously be the case in hearing someone say, “It's late.” All of these rule systems must be adequately mastered in order for the child to function as a capable speaker–hearer of a language.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780080448947005078

Literacy Instruction for Students with Special Needs

L.H. Mason, in International Encyclopedia of Education (Third Edition), 2010

Vocabulary

A student’s understanding of word meanings, or their vocabulary, is often considered the glue that binds word recognition and comprehension. For students with special needs, gaps in vocabulary development cause reading delay and affect reading growth (Bos and Vaughn, 2006). There are two types of vocabulary instruction, oral vocabulary and reading vocabulary. Instruction in reading vocabulary (the words a reader recognizes in print or uses in writing) is the focus of this synopsis. Although teacher-directed instructional approaches such as oral language instruction (i.e., teaching vocabulary during reading by modeling how to use context; using synonyms and definitions) work well for average achieving students, students with special needs often require additional explicit instructional approaches.

DI for word-learning strategies, morphemic and contextual analysis, is an approach often used to support student independence in developing vocabulary. In morphemic analysis, students are taught to use word parts to interpret word meanings. For example, the teacher introduces and teaches the meaning of a morpheme (e.g., Pre usually means before, what does pre mean?). The teacher then teaches new words that include the morpheme through signaling and unison oral responding (e.g., So what does pretest mean?). Carnine et al. (2004) cautioned that this approach is limited by the difficulties in translating some words into functional definitions, the fact that many words have dual meanings, and the difficulty in selecting appropriate morphemes for instruction. Contextual analysis, in contrast, fosters students’ independence in learning word meanings by using surrounding words in text. In contextual analysis instruction, the teacher points out the unknown word in the text, asks the students to find the words that tell what the word means, and prompts the student to restate the sentence substituting the known word for the unknown word. Students are taught to look in text for an imbedded word-definition, a synonym, a description, a contrast, or a comparison.

Mnemonics with modified CSI has been used effectively for teaching vocabulary and associated concepts to students with special needs (Hughes, 1996). The keyword picture strategy, for example, uses visualization in the following steps taught to students: (1) select a word or term; (2) state the definition; (3) select another similar sounding word; (4) create a picture using the definition, the word to be learned, and the key word; and (5) think about and study the picture. When developing mnemonic strategies for vocabularies and concepts, Mastropieri and Scruggs (1989) recommended that the amount of elaboration required is dependent on the students’ familiarity with both the word and the concept abstraction.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780080448947011258

Learning to Read

M.S. Burns, J.K. Kidd, in International Encyclopedia of Education (Third Edition), 2010

Essential Underpinnings of Reading

All beginning readers depend on language underpinnings that begin developing in the preschool years. During formal reading instruction, children continue to develop oral-language capabilities needed to approach and understand written language. As preschool children's oral language develops, their expressive oral language increases and their listening comprehension becomes more sophisticated. As this occurs, their ability to remember and use information increases and they learn how to listen differently when hearing expository text, storybooks, poems, and nonsense rhymes. They become aware of the sounds of the language and the phonemes and morphemes that comprise words. They learn the purpose of written language and the forms it takes. At a basic level, they learn how print works (concepts of print). A final but major part of an early literacy foundation is children's developing motivation to read. These essential skills and knowledge continue to develop during instruction in learning to read and as students continue to build reading competence (see Figure 1). Below, we elaborate on these underlying skills and knowledge.

Figure 1. Essential underpinnings of reading.

Language and Listening Comprehension

Oral language begins in infancy and continues throughout life. It is the largest developmental domain relevant to reading. Oral-language skills form the foundation for the transition from understandings of spoken language to written language. During the preschool years, children develop their receptive language, which enables them to understand, remember, and use what they hear, as well as their expressive language, which gives them the ability to communicate their own needs and thoughts.

At the word level, young children develop speech discrimination in the languages they hear on a consistent basis. They hear words and also the separation between words, developing a sense of what a word is. They compare and contrast words, understanding that some begin with the same sounds and some end with the same sounds, developing phonological awareness. Through manipulation of this word–sound system, they begin to understand that words (speech) are made up of a sequence of sounds (phonomes) and are combined in different ways for different words. Given this opportunity and familiarity with many words, children eventually develop a mental model that enables them to break the code (i.e., understand sound–letter correspondences). Children's model for learning and understanding new words is also rooted in morphological development. During the preschool years, children learn many aspects of morphology, for example, how to form past tense and possessives. A morpheme is the smallest unit of language that carries meaning, for example, the word play has one morpheme, that is, play, and the past tense of play, played, has two morphemes play and ed. Children develop phonological awareness and later morphological awareness, metalinguistic understandings. Words are connected and syntax developed.

Central to language development is semantic development and vocabulary. Young children learn sentential semantics, how phrases and sentences are ordered to obtain meaning. Lexical semantics and vocabulary flourish around 2–3 years of age when children acquire the naming insight, realizing that words are names for things. Developing vocabulary includes not only learning new words but the interrelationships between words. Young children play with word meaning as exemplified in the childhood joke, “Why did the girl throw the butter out the window? To see a butterfly.” As children build upon these skills and gain experience with storytelling, they develop a sense of narrative and hone their listening-comprehensions skills.

Forms and Uses of Written Language

During the preschool period, children begin to learn that written language comes in different forms and is used for different purposes. They learn that words are arranged on paper in different ways (e.g., the food words on a menu vs. those on a grocery list vs. those in a cookbook). They simultaneously learn that the food words in these three different forms have different functions. They learn that a storybook (i.e., fiction) is different from a book in which they are learning new information (i.e., expository text). They learn the features of a narrative, how in a narrative a sequence of events relates to the central theme of a story. They learn that a written story has specific features such as quotes in parentheses along with an indication of who is saying the information in quotes (e.g., “Let's take the bicycles,” said Mary).

Knowledge of the Written Symbol System and of Print Concepts

Before they can read, children as young as 3 years know we can read certain letter strings, like BOOK, but not TTTT. Young children also reliably classify BOOK as a word and 8965 as a number. They learn that it is the print in books that is read, and that it works in a certain way (e.g., in English it goes from top to bottom, left to right on the page). They learn knowledge of punctuation and letter knowledge. They develop the understanding that print can have meaning independent of immediate context.

Motivation: Becoming Enthusiastic About Reading and Writing

Literacy as a source of enjoyment takes place in the preschool years as children are read to in interactions that are positive and warm. These experiences are enhanced when the interactions address children's interests, take into account their prior knowledge, honor cultural continuity, and provide multilingual support.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780080448947005121

Language Acquisition

Allyssa McCabe, in Encyclopedia of Social Measurement, 2005

Issues in Assessing Language Acquisition

Age Concerns

The first important consideration in measuring language acquisition is the age of the child. Virtually all measures are devised to address the language acquisition of individuals in a specified age range. Related to this concern is the mental age or stage of language acquisition of the child. For example, although the mean length of utterance (MLU) in morphemes is appropriate until that measure reaches 3.5 morphemes, a point usually reached by children when they are 3 years old, the measure may continue to be of use with older, delayed children. That is, MLU is no longer appropriate to use for children who routinely produce longer sentences (e.g., “I jump/ed out of the car” or “He did/n't cry too much today,” both consist of seven morphemes).

Control versus Generalizability of Results

Psychologists have long stressed the importance of standardizing the procedure of a study, or arranging for as many circumstances to be the same for all participants as possible. Through the exercise of such scientific control, experimenters believe that they can attribute outcomes to the independent variable of interest to them rather than some other, extraneous variable. The difficulty is that such control comes at the inevitable expense of generalizability (the extent to which findings can be applied to other situations outside the laboratory). For example, an experimenter might adopt the method Ebbinghaus used in the 1880s to study the acquisition of words. Ebbinghaus used consonant-vowel-consonant trigrams—nonsense syllables—in an effort to avoid contaminating the experimental procedure by the intrusion of meaning on the laboratory experience. He then measured precisely how many repetitions of “JUM” or “PID” were required for subjects to memorize those nonsense syllables. Researchers eventually discovered that such procedures told them very little about how people learn words in the real world; in other words, generalizability had all but completely been sacrificed for the sake of control. Moreover, unbeknownst to researchers, subjects often turned nonsense syllables into meaningful ones (e.g., “JUM” became “JUMP” or “CHUM”) to ease memorization.

On the other hand, simply observing language in the real world, which maximizes generalizability, would not tell us much about which of the many aspects of some particular situation were responsible for triggering the language observed. Once again, multiple methods of assessment are required.

Naturalistic Observation versus Elicitation

Related to the trade-off between experimental control and generalizability is that between naturalistic observation and elicitation. One of the earliest means used to study language acquisition was a diary of a child's progress, kept by parents who were also often linguists. Such an approach can yield ample, rich data that are true of real situations because they were derived from such situations. However, observation of what a child does, even day in and day out, does not necessarily tell us about a child's capability. Elicitation procedures are best suited to informing us of children's capacity, and by far the best-known such procedure is the wug test developed by Berko Gleason in 1958. Berko Gleason showed children a picture of a strange creature and said, “This is a wug.” She then showed children a picture of two of the strange creatures, saying, “Now there is another one. There are two of them. There are two ____.” Using this procedure, Berko Gleason was able to demonstrate that children were capable of producing grammatical morphemes (e.g., saying, “wugs”) in response to nonsense words they had never heard of before. Children had apparently implicitly acquired certain rules (e.g., for forming plurals) rather than simply mindlessly imitating their parents' productions.

Longitudinal versus Cross-Sectional Assessment

The assessment of language acquisition can be accomplished by testing groups of individuals of different ages at approximately the same time, called the cross-sectional approach. Alternatively, the same individuals can be tested repeatedly over a number of years, called the longitudinal approach. The cross-sectional approach is more economical of a researcher's time, because wide age spans can be assessed in a relatively short period of time. However, that method gives no information about the particular path or paths of development of individual children, nor does it provide any hints about the possible causes of such development. The longitudinal method, in contrast, provides information both about individuals and potential causes, although it is expensive in terms of time and money and liable to problems if participants drop out. Furthermore, the results of longitudinal studies may not be generalizable to other groups or other generations.

Production versus Comprehension

At first glance the distinction between children's production of language and their comprehension of language may seem identical to the distinction between performance and competence already noted. However, whereas production and performance may be used interchangeably, comprehension and competence cannot. Specifically, comprehension refers to children's understanding of language directed to them (e.g., can the child perform a sequence of orders directed to him or her?), whereas their competence refers to their internalized knowledge of linguistic rules, a more esoteric ability often tapped by asking people to judge the grammatical acceptability of a sentence. For most children, the comprehension of language precedes and exceeds production. Some estimate that children comprehend five times as many words as they produce, for example. Thus, the discrepancy between comprehension and production must be kept keenly in mind by those assessing children.

Research versus Clinical Assessment

Any type of language assessment can in theory be used either to address research issues about the language development of children in general or for clinical purposes to assess the relative progress of a particular child in acquiring language relative to his or her peers. In fact, research on children in general is essential for the clinical assessment of specific children. Whereas some measures are used interchangeably, many tend to be used primarily in one setting or another.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B0123693985005363

Computing an organism: on the interface between informatic and dynamic processes

Paulien Hogeweg, in On Growth, Form and Computers, 2003

9.5.1 The model

An overview of the model is shown in Figure 9.3. It combines the GG model with a Boolean gene regulation network in each cell and an evolutionary process to focus on networks which generate morphogenesis. (For a detailed description of the model see Hogeweg (2000a).) Development starts with one large cell that undergoes a number of prescheduled cleavages; during the first two cleavages ‘maternal factors’ may cause cell differentiation. The rules for subsequent cell growth and division are based on the experimental observation that stretch can indeed lead to cell growth and division and squeezing of cells leads to apoptosis (Chen et al., 1997; Ruoslahti, 1997). Note that both these processes involve an elaborate sequence of changes in gene expression; we here implement only the effect. Cell division occurs in the model when a cell has reached a size that is twice some reference size and cell division is through the middle and perpendicular to the longest axis. This seems indeed to be the default situation and modelling of the spindle formation has shown that it will self-organize accordingly (Dogterom et al., 1995). However, in vivo many mechanisms are used to alter the division plane. We will examine below their effect on developmental robustness.

Figure 9.3. Overview of the model: entanglement between gene regulation, development and evolution.

In earlier work (Hogeweg, 2000a, b, 2002) we have shown:

Differentiation into cell types varies between (a) stable differentiation, i.e. differentiation is maintained independent of intercellular signalling, (b) history dependent differentiation, in which the cell type is dependent on a sequence of neighbourhood conditions, and (c) differentiation which is fully specified by the current neighbourhood only. Each of these types of differentiation is associated with different types of morphogenesis.

Morphogenesis results as ‘sustained transient’ from surface energy minimization and ‘intrinsic conflict’, which is maintained by cell differentiation, cell growth and cell death. Without continued ‘interference’, the initial, high energy state would change, through shape changes to, at the end a ‘blob’-like low-energy shape.

These intrinsic conflicts lead to automatic orchestration of adhesion, migration, differentiation, cell growth/division and death. It results in ‘pseudo-isomorphic outgrowth’. Although, the shapes do change during ‘maturation’, a ‘critter’ preserves its general appearance.

Many different morphemes result from combinations of a few mechanisms:

o

An important mechanism is meristematic growth, a layer of dividing cells that differentiate into non-dividing (or rarely dividing) cells of several types. The zone is maintained because cells redifferentiate if ‘out of line’: cells differentiation that is fully dependent on neighbourhood is ‘used’.

o

A related mechanism we dubbed elongation by ‘budding’: a small group of differentiated cells is pushed outwards, because an other cell type on the one hand tends to engulf them, but on the other sticks together more firmly than to the ‘bud’. Again, the situation is maintained, because cells which do, nevertheless, engulf, differentiate into bud-type cells. Like the previous one, this mechanism depends on neighbourhood dependent cell differentiation. The elongation shown in Figure 9.4 is an example of this.

Figure 9.4. Individual variation in the phenotypes developed from the same genome and its partial stabilization by asymmetric early cleavage. Upper panels: developmental sequence. Stages shown just after cleavage and cell redifferentation (3 upper panels show symmetric division, 4th panel shows asymmetric division). Left hand morphemes: variants developed under symmetric cleavage. Right-hand morphemes: variants developed under symmetric cleavage. Histogram shows number of occurrences of cell types as indicated.

o

Another often occurring mechanism is convergence extension, which occurs due to maximization of the contact line between stably differentiated cell types and often involves redifferentiation of subtypes of these cell types when the contact zone increases.

o

Elongation can also result from intercalation of stably diverged cell types and their subsequent growth and division.

o

Finally engulfing is an intrinsic mechanism of differential cell adhesion. In our model it often induces neighbourhood dependent cell differentiation.

The evolutionary dynamics show many of the features known to result from a non-linear genotype-phenotype mapping, i.e. neutral paths and punctuated equilibria at the phenotypic level, although the shape of the quasispecies distribution differs from the simpler examples studied before (e.g. Huynen et al., 1996; van Nimwegen et al., 1999). We will examine this below in more detail.

Moreover, the evolutionary dynamics give rise to interesting mosaic-like variation at the phenotypic level, i.e. repeated reinvention of similar morphotypes occurs in one evolutionary history.

All these features involve the interaction of the intra-and intercellular levels, which generate long-range correlations. In the next two sections we will examine the relationship between the ‘informatic’ processes, i.e. gene regulation, and the dynamic processes, i.e. cell behaviour due to differential adhesion, by focusing on the morphological variation that occurs among critters with identical genomes. In other words, we will study how the genetic and inheritable information can on the one hand exploit and on the other hand ‘tame’ the dynamics of the system.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780124287655500426

Context in Content Composition

Nicholas Asher, in Philosophy of Linguistics, 2012

Aspectual coercion

Aspectual coercion, in which an aspectual operator is applied to a verb phrase denotation, which specifies an eventuality type inter alia, to produce another verb phrase denotation and eventuality type description, is another example of a meaning shift. Aspectual coercion is quite language specific, and thus is not the result of any general pragmatic operation such as that considered by Neo-Griceans or relevance theorists such as [Sperber and Wilson, 1986; Recanati, 2004]. Consider, for example, (6), which involves the progressive aspect.

(6)a.

#John is knowing French.

b.

John is being silly.

c.

John is just being John.

d.

John's being an asshole.

One of the truisms about the progressive aspect is that stative constructions don't support it, as shown in (6a). Nevertheless, (6b-d), which are progressivizations of the stative constructions John is silly, John is John, and John is an asshole, are perfectly unproblematic. Interestingly, aspectual coercion with the progressive appears to be a particular feature of the English progressive aspect morpheme. Languages like French that lexicalize progressive aspect do not seem to support this meaning shift:

(7)a.

Jean est idiot.

b.

#Jean est en train d'être idiot.

c.

Jean est en train de faire l'idiot.

Aspectual coercion is a thus language specific phenomenon and thus cannot be the result of a general cognitive principle of strengthening or weakening due to Gricean or Neo-Gricean constraints on communication. Such meaning shifts must be a part of the linguistic system, due to the meaning of particular words.

Another language specific aspectual coercion concerns the application of a perfective aspectual operator to a verb phrase containing an ability modal. Consider the following French examples. (8) translates roughly as Jeanne had to take the train and (8a) and (9a) use the perfective aspect, while (8b) and (9b) have imperfective aspect.

(8)a.

Jeanne a du prendre le train. → Jeanne a pris le train.

(Jeanne had to take the train. → Jeanne took the train).

b.

Jeanne devait prendre le train. ↛ Jeanne a pris le train.

(Jeanne was supposed to take the train. ↛ Jeanne took the train.)

(9)a.

Jeanne a pu prendre le train. → Jeanne a pris le train.

(Jeanne was able to take the train. → Jeanne took the train.)

b.

Jeanne pouvait prendre le train. ↛ Jeanne a pris le train.

(Jeanne was able to take the train. → Jeanne took the train.)

The → signifies an actuality entailment. Were we to consider ability modals as true modals that we can symbolize with □ and ⋄, the actuality entailments in (8a) and (9a) would translate, respectively, to (10a) and (10b):

(10)a.

□ϕ → ϕ.

b.

□ϕ → ϕ or ϕ → □ϕ.

which implies a collapse of the modality (Bhatt 1999). However, with the imperfective aspect, these inferences vanish, and there is no collapse. The puzzle is, how can an application of the aspectual perfect collapse the modality? This is unpredicted and indeed bizarre from a Montagovian view of composition.

Actuality entailments with certain verb forms, like coercion with the progressive aspect, is a phenomenon particular to certain languages. In English, for instance, the actuality entailment does not appear to exist:

(11)

John was able to take the train.

(12)

John had to take the train.

(13)

?John has been able to take the train.

None of these have the actuality entailment, though they might have what one could call an actuality implicature. Once again, the actuality entailment cannot be the result of some general cognitive but non linguistic principle of strengthening. It is a semantic and lexically constrained kind of inference.

Matters are still more complex when one considers how temporal adverbials interact with modality and aspect to produce actuality entailments.5

(14)

Soudain, Jean pouvait ouvrir la porte.

(Suddenly, Jean could open the door.)

In (14) the actuality entailment holds, despite the fact that the imperfective aspect is used. This is explained by the general observation that adverbs like suddenly coerce the imperfective aspect into an incohative one with a perfective meaning. But once again we have a shift of meanings.

I believe that the apparent meaning shifts discussed in 1.2-1.2 should receive as uniform a treatment as possible within a semantic/pragmatic framework of lexical meanings and semantic composition—that is, how lexical meanings compose together to form meanings for larger semantic constituents like propositions or discourses. But we can only address this issue adequately within a larger view of how context affects interpretation. To this end, I will review the outlines of how dynamic semantic frameworks, including theories like SDRT, view discourse content computation. This will give us the tools with which to understand context effects at the level of clausal content composition and apparent meaning shifts. I will then discuss a couple of classic meaning shift cases and spell out the general approach to these that I favor, comparing it to recent pragmatic as well as semantic accounts.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/B9780444517470500081

Computational Analysis and Understanding of Natural Languages: Principles, Methods and Applications

Venkat N. Gudivada, ... Amogh R. Gudivada, in Handbook of Statistics, 2018

3 Document Preprocessing

We begin this section by defining terminology. Indexing is the process of creating both a representation for documents and associated data structures for storing and retrieving the representation. It is this representation that is used to determine the relevance of a document to a user query. For example, an inverted index is one such representation (discussed in Section 6).

3.1 Document Granularity

The first task of an IR system is to preprocess documents. In this context, we consider the following issues: document granularity, tokenization, and normalization. Recall that the primary purpose of an IR system is to retrieve relevant documents to a user query from a large document collection. What exactly is a document? Is an entire book considered as one document? Or, is each chapter in a book a document? As another example, consider a chain of email messages. Is the entire chain one document or each email in the chain a separate document? The notion of document is important for almost all IR systems such as Elasticsearch (Gheorghe et al., 2015). Each document is assigned a unique identifier and forms the basic unit from a retrieval perspective. An exception to this is the Wumpus IR system (The University of Waterloo, 2018), which treats the entire document collection as one mega document, and each passage in the mega document is a unit of text for retrieval. For other documents such as journal articles, each document component—title, abstract, keywords, and rest of the article—is a separate unit for search. This enables selectively searching on the title, abstract, or the body of the article.

Precision and recall are the two parameters of retrieval effectiveness. Precision refers to how many of the retrieved documents are relevant to the user, whereas recall refers to what fraction of relevant documents in the collection are retrieved. Indexing granularity refers to the size/component of the document chosen for indexing. If the indexing granularity is high—for example, the entire book is considered as one document—may result in increased false positives (i.e., low precision). On the other hand, if the document granularity is too fine, it will entail low recall. This is because important passages will be missed as the passage vocabulary terms are split across mini documents.

3.2 Tokenization

A document is viewed as a sequence of bytes. ASCII encoding uses one byte per character and this suffices for the English language. Other encoding schemes such as Unicode UTF-8 uses one to four bytes to represent 1,112,064 distinct codes to accommodate characters in all written languages. A token corresponds to a sequence of bytes. For example, in “seven wonders of the world” there are five tokens and each is a different token type. On the other hand, in “to be, or not to be” there are seven tokens but only five distinct token types. The token types “to” and “be” each have two instances. Note that “,” is a punctuation token. The whitespace-based demarcation of tokens works in languages such as English. Even in these languages, problems arise when phrases like “New York” and “North Carolina” are segmented. These phrases should be segmented as single tokens. Language-specific rules are effective in recognizing single-token phrases. In many other languages compound words abound and segmenting them into individual tokens is nontrivial. The text “Aliikusersuillammassuaanerartassagaluarpaalli” from Western Greenlandic language translates to “However, they will say that he is a great entertainer, but ….” This is typical in polysynthetic languages where several morphemes (i.e., smallest units of meaning in a language) are strung together to make larger words.

Other tokenization issues include dealing with apostrophes (e.g., O’Brian, Carolina’s), contractions (I’ll), and hyphenated words (now-a-days). Acronyms also pose challenges (U.S.A/USA, Los Angles/LA, Louisiana/LA). Token normalization refers to canonicalizing tokens so that matches occur despite superficial differences (Manning et al., 2008). Token normalization is also referred to as equivalence classing of tokens. For example, various date forms such as 04/01/2018, 2018/04/01, and April 4th, 2018 denote the same date and therefore are members of the same equivalence class. Domain-specific mapping rules are used to specify equivalence classes. Alternatively, relations between unnormalized tokens are maintained using a hand-constructed list of synonyms. For example, synonyms of accelerate include advance, expedite, hasten, hurry, quicken, step up, and stimulate. Suppose that the term accelerate occurs in a document d1. This term in unnormalized form is used to index d1. Also, assume that a user issues a query which contains just the term expedite. An IR system may add additional terms in the synonyms list of expedite to the initial query, and the query is processed as a disjunction of synonyms. That is, the expanded query is: expedite or accelerate or advance or hasten or hurry or quicken or step up or stimulate. An alternative to query expansion is to index d1 using the term accelerate and also all the terms in its synonym list—advance, expedite, hasten, hurry, quicken, step up, and stimulate.

3.3 Stemming and Lemmatization

Morphology is the study of words, their formation, and their relationship to other words in the same language. There are two types of morphology—inflectional and derivational. Inflectional morphology produces different forms of the same word rather than different words. Inflectional categories include number, tense, person, case, gender, among others. For example, leaves is produced from leaf, and both the original and new word belong to the same word category—nouns. In contrast, derivational morphology often involves the addition of derivational affixes, and affixation entails different categories for the new words. For example, the suffix “-ive” changes the word select to selective.

Other approaches to equivalence classing include stemming and lemmatization. Stemming is a crude heuristic process which collapses derivationally related words to their stem, base, or root form. There are two kinds of stemmers—algorithmic and dictionary. Algorithmic stemmers (e.g., Porter’s) apply a set of rules to reduce a word to its stem form. In contrast, a dictionary stemmer looks up a dictionary to find the stem for a given word. Given the sentence “Other approaches to equivalence classing include stemming and lemmatization,” Porter's algorithm reduces it to the following form: Other approach to equival class includ stem and lemmat. Lemmatization involves full morphological analysis of words to reduce inflectionally related and sometimes derivationally related forms to their base form—lemma. The same sentence in the example above reduces to the following form through lemmatization: Other approach to equivalence class include stemming and lemmatization. Both stemming and lemmatization help to improve recall while hurting precision.

3.4 Stop Words, Accents, Case Folding, and Language Identification

Stop words are grammatical function words—for example, a, an, and, be, int, not, of, off, over, out, to, the, and under. Early IR system discarded stop words as they carry no content and exist only to meet grammatical requirements. If stop words are removed, phrases such as “to be, not to be, that is the question” do not get indexed correctly. Furthermore, for phrase queries (Section 6), presence of stop words contributes to better recall. Current IR systems including web search engines do not exclude stop words from indexing.

Accents and diacritics may be ignored in English text, but they can be quite significant for retrieval in other languages such as Spanish. Case folding is another normalization task. Beginning of the sentence words can be lowercased without retrieval implications. However, terms in the middle of a sentence should be left capitalized. Similar issues arise with acronyms. For applications such as web search engines, lowercasing everything is a pragmatic solution since users hardly use capitalization in their queries.

Writing systems of languages also pose problems for token extraction. In some writing systems, one reads from left to right, in others, right to left, and mixed (both left to right and right to left) yet in others. Though most documents on the web are written in English, of late, web documents written in other languages are becoming prevalent. In such cases, the first task is to identify the language of the document—Language Identification (LID). Short character sequences serve as distinctive signature patterns for the LID task. The LID problem has been solved with high degree of accuracy as a classification problem using supervised machine learning approaches. However, mixed-language documents, where a small fraction of words from another language are mixed, can create challenges to LID.

Read full chapter

URL: //www.sciencedirect.com/science/article/pii/S0169716118300245

Code Biology

Luc Steels, Eörs Szathmáry, in Biosystems, 2018

1 Introduction

Human languages are the archetypal examples of a code, taken in the sense of “a small set of arbitrary rules selected from a potentially unlimited number in order to ensure a specific correspondence between two independent worlds.” (Barbieri, 2015) The two independent worlds in this case are the world of speech, gesture or written marks on the one hand, and the world of meaning on the other. The set of rules of a language consists of its lexicon (associating words or morphemes with meanings and functions) and its grammar (prescribing how larger units are built and how the meaning of these combinations is assembled to form the meaning of the whole).

Human languages differ from other codes in important ways: (i) The rules are not static but change over time, possibly in profound ways. Speech sounds obviously change, new words come into the language all the time and others become obsolete, grammatical marking may erode and be re-invented again. (ii) The rules are not universal. There is a large variety of languages and dialects within languages. Languages differ in all respects including in the kinds of meanings they are able to express directly. (iii) The rules are not made up by designers (as would be the case for a programming language) but they emerge dynamically by the individual activities of speakers and listeners. Language conformity is an emergent phenomenon without central control. (iv) The set of rules is not small and not definable. A typical language user employs on the order of half a million lexical and grammatical constructions and this set continuously expands and contracts based on usage. (v) Language is an inferential code, which means that the decoder needs a lot of contextual and background knowledge in order to be able to understand what is being said. This contrasts with most other codes where all the information to be transmitted is explicitly coded in the message.

All these properties make human language a fascinating object of study for code biologists − who tend to emphasize the dynamic evolutionary nature of the linguistic code and enquire about its origins and incessant change, in contrast with linguistic theorists who tend to seek the common static structure underlying language.

This paper addresses two open questions: (i) How does language develop in the individual? and (ii) How do languages evolve in a population? s is here used in the sense of change historically over time, which may involve real novelty such as the emergence of phrase structure or the emergence of a grammatical case system. Both questions are clearly related to each other, because language development in each individual of a community drives the evolution of its common language. We argue that the perspective of evolutionary biology, in particular the framework of replicator dynamics, helps to tackle both questions.

Read full article

URL: //www.sciencedirect.com/science/article/pii/S0303264717302885

What do we call the idea that language affects what we think 3 points?

Language may indeed influence the way that we think, an idea known as linguistic determinism.

What is the system of rules that governs how we combine words?

Syntax—the rules that pertain to the ways in which words can be combined to form sentences in a language. Semantics—the meaning of words and combinations of words in a language. Pragmatics—the rules associated with the use of language in conversation and broader social situations.

What are the basic units of sounds in any given language?

The smallest units of sound that make up a language are called phonemes. For example, the word “that” contains three phonemes the “th” represents one phoneme /th/, the “a” maps to the short a sound /ă/, and the “t” to its basic sound /t/.

Which of the following levels of language analysis focuses on sound?

Phonology is concerned with classifying the sounds of language and with saying how the subset used in a particular language is utilised, for instance what distinctions in meaning can be made on the basis of what sounds. 2) Morphology This is the level of words and endings, to put it in simplified terms.

Toplist

Neuester Beitrag

Stichworte