Temporal Structure and Thematic Progression : a case study on French corpora LydiaMai HoDac Marion Laignelet ERSS (UMR 5610 et Université Toulouse 2 – Le Mirail) hodac@univtlse2.fr marion.laignelet@univtlse2.fr L.M. HoDac & M. Laignelet, ERSS 1
SEM05
●
Definitions / theoretical background –
Text as a structured object
–
Three structuring mechanisms : Discourse Framing, Heading and Thematic Progression
●
Hypotheses
●
A corpus-based study / methodology
●
Results
L.M. HoDac & M. Laignelet, ERSS 2
SEM05
Text as a linear and structured object Text linearity “language production, whether oral or written, is strictly linear and time dependent. This means that, at a given time, only one piece of information can be related. It follows that one of the main problems confronting speaker and writer is how to present information in linear format (i.e., to linear information) when such information is rarely stored in a linear structure in the mental model.” (Fayol 1997 : 157)
Text as a structured object ↔ on the surface of the text, the writer uses markers which signal how current information has to be associated/ understood (or not) with information that appears before or after in the text: → use of text structuring mechanisms ↔ two main types of relationships: continuity and discontinuity L.M. HoDac & M. Laignelet, ERSS 3
SEM05
(Dis)continuity relations Continuity relations Cohesion markers of a textual segment ➔ Thematic Progression Discontinuity relations ➔
Signals of the boundaries of a textual segment ➔ Headings & Discourse Framing ➔
These three mechanisms do not cover the whole text: there are chunks where some are not used (the writer may then use other mechanisms that are not presented in this study). L.M. HoDac & M. Laignelet, ERSS 4
SEM05
Discourse Framing (Charolles, 1997) Discourse segmentation into frames: ➔
A frame is a gathering of several propositions; propositions are grouped together because they have to be interpreted under a specific criterion; this criterion is given by a Frame Introducer (FI)
The marker then is a Frame Introducer (FI): ➔
➔
Form: the initial boundary of a Frame is an adverbial (adverb, noun phrase, prepositional phrase, subordinate clause) that appears in sentence initial position, usually detached (comma) Discourse functions: ● FI gives a semantic criterion of interpretation (temporal, spatial, organisational, evidential, etc.) which can extend its scope beyond the sentence in which it appears. ● FI marks out text segments which contribute to the text organization. ● FI introduces an indexing relationship in the discourse → information packaging.
L.M. HoDac & M. Laignelet, ERSS 5
SEM05
Headings (HoDac, Jacques et Rebeyrolles, 2004) Discourse segmentation into Sections: ➔ The propositions and/or paragraphs are linked by the fact
that they must be interpreted in reference to a common and general “theme” or point of view given by the Heading
The marker is then a Heading: ➔
➔
Form: the heading stands out from the rest of the text (typography and layout) Discourse functions: ● Heading gives a semantic criterion of interpretation which covers the whole section. ● Heading marks out text segments which contribute to the text organization. ● Heading introduces an indexing relationship in the discourse → information packaging.
L.M. HoDac & M. Laignelet, ERSS 6
SEM05
Thematic Progression (TP) Sequentiality in discourse: ➔
a referential chain is constituted of successive expressions between which readers establish a coreferential link (Corblin, 1995)
The marker here is a succession of coreferential expressions: Form: ● noun phrases (or pronouns) which refer to a world entity (object, person, event, ...) (Charolles, 1988) ● Thematic Progression is established when there is a co referential expression in thematic position (here grammatical subject); ➔ 3 main types of Thematic Progression (Da ň es, 1974): ● (Simple) linear TP: each rheme becomes the theme of the next utterance ● Constant (continuous) TP: theme stays constant over sentences ● Derived TP: themes of successive sentences are derived from a single overriding theme ➔
L.M. HoDac & M. Laignelet, ERSS 7
SEM05
3 ways to establish coreference Pronominalization : an anaphoric pronoun is used to establish a coreferential relationship
[...], le débat entre spécialistes des relations transatlantiques s'est trop souvent contenté d'osciller [...]. Il ne s’est pas suffisamment porté sur l’ampleur des changements […].
Redenomination (direct coreference): repetition of a referent in a nominal form, identical to the one that was initially employed
[...]. En 1995, l'Ouest ne reste pas ou ne revient pas [...]. [...]. En même temps, l'Ouest [...], compte parmi les meilleurs [...].
Relabel (indirect coreference): “the head noun of the anaphoric phrase is different from the head noun of the antecedent” (Manuélian, 2003)
[...], le débat entre spécialistes des relations transatlantiques s'est trop souvent contenté d'osciller [...]. [...]. Plus récemment, la discussion s'était portée sur [...]. L.M. HoDac & M. Laignelet, ERSS 8
SEM05
(Dis)continuity : an example Depuis la fin de la guerre froide, le débat entre spécialistes des relations transatlantiques s'est trop souvent contenté d'osciller entre les bons sentiments et la simplification. Il ne s’est pas suffisamment porté sur l’ampleur des changements […]. Plus récemment, la discussion s'était portée sur un éloignement supposé des valeurs sociales entre les deux rives de l'Atlantique, auquel les événements du 11 septembre 2001 ont au moins provisoirement mis fin. Ce débat se poursuit, mais il est maintenant limité à la sphère de l'analyse sociale. En termes de politique étrangère, cette discussion sur la dérive des continents a pris la forme d'une opposition entre l'unilatéralisme de la politique américaine et le multilatéralisme de leurs partenaires européens. L.M. HoDac & M. Laignelet, ERSS 9
SEM05
Hypotheses These three structuring mechanisms (Discourse Framing, Headings and Thematic Progression) interact in discourse Theses mechanisms affect the global structuring of the text even if they don't cover the whole text Some cues specific to one mechanism could be used to describe another mechanism The properties of one may affect the properties of another Text type has an influence on the properties of these mechanisms
L.M. HoDac & M. Laignelet, ERSS 10
SEM05
So what... We have partially marked discourse phenomena that partially cover the text. •
• •
Organisation into sections: initial and final boundaries = Headings Discourse framing: initial boundary = Frame Introducer Discourse framing: Thematic progressions: continuity markers = coreferential expressions in Theme position (~grammatical subject)
We have some potential markers • • • • •
Headings ← document structure Temporal FI ← temporal adverbials occurring in initial position Pronominalization ← 3rd person pronoun in thematic position Relabel ← demonstrative NP occurring in thematic position Redenomination ← NP in thematic position whose head has already occurred in preceding text (limited to the section in progress)
L.M. HoDac & M. Laignelet, ERSS 11
SEM05
Corpora in study We expect these mechanisms to behave and interact differently according to the kind of text → study on two corpora : (1) Both corpora are concerned with location in time and sometimes in space with respect to a specific subject matter.
( 2 ) [atlas] (~80000 words) : single expository text concerning the
domain of social geography content : geographical distribution of a social phenomenon, here : the political evolution during the last 40 years in the West of France [geopo] (~250000 words) 32 argumentative texts about geopolitical issues. content : some concern recent events like the Iraq war, energy and environmental policies, the European community, the new deal between West and China, etc.
L.M. HoDac & M. Laignelet, ERSS 12
SEM05
Data annotation procedure A way of describing the interaction between these mechanisms consists in annotating properties when they co occur. In this case, data annotation procedures are :
Delimitation of areas of investigation : ➔
section that contains at least 2 temporal FIs → 160 segments and 548 FI
Selecting an annotation starting point : ➔
the FIs
L.M. HoDac & M. Laignelet, ERSS 13
SEM05
Annotations Frame introducer Morphology Type of temporal reference of the FI FI's position in the paragraph : does it occur in the 1st sentence ? FI's status : is it the only FI in the paragraph ? (and not in 1st sentence)
HeadingFI relations Presence/absence of temporal reference in heading ? Type of temporal reference in the heading Relations between FI's and heading's temporal references
TPFI relations Type of TP (linear, constant, derived or 0 if there is no TP) Realization of the TP (pronominalization, redenomination, relabel, other) Does the theme open a new TP i.e. is a first mention ? L.M. HoDac & M. Laignelet, ERSS 14
SEM05
Complementarity between Heading and FI Among the 160 headings of the extracted segments, only 16 % (26) have a temporal reference Laignelet (2004) : when there is temporal reference in heading, it's not used to structure a section with temporal framing.
Among the 87 FIs in these segments, 38 % have a temporal reference included in or similar to the one of the heading (whatever the FI's situation in paragraph) ➔
Not enough configurations to describe this relationship
L.M. HoDac & M. Laignelet, ERSS 15
SEM05
Relations between Framing and TP (1) Cooccurrence between framing and TP (2) Cooccurrence between FIs and certain types of TP realizations, i.e. pronominalization, redenomination and relabel. (3) Correlation between the opening of a frame and the beginning of a chain.
L.M. HoDac & M. Laignelet, ERSS 16
SEM05
Usually, framing does not occur during a TP Only 148 FIs (27%) occur during a TP*: [geopo] 22,5% [atlas] 40% in first sentence of paragraph: 27% → 12% (18/146)* ~ paragraph break is referential/thematic shift
often
associated
with
When a FI does occur during a TP, it is generally a Constant TP* * we don't know how many sentences are included in a TP L.M. HoDac & M. Laignelet, ERSS 17
SEM05
Heading : 1.1 Hypothèse sur l'émergence d'un nouveau modèle électoral des années 1980 dans l'Ouest
FI : au cours des années 1980 à la fin des années 1980...
TP : no TP constant
L.M. HoDac & M. Laignelet, ERSS 18
marker : relabel La relation même... SEM05
TP marking and framing _ 1 (Results only on linear and constant TP)
Redenomination is the most frequent strategy after an FI (when there is a TP) : 47/115 cases (40%)
[geopo] [atlas]
32 FI/redenomination cooccurrences (45%) 15 FI/redenomination cooccurrences (40%)
L.M. HoDac & M. Laignelet, ERSS 19
SEM05
Heading : 8.3. La Droite majoritaire ne recouvre pas se positions antérieures, Jospin atteint la Mitterrand 1981
FI : après avoir... en 1995 en même temps,
TP : constant
L.M. HoDac & M. Laignelet, ERSS 20
marker : redenomination L'Ouest SEM05
TP marking and framing _ 2 (Results only on linear and constant TP)
[geopo] 32 FI/redenomination cooccurrences (45%) [atlas] 15 FI/redenomination cooccurrences (40%)
16 FI/pronominalization cooccurrences (42%!) → in [atlas], TP seems to dominate the framing [atlas]
structuring most often than in [geopo] (14 FI/pro = 21%) When FI is isolated (45 cases), these percentages increase: [atlas] (FI/pronoun) = 47% [atlas] [geopo] (FI/pronoun) = 30% note : 56% FIs are isolated in [atlas], against 44% in [geopo] [atlas] L.M. HoDac & M. Laignelet, ERSS 21
SEM05
Heading : 1.1 un premier tour de renverse
FI isolated : en 1993
TP : constant
L.M. HoDac & M. Laignelet, ERSS 22
marker : pronoun il (National Front) SEM05
Frame opening and chain beginning 13% of FIs occur in a sentence where Theme expresses a first mention [atlas] = 15% [atlas] [geopo] = 12,5% Percentages increase in paragraph 1st sentence [atlas] = 20,5% [atlas] [geopo] = 15,5%
Percentages slightly decrease when FI is isolated [atlas] = 13% [atlas] [geopo] = 11%
L.M. HoDac & M. Laignelet, ERSS 23
SEM05
FI in 1 st sentence of section : Depuis les débuts de ...,
TP : no TP constant
L.M. HoDac & M. Laignelet, ERSS 24 L.-M. Ho-Dac & M. Laignelet
marker : complete definite description reduced demonstrative NP SEM05 SEM-05, Biarritz
To conclude First results: ➔
➔
Framing and TP are two structuring mechanisms that can work together in discourse organization This interaction entails some hierarchical relations Ex: TP sometimes seems stronger than framing and sometimes conversely
Further analysis: ➔
Take another starting point: TP ➔ start from relative distributions of framing/TP co occurrence ➔ observe if TP's marking does affect framing segmentation ➔ describe heading/theme relations
Methodological advances: ➔ ➔
Use of data for other analyses Operational descriptions
L.M. HoDac & M. Laignelet, ERSS 25
SEM05
Depuis la fin de la guerre froide, le débat entre spécialistes des relations transatlantiques s'est trop souvent contenté d'osciller entre les bons sentiments et la simplification. Il ne s’est pas suffisamment porté sur l’ampleur des changements […]. Plus récemment, la discussion s'était portée sur un éloignement supposé des valeurs sociales entre les deux rives de l'Atlantique, auquel les événements du 11 septembre 2001 ont au moins provisoirement mis fin. Ce débat se poursuit, mais il est maintenant limité à la sphère de l'analyse sociale. En termes de politique étrangère, cette discussion sur la dérive des continents a pris la forme d'une opposition entre l'unilatéralisme de la politique américaine et le multilatéralisme de leurs partenaires européens. L.M. HoDac & M. Laignelet, ERSS 26
SEM05
Bibliography
Charolles M. (1997): “L'encadrement du discours ; univers champs domaines et espaces”, Cahier de Recherche Linguistique, 6, LanDisCo université Nancy2. Charolles M. (1988): Les plans d'organisation textuelle : périodes chaînes, portées et séquences, in Pratiques, 57, 1988. Corblin F. (199/85): “Les formes de reprise dans le discours : Anaphores et chaînes de référence”, Presses Universitaires de Rennes. Daňes F. (1974): “Functional sentence perspective and the organisation of the text”, in F.Daneš (ed), Papers on functional sentence perspective, Mouton : La Hague, 106128. Fayol, M. (1997): Des idées au texte : Psychologie cognitive de la production verbale, orale et écrite, Paris : Presses Universitaires de France. HoDac LM., Jacques MP. & Rebeyrolle J. (2004): “Sur la fonction discursive des titres”, in S. Porhiel & D. Klinger (eds), L'unité texte, pp. 125152. Pleyben: Perspectives. . Laignelet (2004): Les titres et les cadres de discours temporels structuration des discours et organisation de l'information, Master's thesis, Université Toulouse Le Mirail, 2004. Manuélian (2003): “Coreferential Definite and Demonstrative Descriptions in French: A Corpus Study for Text Generation”, in : ESSLLI Student Session, Vienna, Austria, August 2003. L.M. HoDac & M. Laignelet, ERSS 27 SEM05