Mastering noise and silence in learner answers processing Simple

Mastering noise and silence in learner answers processing ... Contextual knowledge. Enriched expected answer. Diagnosed production .... the same category.
261KB taille 0 téléchargements 183 vues
Mastering noise and silence in learner answers processing Simple techniques for analysis and diagnosis Olivier Kraif, Claude Ponton, Alexia Blanchard LIDILEM Laboratory, Stendhal University, Grenoble, France {olivier.kraif; claude.ponton; alexia.blanchard}@u-grenoble3.fr

■ Learner answers analysis Error description

A real need for high quality feedback but4 Some facts: ♦ In most systems, analysis is only made by testing character string identity ♦ NLP techniques in the field of CALL are underused due to: ∗ the lack of reliability (noise, erroneous analyses) ∗ the high cost of implementation ♦ Lack of systematic follow up on experiments ♦ Overambitious and hardly attainable goals

Some hopes: ♦ Error detection alone may be a valuable step towards didactic use ♦ Some straightforward and basic NLP techniques are reliable enough ♦ To cope with the lack of reliability, it is possible to put forward "Computer Aided" approaches rather than "Automatized" processes (correction, evaluation, feedback generation, activity generation, etc.)

Diagnosis

Enriched learner production

Detection/ description

Diagnosed production

Annotation

Generic NLP processes

Specific NLP processes (triangulation)

Enriched expected answer

Towards a low cost strategy An empirical approach based on the following principles: • Identifying the applications which allow the user to keep some leeway in interpreting results (partial analyses, unsolved ambiguities, etc.) ⇒ machine aided correction, comprehension aids, activity generators, contentoriented tools • Implementing first the most basic and reliable NLP techniques such as tokenization, POS tagging, lemmatization, morphological analysis. • Mastering, from the end-user (i.e. didactic) point of view, the short comings of Natural Language Processing. For instance, in the context of an activity, the knowledge about the expected answer (EA) may yield additional data for the given answer (GA) analysis. • When ambiguities remain, multiple analyses may be integrated into the learning process, in order to help users (teachers or learners) to make the right decisions. • Developing a modular and declarative approach designed for resources and processes reusability, and allowing end-users to define by themselves the relevant knowledge and parameters.

Learner production (GA)

♦ ♦ ♦

Expected Answer (EA)

Lemmatization POS tagging Morphological analysis

Contextual knowledge

Activity

Didactic knowledge Feedback generation

ExoGen

Learner

■ The ExoGen system General principle

Examples -

Simplification of triangulation : The analysis is reduced to a comparison between EA and GA (no contextual analysis). Resource: online inflected forms dictionary (http://abu.cnam.fr/) glace glacé glacent glacera glaceraient

glacer glacer glacer glacer glacer

Ver:IPre+SG+P1:IPre+SG+P3:SPre+SG+P1:SPre+SG+P3:ImPre+SG+P2 Ver:PPas+Mas+SG Ver:IPre+PL+P3:SPre+PL+P3 Ver:IFut+SG+P3 Ver:CPre+PL+P3

Analysis principle : Lesser difference heuristic, the analysis is guided by similarities between potential tags of both EA and GA EA: si j'avais su GA: si j'aurais su Common tags : Ver+SG

Category : Ver Tags : IImp+SG+P1 or IImp+SG+P2 Category : Ver Tags : CPre+SG+P1 or CPre+SG+P2 Disambiguated difference: IImp ¹ CPre Not disambiguated : P1 or P2

Examples of error

Description (automatically generated)

(9) avant de retourner [arriver] en Angleterre

Forme grammaticalement correcte (verbe infinitif), mais on attendait une autre forme

et beaucoup d’échafaide [échafaudages]

Orthographe erronée ou mot inconnu du dictionnaire

Je dois me dépécher [dépêcher]

Orthographe erronée : problème d’accent

(9) sommes bien amusées et c’est vrai [juste] de dire que nous avons Forme grammaticalement correcte (adjectif ou adverbe ou nom mascudansé assez bien lin singulier), mais on attendait une autre forme C’était désespéré [désespérant] mais c’était la seule chance (9)

S’il s’agit du verbe désespérer : Cas 1 [Masculin singulier] : On attend un participe présent et non un participe passé

Pour moi l’ [cette] image crée une ambiance délassante

Forme grammaticalement correcte sur le plan de la catégorie (déterminant), mais on attendait une autre forme avec d’autres traits

Le Premier ministre reste toujours un britannique [Britannique]

Exact, mais il faut une majuscule à l’initiale

Legend : Error found [correction]

Evaluation of error descriptions

EA=G A a fte r g r a p h ic a l n o r m a lis a tio n

All cases

Non ambiguous

Totally disambiguated

Partially disambiguated

Not disambiguated

Correct

312

187

104

14

7

Incorrect

6

1

5

0

0

Precision

0,981

0,995

0,954

1

1

fa ls e

tr u e

C ase, s p a c in g ,... d iffe r e n c e s

Frida corpora (Granger, 2001)

G A = unknow n

fa ls e

tr u e

Forthcoming: integration of a morphological analyzer (Blanchard, 2007)

G A a n d E A s h a re th e s a m e le m m a

G A c lo s e to E A

Aim: morphological analysis of unknown forms (paradigm confusion) fa ls e

tr u e

GA = EA e x c e p t d ia c r itic s

tr u e

D ia c r itic d iffe r e n c e s

fa ls e

O r th o g r a p h ic a l d iffe r e n c e

tr u e

A fo r m c lo s e to G A e x is ts in th e le x ic o n

tr u e O r th o g r a p h ic a l d iffe r e n c e : lis tin g o f th e n e a r e s t fo r m s

G A a n d E A s h a re th e s a m e c a te g o r y tr u e

fa ls e

U nknow n fo r m

fa ls e

Tag d iffe r e n c e s

tr u e

fa ls e

fa ls e

G A and EA s h a r e th e s a m e ta g s

Tag and c a te g o r y d iffe r e n c e s

e.g. e.g. e.g. e.g. e.g. e.g. "échafaide" "égales" "considère" "dépécher" "comtempler" "CEE" instead of instead of instead of instead of instead of instead of "échaffaudage" "égaux" "considérer" "dépêcher" "contempler" "C.E.E."

■ Perspectives

G A a n d E A s h a re th e s a m e c a te g o r y

tr u e Lem m a d iffe r e n c e s

fa ls e Lem m a and ta g d iffe r e n c e s

e.g. e.g. "prennent" "souffrons" instead of instead of "saisissent" "subirons"

L e m m a , ta g a n d c a te g o r y d iffe r e n c e s e.g. "mieux" instead of "préférables"

General principle: segmentation of inflected forms into a [base form + inflection(s)] which are interpreted linguistically 1. Integration into generic NLP processes in order to reduce numbers of unknown forms and therefore to generate an analysis 2. Modifying tree analysis with checking inflectional model Example GA: attitudent EA: attitudes

Category: N Category: N

Tags: fem,plu Tags: fem,plu

Model: inflection [-ent] (plu) Model: inflection [-s] (plu)

This analysis allows description of “attitudent” as flexional error on plural

 Completion of lesser difference analysis: integration of a wordnet or a thesaurus (semantic distance between lemmas)  Context analysis in order to disambiguate more precisely (depending on triangulation EA/GA/Context)  Definition of declarative rules to design a diagnosis process based on the lesser difference analysis (detection/description level). These rules should be applicable even in case of residual ambiguity (e.g. suggestions, hypothesis, more general diagnosis,...)  Experimentation (work in progress): past participle agreement errors analysis in perfect tense (“passé composé”). Evaluation with end-users: French as a Foreign Language teachers / learners