Artificial Intelligence Natural Language Processing Hoá NGUYEN College of Technology, Vietnam National University, Hanoi
6 May 2006
[email protected]
Agenda
Introduction : From words to meaning
Difficulties
Understanding
Generation
Applications: Spoken Dialogue System
AI++ - Hoá NGUYEN @ 2006
2
Introduction
What is NLP?
Systems using natural language as modality to interact with users
NLP concerns:
Understanding spoken and written language. Generating written and spoken language. Language
Systems
Language
AI++ - Hoá NGUYEN @ 2006
3
Alternative Views on NLP
Computational models of human language processing
Computational models of human communication
Programs that operate internally the way humans do
Programs that interact like humans
Computational systems that efficiently process text and speech
AI++ - Hoá NGUYEN @ 2006
4
Language Engineering Discourse Pragmatic
Speech Applications
Semantic Syntactic Lexical Phonetic
Core speech technologies
Articulatory Acoustic
AI++ - Hoá NGUYEN @ 2006
5
Applications…
Translation
Tout ce que vous produisez pour credit dans ce cours doit etre votre propre travail. Vous pouvez parler avec les autres etudiants (et les professeurs) de votre approche du probleme, mais ensuite vous devez rsoudre le probleme par vous-meme. Ce n’est pas seulement la facon la plus ethique d’apprendre le contenu de cette classe, mais aussi la plus efficace.
AI++ - Hoá NGUYEN @ 2006
Everything you do for credit in this subject is supposed to be your own work. You can talk to other students (and instructors) about approaches to problems, but then you should sit down and do the problem yourself. This is not only the ethical way but also the only effective way of learning the material.
All that you produce for credit in this course must be your own work. You can speak with the other students (and the professors) about your approach about the problem, but then you must solve the same problem by you. It is not only the most ethical way to learn the contents from this class, but also most effective. 6
Applications…
Information Extraction: Map a document collection to structured database Firm XYZ is a full service advertising agency specializing in direct and interactive marketing. Located in Bigtown CA, Firm XYZ is looking for an Assistant Account Manager to help manage and coordinate interactive marketing initiatives for a marquee automative account. Experience in online marketing, automative and/or the advertising field is a plus. Assistant Account Manager Responsibilities Ensures smooth implementation of programs and initiatives Helps manage the delivery of projects and key client deliverables ... Compensation: 50,000-$80,000 Hiring Organization: Firm XYZ
INDUSTRY POSITION LOCATION COMPANY SALARY
Advertising Assistant Account Manager Bigtown, CA. Firm XYZ $50,000-$80,000
AI++ - Hoá NGUYEN @ 2006
7
Applications…
Text summarization:
AI++ - Hoá NGUYEN @ 2006
8
Applications
Conversational Agent S1: Hello. You’ve reached the [Communicator.] Tell me your name U2: Hi I’d like to fly to Seattle Tuesday morning S3: Travelling to Seattle on Tuesday, August 11th in the morning. U4: Yes. S5: Your full name? U6: John Doe …
Other NLP applications
Grammar Checking Sentiment Classification Report Generation …
AI++ - Hoá NGUYEN @ 2006
9
Agenda
Introduction : From words to meaning
Difficulties
Understanding
Generation
Applications: Spoken Dialogue System
AI++ - Hoá NGUYEN @ 2006
10
Why is NLP hard?
A lot of difficult problems
ambiguity anaphora indexicality discourse structure metonymy metaphor …
Example: “At last, a computer that understands you like your mother”
AI++ - Hoá NGUYEN @ 2006
11
Ambiguity
Acoustic level (speech recognition)
“ . . . a computer that understands you like your mother” “ . . . a computer that understands you lie cured mother”
Syntactic level
AI++ - Hoá NGUYEN @ 2006
12
Ambiguity
Word sense ambiguity – Semantic meaning level
Two definitions of “mother”
a woman who has given birth to a child a stringy slimy substance consisting of yeast cells and bacteria; is added to cider or wine to produce vinegar
Discourse level
Alice says they’ve built a computer that understands you like your mother. But she doesn’t know any details ! Anaphora problem !
AI++ - Hoá NGUYEN @ 2006
13
Anaphora
Using pronouns to refer back to entities already introduced in the text
After Mary proposed to John, they found a preacher and got married. For the honeymoon, they went to Hawaii Mary saw a ring through the window and asked John for it Mary threw a rock at the window and broke it
AI++ - Hoá NGUYEN @ 2006
14
Other problems
Indexicality: Indexical sentences refer to utterance situation (place, time, S/H, etc.)
Metonymy: Using one noun phrase to stand for another
I am over here Why did you do that?
I've read Shakespeare Chrysler announced record profits The ham sandwich on Table 4 wants another beer
Metaphor: “Non-literal” usage of words and phrases, often systematic:
I've tried killing the process but it won't die. Its parent keeps it alive.
AI++ - Hoá NGUYEN @ 2006
15
Agenda
Introduction : From words to meaning
Difficulties
Understanding
Generation
Applications: Spoken Dialogue System
AI++ - Hoá NGUYEN @ 2006
16
Knowledge Required
What knowledge do we (as humans) use to make sense of language?
Knowledge of how words sound
Knowledge of how words can be composed into sentences (grammar)
“cat” == “c” “a” “t”
The can sat on the mat OK sat mat can on the NO
Knowledge of people, events, the world, types of text.
Recognizing adverts for what they are. Understanding indirect requests “I don’t quite understand this” as request for help.
AI++ - Hoá NGUYEN @ 2006
17
Components of NL
Sound structure: phonetics, phonology
Word structure: morphology and morphophonemic
Phrase structure – syntax: combinations of words
Semantic structure: meaning of utterance/phrase
Pragmatic and discourse structure: reasoning about the actions, beliefs, causes, intentions…
AI++ - Hoá NGUYEN @ 2006
18
Stages of processing To deal with complexity, can process language in series of stages: speech recognition
syntactic analysis
using grammar of language to get at sentence structure.
semantic analysis
using knowledge of how sounds make up words.
mapping this to meaning
pragmatics
using world knowledge and context to fill in aspects of meaning.
AI++ - Hoá NGUYEN @ 2006
19
Syntactic Analysis
We will focus on syntax. How do we recognise that a sentence is grammatically correct?
The cat sat on the mat. OK On the the sat cat mat. NO.
More importantly, how to we use knowledge of language structures to assign structure to a sentence (helping in deriving its meaning).
(The large green cat) (sat on (the small mat)) Bracketed bits are meaningful subparts.
AI++ - Hoá NGUYEN @ 2006
20
Grammars
Grammars define the legal structures of a language.
We “parse” a sentence using a grammar to:
Determine whether it is grammatical. Assign some useful structure/grouping to the sentence.
We want the words denoting an object to be grouped together, and words denoting actions to be grouped together.
21
AI++ - Hoá NGUYEN @ 2006
Syntactic Categories
Grammars based on each word belonging to a particular category:
nouns verbs adjectives adverbs articles/determiners
Example
The black cat jumps article adjective noun verb
AI++ - Hoá NGUYEN @ 2006
quickly adverb
22
Larger groupings
Noun phrase: sequence of words denoting an object.
E.g.: the black cat.
Verb phrase: sequence of words denoting an action. E.g.,
jumps quickly runs after the small dog kicks the small boy with the funny teeth
! Note that verb phrases may contain noun phrases.
AI++ - Hoá NGUYEN @ 2006
23
Simple NL Grammar
We can write a simple NL grammar using phrase structure rules such as the following:
sentence --> nounPhrase, verbPhrase. nounPhrase --> article, adjective, noun.
verbPhrase --> verb, nounPhrase.
This means:
a sentence can consist of a noun phrase followed by a verb phrase. A noun phrase can consist of an article, followed by an adjective, followed by a noun.
Rules define constituent structure.
AI++ - Hoá NGUYEN @ 2006
24
Parsing
Using these rules we can determine whether a sentence is legal, and obtain its structure. Example: “The large cat eats the small rat”, this consists of: Noun Phrase: The large cat Verb Phrase: eats the small rat The verb phrase in turn consists of:
verb: eats Noun Phrase: the small rat
25
AI++ - Hoá NGUYEN @ 2006
Parse Tree
This structure can be represented as a tree: sentence noun phrase
article adjective noun
The large cat
AI++ - Hoá NGUYEN @ 2006
verb phrase
verb
noun phrase
article adjective noun eats the small rat
26
Parse Tree
This tree structure gives you groupings of words. (e.g., the small cat).
These are meaningful groupings - considering these together helps in working out what the sentence means.
AI++ - Hoá NGUYEN @ 2006
27
Parsing
Basic approach is based on rewriting.
To parse a sentence you must be able to “rewrite” the “start” symbol (in this case sentence) to the sequence of syntactic categories corresponding to the sentence.
You can rewrite a symbol using one of the grammar rules if it corresponds to the LHS of a rule. You then just replace it with the symbols in LHS. e.g.,
sentence nounPhrase verbPhrase article adjective noun verbPhrase Etc.
AI++ - Hoá NGUYEN @ 2006
28
A little more on grammars
Example grammar will ONLY parse sentences of a very restricted form. What about:
“John jumps” The man jumps”. John jumps in the pond.
We need to add extra rules to cover some of these cases
AI++ - Hoá NGUYEN @ 2006
29
Extended Grammar
sentence --> nounPhrase, verbPhrase. nounPhrase --> article, adjective, noun. nounPhrase --> article, noun. nounPhrase --> properName.
verbPhrase --> verb, nounPhrase. Think how you verbPhrase --> verb. might handle “in the pond”..? Grammar now parses:
John jumps the pond.
And fails to parse ungrammatical ones like:
jumps pond John the
AI++ - Hoá NGUYEN @ 2006
30
NL Grammars
A good NL grammar should:
cover a reasonable subset of natural language. Avoid parsing ungrammatical sentences
(or at least, ones that are viewed as not acceptable in the target application).
Assign plausible structures to the sentence, where meaningful bits of the sentence are grouped together.
But.. The role is NOT to check that a sentence is grammatical. By excluding dodgy sentences the grammar is more likely to get the right structure of a sentence.
AI++ - Hoá NGUYEN @ 2006
31
More on grammars
Consider following examples:
“John likes.” NOT OK “John jumps.” OK “John jumps in the water,” OK “The small fluffy cat jumps.” OK John like the cat. NOT OK. The cats likes John. NOT OK. The cat on the table likes John. OK
AI++ - Hoá NGUYEN @ 2006
32
Better grammar
Should deal with:
Intransive/Transitive verbs. Former are ones that don’t need following noun phrase. Prepositional phrases (e.g., in the lake). Prepostion followed by noun phrase. Series of adjectives. Recursive rule can be used.. Subject-verb agreement. Can add arguments to grammar rules/dictionary entries.
sentence --> np(Num), vp(Num). np(Num) --> art, noun(Num). noun(sing) --> [cat].
AI++ - Hoá NGUYEN @ 2006
33
Semantics
Syntax: Uses grammar to structure sentence. Semantics: Maps this to a structured representation that can be used in inference. (often referred to as sentence meaning) Possible representations:
SQL. Map “Find me all the students who are taking AI3” to relevant SQL query. Predicate Logic: Map “John loves anyone who is tall” onto relevant statement in predicate logic. Other structured rep: e.g., “case frame”: action: loves subject: john object: mary
AI++ - Hoá NGUYEN @ 2006
34
Semantics
How do we get from the parsed sentence to this kind of representation? In general rather tricky, but to illustrate idea we will show how it could be done for “John loves Mary” by adding extra arguments to a prolog grammar. We want to map that sentence to
loves(john, mary).
We will cheat by assuming that the functor pf Prolog structured objects can be a variable.
Verb(Object, Subject)
AI++ - Hoá NGUYEN @ 2006
35
Grammar with Semantics Sentence(Verb(Subject, Object)) --> nounPhrase(Subject), verbPhrase(Verb, Object). nounPhrase(Subject) --> properName(Subject). verbPhrase(Verb, Object) --> verb(Verb), nounPhrase(Object).
General idea is that we can “compose” the sentence meaning by working out the “meaning” of the syntactic constituents and sticking the results together somehow.
AI++ - Hoá NGUYEN @ 2006
36
Pragmatics
But can’t get very far without knowing something about the world, and the context in which a sentence is uttered. Pragmatics deals with this. Example. Determining referents of pronouns etc.
“John likes that blue car. He buys it.” We need context to determine what he is referring to in “that blue car”, “he”, it”. Then can create meaning: likes(john, car1) and buys(john, car1).
AI++ - Hoá NGUYEN @ 2006
37
Pragmatics
Pragmatics is also about what people DO with language. Making sense of, and generating language involves mapping language to goals.
“Do you have the time?” -> speaker wants to know the time. “When is the last train to London?” -> speaker probably wants to go there.
We can apply some of our planning ideas to this problem.
AI++ - Hoá NGUYEN @ 2006
38
Pragmatics and Plans
As an example of a plan-based approach to language, consider the actions of requesting, informing, asking. Referred to as “speech acts”. We can describe these as planning operators. The preconditions and effects refer to speaker and hearer’s beliefs and desires. We use a notation to describe these:
knows(Agent, Fact) wants(Agent, State/Action)
e.g.
wants(fred, kiss(fred, mary)) knows(fred, loves(mary, joe))
AI++ - Hoá NGUYEN @ 2006
39
Putting it all together
Given sentences like spoken by John about Fred:
“What is the time? He has missed the train.
Can now
parse the sentence map that to a structured representation that is good for inference. Use context and knowledge of goals/plans to obtain from that:
wants(john, know(john, time1)) (where time1 is the time at some instant) believes(john, missed(fred, train2))
AI++ - Hoá NGUYEN @ 2006
40
Agenda
Introduction : From words to meaning
Difficulties
Understanding
Generation
Applications: Spoken Dialogue System
AI++ - Hoá NGUYEN @ 2006
41
Language Generation
Language processing also about generation of language.
Structured representation --> NL text.
Simplest generation method is using templates, mapping representation straight to text template (with variables/slots to fill in).
loves(X, Y) -> X “loves” Y gives(X, Y, Z) -> X “gives the” Y “to” Z
AI++ - Hoá NGUYEN @ 2006
42
Language Generation
But much more to language generation in general. Templates are very rigid.
Consider “John eats the cheese. John eats the apple. John sneezes. John laughs.” Better as “John eats the cheese and apple, then sneezes. He then laughs.”
Getting good style involves working out how to map many facts to one sentence, when to use pronouns, when to use “connectives” like “then”.
AI++ - Hoá NGUYEN @ 2006
43
Language Generation
Serious language generation involves deciding:
What to say. How to order and structure it. How to break it up into sentences. How to refer to objects (using pronouns, and expressions like “the cat” etc). How to express things in terms of grammatically correct sentences.
Often starting point is a communicative goal
AI++ - Hoá NGUYEN @ 2006
44
Agenda
Introduction : From words to meaning
Difficulties
Understanding
Generation
Applications: Spoken Dialogue System
AI++ - Hoá NGUYEN @ 2006
45
Human Conversation
Human data is used to inform design of conversational systems
scheduling assistant cross-language information access
Computational questions:
how to represent structural information in dialogue? how to compute this representation?
AI++ - Hoá NGUYEN @ 2006
46
Speech act
Austin (1962): An utterance is a kind of action
One utterance – three acts:
Locutionary act: the utterance of a sentence with a particular meaning Illocutionary act: the act of asking, answering, promising, etc., in uttering a sentence Perlocutionary act: the (often intentional) production of certain effects upon the thoughts, feelings, or actions of addressee in uttering a sentence
AI++ - Hoá NGUYEN @ 2006
47
Speech act
Example: “You can’t do that !”
Locutionary force:
Illocutionary force:
Imperative Protesting
Perlocutionary act:
Intent to annoy addressee Intent to stop addresses from doing something
AI++ - Hoá NGUYEN @ 2006
48
Five classes of Speech Acts (Searle, 1975)
Assertives: committing the speaker to something’s being the case (suggesting, putting forward, swearing, boasting)
Directives: attempts by the speaker to get the addressee to do something (asking, ordering, requesting)
Commisives: committing the speaker to future course ofaction (promising, planning, vowing, betting, opposing)
Expressives: expressing the psychological state of the speaker about a state of affairs (thanking, apologizing, welcoming, deploring)
Declarations: bringing about a different state of the world via the utterance (I resign; You’re fired)
AI++ - Hoá NGUYEN @ 2006
49
Dialogue act
An act with associated structural information related to its dialogue function Multiple classification schemes have been developed in the past These schemes combine ideas from Searle, Austin and others, but details may change from one domain to another Ex: Meeting organizing task
Two-party scheduling dialogues Speakers were asked to plan a meeting at some future date
AI++ - Hoá NGUYEN @ 2006
50
Spoken dialogue system - Allô, Mr. Dupuis?
- Please book the roomhave Lafayette - Mr. Dupont would Lafayette Yes… forroom tomorrow at 10h for an important meeting, so Cool, and inform also allroom - -Tell him I need it and I this ask him could you let him have ofmorning, PVE project tomember give up me this room. tomorrow please?
please. - Great, Thank you, Mr. Dupuis.
Dupont
- Allô, Mr. this Dupont - Sorry, room?has been Yes…. - M.booked Dupuisbyhas Mr.agreed Dupuisto let you OK, but move my have- OK, the room I will Lafayette call him and I will reservation to the next day.. let you know about the result. tomorrow. - OK, I will call them immediately
Spoken Spoken Spoken Dialogue Dialogue Dialogue System System Dupuis
Services Services
51
AI++ - Hoá NGUYEN @ 2006
Dialogue management Generic algorithm of speech turn treatment: Fp
π • Detect and handle incomprehension, • Calculate the adequate dialogue strategy, • Invoke the task manager to perform necessary inference, • Determine the theme, the goal to refine, • Generate the acts of system. Next turn (π+1)
AI++ - Hoá NGUYEN @ 2006
52
Architecture
Utterance
Utterance
Automatic Speech Recognizer
TTS
Orthographical String Parser
Generator
Actions
Semantic Schema Pragmatic Interpreter
Task Manager Acts
Dialogue Manager 53
AI++ - Hoá NGUYEN @ 2006
Multisession dialogue Emergence session S
Welcome to the meeting organization service of the CLIPS laboratory, I am Melina and you?
D
Good morning, Jean Caelen is speaking
S
Good morning Director! Can I help you?
D
Please book the Lafayette room for me on Thursday from 15 p.m. to 17 p.m.
S
I am sorry, but this room is already reserved! More precisely, Lafayette room was booked by Mr. Yannick Fouquet from 14 p.m. to 16 p.m., and by Mrs. Solange Hollard from 16 p.m. to 17 p.m. on Thursday 7 April 2005
D
Tell them I absolutely need this room and I ask them to change it…
S
OK, I will call them and I will give you the result later.
AI++ - Hoá NGUYEN @ 2006
54
Multisession dialogue Negotiation session with P1 S
Hello, here is Melina, the meeting organization service of the CLIPS laboratory! Are you Mr. Yannick Fouquet?
P1
Yes, it’s me…
S
Mr. Jean Caelen has an important 2 hours meeting from 15 p.m. Thursday 7 April 2005. He asks you to let him have the Lafayette room. Would this be possible for you?
P1
Let me see, um, OK, yes, it should be possible.
S
Do you want to put back your meeting to another day? For example, Friday 8 April 2005
P1
No no, book me the Aquarium room please. I also need a video-projector
S
OK, this room has 30 places, and you will have it as well as a videoprojector at 14 p.m. for 2 hours on Thursday 7 April 2005.
P1
Very good!
S
Do you want me to inform the participants?
P1
No, I will do it myself.
S
OK.
P1
Thanks and good bye
S
Thank you for your generosity! Good bye, Mr. Fouquet. 55
AI++ - Hoá NGUYEN @ 2006
Multisession dialogue Negotiation session with P2 S
Hello, here is Melina, the meeting organization service of the CLIPS laboratory! Are you Mrs. Solange Hollard?
P2
Yes
S
I am contacting you about your Lafayette room reservation. Mr. Jean Caelen has an important 2 hours meeting from 15 p.m. Thursday 7 April 2005. Could you let him have this room, please?
P2
I am sorry but this is not possible, I already fixed my agenda…
S
Could you move your meeting to another day, Friday 8 April 2005, for example?
P2
No, I don’t want to change my date…
S
I see, I will have to find another solution for him! Thank you, Mrs. Hollard and see you later.
AI++ - Hoá NGUYEN @ 2006
56
Multisession dialogue Notification session S
Hello, here is Melina, the meeting organization service of the CLIPS laboratory! Are you Mr Jean Caelen?
D
Yes
S
Mr. Yannick Fouquet agreed to let you have the Lafayette room at 15 p.m. Thursday 7 April 2005. However, Mrs. Solange Hollard did not agree to modify her reservation! Could you therefore choose another day please?
D
That’s a pity! OK, move my meeting to this Friday from 10a.m. to midday
S
You will have the Lafayette room at your disposition. It has 45 places
D
Very good
S
Would you like me to inform the participants
D
Yes, this would be kind. Inform all members of CLIPS for me please
S
By e-mail or by telephone?
D
By phone, of course.
S
OK, I will announce your meeting to them immediately
D
Thank you and bye
S
Good bye, Mr. Caelen.
AI++ - Hoá NGUYEN @ 2006
57
Summary
Natural Language Processing covers understanding and generating spoken and written language, from sentences to large texts. Focus on understanding sentences.
First step is to parse sentence to derive structure. Use grammar rules which define constituency structure of language. Parse gives tree structure which shows how words are grouped together. Analyze pragmatic & discourse to discover maximally meaning of text
AI++ - Hoá NGUYEN @ 2006
58