2008.02.26 HIS NTCIR Overview - Horizons

Feb 26, 2008 - NTCIR-6 Opinion Analysis. • Focused domain: Patent ... Text Summarization. Trend Information. T a .... Summarization. • Sentiment Analysis ...
4MB taille 3 téléchargements 267 vues
NTCIR Overview Noriko Kando David Kirk Evans National Institute of Informatics http://research.nii.ac.jp/ntcir/

Outline • • • • • • •

NTCIR Goals (Information Access) NTCIR Introduction Q&A History, NTCIR-7 Q&A NTCIR-6 CLIR NTCIR-6 Opinion Analysis Focused domain: Patent Research Corpora

Information Access Over Time

Royal Library at Monastery of San Lorenzo de El Escorial (Spain) http://flickr.com/photos/cuellar/370663920/

Information Access Over Time British Library reading room http://flickr.com/photos/sifter/370775225/

Card Catalog http://flickr.com/photos/emdot/45249090/

Information Access Over Time

5.25” Floppy http://flickr.com/photos/jrpeng/1960841100/

Apple //e http://flickr.com/photos/lrosa/2182634449/

Information Access Over Time Seattle Public Library, computer terminals http://flickr.com/photos/fugutabetai/214479393/ http://flickr.com/photos/fugutabetai/214476374/

Information Access Over Time Internet Search, Desktop Search, Enterprise Search… We went from search mediated by interaction with experts to automatic keyword search. The future: more interaction, more intelligence. Information retrieval, question answering, summarization, information extraction are merging.

NTCIR Workshop is : A series of evaluation workshops designed to enhance research in information access technologies by providing infrastructure of large-scale evaluation. Project started late 1997, Once per 1½ years 1st : Nov 1998- Sept 1999

2nd : June,2000– March,2001 3rd : Sept 2001- Oct 2002 4th: Apr 2003 – June 2004 5th: Oct 2004 – Dec 2005

6th: April 2006 – May 2007 7th: Oct 2007 – Dec 2008 * Nii Test Collection for Information Retrieval systems

Focus of NTCIR Lab-type IR Test Asian Languages/cross-language Variety of Genre Parallel/comparable Corpus

New Challenges Intersection of IR + NLP To make information in the documents more usable for users! Realistic eval/user task

Forum for Researchers Idea Exchange Discussion/Investigation on Evaluation methods/metrics

NTCIR workshop: Number of Participating Groups registered 6th Workshop

12

5th workshop

85 77

15

4th workshop

74

10

3rd workshop

65

9

2st workshop

#of registered

36

8

1st workshop

28

6

0

104

20

# of groups # of countries

40

60

80

100

Tasks (Research Areas) of NTCIR Workshops Japanese IR T Cross-lingual IR a Patent Retrieval s map/classif k Web Retrieval s Navigational Geo Result Classification Term Extraction QuestionAnswering Info Access Dialog Summ metrics Cross-Lingual Text Summarization Trend Information Opinion Analysis

1st sci

2nd

3rd 4th news

5th 6th

NTCIR Q&A History • Question Answering Challenge NTICR3, 4, 5 (Monolingual Japanese) – Factoid questions, list answers, Information Access Dialogue – NTCIR-6 Complex Monolingual QA

• CLQA NTCIR-5, 6 – EJ, JE, CE, EC, CC – EJ, JE, CE, EC, CC, JJ, EE

NTCIR-6 CLQA Japanese Run Forst-E-J Forst-J-J HARAD-J-J LTI-E-J LTI-J-J-u TITFL-E-J TITFL-J-J TTH-E-J TTH-J-J

NTCIR-6 Right 0.175 0.310 0.085 0.095 0.335 0.030 0.155 0.130 0.270

Right+ Unsupported 0.195 0.335 0.110 0.115 0.360 0.065 0.190 0.165 0.295

NTCIR-5 Right 0.125 0.170 0.100 0.080 -

Right+ Unsupported 0.155 0.265 0.125 0.200 -

NTCIR-6 CLQA Findings • CL vs Mono – – – – –

E-J vs J-J: about 50% of Accuracy E-C vs C-C: “Veterans” worked better LTI, PIRCS 60% IASL, WMMKS 40%, 47.2 Other newcommers less than 20%

• Synonyms – QID T0054: What is Japan’s unemployment rate for May of 1997? no answers reported – QID T0123: What was the Japan’s jobless in May 1997

• IR for QA – IR module showed largest performance drop in module by module analysis. (CMU-LTI) – Extrinsic Evaluation of IR?

ACLIA • Complex CLQA – Focus on the evaluation of definition questions, timeline questions and relationship questions – EJ, EC, JJ, CC (Simplified + Traditional)

• IR for QA – Allows IR participants to submit only an IR system, evaluated in an Q&A framework

NTCIR-6 CLIR • Stage 1: Ad-hoc retrieval CJKE->CJK docs, 2000-2001 news documents, multi-, bi- and single language runs • Stage 2: Run systems over NTCIR-3 to 5 test collections for higher reliability of evaluation – Do system scores change based on corpora used to evaluate them?

NTCIR-6 CLIR Comparison of best monolingual, bilingual runs DESC field and rigid relevance (S+A) Documents Mono. (base)

C 0.313 (100%)

J 0.325 (100%)

K 0.454 (100%)

C>X J>X K>X

0.078 (24.7%) 0.102 (32.6%)

0.312 (95.8%) 0.267 (82.1%)

N/A 0.287 (63.2%) -

E>X

0.191 (61.0%)

0.307 (94.4%)

0.292 (64.3%)

Results of STAGE2 • MAP across 3 different test collections: correlation coefficients by types of runs.

(a) C-C runs / E-C runs (lowest) NTCIR-5 NTCIR-5 NTCIR-4 NTCIR-3

1.000 0.956 0.952

NTCIR-4 0.562 1.000 0.957

NTCIR-3 0.645 0.946 1.000

NTCIR-6 Opinion Analysis • 3 annotators per sentence • ~ 20 docs/topic, ~30 topics • 1998-2001 data news data

Feature

Value

Opiniona ted Opinion Holder Relevant

YES, NO

Polarity

POS, NEG, NEU

String, multiple YES, NO

Req’ d? Y Y N N

NTCIR Patent History • NTCIR-3 Patent retrieval, 31 topics • NTCIR-4 invalidity search, patent map generation, 34 topics + 69 automatic • NTCIR-5 Document retrieval (invalidity) 1189 topics, passage retrieval, classification • NTCIR-6 Doc. Retrieval J+E – JA 1993-2002 3.5M docs, 45GB, 1685 topics – EN 1993-2002 980k docs, 1000 train 2221 test

• NTCIR-6 Patent Classification

Goal: Patent Map Creation Example : optical disk problems high density solutions

managing the number of rewriting shifting the writing position laser power pulse waveform

erasing

rewriting 1993-000003 1994-000002

1996-000005 1994-000008

patent map creation = multi faceted patent clustering

NTCIR-7: Focused Domain (Patent) Documents: 10 Years Japanese Patent Application (NTCIR4-5) 10 Years USTPO Patents (NTCIR6) Parallel Sentence Data (1.8 M sentences JE Pairs) Scientific Paper Abstracts (NTCIR 1-2) Patent Translation (PATMT) MT is key for CLIR Training: 1993-2000, Test: 2001-2002 One Ref Trans good?? Intrinsic Eval. ;BLEU, human assessments Extrinsic Eval: CLIR task-based

Patent Mining (PATMN) Cross-Genre PAT & Scientific Classify Paper Abstracts in to IPC Classes ML approach: Classsify Absts to IPC Class IR Approach: use invalidity search system to find relevant Patent, then assign IPCs to Paper Absts.

Research Corpora • Cross-Lingual IR – – – – –

JE Scientific Abstracts (NTCIR 1,2), ~340k docs CJKE 98-99 News (NTCIR-3), ~690k, 50 topics CJKE 98-99 News (NTCIR-4), ~1.5M, 60 topics CJKE 00-01 News (NTCIR-5), ~2.2M, 50 topics CJKE -5 minus English, 50 topics from -3,-4

• Patent – – – –

J full 98-99, E abs. 95-99 (-3), 4.1M, 31 topics J full, E abs.1993-1997 (-4), 3.4M, 34 topics J full, E abs. 1993-2002 (-5), 6.9M, 1223 topics J full, E full 1993-2000 (-6), ~982k, 3221 topics

Research Corpora • Web – 100GB html, crawled in 2001, 47 J topics – 1.36TB html, crawled in 2004, 1116 J topics

• QA – – – – –

NTCIR-3 QAC J 1200 98-99 news qs, 220k docs NTCIR-4 QAC J 647 98-99 news qs, 593k docs NTCIR-5 QAC J 50 series 00-01 news qs, 200k CJE 00-01 news, 1.5M docs, 500 qs (-5 CLQA) CJE 98-99 news, 608k docs, JE 200 CE 150 qs

• Summarization • Sentiment Analysis

Data Availability

Thank you!

• NTCIR Homepage http://research.nii.ac.jp/ntcir/

Other IR Evaluations • TREC 1992, English http://trec.nist.gov/ • NTCIR 1999, CJKE http://research.nii.ac.jp/ntcir/ • CLEF 2000, European languages http://www.clef-campaign.org/ • FIRE 2008, Indian languages http://www.isical.ac.in/~clia/

Breakdown of Relevant Documents

at NTCIR-3 Patent (average per topic, A-Rigid Relevant) by human patent professionals using multiple commercial services (ISJ: Interactive Search and Judge)

14.2

20.0

11.0

by systems (pooling top 30 retrieved docs per topic)

• Tendency varies with the topics • human worked better = topics fitting to the IPC Class or Fterm and/or the rel docs contain critical info in un-searchable images in the figures only • to obtain the same pool exhaustivity without ISJ, top 1000 documents had to be pooled • ISJ increased the pool exhaustivity, and No skew for systems ranking

# Rel Docs per Topic, per Search Type &"! '()*+,-. /()*+,-. '(01.23 /(01.23 '('4,5 /('4,5

number of relevant documents (A-Rigid relevant, B- Partial relevant)

&!! %"! %!! $"! $!! #"! #!! "! !

# $ % & " 8 '()*+,-. # %& && $ #" &% /()*+,-. ! " $7 $ ! ! '(01.23 #! 7 $ $ $ ! /(01.23 & ! ! ! ! ! 7 #% & & 8 ! '('4,5 $$ #% 8 #! #" #$ /('4,5

7 6 "" 6 $% " $7

9 6 #! ## #$ #% #& #" #8 $6 7 % 8 #" ! 8 $ $ ! # ! # "7 8 %7 ! $ &$ #! % ! &7 $$ $ 9 & #! #7 ## # #7% #$ #! # # %% 7 & % $6 " #! % # $% #9 #" & #!# $$ #7 " #" ,4:;?

#7 #" ! 6 $ 9 "

#9 #6 $! $# $$ $% $& $" $8 $7 $9 $6 %! %# " & ! # ! # 7 #" #7 $8 ! ! # #8 8 6 # ! # # #! &6 ## " %" ! " #" %8 ! " ! ! $ %% % #8 #9 % ! $ #6 ## ! ! $ & $ #!9 #$ 8 7 7 ! # #8 #9 $ # # ! & #8 8 #8 %9 7 " 7 & #7 %8 & 9 & & 7$ &6 #$ #6 &! 8 % #8