Querying a bioinformatic data sources registry with concept lattices

Outline. 1. Motivation. 2. BioRegistry: data source metadata repository. 3. FCA for classifying and querying data sources. 4. Ontology-based query refinement. 5.
2MB taille 3 téléchargements 275 vues
Querying a bioinformatic data sources registry with concept lattices Nizar Messai, Marie-Dominique Devignes, Amedeo Napoli and Malika Smail-Tabbone



[email protected]

LORIA – UMR 7503 – BP 239, 54506 Vandoeuvre-l s-Nancy ICCS 2005 Kassel – July, 18 - 22, 2005

Querying a bioinformatic data sources registry with concept lattices – p.1/21

Outline 1. Motivation 2. BioRegistry: data source metadata repository 3. FCA for classifying and querying data sources 4. Ontology-based query refinement 5. Conclusion and future work

Querying a bioinformatic data sources registry with concept lattices – p.2/21

Outline 1. Motivation 1.1 Bioinformatic data sources on the web 1.2 Existing solutions 1.3 Challenge

2. BioRegistry: data source metadata repository 3. FCA for classifying and querying data sources 4. Ontology-based query refinement 5. Conclusion and future work Querying a bioinformatic data sources registry with concept lattices – p.3/21

1.1 Bioinformatic data sources on the web Bioinformatic data sources available on the Web 719 in 2005 (171 more than 2004) Diversity of contents (e.g. particular/any organism(s)) Different data types (e.g. nucleic/proteic sequences) Different data qualities (e.g. update, revision, annotation) New data source appearance

Querying a bioinformatic data sources registry with concept lattices – p.4/21

1.2 Existing solutions Thematic Portals Access to collection of selected data sources Correspond to given points of view Limited search capabilities

Querying a bioinformatic data sources registry with concept lattices – p.5/21

1.2 Existing solutions Thematic Portals Access to collection of selected data sources Correspond to given points of view Limited search capabilities Structured catalogs Bioinformatic data source catalog: DBcat small set of "free text" metadata no more maintained (since 2001)

Querying a bioinformatic data sources registry with concept lattices – p.5/21

1.3 Challenge

Improve data source identification through: gathering metadata in a structured repository taking into account existing domain ontologies organising data sources for browsing and querying

Querying a bioinformatic data sources registry with concept lattices – p.6/21

Outline 1. Motivation 2. BioRegistry: data source metadata repository 2.1 BioRegistry model 2.2 A subpart of the BioRegistry

3. FCA for classifying and querying data sources 4. Ontology-based query refinement 5. Conclusion and future work

Querying a bioinformatic data sources registry with concept lattices – p.7/21

2.1 BioRegistry model

Querying a bioinformatic data sources registry with concept lattices – p.8/21

2.1 BioRegistry model

Querying a bioinformatic data sources registry with concept lattices – p.8/21

2.1 BioRegistry model

Querying a bioinformatic data sources registry with concept lattices – p.8/21

2.1 BioRegistry model

BioRegistry Associate metadata to the data sources (from ontologies)

Idea

A formal context: data sources



Extract properties on the data sources from these metadata properties

Querying a bioinformatic data sources registry with concept lattices – p.8/21

2.2 A subpart of the BioRegistry

Querying a bioinformatic data sources registry with concept lattices – p.9/21

2.2 A subpart of the BioRegistry Data source properties extracted from the BioRegistry Data Source

Sequence

Organism

Manual Revision

Swissprot (S1)

Proteic (PS)

Any Organism (AO)

Yes

RefSeq (S2)

Nucleic (NS),Proteic (PS)

Any Organism (AO)

Yes

TIGR-HGI (S3)

Nucleic (NS)

Human (Hu)

No

GPCRDB (S4)

Proteic (PS)

Any Organism (AO)

Yes

HUGE (S5)

Nucleic (NS),Proteic (PS)

Human (Hu)

No

ENSEMBL (S6)

Nucleic (NS)

Animal (An)

No

MGDB (S7)

Proteic (PS)

Mouse (Mo)

No

VGB (S8)

Nucleic (NS)

Vertebrate (Ve)

No

Querying a bioinformatic data sources registry with concept lattices – p.9/21

2.2 A subpart of the BioRegistry Ontologies to valuate the properties (from NCBI)

Querying a bioinformatic data sources registry with concept lattices – p.9/21

2.2 A subpart of the BioRegistry Corresponding formal context Sources Metadata

NS

PS

AO

An

Ve

Hu

Mo

MR

S1

0

1

1

0

0

0

0

1

S2

1

1

1

0

0

0

0

1

S3

1

0

0

0

0

1

0

0

S4

0

1

1

0

0

0

0

1

S5

1

1

0

0

0

1

0

0

S6

1

0

0

1

0

0

0

0

S7

0

1

0

0

0

0

1

0

S8

0

1

0

0

1

0

0

0

Querying a bioinformatic data sources registry with concept lattices – p.9/21

Outline 1. Motivation 2. BioRegistry: data source metadata repository 3. FCA for classifying and querying data sources 3.1 Methodology 3.2 Data source classification 3.3 Query 3.4 Data source retrieval algorithm 3.5 Problem

4. Ontology-based query refinement 5. Conclusion and future work Querying a bioinformatic data sources registry with concept lattices – p.10/21

3.1 Methodology

Querying a bioinformatic data sources registry with concept lattices – p.11/21

3.1 Methodology

Querying a bioinformatic data sources registry with concept lattices – p.11/21

3.1 Methodology

Querying a bioinformatic data sources registry with concept lattices – p.11/21

3.2 Data source classification

Incremental construction of the concept lattices [Godin et Al. 1995] Add new data sources (Registry updating) Insert queries (Registry querying) Querying a bioinformatic data sources registry with concept lattices – p.12/21

3.3 Query A set of properties Example : "Data sources, that are manually revised, containing nucleic sequences of Human organism"

nucleic sequences (NS) human organism (Hu) manually revised (MR) Transform the query into a concept

  

=(

,

 

{Query} {nucleic sequences (NS), Human (Hu), Manual Revision (MR)} ) = ({Query}, {NS, Hu, MR})

Querying a bioinformatic data sources registry with concept lattices – p.13/21

3.4 Data source retrieval algorithm

Querying a bioinformatic data sources registry with concept lattices – p.14/21

3.4 Data source retrieval algorithm

Insert the query concept into the concept lattice [Carpineto 2000] Search relevant data sources: A data source is relevant to a query if it shares at least one of its properties Querying a bioinformatic data sources registry with concept lattices – p.14/21

3.4 Data source retrieval algorithm

Ø

 



Begin the result construction :

 

Step 0: Locate the new query concept in the resulting lattice

Querying a bioinformatic data sources registry with concept lattices – p.14/21

3.4 Data source retrieval algorithm







Step 1: Get the query concept subsumers and continue the result construction = 1) S3, S5 (Hu,NS), S2 (NS,MR) Querying a bioinformatic data sources registry with concept lattices – p.14/21

3.4 Data source retrieval algorithm







Step 2: = 1) S3, S5 (Hu,NS), S2 (NS,MR) 2) S1, S4 (MR), S6 (NS) Querying a bioinformatic data sources registry with concept lattices – p.14/21

3.4 Data source retrieval algorithm







Step 3: A concept with an empty intension is reached end of the algorithm return the result Querying a bioinformatic data sources registry with concept lattices – p.14/21

3.5 Problem When query properties are not in the context



= ({Query}, {Chicken (Ch)})



1-



Examples : =Ø



= ({Query}, {Eucaryote (Eu)})



2-



although data sources dealing with vertebrate can be interesting =Ø

although data sources dealing with animals can be interesting

Querying a bioinformatic data sources registry with concept lattices – p.15/21

3.5 Problem When query properties are not in the context



= ({Query}, {Chicken (Ch)})



1-



Examples : =Ø



= ({Query}, {Eucaryote (Eu)})



2-



although data sources dealing with vertebrate can be interesting =Ø

although data sources dealing with animals can be interesting

Idea : Ontology-based query refinement Querying a bioinformatic data sources registry with concept lattices – p.15/21

Outline 1. Motivation 2. BioRegistry: data source metadata repository 3. FCA for classifying and querying data sources 4. Ontology-based query refinement 4.1 Generalisation refinement 4.2 Specialisation refinement

5. Conclusion and future work

Querying a bioinformatic data sources registry with concept lattices – p.16/21

4.1 Generalisation refinement

Querying a bioinformatic data sources registry with concept lattices – p.17/21

4.1 Generalisation refinement

Generalisation refinement Add to the query the ancestors of the considered property in the ontology Only those that are in the formal context Querying a bioinformatic data sources registry with concept lattices – p.17/21

4.1 Generalisation refinement





New result:

= ({Query}, {Ve, An, AO})



Refined query:

= 1) S6 (An) 1) S8 (Ve) 1) S1,S2,S4 (AO) Querying a bioinformatic data sources registry with concept lattices – p.17/21

4.2 Specialisation refinement

Querying a bioinformatic data sources registry with concept lattices – p.18/21

4.2 Specialisation refinement

Specialisation refinement Add to the query the descendants of the considered property in the ontology Only those that are in the formal context Querying a bioinformatic data sources registry with concept lattices – p.18/21

4.2 Specialisation refinement





New result:

= ({Query}, {An, Ve, Hu, Mo})



Refined query:

= 1) S6 (An) 1) S8 (Ve) 1) S5 (Hu) 1) S7 (Mo) Querying a bioinformatic data sources registry with concept lattices – p.18/21

Outline 1. Motivation 2. BioRegistry: data source metadata repository 3. FCA for classifying and querying data sources 4. Ontology-based query refinement 5. Conclusion and future work

Querying a bioinformatic data sources registry with concept lattices – p.19/21

5 Conclusion and future work Conclusion Classification of data sources according to their metadata Identifying relevant data sources for a given query Ontology-based query refinement

Querying a bioinformatic data sources registry with concept lattices – p.20/21

5 Conclusion and future work Conclusion Classification of data sources according to their metadata Identifying relevant data sources for a given query Ontology-based query refinement

Future work Refine the definition of relevance (take into account some preferences) Define an order for data source composition (case of complex queries)

Querying a bioinformatic data sources registry with concept lattices – p.20/21

Thank you for your attention

Querying a bioinformatic data sources registry with concept lattices – p.21/21