Evaluation of Internet resources: Bibliometric techniques applications

Oct 26, 1998 - q Query to the Altavista search engine scientometri* or bibliometri* or scientometry or bibliometry. – > 3518 URL found. – > 1010 URL display.
466KB taille 3 téléchargements 333 vues
Evaluation of Internet resources: Bibliometric techniques applications. How to map the Internet Web ? An experiment for the biblio-scientometric hosts. Hervé ROSTAING, Eric BOUTIN, Bruno MANNINA CRRM, Université Aix-Marseille, France Lepont, Université Toulon-Var, France

Evaluation of Internet resources



Data collection



Quantitative analysis and data set selection for mapping analysis



Qualitative analysis : network mapping

09 july 1999

CYBERMETRICS’99, Colima Univ

Hervé Rostaing

2

Data collection ●

Query to the Altavista search engine scientometri* or bibliometri* or scientometry or bibliometry

– > 3518 URL found – > 1010 URL display ●

HTML pages locations

Auresys robot (CRRM, http://193.51.109.166) – – – –

09 july 1999

submit the same query to Altavista propagating search from links present in the 1010 pages build an local HTML pages database responding to the query > collect the HTML pages and exceed the Altavista limit CYBERMETRICS’99, Colima Univ

Hervé Rostaing

3

Auresys propagating search Altavista URL listing

 

Auresys get HTML pages located by Altavista







Page already visited Page not responding to the query  Page not found or host unavailable 09 july 1999

CYBERMETRICS’99, Colima Univ

Hervé Rostaing

4

depth 0

Auresys visite the first links

1

Second level of links

2

Auresys storage of HMTL pages ●

Storage the HTML pages set after – only retaining pages responding to the query – removing duplicate pages visited – data organization : pages classified according to • • • •

hosts internet domain relevance index (wais index) page categories : form action, directory, text, normal

– extracting information for bibliographic reference creation Bibliometric analysis

09 july 1999

CYBERMETRICS’99, Colima Univ

Hervé Rostaing

5

Auresys bibliographic reference TIT IND LNG NMP TAG NIM MCT DOM NMO DTR DMO NFS URL HST HXT ABS

: : : : : : : : : : : : : : : :

AIN AEX IMG PRF TFI TST MLS WAY

: : : : : : : :

PROGRAMME 28 english 2230 Nom : GENERATOR
Content : Mozilla/4.03 [en] (Win95; I) [Netscape]; 0 scientometri; fr 1 Mon Oct 26 21:59:33 1998 Tue Aug 4 18:18:38 1998 436 http://cournot.u-strasbg.fr/divers/apr98.html cournot.u-strasbg.fr www.business.auc.dk; and Public Science, Research Policy, Vol.26, pp.317-330. to Laurent BACH: Evalu of large Research Programmes * Bach L. et al., 1995 "Evaluation of the economi effects of BRITE-EURAM programmes on the European industry" Scientometrics,... http://cournot.u-strasbg.fr/divers/; http://www.business.auc.dk/homepage.html; Aucune 0 13834 12711 Aucun cournot.u-strasbg.fr/divers/apr98.html;

09 july 1999

CYBERMETRICS’99, Colima Univ

Hervé Rostaing

6

Quantitative analysis Selection of data set for qualitative analysis

Depth

Time spending

Visited pages

Found pages

0

13 h

1010

421

Found hosts (A set) 299

Cited hosts 1189

Cited hosts from A (B set) 64

Citing hosts from B 76

80

1

21 h

4029

501

315

1249

67

89

2

44 h

12612

597

321

1367

83

93

3

149 h

37529

783

331

1785

97

97

388 388pages pages

09 july 1999

CYBERMETRICS’99, Colima Univ

Hervé Rostaing

7

Hosts holding numerous pages Selected data set Complete data set C R R M .U N IV -M R S .F R 39 H U B . IB . H U -B E R L IN .D E 17 W W W .TH E -S C IE N TIS T.L IB R A R Y .U P E N N . E D 1 6 W W W .C H E M . U V A .N L 16 G O P H E R .R Z.U N I-D U E S S E L D O R F .D E 15 W W W .IN F O R M A TIK .U N I-TR IE R .D E 12 W W W .C IN D O C .C S IC . E S 12 TP A C . G C A TT. G A TE C H .E D U 12 W W W .D G P S . D E 11 W W W .U N I-TR IE R .D E 10 W W W .A S IS . O R G 10 S A H A R A .F S W . L E ID E N U N IV .N L 10 W W W .E N S S IB .F R 9 W W W .S R I. C O M 8 S Y Y .O ULU.F I 8 S H E R L O C K .B E R K E L E Y .E D U 8 A I. IIT. N R C .C A 8 o t h e rs 562

09 july 1999

CYBERMETRICS’99, Colima Univ

uns elec ted hos ts 397 C R R M .U N IV -M R S .F R 39 H U B .IB .H U -B E R LIN .D E 17 W W W .TH E -S C IE N TIS T.LIB R A R Y .U P E 16 W W W .C H E M .U V A .N L 16 W W W .IN F O R M A TIK .U N I-TR IE R .D E 12 W W W .C IN D O C .C S IC .E S 12 W W W .A S IS .O R G 10 S A H A R A .F S W .LE ID E N U N IV .N L 10 W W W .E N S S IB .F R 9 W W W .S R I.C O M 8 S Y Y .O U LU .F I 8 S H E R LO C K .B E R K E LE Y .E D U 8 A I.IIT.N R C .C A 8 W W W .R A N D .O R G 7 W W W .LIB .N C S U .E D U 7 W W W .A S LIB .C O .U K 7 W W W -S LIS .LIB .IN D IA N A .E D U 7 others 185

Hervé Rostaing

8

The most often cited hosts Complete data set none 506 C R R M .U N IV -M R S .F R 22 S A H A R A .F S W .LE ID E N U N IV .N L 13 W W W .P S Y C H O LO G IE .U N I-TR IE R .D E 11 W W W .P S Y C H O LO G IE .U N I-F R E IB U R G .D E 11 W W W .P S Y C H O LO G IE .H U -B E R LIN .D E 11 U N IO N .N C S A .U IU C .E D U 11 E ZIN F O .U C S .IN D IA N A .E D U 11 XXX.LA N L.G O V 10 W W W .IS IN E T.C O M 10 W W W .E LS E V IE R .N L 10 W W W .W 3.O R G 9 W W W .IN D IA N A .E D U 9 W W W .D B .D K 9 W W W .A S IS .O R G 9 W W W .A D O B E .C O M 9 W W W .Y A H O O .C O M 8 W W W .U M U .S E 8 W W W .N LC -B N C .C A 8 W W W .IIT.N R C .C A 8 W W W .C O R P S E R V .N R C .C A 8 W W W .C IN D O C .C S IC .E S 8 IN F O .LIB .U H .E D U 8 others 2437

09 july 1999

CYBERMETRICS’99, Colima Univ

Selected data set none or uns elec ted hos ts 671 C R R M .U N IV -M R S .F R 22 S A H A R A .F S W .LE ID E N U N IV .N L 13 E ZIN F O .U C S .IN D IA N A .E D U 11 W W W .IS IN E T.C O M 10 W W W .A S IS .O R G 9 W W W .U M U .S E 8 W W W .N LC -B N C .C A 8 W W W .C IN D O C .C S IC .E S 8 IN F O .LIB .U H .E D U 8 W W W .U N I-B IE LE F E LD .D E 7 W W W .TH E -S C IE N TIS T.LIB R A R Y .U P 7 W W W -S LIS .LIB .IN D IA N A .E D U 7 W W W .O C LC .O R G 6 W W W .IB .H U -B E R LIN .D E 6 W W W .C H E M .U V A .N L 6 S H U M .C C .H U JI.A C .IL 6 S H E R LO C K .B E R K E LE Y .E D U 6 C O O M B S .A N U .E D U .A U 6 others 157 Hervé Rostaing

9

Hosts having a great citation activity Complete data set W W W .CIN DO C.CS IC.E S TO RN A DE .E RE .U M O NTR E A L.C A CR RM .UN IV -M RS .F R W W W .UNI-B IE LE F E LD.DE O LY M P E .S C IN F O .U-N A NCY .F R W W W .S TE D E TS O M IK K E E R .DK S U NS ITE .INF O RM A TIK .R W TH-A A CH E N.D S H E RLO C K .B E R K E LE Y .E D U LU CIE N.S IM S .B E RK E LE Y .E DU W W W .P A S TE UR .F R W W W .CHE M HE R ITA G E .O R G W W W .A S IS .O RG W W W .V A LLE S N E T.O RG W W W .CNA M .F R W W W .INF O RM A TIK .UNI-TRIE R.D E W W W .B IO CH E M S O C .O R G .U K W W W .Y A HO O .CO M .S G W W W .P HILO S .RUG .NL others c ite les s than 4 hos ts

09 july 1999

Selected data set 71 31 30 21 15 11 9 8 8 7 6 5 5 5 4 4 4 4

CYBERMETRICS’99, Colima Univ

W W W .CINDO C.CS IC.E S TO RNA DE .E RE .UM O NTRE A L.CA CRRM .UNIV -M RS .F R W W W .UNI-B IE LE F E LD.DE S UNS ITE .INF O RM A TIK .RW TH-A A CHE N.DE W W W .P A S TE UR.F R S HE RLO CK .B E RK E LE Y .E DU W W W .A S IS .O RG W W W .INF O RM A TIK .UNI-TRIE R.DE HUB .IB .HU-B E RLIN.DE W W W .IB .HU-B E RLIN.DE W W W .S LIS .INDIA NA .E DU W W W -S LIS .LIB .INDIA NA .E DU CO O M B S .A NU.E DU.A U E ZINF O .UCS .INDIA NA .E DU W W W .E NS S IB .F R W W W .LB O RO .A C.UK others c ite only one hos t

Hervé Rostaing

10

71 30 24 21 9 7 6 4 4 3 3 3 3 2 2 2 2

Distribution of languages for writing HTML pages in biblio-scientometrics 24 69

11

42

11 11 11 9 8 5

2

2

7

1 1

2

1

18 3 544 3

ENGLISH FINNISH MIDDLE_FRISIAN LITHUANIAN JAPANESE

09 july 1999

GERMAN DUTCH SLOVENIAN KOREAN SERBIAN

CYBERMETRICS’99, Colima Univ

FRENCH DANISH SLOVAK CHINESE NEPALI

SPANISH CATALAN PORTUGUESE HUNGARIAN ESPERANTO

Hervé Rostaing

11

SWEDISH ITALIAN POLISH CZECH

Distribution of internet domains for biblio-scientometrics hosts 17 16 11 11 10 9 33

31

98 7

40

3

6

45

3

5

3

4 3

55

3 2

27

2 55

1 1

145

73

1 103

09 july 1999

CYBERMETRICS’99, Colima Univ

1

Hervé Rostaing

12

EDU FR ORG NL COM FI indefinite NET HU GOV AU BR MX IL AR IT SK KR AT MIL PK MY IN HK EE

DE UK CA ES SE JP BE DK PL CU CZ SI INT CL LT CH RU HR SG UY NZ LU IE GR BY

Qualitative analysis : mapping the Web for biblio-scientometrics field ●

Main goal for mapping the Web – to create a map useful for web navigation – to understand the relationship between hosts belonging to a field of interests – to judge the central or peripheral character of hosts in relation to others – content analysis not possible : full text analysis in different languages Mapping the links muddle for host cited and citing positions

09 july 1999

CYBERMETRICS’99, Colima Univ

Hervé Rostaing

13

Network mapping : the components node = host name ● arc = relation measure between two nodes ●

What relation to measure? •Hypertext linking to map the structure of the Web (central and peripheral hosts) •Citation phenomena to detect important hosts (valuable hosts) Constraint : the software need a symmetrical matrix for input

09 july 1999

CYBERMETRICS’99, Colima Univ

Hervé Rostaing

14

Matrix building ●

Cross citation matrix

1189 cited hosts

– for the whole data set

331 citing hosts 97 cited hosts

– only for biblio-sciento hosts

331 citing hosts 97 cited hosts

– software constraint

97 citing hosts

Square matrix but asymmetric matrix ! What measurement ?

09 july 1999

CYBERMETRICS’99, Colima Univ

Hervé Rostaing

15

Measure calculation

Ø

hosts

citing hosts

+

Ø

=

hosts

cited hosts cited hosts

X + Xt citing hosts



Number of links between two hosts

xij = xji = min {xij, xji} cited hosts

citing hosts

cited hosts

09 july 1999

If Ø

CYBERMETRICS’99, Colima Univ