Evaluation of Internet resources: Bibliometric techniques applications. How to map the Internet Web ? An experiment for the biblio-scientometric hosts. Hervé ROSTAING, Eric BOUTIN, Bruno MANNINA CRRM, Université Aix-Marseille, France Lepont, Université Toulon-Var, France
Evaluation of Internet resources
●
Data collection
●
Quantitative analysis and data set selection for mapping analysis
●
Qualitative analysis : network mapping
09 july 1999
CYBERMETRICS’99, Colima Univ
Hervé Rostaing
2
Data collection ●
Query to the Altavista search engine scientometri* or bibliometri* or scientometry or bibliometry
– > 3518 URL found – > 1010 URL display ●
HTML pages locations
Auresys robot (CRRM, http://193.51.109.166) – – – –
09 july 1999
submit the same query to Altavista propagating search from links present in the 1010 pages build an local HTML pages database responding to the query > collect the HTML pages and exceed the Altavista limit CYBERMETRICS’99, Colima Univ
Hervé Rostaing
3
Auresys propagating search Altavista URL listing
Auresys get HTML pages located by Altavista
Page already visited Page not responding to the query Page not found or host unavailable 09 july 1999
CYBERMETRICS’99, Colima Univ
Hervé Rostaing
4
depth 0
Auresys visite the first links
1
Second level of links
2
Auresys storage of HMTL pages ●
Storage the HTML pages set after – only retaining pages responding to the query – removing duplicate pages visited – data organization : pages classified according to • • • •
hosts internet domain relevance index (wais index) page categories : form action, directory, text, normal
– extracting information for bibliographic reference creation Bibliometric analysis
09 july 1999
CYBERMETRICS’99, Colima Univ
Hervé Rostaing
5
Auresys bibliographic reference TIT IND LNG NMP TAG NIM MCT DOM NMO DTR DMO NFS URL HST HXT ABS
: : : : : : : : : : : : : : : :
AIN AEX IMG PRF TFI TST MLS WAY
: : : : : : : :
PROGRAMME 28 english 2230 Nom : GENERATOR
Content : Mozilla/4.03 [en] (Win95; I) [Netscape]; 0 scientometri; fr 1 Mon Oct 26 21:59:33 1998 Tue Aug 4 18:18:38 1998 436 http://cournot.u-strasbg.fr/divers/apr98.html cournot.u-strasbg.fr www.business.auc.dk; and Public Science, Research Policy, Vol.26, pp.317-330. to Laurent BACH: Evalu of large Research Programmes * Bach L. et al., 1995 "Evaluation of the economi effects of BRITE-EURAM programmes on the European industry" Scientometrics,... http://cournot.u-strasbg.fr/divers/; http://www.business.auc.dk/homepage.html; Aucune 0 13834 12711 Aucun cournot.u-strasbg.fr/divers/apr98.html;
09 july 1999
CYBERMETRICS’99, Colima Univ
Hervé Rostaing
6
Quantitative analysis Selection of data set for qualitative analysis
Depth
Time spending
Visited pages
Found pages
0
13 h
1010
421
Found hosts (A set) 299
Cited hosts 1189
Cited hosts from A (B set) 64
Citing hosts from B 76
80
1
21 h
4029
501
315
1249
67
89
2
44 h
12612
597
321
1367
83
93
3
149 h
37529
783
331
1785
97
97
388 388pages pages
09 july 1999
CYBERMETRICS’99, Colima Univ
Hervé Rostaing
7
Hosts holding numerous pages Selected data set Complete data set C R R M .U N IV -M R S .F R 39 H U B . IB . H U -B E R L IN .D E 17 W W W .TH E -S C IE N TIS T.L IB R A R Y .U P E N N . E D 1 6 W W W .C H E M . U V A .N L 16 G O P H E R .R Z.U N I-D U E S S E L D O R F .D E 15 W W W .IN F O R M A TIK .U N I-TR IE R .D E 12 W W W .C IN D O C .C S IC . E S 12 TP A C . G C A TT. G A TE C H .E D U 12 W W W .D G P S . D E 11 W W W .U N I-TR IE R .D E 10 W W W .A S IS . O R G 10 S A H A R A .F S W . L E ID E N U N IV .N L 10 W W W .E N S S IB .F R 9 W W W .S R I. C O M 8 S Y Y .O ULU.F I 8 S H E R L O C K .B E R K E L E Y .E D U 8 A I. IIT. N R C .C A 8 o t h e rs 562
09 july 1999
CYBERMETRICS’99, Colima Univ
uns elec ted hos ts 397 C R R M .U N IV -M R S .F R 39 H U B .IB .H U -B E R LIN .D E 17 W W W .TH E -S C IE N TIS T.LIB R A R Y .U P E 16 W W W .C H E M .U V A .N L 16 W W W .IN F O R M A TIK .U N I-TR IE R .D E 12 W W W .C IN D O C .C S IC .E S 12 W W W .A S IS .O R G 10 S A H A R A .F S W .LE ID E N U N IV .N L 10 W W W .E N S S IB .F R 9 W W W .S R I.C O M 8 S Y Y .O U LU .F I 8 S H E R LO C K .B E R K E LE Y .E D U 8 A I.IIT.N R C .C A 8 W W W .R A N D .O R G 7 W W W .LIB .N C S U .E D U 7 W W W .A S LIB .C O .U K 7 W W W -S LIS .LIB .IN D IA N A .E D U 7 others 185
Hervé Rostaing
8
The most often cited hosts Complete data set none 506 C R R M .U N IV -M R S .F R 22 S A H A R A .F S W .LE ID E N U N IV .N L 13 W W W .P S Y C H O LO G IE .U N I-TR IE R .D E 11 W W W .P S Y C H O LO G IE .U N I-F R E IB U R G .D E 11 W W W .P S Y C H O LO G IE .H U -B E R LIN .D E 11 U N IO N .N C S A .U IU C .E D U 11 E ZIN F O .U C S .IN D IA N A .E D U 11 XXX.LA N L.G O V 10 W W W .IS IN E T.C O M 10 W W W .E LS E V IE R .N L 10 W W W .W 3.O R G 9 W W W .IN D IA N A .E D U 9 W W W .D B .D K 9 W W W .A S IS .O R G 9 W W W .A D O B E .C O M 9 W W W .Y A H O O .C O M 8 W W W .U M U .S E 8 W W W .N LC -B N C .C A 8 W W W .IIT.N R C .C A 8 W W W .C O R P S E R V .N R C .C A 8 W W W .C IN D O C .C S IC .E S 8 IN F O .LIB .U H .E D U 8 others 2437
09 july 1999
CYBERMETRICS’99, Colima Univ
Selected data set none or uns elec ted hos ts 671 C R R M .U N IV -M R S .F R 22 S A H A R A .F S W .LE ID E N U N IV .N L 13 E ZIN F O .U C S .IN D IA N A .E D U 11 W W W .IS IN E T.C O M 10 W W W .A S IS .O R G 9 W W W .U M U .S E 8 W W W .N LC -B N C .C A 8 W W W .C IN D O C .C S IC .E S 8 IN F O .LIB .U H .E D U 8 W W W .U N I-B IE LE F E LD .D E 7 W W W .TH E -S C IE N TIS T.LIB R A R Y .U P 7 W W W -S LIS .LIB .IN D IA N A .E D U 7 W W W .O C LC .O R G 6 W W W .IB .H U -B E R LIN .D E 6 W W W .C H E M .U V A .N L 6 S H U M .C C .H U JI.A C .IL 6 S H E R LO C K .B E R K E LE Y .E D U 6 C O O M B S .A N U .E D U .A U 6 others 157 Hervé Rostaing
9
Hosts having a great citation activity Complete data set W W W .CIN DO C.CS IC.E S TO RN A DE .E RE .U M O NTR E A L.C A CR RM .UN IV -M RS .F R W W W .UNI-B IE LE F E LD.DE O LY M P E .S C IN F O .U-N A NCY .F R W W W .S TE D E TS O M IK K E E R .DK S U NS ITE .INF O RM A TIK .R W TH-A A CH E N.D S H E RLO C K .B E R K E LE Y .E D U LU CIE N.S IM S .B E RK E LE Y .E DU W W W .P A S TE UR .F R W W W .CHE M HE R ITA G E .O R G W W W .A S IS .O RG W W W .V A LLE S N E T.O RG W W W .CNA M .F R W W W .INF O RM A TIK .UNI-TRIE R.D E W W W .B IO CH E M S O C .O R G .U K W W W .Y A HO O .CO M .S G W W W .P HILO S .RUG .NL others c ite les s than 4 hos ts
09 july 1999
Selected data set 71 31 30 21 15 11 9 8 8 7 6 5 5 5 4 4 4 4
CYBERMETRICS’99, Colima Univ
W W W .CINDO C.CS IC.E S TO RNA DE .E RE .UM O NTRE A L.CA CRRM .UNIV -M RS .F R W W W .UNI-B IE LE F E LD.DE S UNS ITE .INF O RM A TIK .RW TH-A A CHE N.DE W W W .P A S TE UR.F R S HE RLO CK .B E RK E LE Y .E DU W W W .A S IS .O RG W W W .INF O RM A TIK .UNI-TRIE R.DE HUB .IB .HU-B E RLIN.DE W W W .IB .HU-B E RLIN.DE W W W .S LIS .INDIA NA .E DU W W W -S LIS .LIB .INDIA NA .E DU CO O M B S .A NU.E DU.A U E ZINF O .UCS .INDIA NA .E DU W W W .E NS S IB .F R W W W .LB O RO .A C.UK others c ite only one hos t
Hervé Rostaing
10
71 30 24 21 9 7 6 4 4 3 3 3 3 2 2 2 2
Distribution of languages for writing HTML pages in biblio-scientometrics 24 69
11
42
11 11 11 9 8 5
2
2
7
1 1
2
1
18 3 544 3
ENGLISH FINNISH MIDDLE_FRISIAN LITHUANIAN JAPANESE
09 july 1999
GERMAN DUTCH SLOVENIAN KOREAN SERBIAN
CYBERMETRICS’99, Colima Univ
FRENCH DANISH SLOVAK CHINESE NEPALI
SPANISH CATALAN PORTUGUESE HUNGARIAN ESPERANTO
Hervé Rostaing
11
SWEDISH ITALIAN POLISH CZECH
Distribution of internet domains for biblio-scientometrics hosts 17 16 11 11 10 9 33
31
98 7
40
3
6
45
3
5
3
4 3
55
3 2
27
2 55
1 1
145
73
1 103
09 july 1999
CYBERMETRICS’99, Colima Univ
1
Hervé Rostaing
12
EDU FR ORG NL COM FI indefinite NET HU GOV AU BR MX IL AR IT SK KR AT MIL PK MY IN HK EE
DE UK CA ES SE JP BE DK PL CU CZ SI INT CL LT CH RU HR SG UY NZ LU IE GR BY
Qualitative analysis : mapping the Web for biblio-scientometrics field ●
Main goal for mapping the Web – to create a map useful for web navigation – to understand the relationship between hosts belonging to a field of interests – to judge the central or peripheral character of hosts in relation to others – content analysis not possible : full text analysis in different languages Mapping the links muddle for host cited and citing positions
09 july 1999
CYBERMETRICS’99, Colima Univ
Hervé Rostaing
13
Network mapping : the components node = host name ● arc = relation measure between two nodes ●
What relation to measure? •Hypertext linking to map the structure of the Web (central and peripheral hosts) •Citation phenomena to detect important hosts (valuable hosts) Constraint : the software need a symmetrical matrix for input
09 july 1999
CYBERMETRICS’99, Colima Univ
Hervé Rostaing
14
Matrix building ●
Cross citation matrix
1189 cited hosts
– for the whole data set
331 citing hosts 97 cited hosts
– only for biblio-sciento hosts
331 citing hosts 97 cited hosts
– software constraint
97 citing hosts
Square matrix but asymmetric matrix ! What measurement ?
09 july 1999
CYBERMETRICS’99, Colima Univ
Hervé Rostaing
15
Measure calculation
Ø
hosts
citing hosts
+
Ø
=
hosts
cited hosts cited hosts
X + Xt citing hosts
●
Number of links between two hosts
xij = xji = min {xij, xji} cited hosts
citing hosts
cited hosts
09 july 1999
If Ø
CYBERMETRICS’99, Colima Univ