Establishment of a geographic data dictionary: a case ... - David Sheeren

followed in developing the Brussels UrbIS data dictionary, while the fourth section. 4. D. Pantazis .... charge in digital format or in paper (more expensive option).
446KB taille 3 téléchargements 304 vues
Computers, Environment and Urban Systems 26 (2002) 3–17 www.elsevier.com/locate/compenvurbsys

Establishment of a geographic data dictionary: a case study of UrbIS 2#, the Brussels regional government GIS Dimos Pantazisa,*, Bernard Corne´lisb, Roland Billenb,c, David Sheerenb a

Technological Education Institute (TEI) of Athens, Samarinas 3, Sepolia, Athens 104 43, Greece b Department of Geomatics, University of Liege, Place du 20-Aouˆt, 7-B-4000 Liege, Belgium c Aspirant au Fond Nationalde la Recherche Scientifique (FNRS),Departmentof Geomatics, University of Liege, Place du 20-Aouˆt, 7-B-4000 Liege, Belgium

Abstract This article focuses on the establishment of a geographical database dictionary. It develops the elaboration process of the Brussels UrbIS# data dictionary. It presents the experience gained in the field of urban data management from a practical study. It describes the problems and difficulties encountered, as well as the proposed solutions and perspectives on future improvements to this data dictionary. This expertise could be applied to the development of geographic data dictionaries in similar cases. # 2001 Elsevier Science Ltd. All rights reserved. Keywords: Urban data management; Database dictionary; Urban database; Geographical database; Metadata; GIS

1. The context In 1997, a collaborative venture was established between the Centre Informatique de la Re´gion Bruxelloise (CIRB) and the SURFACES laboratory (Service Universitaire de Recherches Fondamentales et Applique´es en Cartographie et en Etudes Spatiales) in the Department of Geomatics at the University of Lie`ge. The objective of this on-going collaboration is to provide scientific support for the improvement,

* Corresponding author. Tel./Fax: +30-1-512-9549. E-mail addresses: [email protected] (D. Pantazis), [email protected] (B. Corne´lis), [email protected] (R. Billen), [email protected] (D. Sheeren). 0198-9715/01/$ - see front matter # 2001 Elsevier Science Ltd. All rights reserved. PII: S0198-9715(01)00012-6

4

D. Pantazis et al. / Comput., Environ. and Urban Systems 26 (2002) 3–17

development and re-engineering of Brussels UrbIS# (Brussels Urban Information System). 1.1. Brussels UrbIS Brussels UrbIS is the geographic information system (GIS) of the Regional Government of Brussels. It is a complex and advanced system, including not only a set of geographic and non-geographic databases related to the Brussels–Capital Region in Belgium, but also software designed to manipulate the databases in order to support the development of different applications. The Brussels UrbIS 2# actually comprises four databases: 1. 2. 3. 4.

the the the the

ADM base: principally geographic administrative information; PWN base: the public ways network; TOP base: topographic surveys; and FOT base: a collection of aerial photographs.

and the following software: 1. UrbIS SPW or UrbIS WSPW: an application for searching and managing addresses, taking into account any orthographic/spelling errors in roads names; 2. UrbIS MGR: a specific application for the management of the Brussels UrbIS databases in MicroStation Environment; and 3. extension of ArcView# for UrbIS. 1.2. Objectives of this article To date, the collaboration between CIRB and the SURFACES laboratory has consisted of auditing of the organization, creating conceptual data models (CDM), creating a new, up-to-date, complete and analytical geographic database dictionary, and establishing quality assessment methodologies. This paper has risen from the observation that, although there exist nowadays standards (national, European, American, Canadian, of different international organizations, e.g. Federal Geographic Data Committee, 1998; Fig. 1) concerning metadata creation, there is no official standard relating to the structure of a data dictionary. The problem is even more acute when dealing with geographic database dictionaries because of the specific nature of these data and the lack of specific CASE (Computer Aided Systems/Software Engineering) tools. In other words, alphanumerical database dictionary methodologies cannot simply be applied without modification. 1.3. Structure of this article In the second section, some concepts about geographic database metadata, data dictionaries and their relationships are presented. The third section outlines the steps followed in developing the Brussels UrbIS data dictionary, while the fourth section

D. Pantazis et al. / Comput., Environ. and Urban Systems 26 (2002) 3–17

5

Fig. 1. Metadata standards (after Federal Geographic Data Committee, 1998).

discusses questions that arose during the process, from assigning a role to the database dictionary to the practical solutions proposed. Section 5 presents examples from the work and the conclusion explores some perspectives for future work.

2. Metadata and data dictionnary: definitions The metadata concept is generally defined as data about data (CEN/TC 287, 1998; Federal Geographic Data Commitee, 1998). In regard to geographic information, metadata may concern the representation, the quality, the providers, the cost, etc. of the geographic data. There are many definitions of what a data dictionary is or must be (Antenucci, Brown, Croswell, & Kevany, 1991; Connolly, Begg, & Strachan, 1996; Hansen & Hansen, 1992). A data dictionary is a subset of the entire metadata set. However, in some cases, modifications of the existing metadata or creations of metadata specific to the dictionary are necessary. The two main advantages of a data dictionary are its flexibility (in comparison with the complete metadata set), and its ease of access for the end-user.

3. Methodology The methodology adopted for the development of the Brussels UrbIS geographic database dictionary had seven phases:

6

D. Pantazis et al. / Comput., Environ. and Urban Systems 26 (2002) 3–17

1. Bibliographic analysis of metadata standards and case studies of existing geographic data dictionaries. 2. Analysis of a primitive Brussels UrbIS dictionary and related documents (Billen, Pantazis, & Donnay, 1998a; CIRB, 1997a, b, 1999; Donnay & Pantazis, 1997; Pantazis & Donnay, 1996; Service Communal de Belgique, 1996). 3. Development of the dictionary requirements. 4. Development of the dictionary based on the list of requirements, in consultation with the Brussels UrbIS development team. This phase also included the evaluation of an alternative dictionary proposal. 5. Limited distribution of the dictionary to the end-users of Brussels UrbIS for criticism, proposals, comments and validation. 6. Final edition of the dictionary after incorporation of the end-users’ comments. 7. Translation of the dictionary into the other two languages (see next section).

4. Basic problems, needs, questions and solutions 4.1. Why a data dictionary? What are the needs it should cover? While most elements of the real world have a commonly accepted cognitive meaning, their representation in (geographic) databases requires a precise and unambiguous semantic definition. In the case of a street, does the object correspond to the road axis, to the space between the pedestrian pavements, or to something else? The urban environment is quite complex to formalize in terms of objects and their objects interrelationships. The objective of the project was to create a document that neatly describes with a minimal set of metadata the objects in the geographic databases and that could be used as a starting point for a future metadata set. 4.2. Who will use this database dictionary? This was a critical question because the answer would be the main influence on the contents of the dictionary. We chose to design the dictionary mainly for the present and future end-users of Brussels UrbIS and for the decision-makers involved in its development, and secondarily for its programmers and developers. Thus, we focused on the comprehensive help that a database dictionary can provide to the end-user. It was also clear that the dictionary would only cover the databases of the ADM and PWN systems—the TOP and FOT databases were not yet ready. 4.3. What information should it hold? As already stated, the dictionary is a subset of the metadata set, but where should the subset begin and end, and why? A quick and easy answer would be: the definitions of the ‘entities’. But which entities: the entities of the real world or the representations of these entities in the database (points, lines, polygons,

D. Pantazis et al. / Comput., Environ. and Urban Systems 26 (2002) 3–17

7

etc.)? At which level (natural language/non-specialist end-user level, conceptual, logical, physical) do we have to select the dictionary metadata? How is it possible to balance the contradictory requirements of exhaustivity and minimal information? The choice of the principal user of the dictionary determined its content: basic information and description of the databases without using complex geoinformatic terms. But which metadata must be selected for the dictionary and how many (too much information would discourage easy and frequent updating)? After analysing the existing primitive dictionaries, the database objective, user profiles and wishes and the constraints imposed by the real world, we selected the following elements: 1. Base de donne´es: the name of the database in which the object belongs. 2. Type: type of geographical information (point, line, polygon, or any combination of these types). This type is described by text and pictograms. 3. Name: name of the object, which should be unique. 4. Nom: translation of the name into French. 5. Naam: translation of the name into Dutch. 6. Abre´viation: abbreviation of the name which allows the user to identify the object directly. 7. De´finition de l’objet UrbIS re´el: description of the corresponding object in the real world. This textual definition must be illustrated by a scheme, a photo or some kind of visual information that can help the user to understand what the object is. 8. Identifiant: characteristic that allows the user to identify each and every instance of the object. Usually this attribute is a numeric integer. 9. Attributs alphanume´riques: all the alphanumeric (non-geometric) attributes describing the occurrences of the object. 10. De´finition de la repre´sentation graphique: description of the graphical representation of the object. This description is textual and visual, using relevant examples from the database. 11. Nombre d’instances: number of instances of the object in the database. 12. Sources: origin of the data (document, property, etc.). 13. Historique: description of all steps allowing going from the original data source to the final data set of the database. The following indexes and lists were added at the beginning of the dictionary: 1. 2. 3. 4. 5. 6. 7.

list of the geographic objects; alphabetical index; list of the objects that include/are included in other objects; list of the objects that are not included in other objects; list of the objects by type of information (point, line, etc.) list of generalized/specialized objects; and a thematic index.

8

D. Pantazis et al. / Comput., Environ. and Urban Systems 26 (2002) 3–17

4.4. Number of dictionaries: one or many? The dictionary being created was intended for end-users. Was it advisable to develop another one, dedicated to the programmers? To avoid confusion, no other document described as a dictionary has been created. Nevertheless, complementary documents, covering other aspects of the meta-information of Brussels UrbIS, were or will be developed, and it was also necessary to translate the dictionary into several languages. 4.5. Languages In this highly urbanized region, the linguistic problem is an important one, as three languages (French, Flemish, English) compete. The solution selected was to develop the dictionary in French (working language of our team and of >80% of the population) until its ‘final’ version was ready for translation into the other two languages. This was a delicate task to achieve, because of the inadequacy of some terms in all three languages! 4.6. Evolution of the dictionary Should the dictionary be an autonomous, independent product or should it be a complementary tool for the database it describes, for the global set of metadata, or for the database conceptual model? Should it be evolving, or static? Why and how should this be achieved? The Brussels UrbIS is not a static system and neither is the proposed dictionary. It is an evolving tool and, like the databases, is constantly being improved and updated. Having a structured document at hand, active end-users can participate in improving it through criticism and suggestions for modification. It is well known that users can more easily point out features they dislike in an existing system (or indicate a missing feature) than describe what they would like in a system that is yet to be developed (dictionary in our case; Jenkins, 1985 in Brancheau, Vogel, & Wetherbe, 1985). In our specific case, the dictionary that has been built is not complementary to another document (e.g. a conceptual data model, another metadata set, or the database itself) and is thus an autonomous document. Nevertheless, it is not an independent document for it cannot exist by itself. It is related to the database: modifications to the database imply modifications to the data dictionary. This is a feature common to every dictionary. 4.7. Dictionary format Should the dictionary be digital and automatically connected directly to the database (thus guaranteeing its automatic updating)? Since the objective of the project was the establishment and refinement of the dictionary, a paper document accompanied by an electronic file was produced. The possibility of developing a dictionary with a database management system is being studied, while a web-based approach was discussed and could represent a future extension.

D. Pantazis et al. / Comput., Environ. and Urban Systems 26 (2002) 3–17

9

4.8. Updating of the database dictionary A database for which the updating process in not exhaustively studied could soon become problematic. This statement applies to a database dictionary, and especially to the Brussels UrbIS database dictionary because of its fast and continuous evolution: addition of objects, completion of the dictionary at the users’ request, etc. In a second phase, automated procedures for updating the dictionary have been proposed (active dictionary). At present, updating is performed manually, in parallel with the CDM of the database. Every change in the CDM (addition of object, modification, destruction, etc.) is reported to the dictionary after the implementation of the modification at the physical level/database. 4.9. Costs Four questions have been analysed: 1. Who will have to pay the cost of the data dictionary? Two options are available: the CIRB could distribute the dictionary freely to the users, and be liable for the whole cost, or it could be sold to the users, in order to cover at least part of the costs. No decision has yet been made, as the dictionary is not yet ready for public dissemination. 2. Will it be free access? In this case, it would be available to anyone without charge in digital format or in paper (more expensive option). Another option is for it to be free on the Internet, but at reproduction cost for the printed version. 3. Will it be provided only with the database? This is the most ‘strictly policy’ option. If adopted, the dictionary could not be used by potential future users wanting formal information about the contents of the database. 4. Will it be possible to buy it without the database? The question is relevant if the dictionary has a price for the end-user. In this case, it will be possible for anyone to get complete formal information about the databases of Brussels UrbIS at a low individual cost.

5. Examples of Brussels UrbIS 2 dictionary The different steps of the Brussel UrbIS data dictionary elaboration are presented in Fig. 2. At the very beginning, the information available about the objects of the database was insufficient, unclear and incomplete. Therefore, the CIRB wished to devise a database dictionary that would contain the necessary information. Thus, the first convention began and a strong collaboration between the partners was established (Billen, Corne´lis, Muller, Pantazis, Thiam, & Donnay, 1998b). Following the initial advice, the CIRB team gathered all the necessary information that was available and then the SURFACES team made a first proposal for the dictionary structure (Billen, Corne´lis, Muller, Pantazis, Thiam, & Donnay, 1998a). Fig. 3 is an example of an index card drafted by the SURFACES team.

10

D. Pantazis et al. / Comput., Environ. and Urban Systems 26 (2002) 3–17

Fig. 2. The Brussels UrbIS data dictionary timetable.

On this basis, the CIRB team began to elaborate the dictionary structure. During the second convention (Billen, Pantazis, & Donnay, 1998b) and later in 1999, different dictionary projects were submitted to the SURFACES team. Some rules were formulated, especially about the object definitions. The Brussels UrbIS 2 is a complex database that is still being developed; new objects are being created, others are disappearing and the topological relationships have to be updated according to the most recent modifications. Fig. 4 shows an index card from the dictionary in February 2000, just before convention 3. Most of the structure is similar to that proposed at the end of convention 1. Apart from minor changes to vocabulary and sub-structure, few differences occur. Two headings were deleted:

D. Pantazis et al. / Comput., Environ. and Urban Systems 26 (2002) 3–17

Fig. 3. (a) The proposed index structure for an object—part 1 (Billen, Corne´lis, Muller, Pantazis, Thiam, & Donnay, 1998b). (b) The proposed index structure for an object—part 2 (Billen, Corne´lis, Muller, Pantazis, Thiam, & Donnay, 1998b). 11

12 D. Pantazis et al. / Comput., Environ. and Urban Systems 26 (2002) 3–17

Fig. 4. (a) An index object—part 1 (CIRB, 1999). (b) An index object—part 2 (CIRB, 1999).

D. Pantazis et al. / Comput., Environ. and Urban Systems 26 (2002) 3–17

13

1. Base de donne´es: information about the name of the database is no longer available; and 2. Identifiant: incorporated into the alphanumerical attributes (attributs alphanume´riques) section. Two news headings were created: 1. Principales relations: contains the ‘main’ relationships of an object with other objects; and 2. Adaptations successives: presents the different updates. The information about the ‘main’ relationship is ambiguous. In fact, it is difficult to say whether a relationship is important or not. Above all, a distinction should be made between structural and topological relationships. Information about the relationships is important metadata, but it is certainly not relevant for most of the dictionary users. Some inconsistencies (definition, sources) were found in this version, but this is only to be expected since the dictionary is still under construction. Finally, convention 3 (Sheeren & Billen, 2000) led to a new set of metadata about quality. Quality control is not part of the dictionary, but the CIRB decided users should have access to it. In order to make it relevant, strict quality criteria have to be established. On the basis of quality standards, in particular the European prestandard (CEN/TC, 1998), quality criteria were held to be: 1. 2. 3. 4. 5. 6.

positional accuracy and precision; semantic accuracy and precision; completeness; logical consistency; lineage; and temporal precision.

Fig. 5 shows the proposed structure for the quality report.

6. State of dictionary development and usage Since our first cooperation, the dictionary has continued to evolve and version 3 has now been produced by CIRB. Following the dissemination of UrbIS 1.9.0. (October 1998), the dictionary was distributed to the 25% of users who wanted to prepare for the UrbIS 2 update. The dictionary was also distributed at UrbIS advanced users’ training. Thus, 44% of users have received at least one version of the dictionary. In the near future, CIRB will distribute the first version of UrbIS 2 and envisage giving the latest version of the dictionary to all users. The updates of the dictionary follow the conceptual modifications that continue to affect the database. In future, CIRB and other regional and federal institutes should collaborate to

14

D. Pantazis et al. / Comput., Environ. and Urban Systems 26 (2002) 3–17

Fig. 5. (a) Quality report structure partly filled in—part 1. (b) Quality report structure partly filled in— part 2.

prepare a common data dictionary. This new dictionary would be based on the existing UrbIS 2 dictionary.

7. Lessons learnt from developing and using this dictionary 1. The content of the dictionary must basically be determined by the databases users’ requirements through a continuous ‘feed back’ process. 2. The formal definition of the ‘real’ urban objects is a quite complex process; the time needed for this development should not be underestimated. 3. A set of specific indexes in the dictionary is very useful to users for the rapid location of database objects and characteristics. 4. The dictionary must balance the conflicting requirements of exhaustivity and minimal information; exhaustivity concerns the entire content of the databases and minimal information the fact that the question ‘what exactly is the information held in the DB?’ must be clearly and concisely answered.

D. Pantazis et al. / Comput., Environ. and Urban Systems 26 (2002) 3–17

15

Fig. 5. (Continued).

8. From a data dictionary to a whole metadata system Despite the obvious usefulness of the database dictionary, it is clear that ultimately the development of a complete and integrated metadata system for Brussels UrbIS is necessary. But it is wise to proceed gradually, doing first what can most easily be achieved. In our case the starting point was the development of a formal and structured database dictionary, and this was the most reasonable and necessary task. When the development process of the dictionary is completed, strategic decisions (about distribution, copyright, updating, etc.) will be made, and the development of an integrated metadata system will be the next step. We do not expect such a system to replace the data dictionary and its specific role. The principal steps for this type of development are: 1. definition of the responsible team inside CIRB (eventually with external consultant);

16

D. Pantazis et al. / Comput., Environ. and Urban Systems 26 (2002) 3–17

2. choice of the metadata standards that will be used (the creation of a new modified frame for the development of a such metadata system is also a valuable option); 3. decision about the technological platform that will be used; 4. decision about updating the system; 5. cost–benefit study; and 6. development of the metadata set. It is important to remember that despite its usefulness (for analysts, programmers, developers, etc.), such a system requires considerable human, financial and technical resources.

9. Conclusions and future prospects The development of a geographic database dictionary may well be a very complex project. This was certainly the case with the Brussels UrbIS database dictionary. The principal reasons for the difficulty and complexity of such a project are: 1. lack of dictionary standards; 2. complicated and unclear definitions of geographic objects as well as the realworld objects they represent; an object should be strictly defined to prevent confusion and error, especially in an urban context; and 3. lack of specific CASE tools that could automatically create the metadata/dictionary database. In addition to this ongoing phase (i.e. the development of a quality metadata set), the main aims of the next stage of this project are: 1. a cost–benefit analysis; 2. study concerning the digital basis of the dictionary (word processing software, database management system, Internet site, combination of all these, other) but also the analogue one (dossier with cards that could be replaced, catalogue, etc.); 3. study concerning the controls and processes that could warrant the simultaneous update of the databases and dictionary; and 4. analytical study of existing software for automatic metadata creation, queries and modifications and the possibility of using it within the Brussels UrbIS project.

Acknowledgements The authors would like to thank the CIRB staff for their kind support and close interaction with us to discuss various aspects of the data models and dictionary. Our

D. Pantazis et al. / Comput., Environ. and Urban Systems 26 (2002) 3–17

17

recognition goes in particular to Mr. Vanderborght, Mr. Van Acker, Mrs. Himpe, Mrs. Roland and Mr. Delande. Special thanks to Chantal Donze´ for her editorial help and her constructive criticism and remarks.

References Antenucci, J., Brown, K., Croswell, P., & Kevany, M. (1991). Geographic information systems. New York—London: Chapman & Hall. Billen, R., Corne´lis, B., Muller, F., Pantazis, D., Thiam, S., & Donnay, J. P. (1998a). E´bauche du dictionnaire de donne´es de la base UrbIS2 ADM—version 0.1 . Lie`ge: De´partement de Ge´omatique— Universite´ de Lie`ge. Billen, R., Corne´lis, B., Muller, F., Pantazis, D., Thiam, S., & Donnay, J. P. (1998b). E´laboration du mode`le conceptuel de donne´es de la base UrbIS2 ADM—version 1. Lie`ge: SURFACES—Universite´ de Lie`ge. Billen, R., Pantazis, D., & Donnay, J. P. (1998a). Contrat-Cadre de la validation des mode`les conceptuels de donne´es de la base Brussels UrbIS 2. Lie`ge: De´partement de Ge´omatique—Universite´ de Lie`ge. Billen, R., Pantazis, D., & Donnay, J. P. (1998b). Validation des mode`les conceptuels de donne´es de la base Brussels UrbIS 21#. Lie`ge: De´partement de Ge´omatique—Universite´ de Lie`ge. Brancheau, J., Vogel, D., & Wetherbe, J. (1985). An investigation of the information center from the user’s perspective. Database, vol. 17, Number 1, Fall 1985, pp. 19–23. CEN/TC 287. (1998). Geographic information—data description—metadata (ENV 12657). European prestandard. CIRB. (1997a). Cahier n 6 du CIRB, Catalogue des utilisateurs de Brussels UrbIS. Brussels: Centre d’Informatique pour la Re´gion Bruxelloise. CIRB. (1999). Brussels UrbIS1# version 2—Base de donne´es Adm—Dictionnaire de donne´es version 1.0. Brussels: Centre d’Informatique pour la Re´gion Bruxelloise. CIRB, Comite´ d’avis pour la cartographie re´gionale. (1997b). Brussels UrbIS: Historique-Description ge´ne´rale—Objets administratives-line´aire de circulation. Brussels: Centre d’Informatique pour la Re´gion Bruxelloise. Connolly, Th., Begg, C., & Strachan, A. (1996). Database systems. Wokingham, England: Addison Wesley. Donnay, J.-P., & Pantazis, D. (1997). Audit de la base de donne´es Brussels UrbIS—Re´gion de Bruxelles— Capitale. Lie`ge: De´partement de Ge´omatique—Universite´ de Lie`ge. Federal Geographic Data Commitee. (1998). Content standard for digital geospatial metadata (revised June 1998). Federal Geographic Data Commitee (FGDC-STD-001). Available: http://fgdc.er.usgs.gov/, consulted in February 2000. Hansen, G., & Hansen, J. (1992). Database management and design. Englewood Cliffs, NJ: Prentice-Hall. Pantazis, D., & Donnay, J.-P. (1996). Conception des SIG Me´thode et formalisme. Paris: E´ditions He´rme`s. Service Communal de Belgique. (1996). Marche´ de Services, Cahier spe´cial des charges, Dossier n 96F006, Objet de l’Entreprise, Cartographie a` grande e´chelle de la Re´gion de Bruxelles-Capitale. Brussels: Service Communal de Belgique. Sheeren, D., & Billen, R. (2000). Me´thodologie de validation de la qualite´ des donne´es de la base de donne´es UrbIS 2—ADM. Lie`ge: De´partement de Ge´omatique—Universite´ de Lie`ge.