Data Warehouses for the Long-term Preservation of Institutional Electronic Records and Databases

Archivist methods have been put under great strain by the growing amount of institutional information that is being stored in digital format. Although the goals and principles of the discipline remain solid, their application to the new supports and information structures is subject of much debate. The research domain addressed in this project is the archiving of the data records produced as a result of the regular activity of an institution, which are usually kept by a DBMS.

At first, dominant approaches to their preservation were too much biased to the paper-based procedures or unfeasible: despite their  value from a museum perspective, the projects to preserve both the data and the tools to read it in the original format, including hardware and the several layers of software (operating systems, DBMS, applications), in the appropriate versions, were soon recognised as an unfeasible solution to the archivist goals. The same happened to the projects on simulating the old machines. The third alternative is collectively labeled as migration, though it has many trends.

Two research areas are essential for this project. The first has focused on the description of documents to render them available for retrieval, across the frontiers of domain modelling, document nature and storage technology, and has grown along with the Web. The second has roots on the concerns of archivists with the fragility and opacity of digital materials, and has a broader research agenda fueled by the needs of organisations and increased awareness of the need for new approaches.  

From a pragmatic viewpoint, the first line has originated several standards to add metadata to current digital objects (Dublin Core in the case of generic objects, RDF for Web objects and EAD for digital archive objects), and thus contributed with partial solutions to the description problem. However, the general goal of building digital archives needed a more comprehensive treatment, which resulted in the OAI standard[4].

The second line has a more ambitious goal of dealing with the general problem of preserving the wealth of information that is being generated in digital format or converted to this format. Several projects have been dealing either with fundamental models for integrating preservation into the management of current records or with solutions to the concrete problems that arise in a specialised domain [5].

The process of specifying a DW can be used for building an information asset that is both faithful to the original data and organised in a way that can be given use in the future without the complexity of the original system. Although the primary intention of DW has been to support management decisions with flexible and relevant data (a goal that is more typical of current archives), the tendency to add more and more data to it turned DW into the most valuable repository of large organisations. Understanding this, recent recommendations on DW design stress the importance of a global, process-centered analysis of the organisation, resulting in the clear establishment of a DW bus architecture where dimensions are initially identified and then reused in the several stars that constitute the dimensional model. The data considered relevant includes all basic facts, to enable arbitrary future queries. The knowledge of techniques to deal with changing dimensions without information loss is also central to give the DW an archival value.

The project requires the effort of an interdisciplinary team with kwow-how on archives, information systems and data modelling. The previous experience of this team in developing models and prototypes of multimedia databases with a strong component of metadata, according to the International Standard of Archival Description (ISAD), a uniform multilevel approach to describe archival documents, makes it particularly suited to the new goal proposed in this project. The existence of a concrete archive of a Human Resources department of a large institution, including a complex database in a process of deep reorganisation implying the archiving of the old system, is a strong practical motivation for the research outlined.

The project explores the application of DW technology to the preservation of complex electronic records.

In more detail, the project will study:
- properties that the DW must possess, according to the standards already established (such as the Open Archives Initiative);
- rules of transformation, from operational systems into DW, adopting a process-centric and yet integrated view, which guarantees those properties;
- an XML version of the DW, though the DW model is already very simple and thus fulfills the requirement of platform independence required by long-term preservation;
- metadata needs, other than the data in the operational database, to preserve the meaning of the processes, most often just implicit in the data but present in organisational procedures and in software development documentation;
- application to the human resources system of a concrete institution, now undergoing an important reorganisation;
- assessment of the results obtained with the help of external experts.

Project Duration


Project Leader



Universidade do Minho and Instituto dos Arquivos Nacionais Torre do Tombo


Fundação para a Ciência e Tecnologia

Principal Investigator (CSIG/INESC TEC)

Gabriel David



CSIG +351 22 209 4199