Life: A Huge Archive (PDF Download Available)
Nov 18, 2017 - Ahmed Laarfi at Folrida Institution of Technology .... necessity; Jowfe Oil Company automated their archi...
Running head: Electronic Archiving
Life: A Huge Archive Ahmed Laarfi
Electronic Archiving Abstract In the age of information, smart devices collaborate in increasing the quantity
of information hugely. Therefore, the traditional archiving couldn’t support this explosion of information. In this case, after the invention of the computer, networks, Internet, and improving the Database Systems (DBSs) to be distributed and reachable everywhere, there would be a big trend to automate archiving. This will lead to many advantages such as faster access, privacy and security, data accessibility, and distribution of authorities between administrative levels. This paper will discuss electronic archiving and whether it is a necessity or an administrative luxury. This research shed some light about three main topics, which are general archiving, distributed database systems, and a case-study. Keywords: archiving, distributed database systems, indexing and retrieving data, electronic archive
Electronic Archiving Life: A Huge Archive In this time, information becomes an expensive commodity. Who owns the information owns many valuables. Two types of information are used; one of them is confidential and private like the military, political, and economical documents, and the other can be widespread such as books, publications, and education in general. Mankind is concerned about information because it is vital to life issues, such as commercial regarding to competitive,
scientific especially secret research, and even personal information can be kept. By the1960’s NASA improved a project while connecting groups of U.S. universities together by computer networks (Whalen, 2010). This later became the seed which grew the Internet in the 1970’s. All this information, which is available in the world, since the beginning of its creation to 1970s is equivalent in size to the information published on the Internet in the last thirty years until the beginning of the third millennium (Spencer, 2012; Silverman, 2012; Meyer, 2009; Fredrick, Marsan, 2010). According to Wray (2009) the amount of information available at that year are around 500 billion Gigabyte. These amounts of information are unimaginable! It’s a huge amount if it is published as books, they would cover the distance between the earth and the Pluto 10 times (Wray, 2009). This amount is able to increase progressively, they may multiply, yearly. Therefore, this time is called the age of information. Background In the era of the empires like Great Britain and France, especially after the great colonial expansion in the 18th century, the need for the existence of archive for each colony had emerged. On the other hand, some historians went back with archive to a historical period precede the age of Empires when touched the Assyrian civilization, in Iraq; for example, the Library of Babel that considered as the oldest archive on the planet (Featherstone, 2004). Not only imperial expansion in the eighteenth century was the unique reason for the establishment of archives but also the urgent need to build the archives were found before
Electronic Archiving those centuries, which represented in the presence of civilizations such as Pharaonic which
had had their own archives to manage their states. In those historical periods, the archive was a place compiling historical documents. Books, manuscripts, sculptures in museums, internal and external correspondence, including information on population numbers and their whereabouts were also essential components of the archive. Featherstone, (2004) defined it as the collective memory of the nation. In our contemporary history, after the invention of the computer in the forties of the last century, also, after the development of Internet by NASA in the early sixties and its strong appear in the seventies, turned a look of technicians to digitalization of archives (Cohen, 2004). Although some people believe that electronic archive is an administrative luxury, the Distributed Database Systems (DDBSs) make it a big necessity; Jowfe Oil Company automated their archive and manipulated their databases as a common use. Documentation, Indexing, and Archiving Documentation Documentation concepts are the same whether in regular or electronic archiving even if different methods are used. Grobovsek (2012) wrote what translated to English as two important principles: The scope and characteristics which distinguish the process of documentation entirely and the outcome of this process. In other words, the conclusion which excluded from what was written that the documentation includes all activities that fall within the scope of any documentary process - by the methodology used in the documentation and archiving. For instance; compilation of information from the data sources that actually used to support the reliability, and how can convince that this data is correct and can be trusted. Zins, (2007) explains that "Data is a symbol set that is quantified and/or qualified. Information is a set of significant sings that has the ability to create knowledge." (P. 480).
Electronic Archiving Therefore, data is one of the components of the information. Documenting this information obtained is the first step of archiving. Classification of Documents or Records: According to Pearce-Moses (2005) and Heinrich (2000) definitions, it can be summarized that the records classified depending on their activities into: Table 1 Three types of documents Active Documents Inactive Documents
Semi active Documents
Also, they called “Current records” which mean the documents that in use currently. They called “Noncurrent records” which mean they are no longer used and preserved somewhere for returning back to them from time to time. They include also legal or historical documents for example. Also, called “Semi current Records” which mean the Documents are not being used on a daily basis or periodically and are rarely resorting to it and placed nearby.
Indexing It is important to retrieve the information that has been stored. Harman and Candela (1994) mentioned that the known primitive way of indexing was catalogs. Now, many electronic ways are followed. The indexing; therefore, is the most important step for the classification and putting the documents in the archive. However, Andrea (personal communication, 2014), a librarian assistant, explained that most systems currently reverted indexing are the Dewey Decimal System and the Library of Congress System. In regard to the Distributed Data System, there are several methodologies based on the types of databases and programming languages used. According to Pearce-Moses (2005) and Heinrich (2000) definitions, this comparison between normal and electronic archives can be derived from them:
Electronic Archiving Table 2 A Comparison between Traditional and Electronic Archiving Comparison
Document Sharing Dedicated places
Traditional archive From minutes to hours Some times Minutes to hours Long time and sometimes resent back. Fax machine gives unclear copies. A lot of copies Large and expensive
Takes a time
Document Recovery Missing Documents Document Saving Document Sending
Seconds Not at all Seconds Easy and faster to be sent
Just taking the authority Small and cheap Less time
Archiving After documentation and indexing, stage where setting the record or document in an accessible place comes. This stage should take place at required speed by the time it should be required for achieving the desired end result. Hamill (2009) defined the archives as “'archives' [have] three [earnings]. [Archives] can be: 1) the physical records which are collected; 2) the physical space which houses archival records; 3) the department or unit which manages the records." This leads to archiving, which is the last step of process where the documents should be found. Distributed Database Systems History and Improvement The networking project attributed to NASA in the early sixties while it is experimented on a group of American universities. Before that time the information had stored in computers isolated from each other's, which lead to great difficulties in the exchange of them. In the early seventies, the advent of the Internet was come into existence; in addition to the big drop in the price of Personal Computers (PCs) (Peter, 2003). This has
Electronic Archiving led to rely on the exchange of information across networks. Emergence of the divergent networks and manipulating databases from different sources has been needed; therefore. Up to now this subject has been developing over time. Data Everywhere Two things have become important characteristic of the data. First is spread all data over the places and the second is the possibility of access to the most data that allowed to
reach. There is security data which are not allowed for each one to have a look at. These data might be put either in isolated computers or internal networks which accessible has to be restricted to. In general, access to this data will administrate through the database management systems. Simply, for example, a user at the University of Colorado can access to existing data in any library in Japan or Brazil. Of course, this issue is directly related to the electronic archiving and distributed database management systems concepts together. As Goebel (2011) listed some advantages and problems might be involved in these systems. Table 3 Distributed Database Management Systems, retrieved from (Goebel, 2011) Advantages Improved performance Efficiency Extensibility (addition of new nodes Transparency of distribution • Storage data • Query execution
Problems Complexity of design and implementation Data consistency Safety Failure recovery
Figure 1. Client with Distributed Server Architecture, adapted from (Goebel, 2011)
No users are concerned about what strand is behind the interface. According to what is seen in the above figure, the users have not paid attention to the technical complexities as much as what to watch and use as an interface. All what user interest in are how can get the appropriate information, regardless of what it conceals a range of complex technical links. Moreover, the figure shows only where the information is stored, and a user who wants to use it. Three facts should be mentioned. First, it is not known for the user what operations are done and how they work. This disappeared behind the interface. Second, electronic indexing is very complex, and in seconds can conduct any data which will be needed. Third, data storehouses are collections of electronic archives scattered in multiple locations. The three points together might be called data transparency which defined by Silberschatz, Korth and Sudarshan (2010) as “Degree to which system user may remain unaware of the details of how and where the data items are stored in a distributed system.” Many people have agreed that the Archive is an ancient technique to save documents, valuables, and records in a safe place where can refer to later, and restore the required data. Upon inventing it, the computer has been exploited in many areas of public life, especially while computer networks have been introduced. An important usage has applied when the computer involved in distributed database, electronic archives which are represented in databases deployed. Consequently, another question will appear which is what the reasons for all of these efforts to build electronic archive are as long as traditional one still found. Despite the fact that papers and archives are increasing daily, an electronic archive does not solve congestion problem! They still work in parallel. As soon as a strong necessity to electronic archive has arisen, and, at the same time, the traditional archive with its large spaces, difficulties of retrieving documents, unsafe places, and its dust, are originated, then a paradox is set up!
Electronic Archiving In fact, the electronic archive does not necessarily cancel the traditional one, but it made it just like the museum, where the original copies of pictures and statues are collected. In the age of speed, one cannot spend hours or days to retrieve important data; one may lead to the loss of important business. The electronic archive is quickly retrieving any data requested without any need to leave the place or to know where the data are. Nowadays, in
the time of electronic archiving, there is no need to store the original copy, the emails became identities, and what they contain, in case of presence digital signature, all what these emails can be guaranteed. While digitalization everything, all these data become sources for electronic archive. Many methodologies can help transferring from traditional to electronic version of archive. Reach and Use In all stages of building archives there is a rule which is essential -linkage between archive and the storehouse of information in all its form. This is known as indexing, which involved in all the operations regarding to all partials of work. Accessing any information, whether the used system is an electronic archive or traditional, has to make a link either virtually or real connection to be able to recover these information. Regardless to information uses, and whatever they are, the important concern is when and how to use. It is significant to choose the appropriate time to prepare information to be worked on, in parallel, at what time this work should be involved. In other words, choosing the appropriate time to retrieve and use information is substantial for obtaining good outcome in form of decision. This will be served by quick retrieving to information. Case study Payroll system and Personnel information system in Jowfe Oil Technology (one of Libyan National Oil Corporation LNOC)
These two computer systems in Jowfe Oil Company have represented a common database. New data has been added, modified, and retrieved daily. Important information has been taken from its database and some reports have been issued using the data entered. Calculating the salaries of employees has been the most important operation that these systems have done. This includes deducting the days of absence, collection the number of times of being delay and the deduction of social insurance for the purposes of retirement payments. At the same time they already have an archive including all the data. In fact, these represent one of many electronic archives in this company. Three decades ago, this operation was manually calculated. By that time, it had taken long time to be achieved, more employees, and more efforts, in addition to noisy environment which was easy to miss any important documents. The feedback came from other departments as a set of papers which create work station filled by packages of papers. Studying the Systems It was important to study and analyze structure and installation of the data and processes which involved in the work. For example, knowing all what have been used like types of documents, photocopies, images, video or audio recordings, and digital data which entered as records etc. Appropriate forms of expected data incorporated in general were designed. Applications and programs, which could be applied as an input data, were developed. Data was provided to users for the purpose of input, process, and output. On the other hand, development of management systems in regard to survey data is done. How to Execute the New System As the database for the two systems were designed using the same methodology and database model, it is not hard to reorganize that the two databases have many similarities between both; these similarities could be used while re-designing tables is required. Each system kept the same screen design unique. Ultimately, the project needed just a system
analyst and a programmer to design tables and modify and/or create some programs. The database will be the archive, but it is just for company use. Most the output of this system are mainly used to be submitted to the authorized departments and the decision maker. Other data may be accessed by externals. Furthermore, this system has an own archive to retrieve olddated data or that data which regarded to employees who left the company or retired. The system has kept track for any employee records. However, there are external backup which referred to in case of emergency i.e. losing some data from the server, accidently. Testing and Documenting Inputting real data to the system is the only way to test your modifications. This is to test the new and modifying programs. If they worked regularly, without mistakes that would be good result otherwise the database and programs should be revised. While passing this step, the system analyst and programmer will confirm the effectiveness of the job, then they refer the work to be documented. System analysts will document the work finally. All the tables, methodology used designs, programming syntaxes, and screens which appear in each stage. The documentation is important for the future because it gives anyone else the opportunity to revise and understand. Another type of data which can be accessed by anyone is found on the website. Most of them are accessible unless they are confidential information. The data is installed to servers, computer with huge amount of data that can be stored there with many facilities such as mirroring which means copying versions from this data to be recovered in case of system corruption. This project can be considered as a model for most of what has been shown as steps. In addition to being linked to the traditional archives, it used Distributed Database Systems.
Electronic Archiving Conclusion and Recommendations
To sum up, archiving is “A place where documents and other materials of public historical interest are preserved.” (Manoff, 2004.P .10). It is essential to organize our “memory” –the archive- to make things are easier to be retrievable and reachable, and the ancient nations had done it before. Archive has improved over time. Today, electronic archives have been used because they offer many facilities such as speed, safe, secure, and reliable. In their book "Digital Archive the New Challenge", Boudriz, Dekeyser, and Prof. Domortier (2005) connected between the inventing of writing and inventing of archive. There is a difference between naming archive as digitalization and electronic archive. Digitalization specializes in converting documents to be accepted as a part of electronic archives and DDBSs. Archive was explored of historical background and some definitions have been discussed. Three main components are asserted; documentation, indexing and archiving, distributed databases, and finally a case study. Perhaps some people are not aware of great benefits provided by the archiving systems, in general, although they touch all the particulars of daily life. Documents and things are organized, up to the train station, it is known leave on a schedule. Streets, where to find place of work, which floor of the building. There is a desk, papers, and documents which organize the duty. Also helped tools in performance work were found. Navigating computer, searching and retrieving information, logging on e-mail to reaching emails, responding, and archiving are important tasks in the work etc. If the words (find, order, organize, etc.) are mentioned this means a type of retrieving and organizing that is the archiving. Thus, simply, importance of archive will be found. Each business has to automate the work. This contributes to a possible cooperation in information exchange with others, and to facilitate work organization. Furthermore, this also leads that information is transmitted between all, who have access authority, easily. It is
important to identify and restrict the powers of accessing information; therefore, each level of validity supposed to obtain its information. These controls differ between the lowest level in management and a higher level. Senior management and decision makers are often no controls on their access to any information. Most electronic archiving systems depend on the networking systems which make the concept of e-government easier. This leads to the simplification of the services provided to the customer. In addition to that, it helps the ease of performing administrative work. Terminologies like e-management, e-government, emarketing, e-commerce, and electronic archiving are all mostly, closely, linked. Common factors are networks and distributed database systems. All of this is in the interest of globalization; the most important elements are telecommunication and information exchanging across distributed data houses. This attributes to make the work easier to in multinational companies, which are often technically complex. It reduces many of the expenses related to logistics. Yet, it is possible to be managed by the headquarters of the parent company. The United States compared to some European countries underserved these services; expensive in the prices of services and the lack of competitiveness because monopolizing in each group of states is on either one or two service providers at the most although it is the main developer for most of these advantages. This leads that these services are monopolized by a company. Thus, this imposes their consumer whatever prices and services, specifically in regards to the Internet and communications. Being a continent consisting of fifty states, these services should be given the priority and a greater degree of care by the decision makers in the United States of America. The most important thing, in this country, although it is the richest in the world, high rates are of the unemployment compared to its wealthy. Archiving can create more job opportunities, while preserving the history.
Electronic Archiving References Cohen, D. (2004). Internet History. Computer history museum. Retrieved from http://www.computerhistory.org/internet_history/ Featherstone, M. (2006), Archive, Theory Culture Society (23) P. 591 Fredrick. FutureTimeline.net retrieved from http://www.futuretimeline.net/subject/computers-internet.htm Goebel, V. (2011). Distributed Database Systems, department of informatics, university of Oslo.
Gregory, S Hunter. (2003), Developing and Maintaining Practical Archive, 2nd edition, Neal Shuman Publishers. Inc. New York, London. Grobovsek, J. (2012). Documentation of Cultural Heritage Objects. Oddono. Hamill, D. (2009) Archives 101: Or, How Archives Are Organized. Northern Kentucky University. Kentucky Libraries; Summer2009, Vol. 73 Issue 3, P. 22 Harman, E. and Candela, L. (1990), Retrieving Records from a Gigabyte of Text on a Minicomputer Using Statistical Ranking. Journal of the American Society for Information Science. 41(8):581-589, 1990. Heinrich, E. and Maurer, H. (2000). Active Documents. Concept, Implementation, and Applications. Journal of Universal Computer Science, vol. 6, no. 12 (2000), 11971202 Manoff, M. (2004). Theories of the Archive from Across the discipline. Portal: Libraries and the Academy, Vol. 4, No. 1 (2004), pp. 9–25. The Johns Hopkins University Press, Baltimore Marsan, C. (2010). 10 fool-proof predictions for the Internet in 2020. NetworkWorld. Silver Peak. Retrieved from http://www.networkworld.com/article/2238913/wireless/10fool-proof-predictions-for-the-internet-in-2020.html?page=1
Meyer, D. (2009). Latest SD card format to reach 2TB of storage. ZDNT. Retrieved from http://www.zdnet.com/latest-sd-card-format-to-reach-2tb-of-storage-3039589099/ Pearce-Moses, R. (2005). A Glossary of Archival and Records Terminology. The Society of American Archivists. Peter, I. (2003) The history of computers, networks and modems. Retrieved from http://www.nethistory.info/ Silberschatz , A. Korth, H. and Sudarshan, S. (2010). Database System Concepts, Sixth Edition, McGraw-Hill, (2010). Silverman, M. (2012). A Day in the Life of the Internet [Infographic]. Mashable. Retrieved from http://mashable.com/2012/03/06/one-day-internet-data-traffic/ Spencer, N. (2012). How Much Data is Created Every Minute? Visual News. Retrieved from: http://www.futuretimeline.net/subject/computers-internet.htm Whalen, D. (2010). Communications Satellites: Making the Global Village Possible. NASA, National Aeronautics and Space Administration Wray, R. (2009). Internet data heads for 500bn gigabytes: World's digital content equivalent to stack of books stretching from Earth to Pluto 10 times, the guardian, Monday 18 May 2009 14.22 EDT, retrieved from http://www.theguardian.com/business/2009/may/18/digital-content-expansion Zins, C. (2007). Conceptual Approaches for Defining Data, Information, and Knowledge. Journal of the American Society for Information Science and Technology, 58(4):479– 493, 2007