KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA ITIResearch.com
PRIVACY/COOKIES POLICY
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM EContentMag Faulkner Information Services Fulltext Sources Online InfoToday Europe Internet@Schools Intranets Today ITIResearch.com KMWorld Library Resource Literary Market Place OnlineVideo.net Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer



Magazines > Online > July/August 2004
Back Index Forward
 




SUBSCRIBE NOW!
Online Magazine
Vol. 28 No. 4 — July/August 2004
FEATURE
A Web-Based Database of CIA Declassified Documents on the Vietnam War
By Vinh-The Lam and Darryl Friesen

During the Vietnam War years (1960-1975), the U.S. government generated a large volume of classified documents.

The declassification of these documents started with Executive Order No. 11652 signed by President Richard Nixon in 1972 [1]. Part of that executive order is on the Web [www.fas.org/sgp/eprint/legacy_appendix.html]. Thousands of these documents, formerly classified as "Confidential," "Secret," and "Top Secret," are being declassified, made public, and are available for educational and research purposes. On microfiche, the documents were published by Primary Source Microfilm as Declassified Documents Reference System (DDRS). The microfiche are abstracted, indexed, and published in a bimonthly periodical titled Declassified Documents Catalog (DDC). The DDC is now also published as a CD-ROM by Thomson Gale, while the DDRS is available through subscription on the Internet at www.ddrs.psmedia.com/.

Recently, the Vietnam Center of Texas Tech University in Lubbock, Texas, through its Virtual Vietnam Archive (VVA) [www.vietnam.ttu.edu/virtualarchive/], began providing access to a large number of full-text declassified documents. The Declassified CIA Documents on the Vietnam War database [http://library.usask.ca/Vietnam] is the result of a sabbatical leave research project approved and supported by the University of Saskatchewan, Canada. It includes only declassified documents created by the U.S. Central Intelligence Agency (CIA). It provides an in-depth indexing of the CIA declassified documents and, where possible, also provides a link to the full-text documents available at the VVA and offers both simple and advanced search capabilities.

DATABASE STRUCTURE

Each record in the database contains the following fields:

Record Number: Automatically created by system.

Title: Title of document.

Date of Creation: Date document was created.

Date of Declassification: Date document was declassified.

Type of Document: Type of document, e.g., Report, Memorandum, Cable, etc.

Level of Classification: Level of classification of document before it was declassified; only four terms will be used: CONFIDENTIAL, SECRET, TOP SECRET, and NOT GIVEN.

Status of Copy: Status of copy of document; only two terms will be used: ORIGINAL and SANITIZED.

Pagination: Number of pages and illustrations, such as maps.

Abstract: Abstract of contents of document; taken mostly from the CD-ROM published by the Thomson Gale.

Indexing Terms: Controlled vocabulary (words, phrases) describing topics presented in document.

DDRS Location: Document identifier showing location of document in the Declassified Documents Reference System.

Link to Full Text: If available, URL of document available full text at the Web site of the VVA.

DOCUMENT INDEXING AND DATABASE CONTENTS

The main reason we created this database is the DDC's lack of in-depth indexing. The very detailed indexing provided by the Carollton Press for the Declassified Documents Retrospective Collection, published in 1976, was abandoned when Carollton Press began publishing the Declassified Documents Quarterly Catalog, which preceded the DDC. Research Publications adopted this practice for the DDC. When Primary Source Microfilm replaced Research Publications as publisher of DDC, it continued this practice. As a result, a very limited number of indexing terms are used in the DDC:

Vietnam

Armed Forces

Foreign relations with —-

Politics and government

Religion

Vietnam, North

Commerce

Foreign relations with —-

Military policy

Vietnam, South

Armed forces

Commerce

Commerce with —-

Economic conditions

Foreign relations with —-

Politics and government

Religion

Social conditions

 

Vietnamese Conflict, 1961-1975

Campaigns

Missing in action

Peace negotiations

Prisoners of war

 

Topical searches such as searches for personal names, place-names, names of operations/battles, and titles of U.S. and/or Vietnamese government projects/programs, which would be very useful for Vietnam War scholars/researchers, are impossible.

We decided, therefore, to provide an in-depth content analysis of the documents. Full-text documents were analyzed thoroughly page-by-page so that names of people (politicians, military leaders), operations/battles, military units (U.S., Allied, South Vietnamese, North Vietnamese, Viet Cong divisions, regiments, battalions), projects/programs, place-names (provinces, cities, towns, valleys, mountains, rivers) could be picked up and used as indexing terms.

For example, a search for the most important Communist offensive of the war, the Tet Offensive, retrieves a screen showing the number of results, the document's ID number, and a hyperlinked document title. (See Figure 1 above.)

The search for Tet Offensive retrieves 63 documents; one for the famous U.S. 101st Airborne Division yields six; for Khe Sanh, location of the bloodiest battle between the U.S. Marines and the North Vietnamese divisions, 28; and for General Duong Van "Big" Minh, leader of the military coup that overthrew the Ngo Dinh Diem government on November 1, 1963, 133 documents.

In addition to in-depth indexing, we also tried to achieve consistency for indexing terms assigned to records throughout the whole database in order to maximize retrieval. It was decided to provide personal names in the non-inverted form—Duong Van Minh instead of Minh, Duong Van, or Robert McNamara, not McNamara, Robert. Since one of the co-authors is of Vietnamese origin, we detected and corrected wrongly spelled Vietnamese names in documents. South Vietnamese government program titles were translated into English. Sometimes both English and Vietnamese forms of the program titles, if already familiar within the Vietnam War research community, were used as equivalent indexing terms, as with Returnee Program and Chieu Hoi Program. When the database was populated with about 500 records, we conducted a thorough review and revision of all indexing terms to detect and correct typos and inconsistencies. We did a second review/revision when the database reached the 1,000-record level. The index now contains 3,461 terms and its complete listing is 101 pages long.

The database currently contains 1,080 records, 34 percent of which provide a link to the full-text documents available online at the VVA. The documents analyzed could go from one to a few hundred pages. These could be a Memo, a Telegram, a Report (weekly, monthly, etc.), a Situation Report (or SitRep), a Biographical Sketch, a National Intelligence Estimate (or NIE), a Special National Intelligence Estimate (or SNIE), or a Research Study Report. Sometimes, when an important event was occurring, such as the Tet Offensive, the CIA produced Intelligence memos on a daily or even hourly basis. (See Figure 2 on page 32.)

After the Johnson administration decided to send combat troops to South Vietnam in 1965, the CIA produced weekly and monthly reports, called "The Situation in South Vietnam," in which details on political, military, and economic situation of South Vietnam were given. In the "Political Situation" sections, the reports give detailed information of activities of the South Vietnamese government, such as cabinet reshuffles, inauguration/development/changes of government programs/projects, and deliberations within the National Assembly. Also included is information on activities of political parties and their leaders, on rumors of possible coups, and on local/provincial/national elections. In the "Military Situation" sections, the reports give detailed account of operations/battles engaging U.S., Allied, South Vietnamese, North Vietnamese, and Viet Cong units, as well as their casualties and weapon losses.

The Economic Situation sections report important economic indicators. Examples are retail prices index (especially prices of rice and pork) and weekly and monthly prices of gold and currency in the Saigon free market. About 150 such reports are now included in the database. These reports are extremely useful for researchers who want to draw a chronological picture of South Vietnam during the war years, especially between 1965 and 1968. Another series of reports present monthly evaluation of the cost-effectiveness of Operation Rolling Thunder, which carried out the U.S. sustained bombing of North Vietnam. Still another series details the level of North Vietnamese Army infiltration into South Vietnam. A close look at those reports, together with the NIEs and SNIEs on Vietnam, will help database users understand how U.S. policy on Vietnam was conceived and implemented. A large number of these declassified documents are sanitized, with source of information and names of informants removed for protection purposes.

DATABASE DESIGN

We used Microsoft SQL Server 2000 as the database server. In addition to being an outstanding relational database server, the rich full-text search capabilities it offers made it an excellent choice for this project.

The database consists of a single table, although some normalization could have been done, especially with respect to the indexing terms. However, considering the few data elements in the table, the relatively small number of documents indexed, and the strength of the SQL server's search capabilities, we favored a simple design.

All columns in the table, with the exception of Document ID, are variable length character data (varchar). DocumentID is an auto-incrementing integer value, managed by the SQL Server, and used as the primary key for the table. The CreationDate and DeclassificationDate fields were initially standard SQL datetime data types, but had to be changed to character fields because of a bug in one of the underlying software components.

The following SQL statement was used to create the table in the database:

CREATE TABLE DeclassifiedDocuments (

DocumentID int IDENTITY (1, 1) NOT NULL ,

Title varchar(512) NOT NULL ,

CreationDate varchar(15) NULL ,

DeclassificationDate varchar(15) NULL ,

DocumentType varchar(254) NULL ,

ClassificationLevel varchar(50) NULL ,

CopyStatus varchar(10) NULL ,

Pagination varchar(100) NULL ,

Abstract varchar(8000) NULL ,

Descriptors varchar(8000) NULL ,

DDRS_Location varchar(50) NULL ,

URL varchar(254) NULL

)

All access to the database, including data entry and other administrative functions, is done using a Web browser. The Web-based user interface was written in the PHP programming language. PHP has experienced a rapid growth in popularity in recent years, due in part to its excellent handling of textual data (such as the data sent and received via Web browsers and server) and database support.

The Web server is a Sun UltraEnterprise 2 server running the Solaris 8 operating system and a recent version of the Apache Web server software. An open source product called FreeTDS allows the Unix Web server to communicate with the Microsoft SQL Server directly using the Tabular Data Stream (TDS) protocol. TDS is the native protocol used by Microsoft and Sybase for their database products. Although still somewhat a fledgling product, FreeTDS is a workable solution for establishing connectivity between UNIX machines and Microsoft or Sybase database servers.

Administrative functions—adding, modifying and deleting records—are also performed using the Web browser. The administrative features, located in a secure area on the Web server, are password-protected. The administrative interface closely resembles the public view, with the addition of links in both the brief and full records display that allow the document to be easily modified or deleted. A new record can be added by simply clicking the "New Record" button located at the top of the screen. In addition to being able to browse or search for records requiring modification, a quick edit feature, located in the upper right, is available for documents for which the Document ID number is known. A simple Web-based form is used for data entry and record editing (See Figure 3 above.)

DATABASE NAVIGATION

The default view is an alphabetical listing of all indexed documents, shown in a brief record format. Included in the brief citation is document title, creation date, declassification date, type of document, level of classification, and status of copy.

The number of documents displayed in the brief format is limited to 50 per page. A drop-down menu provides easy access to all indexed documents.. Limiting the display in this manner, rather than listing all 1,080 documents at once, significantly decreases the time it takes a Web browser to load the page and, from a usability point of view, increases the functionality of the database.

Clicking the title brings up a full record display for the selected document. Included in this view are fields not shown in the brief display, including pagination, abstract, indexing terms, DDRS location, and URL. (See Figure 4 below.)

The URL, if present, will link to the full text of the document in Virtual Vietnam Archive. Indexing terms are also hyperlinked, and clicking one term will return all documents sharing that indexing term.

RETRIEVAL MECHANISM

Microsoft SQL Server 2000 allows full-text indexes to be defined on selected columns in a table. This permits complex searches to be executed against any of the columns in the index, or all the columns at once. Boolean operators, phrase searching, word stemming, weighting, proximity searching, and wildcard operators are all supported. Unlike traditional indexes defined on columns, SQL Server full-text indexes reside outside the database on the server's local file system. Thus, additional steps must be taken to populate them.

Index population can be scheduled to occur at any time. In the case of this database, a full population and rebuilding of the index occurs once a week during off hours (5 a.m. Saturday morning), and an incremental population happens hourly during the period when data entry might normally occur (weekdays between 7 a.m. and 7 p.m.). This schedule ensures that the full-text index is up-to-date with any additions or changes to database records. The document title, document type, level of classification, status of copy, creation date, declassification date, and abstract and indexing term fields are all included in the full-text index.

The full-text index is utilized by both simple keyword and advanced searches. In the case of the simple keyword search, located conveniently at the top of almost every page, all terms entered by the user are joined with the Boolean AND operator and a search is performed across all fields in the full-text index.

The advanced search is quite powerful, allowing for more control both in terms of what is searched for and how, as well as the limits that are applied. Complex queries in which specific phrases are combined with a list of terms, all of which must appear, limited by classification level, copy status, full-text availability, and date range limits for both the document creation and declassification dates can be constructed.

Refining the search using theses advanced options decreases the result set from 46 (for a simple keyword search on TET OFFENSIVE) to five. (See Figure 5 at left.)

The search terms themselves can be considered optional, and queries making use of just the limiting features are acceptable.

This online database was designed to provide an efficient tool for Vietnam War scholars/researchers to search for Declassified CIA Documents on various specific topics, with some possibility to retrieve full-text documents. (See Figure 6 below.) The Web-based user interface, written in the PHP programming language, provides users with an easy and smooth database for searching, retrieval, and navigation. As CIA classified documents continue to be declassified, and with a firm commitment from the University of Saskatchewan Library, this database will continue to grow.

Notes

[1] Morehead, Joe and Mary Fetzer. Introduction to United States Government Information Sources. 4th ed. Englewood, Colo.: Libraries Unlimited, 1992. p. 376.

 

 


Vinh-The Lam [vinhthe.lam@usask.ca] is librarian cataloguer, Technical Services Division, University of Saskatchewan Library and Darryl Friesen [darryl.friesen@usask.ca] is programmer analyst, Information Technology Services Division, University of Saskatchewan Library.

Comments? E-mail letters to the editor to marydee@xmission.com.


       Back to top