The study and review of the American Civil War has generally focused on the senior
military officers and major battles and not on an analytical look at the War from the
perspective of the individual soldier. Using information from each soldier's military
and civilian experiences to build a database from "ground up" rather than "top down",
Historical Data Systems has created the only database of its kind that can be used for
statistical and analytical examinations of the War. It is now possible to examine and
measure the impact these individual soldier experiences had upon regimental effectiveness.
The American Civil War Research Database is a relational database. This means that there are
numerous files (i.e. roster records, pension index records, GAR records, etc. ) which are
"related" to each other. With HDS' Database the "relationship" or connection between these
multiple files is the soldier's name. These files contain information gathered from the
different sources used as discussed below.
(See the bibliography section for a data source list.)
Not every soldier is in every file, nor is the type of information about a soldier in a
particular file the same for every other soldier in that same file. Also, the pace at
which we at HDS can enter the information about each soldier will vary significantly
based on the type of source documents available to us. Therefore, the Database is not a
complete biographical story on each soldier, but rather a source of significant war and
post war events that a soldier experienced that can be used for the study of the Civil War.
Building a soldier's story is a continuous process that requires information from
varied and diverse sources; sources that have often been generally unavailable or inaccessible
to the public. By utilizing relational database technology HDS can continue to expand the
Database as the source information becomes available to us. Our primary source of information
comes from the State Rosters as published by the Adjutants' General of each state. In addition
to this primary source, we have begun to enter information for Roll of Honor soldiers,
soldiers awarded the Medal of Honor, soldiers or family members who filed for pensions
(pension record index information), 1860 census town summary information, 1890 enumeration
of Civil War veterans or widows, regimental histories, etc. Data entry for these files is
slower than for our primary sources since the data entry process is more complex.
HDS has made a major investment of time, material, and money in building the Database.
With over 100,000 hours already invested, the Database contains the records of over
4 1/4 million soldiers. We continue to add more soldiers and more information to the
Database. We have released the Database now to the Web because we believe that it has
reached a critical mass where significant research and study can be started. A greater
appreciation of the expense of the War can be made by conducting statistical examination
of data for the work completed and extrapolating many of the results.
THE HDS CHALLENGES
In building a "soldier-centric" database, Historical Data Systems is presented with
four major challenges: gathering the data, completeness of data, quality of data,
and merging multiple data sources into a single database.
There is no central repository for American Civil War soldier information. Thus, our first
challenge is locating the data. There are three key steps to this process: locating the
appropriate information, gaining accessibility to the data, and entering the data into
the Database. The majority of the source documents used in building the Database are
century old books or microfilm images of original documents. After locating and gaining
access to the data the question of how to enter the information into the Database arises.
The age and fragility of the source documents, as well as the font type and size makes
scanning technology ineffective for data entering. Therefore, the data is manually keyed
into the Database and then verified for accuracy. This is the most time consuming and costly
part of building the Database.
The second challenge, data completeness, refers to the type and amount of information
available for each soldier. Our primary source of information for each soldier is the
states' official records as published by the Adjutants General for each state. These are
typically referred to as the "State Rosters" and contain information on every soldier
from the state as well as brief regimental histories (which have been included as part
of the Database). The inconsistency of information published by each state in their
"State Rosters" has led to variations of what is available for each soldier in the
Database. By utilizing multiple data sources as well as information supplied by our
subscribers, we are able to fill in many of the "holes" in a soldier's war and post
war record. A detailed checklist of information available by state is included in the
"Database Status" section of the site. As can be seen, Massachusetts records are among
the most complete but unfortunately are atypical of how most states detailed the service
records of their citizens.
The next challenge is database integrity. Before releasing the Database to the web site,
each record goes through a series of data edits. Many of the source records contain printing,
spelling, date, or factual errors or are incomplete. For example, with inter and intra
regimental transfers, names may be recorded differently (the Database contains over 18,000
unique first names!) or aliases used creating the appearance of multiple soldiers
(aliases were not uncommon.). Also, records were sometimes lost.
Utilizing sophisticated edit routines, many but not all of these errors are caught.
For example, we can determine a military record is wrong if it lists a soldier as being
wounded at the battle of Gettysburg in July, 1862 (rather then in 1863) or if he appears
to have mustered out before mustering in. In many instances we can follow a soldier when
transferred even if there is a name change. Likewise, we catch many of the name variations
for a soldier (and list them) that appear throughout the various data sources used to
build the Database.
The fourth Database challenge is the merging together of the various files of information.
The "key relationship" between these files is the soldier's name. However, this can be
difficult. For example, G. Washington Smith (as in his roster record) may have been awarded
the Medal of Honor for bravery but yet in the Medal of Honor file there isn't a G. Washington
Smith but there is a G. W. Smithe, in the Roll of Honor file there is a G.W. Smythe
and in the pension record index file there is a George W. Smyth. HDS developed data
checking routines to catch many of these and other data issues to reduce the occurrence
of duplication and questionable data. However, even with these edit routines, HDS can not
guaranty that the Database is error free. We welcome our subscribers' help in rectifying
any errors or supplying us with information they may have about family members or others.
|