The US 1880 Federal Census - a 1% Sample ======================================== This "README" contains the following sections:- 1. History and Acknowledgements 2. Additional documentation and program files 3. In Summary - "How to Use" the data in this directory 4. Copyright Notice -= oOo =- 1. History and Acknowledgements ============================ The files in this directory represent extracts from a 1% sample of the US 1880 Federal Census, prepared for a study undertaken at the History department at the University of Minnesota, between 1992 and 1994, and made available at the University as one of the sets of data in their "Public Use Microdata Series" (PUMS). The original datafile (73MB) is a complete transcription of sampled census entries, with additional fields appended by the researchers to assist them in their project. The principle investigators were Steven Ruggles, and Russell R. Menard, and the project was funded jointly by the National Institute of Health, and the Graduate school of the University of Minnesota. The sample is divided by State - one file for each of the 47 States in the US at that time. The files are named us1880ss.gz, where 'ss' is the appropriate State code. The coding standard used is that devised by the Inter-University Consortium for Political and Social research (ICPSR), and may differ slightly from the more familiar ISO set. The files are compressed using gzip, so you will need a copy of this compression utility to decompress the data - filename gzip124.zip in the /pub/genealogy/utils directory on this site. The file us8_desc.gz, in this directory, gives a fuller explanation of the contents of each file, and you may also find it helpful for interpreting and searching the data to have a copy of the program XTRACT (NB for IBM PC Compatibles running MS-DOS v3.3 or above, or equivalent). XTRACT was developed originally for searching the UK 2% census sample - the ukc_ccc.arj files, formerly available in directory /pub/genealogy/text/data, but has been modified for the particular requirement of decoding and searching the data within the us1880ss.gz files. It will also handle the data in most of the popular compressor forms, so will work equally well if you find these files elsewhere rearchived for example to .ZIPs. Note that with XTRACT, you can leave all the data files in compressed form rather than manually expanding them for searching, then deleting the expanded version to save on disk space, although you do still need a copy of the appropriate de-archiver (ARJ for .ARJ files, PKZIP 2.04g for .ZIP files, &c). Please have your copy available to XTRACT, either in your XTRACT working directory, or accessible via your DOS PATH. You will find copies of xtract41.zip, arj.exe, pkz204g.exe, and gzip124.zip in ftp://ftp.cac.psu.edu/pub/genealogy/utils on this site, and the UK data is now available in ftp://ftp.cac.psu.edu/pub/genealogy/text/census/ukc. In addition, we have prepared the following additional documentation relating to the US sample. Here is a list of filenames to look out for in the current directory:- -= oOo =- 2) Additional documentation and program files ========================================== us8_desc.gz Contains further background reading material for the data - in particular, including details of how to read the contents of each file (see reference, above). us8_nidx.gz Contains us8_nidx.txt -- a raw index of surnames present in the various State files, with abbreviations for the States in which they occur. The same codes for States are used as in the data filenames us880ss.gz, 'ss' representing the appropriate State code. Downloading this file first can save you the possible disappointment of downloading the data files only to find the surnames you are seeking aren't in the sample. xtract4x.zip Latest version of XTRACT (for DOS), originally designed for searching the UK 1851 2% sample, but now updated to include the US sample. This replaces the previous v2.2 - filename xtract22.arj - formerly available with the ukc_ccc.arj series in directory /pub/genealogy/text/data. us8_ni0.gz Contains a special surname index, which XTRACT can use to "look up" which States to search for your specified surname. To conserve space in this index, States are represented by numbers rather than state codes; and some surnames have been "massaged" to assist the matching process. us8_sdx.gz Contains the equivalent soundex index, for use with XTRACT. -= oOo =- In Summary - "How to Use" the data in this directory ==================================================== a) Obtain a copy of the gzip utility for decompressing the data:- ftp://ftp.cac.psu.edu/pub/genealogy/utils/gzip124.zip b) Download a copy of the surname index, us8_nidx.gz, described above, listing the surnames present in the sample. Un-gzip it. c) Check if the surname(s) you are interested in are listed, and if so, make a note of the state code(s) listed for each. Then download the appropriate us1880ss.gz file to correspond, eg if you are searching for McRae, and you find the following entry in us8_nidx.gz:- MCRAE AL GA IL IN NC SC This tells you that individuals named McRae are present in us1880al.gz, us1880ga.gz, us1880il.gz, &c. d) The data inside the us1880ss.gz files - apart from address, forenames and surnames - was encoded for the original study, so you may find you will need some assistance in interpreting it. If so, check out the MS-DOS program XTRACT, filename xtract41.zip. XTRACT is designed to convert the data more simply into plain text, with the ability to unpack the archives, and search for, and extract households containing surnames you are interested in. Note also if you obtain copies of the special surname/soundex indexes, us8_ni0.gz, and us8_sdx.gz, XTRACT will work out which of the us1880ss.gz to search (providing you have copies), without you having to consult us_nidx.gz yourself each time. :-) It will also search on other fields, such as occupation, and place of birth, or you can use the .CSV output option to convert complete States for import into your own database software - whichever you prefer. -= oOo =- 4. Copyright Notice ================ The information above was prepared by Rosemary Lockie, Wishful Thinking BBS, FidoNet 2:253/188; and Ron MacRae, FidoNet 2:440/212, for the benefit of those who wish to use the data. We would suggest the usual conditions of non-commercial distribution. That is, reproduction is permitted with the proviso that any subsequent copying is free of profit and MUST retain any notice of copyright. We strongly recommend that anyone wishing to include a copy of the datafiles as part of a profit-making publication or distribution should apply directly to the University of Minnesota for permission. Internet: rosemary@yacc.demon.co.uk FidoNet : Rosemary Lockie 2:253/188 Internet: wonald@CIX.compulink.co.uk FidoNet: Ron MacRae 2:440/212 -+- March 1996