These were derived from the IPUMS 1% sample of the 1850 Census of Free Populations, which is available on-line (Steven Ruggles and Matthew Sobek Integrated Public Use Microdata Series: Version 1.0 Minneapolis: Social History Research Laboratory, University of Minnesota, 1995 ).
See also a similar 1990 survey by the U.S. Census Bureau.
Rank Name Number 95% %-age Cumulative in Sample Conf. %-age ---- ---- --------- ----- ----- ----------
Note: If you want to search for a specific name, just open the particular file and use your browser's Find function to search it.
Towards the end of the list, the number of households sampled becomes small, so the accuracy of the figures will not be as great, particularly with regard to ranking. Names with less than 26 occurrences were not included, though I still have the raw frequency data for every name in the sample elsewhere. Note that surnames are strongly clustered according to household, reducing the effective size of the sample to approximately the same size of the household sample.
There are also a few problems with the original dataset in name transcription such that a small number of first names were transposed with last names. Therefore, a few of the most common first names of the time such as Mary (rank 815) and Sarah (1010) will appear in the list and surnames which correspond to personal names (e.g. James) will have somewhat exaggerated frequencies. Additionally, special notations in the dataset like "!" were treated as actual names, though it is unclear what they are really supposed to represent. Such anomalies mostly appear in the lower frequency regions, particularly at less than 50 occurrences.
Note also that spelling variations such as MCCOY and M.CCOY are also treated separately. The "M.C" notation for Mc is fairly common and probably represents the superscript-c notation in the MSS.
The first names have considerably lower estimated sampling error than the surnames because of reduced clustering effects. First names which occurred more than 5 times in the sample are reported. The comments regarding spelling variations among surnames apply equally to these. Note also that the more common names of the opposite gender will appear occasionally due to errors in the gender field of the dataset. These occurrences are useful though in that they provide information with which to estimate the frequency of gender errors.