1850 US Census Name Frequencies

These were derived from the IPUMS 1% sample of the 1850 Census of Free Populations, which is available on-line (Steven Ruggles and Matthew Sobek Integrated Public Use Microdata Series: Version 1.0 Minneapolis: Social History Research Laboratory, University of Minnesota, 1995 ).

See also a similar 1990 survey by the U.S. Census Bureau.

Columns in the data:

Rank  Name          Number       95%     %-age  Cumulative
                    in Sample    Conf.          %-age
----  ----          ---------    -----   -----  ----------
Rank
The numerical rank of the name's frequency. Names of identical frequency in the sample are ranked in random order (as found in the sample).
Name
The name as it appears in the sample. First names are truncated at the first space.
Number in Sample
The number of instances of the name, in the exact same form, that were found in the sample.
95% Conf.
The 95% confidence interval in terms of name instances. This means that in 95% of similar samples the name's frequency could be expected to fall within the number of samples found plus or minus the interval number. This is a measure of sampling error. This calculation uses the normal approximation, so it tends to lose accuracy for names of extremely low frequency.
%-age
The percentage of all individuals in the sample (of the appropriate sex, if first names) who bore the name.
Cumulative %-age
The percentage of people bearing names at this rank or at the more frequent ranks.

Note: If you want to search for a specific name, just open the particular file and use your browser's Find function to search it.

SURNAMES

Towards the end of the list, the number of households sampled becomes small, so the accuracy of the figures will not be as great, particularly with regard to ranking. Names with less than 26 occurrences were not included, though I still have the raw frequency data for every name in the sample elsewhere. Note that surnames are strongly clustered according to household, reducing the effective size of the sample to approximately the same size of the household sample.

There are also a few problems with the original dataset in name transcription such that a small number of first names were transposed with last names. Therefore, a few of the most common first names of the time such as Mary (rank 815) and Sarah (1010) will appear in the list and surnames which correspond to personal names (e.g. James) will have somewhat exaggerated frequencies. Additionally, special notations in the dataset like "!" were treated as actual names, though it is unclear what they are really supposed to represent. Such anomalies mostly appear in the lower frequency regions, particularly at less than 50 occurrences.

Note also that spelling variations such as MCCOY and M.CCOY are also treated separately. The "M.C" notation for Mc is fairly common and probably represents the superscript-c notation in the MSS.

FIRST NAMES

The first names have considerably lower estimated sampling error than the surnames because of reduced clustering effects. First names which occurred more than 5 times in the sample are reported. The comments regarding spelling variations among surnames apply equally to these. Note also that the more common names of the opposite gender will appear occasionally due to errors in the gender field of the dataset. These occurrences are useful though in that they provide information with which to estimate the frequency of gender errors.