Extracting information from historical Norwegian census forms

Extracting information from historical Norwegian census forms

This is part of a larger initiative to establish a historical population register for Norway.  Establishing such a database requires transcription of a large amount of historical records to be able to enter the information in the database. Where possible it is desirable to be able to automate parts of the transcription process for these documents. In this context NR has been working with image based methods to automate extraction of information from historical Norwegian census forms.

The focus has been on two types of census forms. The first is the single sheet form used in the 1891 census. These sheets contain information only about one individual and the forms have no tabular structure. Instead information is entered either as underlined options as answers to specific questions or as handwritten numbers or text on designated dotted lines. The aim of the work has been to extract underlined information and recognize handwritten digits from the individual sheets from these forms. The solution includes image registration, underline detection and recognition of handwritten numerals.

The second type of census sheets have a more common tabular structure, where all members of a household are listed sequentially on one sheet. The approaches being developed for these types of forms include image registration, table extraction, recognition of handwritten numbers and splitting and clustering of handwritten names.

Analysing fields from census forms.

 

Department

Financing

  • The Research Council of Norway
  • Mohn-fund at University of Tromsø
  • The Norwegian Institute of Public Health

Partners

  • University of Tromsø
  • Norsk Regnesentral.
  • The Norwegian Institute of Public Health
  • Statistics Norway
  • the National Archive
  • University of Oslo
  • Institute of local history
  • Association of computer genealogy