Sampling and Weighting Technical Report, Census of Population, 2016
3. Census data processing

 

3.1 Introduction

This chapter discusses the processing of all the completed questionnaires (all questionnaire types), which encompasses everything from the receipt of the questionnaires through to the creation of an accurate and complete census database. It describes the steps of questionnaire registration, questionnaire imaging and data capture, editing, error correction, failed edit follow-up, coding, dwelling classification and non-response adjustments, linkage of income data, imputation, weighting, and final response rates.

Automated processes, implemented for the 2016 Census, had to be monitored to ensure that all Canadian residences were enumerated once and only once. The Master Control System (MCS) was built to control and monitor the process flow, from collection to data processing. The MCS held a master list of all the dwellings in Canada, where each dwelling was identified with a unique identifier. This system was updated on an ongoing basis with information about each dwelling's status in the census process flow (e.g., delivered, received or processed). Reports were generated daily by the system and made accessible online to managers to ensure that census operations were efficient and effective.

3.2 Receipt and registration

Responses received through the Internet or help-line telephone interviews were received directly at the Data Operations Centre (DOC), where the receipt of the responses was registered automatically.

Respondents completing paper questionnaires mailed them back to the DOC. Canada Post registered their receipt automatically in multiple locations in Canada (as part of the normal mail flow process) by scanning the barcode on the front of the questionnaire through the transparent portion of the return envelope. The envelopes were then delivered to the DOC throughout each business day. Canada Post would also send files daily listing all census questionnaires received at each regional processing plant, by date of receipt.

The registration of each returned questionnaire was flagged on the MCS at Statistics Canada. A list of all the dwellings for which a questionnaire had not been received was generated daily by the MCS and transmitted to field operations to prevent follow-up on households that had already completed their questionnaire during NRFU.

3.3 Scanning and keying from images

In 2016, all paper census forms (2A, 2C, 2A-L, 2A-R, 3A) were imaged. The following steps were part of the imaging process:

3.4 Coverage edits, completion edits and failed edit follow-up

At this stage, a number of automated edits were performed on respondent data. These edits were designed to detect cases where the number of persons counted in the household was incorrect because of an error in collection, a respondent error or a data capture error. Most of these errors occurred on paper questionnaires, including:

Errors that can occur both on paper and online include:

For about 58% of edit failures, the system resolved the case automatically. This was done when the error was such that the solution was obvious. The solutions included deleting false person data that were created because of respondent or capture error and deleting duplicate responses. The remainder of the edit failure cases were forwarded to processing clerks for resolution. An interactive system enabled the clerks to compare data across questionnaires and examine the images of paper questionnaires to detect data capture or respondent errors. Edit failures were resolved by deleting invalid or duplicate persons or by adding missing persons (i.e., creating blank person records), as necessary and appropriate.

Following the coverage edits, another set of automated edits was run. These edits detected cases where too many questions had missing responses or where data had not been provided for all the usual residents in the household, including cases where missing persons were added by coverage edit clerks. Households that failed these edits were followed up with. An interviewer called the respondent to resolve coverage issues and obtain missing responses, using a computer-assisted telephone interviewing application. For households that responded to the long-form questionnaire, only data missing for the short-form questions were followed up on. The data obtained through this follow-up activity were introduced into the system for subsequent processing steps. If the follow-up was unsuccessful, the data were imputed in the edit and imputation step (see section 3.9).

3.5 Coding

The census questionnaires contained questions for which answers could be checked off a list, as well as questions requiring a written response. Each written response was automatically assigned a numerical code according to Statistics Canada reference files, code sets and standard classifications. Reference files for the automated match process were built using actual responses from past censuses, as well as administrative files. Specially trained coders and subject-matter specialists resolved cases where a code could not be automatically assigned. The following questions required coding on both the long- and short-form questionnaires:

The following questions required coding for the long-form sample only:

A total of about 69 million write-ins were coded from the 2016 Census questionnaires. Overall, about 85% were coded automatically, although the autocoding rate varied considerably from one question to the next.

3.6 Classification and non-response adjustments for unoccupied and non-response dwellings

The Dwelling Classification Survey (DCS) was used to estimate the rate of enumerator error in classifying private dwellings in mail-out and list/leave census CUs as occupied or unoccupied. This information was used to make adjustments to the census database. The DCS selected a random sample of 1,730 mail-out and list/leave CUs. Enumerators revisited these CUs in June and July 2016 to reassess the occupancy status as of Census Day of each private dwelling for which no response was received. The DCS estimated that 15.0% of the 1,187,392 private dwellings classified as unoccupied were actually occupied and that 36.9% of the 284,966 private dwellings with no response that were classified as occupied or that had an unknown occupancy status were actually unoccupied. Estimates based on the DCS sample were used to adjust the occupancy status for individual dwellings. This resulted in an increase of 2.6% in the number of occupied private dwellings and a decrease of 6.2% in the number of unoccupied dwellings at the Canada level.

After this adjustment of the occupancy status by the DCS, occupied private dwellings with total non-response had the number of usual residents (if not known) and all the responses to the census questions imputed. The responses were borrowed from another responding household within the same CU. This process, called whole household imputation (WHI), imputed 99.9% of the total non-response households. Using a single donor under WHI was more efficient computationally and was less likely to produce implausible results than using several donors as part of the main edit and imputation process. Nevertheless, the other 0.1% of the total non-response households where no donor household was found under the WHI process was imputed as part of the main edit and imputation process.

The WHI process has another component that is separate from the use of the DCS estimates to adjust the census database. The non-DCS areas—CUs that have interviewer-administered census questionnaires (i.e., Indian reserve, canvasser and collective CUs)—require a different imputation strategy. In these areas only, all unoccupied dwellings are assumed to be truly unoccupied and all non-responding dwellings are assumed to be truly occupied. This implies that unoccupied dwellings are assumed to be classified correctly and no imputations are done. Private dwellings with an occupancy status classified as unknown are also assumed to be unoccupied. On the other hand, private dwellings with no response that were classified by enumerators as being occupied are all assumed to be occupied, and the geographically nearest neighbour is used as the donor household for these dwellings. No restrictions were placed on the household size for these imputations, as was done in the DCS areas. At the Canada level (for DCS and non-DCS areas), 2.6% of occupied private dwellings were imputed through the WHI process.

More details on the DCS and the WHI process will be available in the Coverage Technical Report, Census of Population, 2016, Statistics Canada Catalogue no. 98-303-X, which will be released in 2019.

3.7 Obtaining income data

For the first time, in 2016, administrative data were the only source of information on income for the Census Program. This not only reduced response burden, but also increased the quality and quantity of the income data available. The information on individuals' income was compiled from administrative data for the entire population aged 15 and older, rather than from a sample, as was done in 2011 and 2006. Regular, recurring taxable and non-taxable income received during the 2015 calendar year was included. One-time receipts, such as lump-sum withdrawals from registered retirement savings plans and other savings plans, lump-sum insurance settlements, lump-sum pension benefits, capital gains or losses, inheritances, and lottery winnings, were excluded.

The information on census respondents could be linked to two types of Canada Revenue Agency (CRA) files, depending on whether respondents were (1) taxfilers, for whom all income information could be extracted from income tax files, including T1 general returns, tax slips and government programs administered by the CRA, or (2) non-filers, for whom the only information available came from tax slips and government programs administered by the CRA. In 2016, the information for 94.8% of the population aged 15 and older in private households was linked to a CRA administrative file. Specifically, the information for 85.2% of the population was linked to a taxfiler's file, and the information for 9.6% of the population was linked to a non-filer's file.

For more information on how income data were obtained, see the Income Reference Guide, Census of Population, 2016, catalogue no. 98-500-X2016004.

3.8 Non-response

A non-response status may differ during the collection and processing phases. The main differences arise because the occupancy status can change between collection and processing, and because the household must answer a minimum number of questions to be considered a respondent in the processing phase. Unless otherwise specified, the term "non-response" refers to non-response in the data processing phase. The same applies when response is referred to rather than non-response.

For the 2016 Census long-form questionnaire, two types of households were considered non-respondent:

This refers to total non-response, which is processed differently depending on the collection method and the type of household.

Finally, partial non-response is when the long-form questionnaire is partially completed. This type of response is processed by imputation. An overview of this method is presented in the next section.

3.9 Edit and imputation

The data collected in any survey or census contain some omissions or inconsistencies. For example, a respondent may be unwilling to answer a question, fail to remember the right answer or misunderstand the question. Other errors, such as incorrect coding, can also occur.

The final clean-up of data, done in the edit and imputation process, was fully automated using the Canadian Census Edit and Imputation System (CANCEIS) (Statistics Canada 2014) for all census topics. Two imputation methods were applied. The first method, called "deterministic imputation," involved assigning specific values under certain conditions when problems were clear and unambiguous to resolve. Detailed edit rules were applied to identify these conditions, and the variables involved in the rules were assigned predetermined values. The second method, called "minimum-change nearest-neighbour donor imputation," applied a series of detailed edit rules that identified any missing or inconsistent responses. When a record with missing or inconsistent responses was identified, another record that met the edit rules and had most characteristics in common with the record with an error was selected. Data from this donor record were borrowed and used to make the minimum number of changes to the variables to resolve all cases of missing or inconsistent responses.

3.10 Weighting

The 2016 Canadian Census Program consisted of a census of population and a sample survey for which one-quarter of Canadian private households were selected. Households not sampled for the survey received a short-form questionnaire, while sampled households received a long-form questionnaire. In addition to the short-form questions, the long-form questionnaire gathered sociocultural information, as well as information on daily activities, mobility, place of birth, education, labour market activity, etc. Weighting was used to represent the entire population based on the information gathered from the sample.

The first step in the weighting process was to assign a design weight to each household that reflected its probability of being sampled. These weights then underwent an initial adjustment for coverage and total non-response. This adjustment was applied to the weights of respondent households. Finally, a second adjustment, referred to as final calibration, was made to establish closer agreement between the estimates obtained from respondent households in the sample and the census counts for a number of characteristics from the short-form questionnaire or from administrative data sources. The weighting methodology is described in detail in Chapter 4. All private households attached to collective dwellings and all private households in a canvasser CU were selected for the long-form sample and received a design weight of 1. They were then excluded from the coverage and non-response adjustment processes, as well as from the final calibration process.

Long-form sample households with a non-zero weight at the end of the weighting process were the respondent households, along with the households who were assigned a design weight of 1, i.e., private households attached to collective dwellings and all private households in canvasser CUs. These households made up the set of households that contributed to the long-form estimates.

3.11 Final response rates

Table 3.11.1 presents the final response rates for private households in the 2016 Census of Population, for Canada and for each province and territory, followed by non-weighted and weighted response rates for the long-form sample based on the definition of non-response given in section 3.8.

The final response rate is the ratio of the numerator to the denominator, where

The final classification of a dwelling's occupancy status is based on an analysis of the data gathered by field staff, data provided by respondents and the results of a study into the quality of occupancy status in the DCS (see section 3.6). The response rates indicated in Table 3.11.1 differ from the collection response rates, which were previously published and were mentioned in section 1.3, in that they take data processing and dwelling occupancy verification into account in identifying non-respondent households. These response rates are therefore considered final.

Weighted response rates were produced for the long-form sample. They are based on the following weights as the numerator and denominator:

Table 3.11.1
Final response rates for private households from the 2016 Census of Population and the long-form sample
Table summary
This table displays the results of Final response rates for private households from the 2016 Census of Population and the long-form sample. The information is grouped by Provinces and territories (appearing as row headers), Response rate – short-form questionnaire, Non-weighted response rate – long-form questionnaire only and Weighted response rate – long-form questionnaire only, calculated using percent units of measure (appearing as column headers).
Provinces and territories Response rate – short-form questionnaire Non-weighted response rate – long-form questionnaire only Weighted response rate – long-form questionnaire only
percent
Canada 97.4 96.1 95.9
Newfoundland and Labrador 97.4 96.0 95.1
Prince Edward Island 97.5 96.4 96.3
Nova Scotia 97.6 96.6 96.1
New Brunswick 97.6 96.7 96.2
Quebec 97.6 96.7 96.6
Ontario 97.6 96.5 96.3
Manitoba 97.4 95.9 95.8
Saskatchewan 96.7 95.6 95.1
Alberta 97.0 95.3 94.8
British Columbia 96.5 94.8 94.6
Yukon 95.8 91.9 92.8
Northwest Territories 93.9 92.7 93.0
Nunavut 92.7 92.6 92.6

Note

Date modified: