Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

7. Reverse Record Check

The primary purpose of the Reverse Record Check is to estimate the number of persons in the 2011 Census target population who were not enumerated by the census at the national, provincial and territorial levels. A sample of approximately 70,000 persons was selected from six sampling frames independent of the 2011 Census. The data for the selected persons (SPs) was matched with tax data and other administrative sources to obtain recent information about their usual residence, contact addresses and household members or associated groups of persons.

A series of complex automated matches and manual searches were performed to find each SP in the 2011 Census Response Database (RRC RDB). The RRC RDB is an early version of the final 2011 Census Response Database that was available before the end of census processing. There are some minor differences between the RRC RDB and later versions of the census databases. In particular, the RRC RDB, which is a database of persons, contains all census records for persons with three exceptions. The first exception involves census records imputed through whole household imputation (WHI). The second group consists of census records with invalid or incomplete names, or invalid or incomplete birth dates. This group is also known as the 'incompletely enumerated.' The third group consists of all census records that were added late, after the start of RRC processing.

When the search produced no matches, a multimode collection process was initiated to determine whether the SP was a member of the target population and to obtain additional information (including addresses) to help find the SP in the RDB. At the end of the search, each SP was classified as out of scope (deceased, emigrated, temporarily outside Canada), enumerated or missed. A small number of non-response cases, consisting mostly of persons who could not be traced during collection, had to be processed and were used to adjust respondent weights with a non-response adjustment model.

7.1 Sampling

The sampling frame for the RRC's target population, which includes all persons who should have been enumerated in the 2011 Census, was constructed from six sources independent of the census. The first five frames were used to select a sample for estimating undercoverage in the ten provinces, while estimates for the three territories were calculated using samples from the last frame only.

At the provincial level, we began with the persons who were in the 2006 Census target population. They represented all persons enumerated in the 2006 Census along with the persons missed by the census, represented by the portion of the sample of SPs from the 2006 RRC who were classified as missed. To represent persons added to the target population since the previous census, we added intercensal births and immigrants (i.e., people who were born and immigrated between the 2006 and 2011 censuses) and non-permanent residents on Census Day. The data sources for these frames are as follows:

  • Census frame: Persons who were enumerated in the 2006 Census and appear in the 2006 RDB.
  • Missed frame: There is no comprehensive list of missed persons. However, there is a representative sample of these persons; the 2006 RRC sample of SPs classified as missed. They were included in the 2011 sample with their 2006 weights.
  • Birth frame: Vital statistics data on intercensal births. Since the final vital statistics file on births was late in becoming available, the RRC sample of births was selected from a mix of vital statistics preliminary, final and raw data files. In addition, to have all samples within the prescribed timeframe, the 2011 sample of births for Newfoundland and Labrador was selected from the Canada Revenue Agency's Canada Child Tax Benefit file.
  • Immigrant frame: Administrative data from Citizenship and Immigration Canada on immigrants who arrived in Canada during the intercensal period.
  • Non-permanent resident (NPRs) frame: Administrative data from Citizenship and Immigration Canada on persons claiming refugee status on Census Day and persons holding a work or study permit valid on Census Day.

For each territory, the only frame was the health insurance files for persons eligible for health care on Census Day.

Table 7.1.1 provides a description of each frame and the size of the sample selected from each one.

None of the first five frames for the provinces covered persons who had emigrated or were out of the country at the time of the 2006 Census, did not complete a census questionnaire and returned during the intercensal period ('returning Canadians within a province'). According to population estimates, there were 234,673 persons in this group. In addition, there were 12,169 persons returning from a territory to a province, and 13,228 persons from Indian reserves or Indian settlements who were partially enumerated in 2006 and enumerated in 2011. Because of these gaps, coverage error estimates do not include these populations, which total an estimated 260,070 persons.

One problem with the use of multiple sampling frames is the possibility that someone will be included in more than one frame. For example, a person in the immigrants frame may have been in Canada on a work permit in May 2006 and thus have been enumerable in the 2006 Census. The person would then be in both the immigrants frame and the census frame if he or she was enumerated, or in the immigrants frame and the missed frame if not enumerated. Consequently, it is important to identify all cases of frame overlap. If this is not done, estimates may be too high because some people are included twice in the frames. Though such overlap was identified wherever possible when the sampling frames were constructed, some overlap was also identified later using information provided by respondents.

It was decided that the total size of the 2011 sample would be similar to that of the 2006 RRC. Sample allocation was done in two stages. First, the national sample was allocated to the provinces using a combination of equal-variance allocation, to obtain the same variance for all provincial undercoverage rate estimates, and optimal allocation, to find the national undercoverage rate estimate with the smallest variance. Second, the provincial samples were allocated to the provincial strata. This was done by optimal allocation based on historical undercoverage rates, historical non-response rates, and stratum size. The only exception was the missed frame: everyone who was classified as missed in the 2006 RRC was selected. It should be noted that incomplete enumerations and late enumerations in 2006 were considered missed (or not enumerated) and were included in this frame. This expanded frame covered a larger proportion of the population than the frame used in the 2006 RRC (since the frame used in the 2006 and previous censuses did not cover incomplete enumerations), thus reducing the census frame's coverage. Note that the resulting allocation was only approximately optimal, since assumptions were made about the size of certain populations, including the expected number of intercensal births and immigrants at the time of the allocation. The actual sample size for the provincial sample of births, immigrants and non-permanent residents was unknown until all the samples had been selected. The final total allocated sample was 69,766 persons distributed across the frames (67,840 in the provinces, and 1,926 in the territories). Table 7.1.2 shows the final sample allocation by stratum for all provinces.

Table 7.1.3 shows the allocation by stratum for all territories.

Table 7.1.4 shows the sample allocation for Canada, provinces and territories.

The sample design varied by frame depending on the nature of the list used. In the 2006 Census frame, the sample design was a one-stage stratified design. The population was stratified by province of residence, sex, age and marital status. People enumerated on Indian reserves in the 2006 Census were placed in separate strata. In the territories frame, the sample design was also a one-stage stratified design. The population was stratified by territory of residence, sex and age. As mentioned previously, we used optimal allocation to select the sample in each stratum. The sample was allocated to strata in order to minimize the standard error of the estimate of missed persons.

Sampling fractions were not the same in all strata. To make the sample design more efficient, higher sampling rates were applied in subgroups for which high undercoverage or a lower tracing rate was expected. For example, as in the 2006 RRC, never-married males aged 20 to 24 in 2011 had a greater probability of being selected, since it had been observed in previous RRCs that undercoverage was consistently higher in that stratum. As a result of increased interest in studies of Aboriginal populations, the samples in the provincial strata for persons enumerated on Indian reserves in the 2006 Census were larger than called for by optimal allocation.

The missed frame is a sample-based frame since there is no list of all persons missed in the 2006 Census. The sample for this frame consisted of all cases classified as 'missed' in the 2006 RRC. This frame now covers late enumerations and incomplete enumerations, i.e., cases for which names and/or birth dates are incomplete in the 2006 Census database. Strictly speaking, the sample was not stratified, but there was an implicit stratification since the 2006 missed cases were from different frames and strata.

For the births frame, copies of intercensal birth registrations were obtained from vital statistics or, for some provinces, through the National Routing System, which provides faster access to such data. The frame was then stratified by mother's province of residence. Provincial samples were selected systematically, after sorting by the child's date of birth. In the past, data on births in the census year were usually not available in time to be sent to collection. For the 2011 RRC, however, we were able to select the births earlier and proceed with collection for the SPs that required this step. To do this for Newfoundland and Labrador, we had to rely on sampling using Canada Child Tax Benefit files.

The immigrants frame was constructed with immigration records obtained from Citizenship and Immigration Canada. This frame was stratified by province. Provincial samples were selected systematically, after sorting by year of immigration.

The non-permanent residents frame (permit holders and refugee claimants) was constructed with records from Citizenship and Immigration Canada. The records were sorted by province. Provincial samples were selected systematically after sorting by type of permit and refugee status to ensure that each of these groups was adequately represented.

The sampling methodology for the territories was similar to the one used in 2006. The sampling frames for the three territories were constructed from their respective health insurance files. The people listed in the sampling frame for each territory were then matched with the 2011 Census Response Database (RDB) using systems developed for information processing (see Section 7.2.1). This frame excludes incomplete enumerations and late enumerations, which were not available at processing time. A manual verification was also performed to ensure that the matched cases were actually the same people. Matched persons were classified as enumerated and assigned a weight of 1. Persons not classified as enumerated were then stratified by age and sex (see Table 7.1.3). After sorting by geography, a one-stage systematic sample was selected in each stratum.

Following selection of the provincial and territorial samples, the next step was to prepare the samples, which included checking the quality of the information for the geographic and demographic variables of interest. For example, we checked the accuracy of names and the validity of birth dates. Addresses were standardized to facilitate subsequent processing. To update the geographic information, especially for the census sample and the missed sample, for which the information was from 2006, we performed a match with Canada Revenue Agency (CRA) files, including personal income tax files for 2005 to 2010 and 2011-2012 Canada Child Tax Benefit files. We also checked whether any selected persons had died, using CRA files and vital statistics data. This preparation stage was very important, because it helped to identify enumerated persons in the census frames and contact persons not classified as enumerated so that they could be interviewed.

7.2 Processing and classification

7.2.1 Processing

The purpose of processing is to provide information for the classification of selected persons (SPs) for purposes of estimation and non-response adjustment. Specifically, processing is carried out to:

  1. determine whether the SPs are enumerated in the Census Response Database
  2. determine whether the SPs are in the census target population
  3. provide further information for non-response adjustment.

The results of processing were recorded in a classification assigned to each SP for estimation and tabulation purposes (see Section 7.4 and Section 9).

Most of the work in processing involved automated and computer-assisted searching of the RRC version of the 2011 Census Response Database (RRC RDB) to determine whether the SP was enumerated or not.

Various elements of information were used for searching, including surnames, given names and birth dates. Telephone numbers and addresses associated with the SP or members of his/her household were also used. Questionnaires in which the SP could have been listed were identified from a variety of sources, including the following:

  • matches with the RRC RDB using the birth date and sex of the SP and members of his/her household, or the SP's name, postal code or telephone number
  • selection addresses from the sampling frame
  • address updates from tax records
  • information from the computer-assisted telephone interview (CATI) (see Section 7.3).

The first step after sample preparation was to process all SPs with questionnaires identified using sampling frame data and tax data to search the RRC RDB for each SP. There were two outcomes. When the SP was found, the 'enumerated' classification was assigned in most cases, and no further processing was required. When the SP was not found, the case was sent to collection. SPs identified as deceased were not sent to collection. While collection was taking place, searching of the RRC RDB continued; in a few cases, the SP was found and collection was halted. When CATI data were available, we were able to determine whether the SP was part of the census target population. If so, the CATI data were used as input for further searching.

Searching for SPs in the RRC RDB based on identified questionnaires was done both automatically and manually by the clerical staff. Automated searching was performed first as follows: for SPs matched with the RRC RDB, there was a corresponding 2011 Census questionnaire. First, we calculated a measure of similarity between the census questionnaire and the RRC data. When this measure was above a specified threshold, it was automatically concluded that the SP was listed in that questionnaire. If so, neither that questionnaire nor the other questionnaires associated with that SP needed to be processed by the Bureau staff. Computer programs also determined when one address was a duplicate of another address being used for searching. Such duplicate addresses were not processed.

The clerical staff used a number of tools for manual searching, and coding was done with DocLink's Interactive Verification Application (DIVA). Some census questionnaires were identified as not having a strong enough match for the SP to be automatically classified as enumerated. In addition, some census collection units associated with the address were identified. Staff members were also able to search the RRC RDB based on flexible parameters. Electronic telephone directories were searched as well. To ensure coding uniformity, coding staff were provided with a highly detailed procedures manual that spelled out the specific steps for coding the search results. In addition, the results of the manual search were automatically edited to minimize errors. A file containing the search results was then produced. The data in this file were used to classify the SPs.

7.2.2 Classification

Processing provides the information required to determine which SPs were:

  • 'listed' or 'not listed'
  • 'mobile' or 'not mobile'
  • included in the 'census target population' or 'out of scope' (not included)
  • 'enumerated'
  • 'missed'
  • 'classified' or 'not classified.'

Some SPs were in three or four of these categories. Other SPs did not belong to any of these groups. This is explained in more detail later in this section.

7.2.2.1 'Target population' or 'out of scope' classification

The 'census target population' includes the group of persons listed in Section 2.2. An SP is considered 'out of scope' if he/she is not in the census target population. Each SP classified 'out of scope' is assigned a reason for the classification, such as death, emigration, or representation by another sampling frame. For a person to be classified as deceased, he/she must appear in the vital statistics death files or have been reported deceased in income tax files or the collection interview. SPs classified in the census target population are either 'enumerated,' 'missed' or 'not classified' (see Section 7.2.2.2). An SP is considered 'enumerated' if he/she is in the RRC RDB. The 'missed' classification is assigned to SPs in the census target population who were not enumerated.

7.2.2.2 Classification for non-response and non-response adjustment

The definitions of 'listed,' 'mobile' and 'not classified' depend on how useful the addresses and the CATI information were in determining the classification. In many cases, collection provided information as well as one or more addresses that were not available from other sources. In other cases, all the information and all the addresses obtained during collection were also available from other sources.

An SP was 'listed' if he/she was classified without using CATI data; even if data were collected, the information and address(es) collected in the interview were not required.

An SP was considered 'mobile' if his or her usual place of residence, as defined in Section 2.4, was available only from collection data. Furthermore, SPs that were not in the census target population and were therefore classified as out of scope were, by definition, mobile.

A person was considered 'not classified' if it was possible to determine whether he/she was in the target population but not whether he/she was missed. A person whose address on Census Day is unknown or too vague (for example, the address on Census Day is only the name of a large city) or a homeless person could fall into this category. These persons were mobile because it was possible to determine that they were not enumerated at the addresses known before collection.

Selected persons (SPs) for whom one or more of the characteristics in the list above could not be determined were considered non-respondents. There were three types of non-respondents:

  • An SP was 'not identified' when it could not be determined whether he/she was listed. In other words, since the information about the SP was incomplete, he/she could not be matched with the RRC RDB or interview data.
  • An SP was 'not traced' when it could not be determined whether he/she was included in the census target population.
  • A 'not classified' SP was deemed to be partial non-response. It was known that the person was in the target population but not whether he/she was missed or enumerated.

7.2.2.3 Distribution of the sample by classification

Table 7.2 shows the distribution of the sample by classification and sampling frame. The classification was determined from specific combinations of the characteristics listed above. Initially, a total sample of 67,840 SPs was selected in the provinces. Of that number, 57,434 SPs were classified as 'enumerated,' 4,745 as 'missed,' and 2,619 as non-respondents, 311 of whom were classified as 'not classified.' The other 3,042 SPs were classified as 'out of scope.' A non-response adjustment was made during estimation (see Section 7.4). It is important to note that the definition of a non-respondent for classification, and therefore for estimation, was not the same as the usual definition of a non-respondent for whom data collection was attempted but not completed. This is because classification was based on data from many sources, one of which might be collection. To prevent confusion, Section 7.3 on collection refers to 'completed collection' rather than 'response.'

Persons in the territory sampling frames were assigned to the matched stratum or the unmatched strata. The matched stratum corresponds to the initial processing of records from the territorial sampling frames. These cases were processed in the same way as our sample was processed: in DIVA and using processing procedures specific to the territories. Of the 113,211 persons in the territorial sampling frame, 74,377 SPs were classified as 'enumerated,' 151 as 'missed' (because they were incompletely enumerated) and 257 as 'out of scope.' A total sample of 1,926 SPs was selected from the unmatched persons. Of that number, 512 SPs were classified as 'enumerated,' 750 as 'missed' and 330 as non-respondents, of whom 130 were classified as 'not classified.'

7.2.2.4 Implications of the classification

'Traced' SPs are SPs for whom it was possible to determine whether they were included in the census target population. For purposes of estimation and tabulation, traced SPs that were also classified were respondents. Since names, including those of household members, and addresses were available in the RRC RDB, and since the tools for consulting the database were sufficiently powerful, it was possible to verify whether an SP was enumerated at an address even though the address was vague. This ensured that SPs were classified as traced only when it was known whether they were mobile and whether they were enumerated.

The usefulness of knowing whether an SP was enumerated is self-evident. SPs who were in the census target population but were not enumerated and therefore classified as missed formed the basis for the undercoverage estimate. We also wanted to classify SPs according to the above-mentioned characteristics so that we could choose the most appropriate respondents to represent non-respondents. The above definitions imply the following:

  • SPs who were not identified were also not traced
  • identified SPs who were not traced were not listed
  • enumerated SPs who were not mobile were listed
  • enumerated SPs who were mobile were not listed
  • SPs who were not classified were mobile.

We also determined the Census Day address (usual place of residence) of each SP in the census target population, except for SPs who were not classified. This is the address where, according to census instructions, the SP should have been enumerated. If the SP was enumerated, the enumeration address was considered to be the Census Day address even if other information might have raised doubts about the proper interpretation of census instructions.

For more information on processing and classification, see Parenteau (2012).

7.3 Data collection

7.3.1 Environment

Head office (HO) staff in Ottawa worked closely with staff in five Statistics Canada regional offices (ROs) to collect data during the survey phase of the RRC. These ROs were located in Halifax, Sherbrooke, Sturgeon Falls, Winnipeg and Edmonton. The suggestions and recommendations made by the ROs as a result of conducting the 2006 RRC were incorporated into the design and operations of the 2011 survey. HO was responsible for providing a computer-assisted telephone interviewing (CATI) application that met the needs of the survey and was interviewer and respondent friendly.

Assignment of the sample to the ROs was based on HO's 'best guess' about where the selected person (SP) was residing during the collection period. Once a case was assigned to an RO, it was never transferred to another RO even if it was determined that the SP moved outside the RO collection area. RO coverage areas and survey counts are shown in Table 7.3.1.

A total of 16,955 cases were sent for collection. Section 7.1 describes the two sample designs used in the RRC for the provinces and for the territories. The number of cases requiring collection could not be determined until all cases were sent for a first attempt at processing, whereby the RRC Census Response Database (RRC RDB) was searched. When the SP was not found, the file was sent for collection. There were a total of 10,448 such cases, referred to as the 'regular' sample. A sample of 6,507 SPs was selected from among the found SPs. These are referred to as the 'non-response adjustment (NRA)' sample. The collection results for the NRA sample were used to estimate a parameter of the RRC non-response adjustment model described in Section 7.4. RO staff was not made aware if a case was NRA or regular until the case was opened. Consequently, the staff could not select cases based on whether or not they belonged to the NRA.

The 16,955 cases sent to the field represented 24.3% of the RRC sample. Most of the sample not sent for collection consisted of SPs who were found on the RRC RDB during the first search. A classification of enumerated could therefore be assigned to these SPs and no further work was required. The remainder of the sample not sent for collection included deceased SPs and cases not sent for other reasons (such as frame overlap, insufficient information to determine the SP's identity and SP found on the incomplete enumerations).

There were five versions of the RRC Survey questionnaire; non-proxy (meaning that the SP is responding for him/herself), proxy (meaning that somebody else is responding for the SP), short versions of the proxy and non-proxy (for the NRA sample), and deceased before Census Day. The content of the 2011 RRC Survey questionnaire focused on determining whether the SP was in scope for the census, and collecting addresses where the SP has lived (and thus where they may have been enumerated), especially the addresses where the SP lived on Census Day and during the month of May 2011. Names and demographic data were collected for all Census Day household members. By design, collection was proxy for SPs who were less than 18 years of age or presumed deceased. Proxy respondents were also used when the SP was not available during the collection period or was difficult to reach.

For deceased SPs, it was important to determine if the SP had died before, on, or after Census Day, since different paper questionnaires and CATI flows were used depending on the date of death. In some cases, it was known that the SP was deceased prior to collection; for example, by matching tax records and vital statistics. These cases were not sent for collection. However, when in doubt, the case was sent for collection with a flag indicating that the SP was 'presumed deceased.'

Although the 2011 RRC Survey was a multi-mode survey, the main data collection mode was CATI. The CATI application was developed using many of the standards set for all CATI questionnaires used at Statistics Canada. The application consisted of various interrelated modules and was accessed through the regional offices' generic interface. Interviewers were assigned cases based on language and whether cases required tracing or not.

Paper questionnaires in both official languages were available for those SPs who were contacted by telephone and requested a paper questionnaire. Selected persons who the RO did not succeed in contacting by telephone and who had a good quality mailing address (as determined by the RO) were sent a paper questionnaire package from HO containing the different questionnaire versions, a cover letter explaining the survey, instructions for choosing the right questionnaire, and how to complete it. Finally, field interviewers completed some interviews using the paper questionnaires. Data capture from the paper questionnaires was performed at HO using the CATI system. A great deal of coordination is required to operationalize a sequential multiple-mode collection system such as the 2011 RRC.

Collection and tracing is becoming increasingly challenging, due in large part to the increased use of cellular phones and decreased use of land lines, the unavailability of directories of cell phone numbers, and the increased use of screening devices and unlisted phone numbers. As well, more people are concerned about privacy and identity theft, and are reluctant to provide personal information.

7.3.2 Operations

As a new initiative for the 2011 RRC, introductory letters were sent out on January 30, 2012, to SPs with valid mailing addresses. These letters explained the RRC, advised the SP (or a proxy) that they had been selected for the survey, and that Statistics Canada would be calling them in the near future to complete the survey. A telephone number was provided if they had any questions or wanted to call to complete the survey at a time of their choosing. Data collection began in the ROs on February 6, 2012, and active collection ended on August 15, 2012. In total, there were 190 days where at least one RO was actively collecting data, and 13,671 questionnaires were completed during that period. Between August 16 and 31, 2012, passive collection took place wherein returned paper questionnaires or SPs calling the RO to do the survey were handled. During this period, 56 questionnaires were completed, and some other completed paper questionnaires were received after. It should also be noted that among these late questionnaires deemed complete by the ROs, a few were later judged in HO to have been conducted with an incorrect SP and thus were removed from processing.

Interviewers were given the survey objectives and background along with a detailed training manual. Mock interviews were incorporated into the training sessions using the CATI application. A call scheduler assigned cases to interviewers in normal operations, but on occasion, an interviewer could be assigned to manage specific cases. For instance, they may take an incoming call or make a call to someone who preferred to speak in a non-official language. International calls were made, especially for SPs in the non-permanent resident (NPR) group who had left Canada.

Quality management of the collection operation included interviewer supervisor training at HO, monitoring the interviewer training at some of the ROs and retraining and discussing specific data quality issues noted in HO relating to completed questionnaires. Regional office managers allocated resources to the survey while balancing the needs of other surveys taking place in their region. Sustained efforts to interview persons who initially refused to participate in the survey improved response rates.

Survey data were sent electronically to HO from the five ROs each night after interviewing was completed. Data quality analysis was performed on the data each morning at HO to verify the completeness and accuracy of each case. Cases with missing or ambiguous data in key fields, or where the data collected was for someone other than the SP, were reactivated and sent back to the ROs for follow-up. Cases passing the data quality analysis were compiled into batches for processing as described in Section 7.2.1.

Table 7.3.2 shows the distribution of cases sent to ROs from HO over time. Interviewing typically began in the RO as soon as new cases arrived. The counts for the second and third wave include a small number of reactivated cases sent back for re-interviewing as a result of HO data quality analysis. The adjusted total reflects cases that were dropped by the ROs as a result of an HO request as well as reactivated cases. The dropped cases were removed if they were found in processing to be enumerated or out of scope.

Detailed management reports were created at HO on a daily and weekly basis to document the progress of the survey collection. The daily reports presented the number of cases collected and response rates by RO and outcome code. The weekly reports included progress by other variables such as sampling frame, sex and age groups and stratum, and compared the progress for some of these variables with the projected targets. Other weekly reports were more specialized, providing details on the interviewing efforts of refusal, tracing, and not contacted cases.

The average duration of the CATI interview was 14 minutes. However, the actual time spent on each case was much greater, given the number of contact attempts required and the amount of tracing that was involved. The average total time by case was 121 minutes.

7.3.3 Tracing

Tracing refers to the work done to find telephone and address information for either a selected person or a proxy for the selected person. Tracing was undertaken by both HO and the ROs, and was critical to the success of the RRC. As part of the sample preparation, cases were linked to tax and other administrative data to provide updated contact data for the SP and their household members. In some cases, initial CATI data were outdated or incomplete and therefore tracing was required.

HO provided tracing leads using several large administrative files containing names and addresses but not necessarily telephone numbers. These files included motor vehicle registration, tax files, Citizenship and Immigration files, and Vital Statistics files. These tracing leads were loaded into the CATI tracing application prior to collection, and additional leads were sent to the ROs as they were found in processing during the collection period.

RO tracing was done on both the SP and the household members, and was extended outside of Canada – calls, emails and faxes could be made internationally. Specialized tracing staff was available to handle specific types of cases, such as immigrants and elderly SPs. RO managers contacted external data suppliers (such as educational facilities and provincial health departments) with the help of the regional directors and provincial/territorial statistical focal points.Footnote1 The information coming from these external data suppliers was used directly by the ROs for tracing, as well as by processing in HO to attempt to locate the SP on the Census Response Database even if the RO was not able to complete the case during collection. Interviewers used a variety of tracing tools, with online electronic directories such as Canada 411, Google and Facebook being the most popular.

As data collection began, 15,039 (88.7%) of the cases sent for collection were placed in the queue for interviewing and the remaining 1,916 (11.3%) in the tracing queue. As required, cases were moved back and forth between interviewing and tracing. For SPs initially in the tracing queue, no telephone number had yet been found for the SP or any family member. As tracing leads were found, cases were moved to interviewing. When all tracing leads were exhausted for interviewing cases, they were moved to tracing.

A minority of cases started in tracing (9.5% of the NRA sample and 12.4% of the regular sample). Looking at the cases that started in interviewing, only 31.7% of the NRA sample required tracing, compared to 70.7% of the regular sample.

Of the 1,916 cases that started in tracing, successful leads that yielded interviews were found for 57% of them. Among the 8,334 cases that started in the interviewing queue and required tracing, the rate of tracing success leading to an interview was 74%. Numerous valuable leads were also found for these cases. Overall, 7,183 (53%) of the 13,477 completed cases required some tracing effort.

7.3.4 Collection statistics

Many statistics were monitored throughout the data collection period. An analysis of the statistics was done after collection was completed.

Table 7.3.4.1 shows provincial and territorial completion rates by type of case as either regular or NRA. The table shows that completion rates are higher for the NRA cases. This is expected because the initial CATI data included the more recent address specified in the 2011 Census, and these people already showed a propensity to answer by completing their census form.

Table 7.3.4.2 gives completion statistics by frame and case type. The low response rate for the SPs in the NPR frame was in large part due to the permits expiring before the start of survey operations; approximately 33% of the NPRs sent for collection had a permit end date before the start of collection. It was also often very difficult to locate these SPs or a suitable proxy. This was especially true for NPRs with a permit to study in Canada where the completion rate was just 40.9%.

Table 7.3.4.3 gives completion statistics by stratum and type of case for the sample selected from the demographic strata. As discussed in Section 7.1, demographic strata were used for the 2006 Census frame and the unmatched frames in the territories.

Another statistic of interest is the degree to which questionnaires were completed by proxy. Collection was proxy by design for everyone who was less than 18 years of age and for SPs who were presumed deceased. Otherwise, proxy was used when the SP was not available during the survey period or was difficult to reach. Overall, 4,640 (34.4%) of the completed sample were done by interviewing a suitable proxy.

Table 7.3.4.4 gives, for Canada and the provinces and territories, the number of cases sent for collection, the number of these that required tracing, and the percentage of cases sent for collection that required tracing. The tracing rate was highest among the provinces for Nova Scotia, Alberta, Ontario, British Columbia and Quebec, and for Yukon and Nunavut.

There were three modes of collection: CATI, self-enumeration using the paper questionnaire, and personal interview also using the paper questionnaire. Of the 13,477 completed questionnaires, 95.9% were done by CATI, 2.6% were done by self-enumeration, and 0.9% by personal interview. Of the 95.9% cases completed by CATI, 4.2% were as a result of the SP calling the RO. The collection mode varied by province and territory. This may reflect different operational methods in the ROs, differences in the characteristics of the persons requesting a questionnaire, or different demographic distributions.

7.4 Estimation

The estimation of the RRC is divided in two parts. First, there is the weighting of Selected Persons (SPs) which is followed by the calculation of the census undercoverage. Weighting is the process consisting of the determination of the initial sampling weights of SPs and of all other adjustments made to these initial weights leading to the creation of the final weights of SPs. Weighting is composed of several steps that are described in Sections 7.4.1 to 7.4.4. The methodology related to the calculation of the census undercoverage is described in Section 7.4.6.

7.4.1 Calculation of the initial weights

The initial weight of an SP from the 2006 missed frame was the final weight assigned to that person in the 2006 Reverse Record Check (RRC) when he/she was classified as missed. For SPs from the other sampling frames, the initial weights are generally based on the inverse selection probabilities in the sample.

7.4.2 Initial weights adjustment

For the births frame, the initial weight was adjusted upward to account for the small number of births who were not in the sampling frame when the sample was selected. Final counts of births were not obtained until after the sample was selected. Also, the frame of births from the year 2011 was incomplete in 3 provinces. The SPs' initial weights were adjusted for these counts.

The initial weights of SPs from the 2006 Census frame who were enumerated more than once in 2006 were adjusted downward to account for the fact that these SPs had more than one chance of being selected. This adjustment was new for 2011, since we were able to determine for the first time, using information provided by the 2006 Census Overcoverage Study, whether SPs appeared more than once in the sampling frame.

7.4.3 Non-response adjustment

To reduce bias, the initial weights of respondents had to be adjusted to account for non-response. The weight of persons who could not be classified (referred to as non-respondents) was redistributed among persons who were classified (referred to as respondents). Where possible, this was done by ensuring that the weight of non-respondents with certain characteristics was redistributed among respondents with the same characteristics. The following characteristics (or 'metadata') were used: sampling stratum (and, in addition, for the non-permanent resident stratum,according to the country of origin and the type of permit); indication that the person filed a tax return for the reference year preceding the census year (or in the case of a child, indication that he was on the Canadian Child Tax Benefit (CCTB) file), which suggests that the person is in the target population; and finally, whether the SP is listed, mobile or part of the target population (classified persons).

For the purposes of redistributing the weight of non-respondents, the RRC was treated as a four-phase sample in which each phase corresponded to the selection of a nested sample: selection of SPs from the sampling frames, selection of identified SPs from the set of SPs, selection of traced SPs from the identified SPs and selection of classified SPs from the traced SPs. When a respondent with the same characteristics as a non-respondent could not be identified in a stratum, the stratum was grouped with another stratum that most closely resembled it.

7.4.4 Post-stratification adjustment for the territories

Following adjustment of the initial weights, the estimated number of enumerated persons in the territories has traditionally been lower than the comparable census count. This is probably due to undercoverage of the census target population in health insurance files. To address this undercoverage, the weights of the SPs selected in each territory were adjusted so that the estimated number of enumerated persons equalled the comparable census count for that territory.

7.4.5 Weighted distribution by classification

Table 7.4.5.1 shows the weighted distribution of SPs by classification and sampling frame. For definitions, see Section 7.2. Only SPs found in the RRC RDB were classified as enumerated. Persons who were in the target population but not in the RRC RDB were classified as missed. The remaining SPs were classified as out of scope of the census population (deceased, emigrated, etc.).

7.4.6 Calculation of the census undercoverage

Let

C    =   MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4qaiaabc cacaqGGaGaaeypaaaa@38C4@  published census count of the number of persons in the target population

U ^     =   MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGabmyvayaaja Gaaeiiaiaab2daaaa@3843@  estimate of undercoverage

      =   MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaeiiaiaabc cacaqGGaGaaeypaaaa@389F@  estimate of the number of persons not included in C MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4qaaaa@36BE@  who should have been

M ^    =   MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGabmytayaaja Gaaeiiaiaab2dacaqGGaaaaa@38DE@  estimate of the number of persons in the RRC target population who were not enumerated

      =   MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaeiiaiaabc cacaqGGaGaaeypaaaa@389F@  sum of the final weights of persons classified as missed

X    =   MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiwaiaabc cacaqG9aGaaeiiaiaabccaaaa@397C@  number of persons included in C MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4qaaaa@36BE@  who could not be identified with certainty in the RRC as enumerated.

Census population undercoverage was estimated by the number (weight) of missed persons less the number of persons excluded from the RRC RDB. We then have

U ^ = M ^ X. MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGabmyvayaaja Gaeyypa0JabmytayaajaGaeyOeI0Iaamiwaiaac6caaaa@3B44@

X MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiwaaaa@36D3@  has three components: imputations (from whole household imputations of DCS), incomplete enumerations and late enumerations.

The SP's Census Day address refers to a dwelling for which there was an imputed enumeration. This was the case in particular for non-respondent dwellings for which the data of another household was used in whole household imputation (WHI).

Some enumerations in the census database were deemed too incomplete to be used by the RRC to identify an SP as enumerated. Incomplete enumerations in this context usually involves invalid data in the date-of-birth field or the name field (e.g., '?,' 'Mr.,' 'Unknown' or 'Person 1'). An SP that had such an enumeration was classified as missed. This is referred to as an 'RRC incomplete enumeration.'

Some cases of persons enumerated only in the National Household Survey (and not in the census) were transferred directly to the final census database and therefore did not appear in the Census RDB from which data were extracted to create the RRC database. These enumerations were not accessible for RRC purposes, and as a result, the RRC was unable to identify the enumerations in the case of these dwellings.

At the national level, X MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiwaaaa@36D3@  made up about half of M ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGabmytayaaja aaaa@36D8@ . This is similar to the 2006 result. The number of persons imputed in the WHI was lower in 2011 than in 2006, but since the number of persons not enumerated was also lower, the relative sizes of the two components of M ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGabmytayaaja aaaa@36D8@  remained unchanged.

Table 7.4.6.1 shows the numbers (for Canada) for the various components of the estimation of population undercoverage, namely the numbers for the three components of the X term.

Lastly, for the purpose of calculating the variance of the estimates, we have

v( U ^ )=v( M ^ X )=v( M ^ ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamODaiaacI caceWGvbGbaKaacaGGPaGaeyypa0JaamODamaabmaabaGabmytayaa jaGaeyOeI0IaamiwaaGaayjkaiaawMcaaiabg2da9iaadAhadaqada qaaiqad2eagaqcaaGaayjkaiaawMcaaaaa@43D6@

v( M ^ ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamODaiaacI caceWGnbGbaKaacaGGPaaaaa@392C@  = estimated variance of M ^ MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGabmytayaaja aaaa@36D8@  based on the RRC design

The RRC sample design is approximated by a stratified design with selection probabilities proportional to size. The sizes are selected so as to reproduce the final weights. The variance was calculated with StatMx, a module of Statistics Canada’s Generalized Estimation System (GES).

For more details on the estimation methods used in the 2011 RRC, see Théberge (2008).

Footnote

Footnote 1

A focal point is a provincial or territorial representative who coordinates activities between Statistics Canada and their provincial or territorial administration.

Return to footnote 1 referrer

Date modified: