Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

5 Data quality assessment and indicators

5.1 Sources of error
- 5.1.1 Sampling error
- 5.1.2 Non-sampling error
  - 5.1.2.1 Non-response bias for Aboriginal variables
5.2 Data suppression related to confidentiality and data quality
5.3 Coverage
- 5.3.1 Net undercoverage error for participating reserves
  - 5.3.1.1 Data sources
- 5.3.2 Coverage error for incompletely enumerated reserves and settlements
  - 5.3.2.1 Estimation model

The objective of data quality assessment is to evaluate the overall quality of survey data, so as to improve our understanding of how and where errors occur, and to inform users of the reliability of the data.

Although there are several potential sources of error, they can be grouped into two types: sampling error and non-sampling error. The former is present because when we estimate a characteristic, we are measuring only part of the population that may not represent the whole population, particularly with high non-response. The latter covers all errors that are not related to sampling (coverage, response, and processing errors).

This section is divided into three main sub-sections: the first deals with different sources of error; the second looks at data suppression related to confidentiality and data quality; and the third deals with coverage error related to Indian reserve communities.

5.1 Sources of error

5.1.1 Sampling error

The objective of the NHS is to produce estimates from a number of questions for a wide variety of geographies, ranging from very large areas (such as provinces and census metropolitan areas) to very small areas (such as neighbourhoods and municipalities), and for various population groups such as Aboriginal peoples and immigrants. These groups also vary in size, especially when cross-classified by geographic area. Such groupings are generally referred to as 'domains of interest.'

With a sampling rate of about 3 in 10 and a response rate of 68.6%, it is estimated that about 21% of the Canadian population participated in the NHS. Nevertheless, given the voluntary nature of the NHS, the quality of domain estimates may vary appreciably, in particular because of the variation in response rates from domain to domain.^Footnote1 Errors from non-response will have an effect on the measurement of sampling error, as it introduces non-response variability in the estimate.

The sampling error of the estimate is often expressed as a coefficient of variation (CV) which is the ratio of the standard error of the estimate over the estimate itself, expressed as a percentage. The CV is used to give an indication of the uncertainty associated with the estimates. While the CVs measured in the NHS are in the same magnitude as those from the 2006 Census long form, this may not be the case for lower levels of geography (see Note: NHS coefficients of variation for a comparison of estimates at various levels of geography for the 2011 NHS and the 2006 Census long form). A known effect of non-response is the reduction on the effective sample yield for any given area. The sampling rate for the NHS was purposefully much higher than the rate for the 2006 Census long form to compensate for the expected rate of non-response. Overall, this strategy worked as expected, however for lower areas of geography or for specific population domains, the differential response rates will still have an effect on the precision of the estimate. As non-response increases, the number of sample cases with a response are reduced, thus increasing the standard error and the CV.

Coefficients of variation are available for selected variables for Canada, provinces and territories. Downloadable files (CSV, 87 kb | TAB, 77 kb) are also available.

Top of Page

5.1.2 Non-sampling error

Besides sampling, a number of factors can cause errors in a survey's results. Respondents may be missed, incorrectly enumerated or counted more than once (coverage errors). Respondents or sometimes the survey representative, misunderstands a question and records an incorrect response or simply uses the wrong response box (response errors), and responses may be entered incorrectly during data capture coding and processing (processing errors).^Footnote2 These are examples of non-sampling errors that were thoroughly accounted for at every stage of collection and processing to mitigate their impact.

In addition, in every voluntary survey, non-response can also introduce error to the estimates from the survey's variables. A substantial portion of the non-sampling error (measurement errors) can be associated with non-response. There is a distinction to be made between partial non-response (lack of response to one or some questions) and total non-response (lack of response to the survey because the household could not be reached or refused to participate).

Total non-response to the survey not only reduces the effective number of participants to the survey (effect on sampling error), it is also likely to bias the estimates from the survey, because non-respondents tend to have different characteristics than respondents. As a result, there is a risk that the results will not be representative of the actual population.

There is non-response bias when a survey's non-respondents differ from its respondents in how they would have responded to the information collected. In that case, the higher a survey's non-response is, the greater the risk of non-response bias. Non-response weight adjustments are meant to compensate for these errors. The quality of the estimates can be affected if such a bias persists.

Several methods can be used during data collection or processing to minimize non-response bias. NHS non-response follow-up was planned in such a way as to maximize the survey's response rate and control potential non-response bias due to the survey's voluntary nature. Ultimately, non-response adjustments done to the survey weights resulting from the non-response follow-up and calibration to known demographic and geographic benchmarks reduce bias significantly.

The NHS has an unweighted response rate of 68.6%, and a weighted response rate of 77.2%.

The weighted response rate is the measurement of the reduced impact of non-response after non-response follow-up. Statistics Canada conducted several studies and various simulations, before and after collection, to assess the risk and extent of the potential bias. A number of measures were taken to mitigate its effects.^Footnote3

5.1.2.1 Non-response bias for Aboriginal variables

Several data sources were used to evaluate the NHS estimates for Aboriginal variables such as: 2011 Census results for mother tongue (since a relationship exists between language and Aboriginal identity, Registered or Treaty Indian status and Membership in a First Nation/Indian band); the 2001 and 2006 censuses; Population Projections by Aboriginal Identity in Canada; and administrative data pertaining to Registered Indians from Aboriginal Affairs and Northern Development Canada (AANDC).

It is impossible to definitively determine how much the NHS may be affected by non-response bias. However, based on information from other data sources, evidence of non-response bias does exist for certain populations and for certain geographic areas. Generally, the risk of bias increases for lower levels of geography and for smaller populations.

On the basis of the estimates and trends from the sources mentioned above, evidence suggests that biases were in general well mitigated, with exceptions. The Inuit population living outside of Inuit Nunangat appears to be slightly overestimated at the national level. The magnitude of this overestimation appears to be higher (and more variable) for some smaller geographic areas. Additionally, while the NHS results show increases of the Métis population and the First Nations population living off reserve as compared with data from the 2006 Census, many factors, other than non-response bias, could explain the growth of these populations, including changes in reporting patterns and the propensity of people to self-identify as an Aboriginal person.

Top of Page

5.2 Data suppression related to confidentiality and data quality

Data disseminated by the NHS are also subjected to a variety of automated and manual processes to determine whether the data need to be suppressed in order to maintain confidentiality (non-disclosure) and data quality.

5.2.1 Data suppression related to confidentiality (non-disclosure)

All NHS data are subject to confidentiality suppression rules, to ensure non-disclosure of individual respondent identity and characteristics. The following describes the various suppression rules used to ensure confidentiality.

5.2.1.1 Area suppression for standard geographic areas

Area suppression is used to remove all characteristic data for geographic areas below a specified population size. The specified population size for all standard^Footnote4 areas or aggregations of standard areas is 40, except for blocks, block-faces or postal codes. Consequently, no characteristics or tabulated data are released for areas, such as CSDs (municipalities or Indian reserves or settlements and unorganized territories) below a total population size of 40.^Footnote5

5.2.1.2 Area suppression for income characteristics data

Area suppression is used to replace all income characteristic data with an 'x' for geographic areas with populations and/or number of households below a specific threshold.

If an NHS tabulation contains quantitative income data (e.g., total income, wages), qualitative data based on income concepts (e.g., low income before tax status) or derived data based on quantitative income variables (e.g., indexes) for individuals, families or households, then the following rule applies: income characteristic data are replaced with an 'x' for areas where the estimated population is less than 250 or where the number of private households is less than 40. The private household threshold does not apply for tabulations based on place of work geographies.

5.2.1.3 Random rounding

All estimates in NHS tabulations are subjected to a process called random rounding. Random rounding is a method used to modify an estimate to a value ending in '5' or '0.' It is either greater or less than the original value. This reduces the possibility of identifying individuals within the tabulations.

All counts greater than 10 are rounded to base 5, counts less than 10 are rounded to base 10. This means that any counts less than 10 will always be changed to 0 or 10. Table 3 below shows the effect of rounding on counts with a value less than 10.

Table 3
Random rounding frequency

Table 3
Random rounding frequency
Table summary
Random rounding frequency
Count of	Will round to 0	Will round to 10
1	9 times out of 10	1 time out of 10
2	8 times out of 10	2 times out of 10
3	7 times out of 10	3 times out of 10
4	6 times out of 10	4 times out of 10
5	5 times out of 10	5 times out of 10
6	4 times out of 10	6 times out of 10
7	3 times out of 10	7 times out of 10
8	2 times out of 10	8 times out of 10
9	1 time out of 10	9 times out of 10
0	Always	Never

The random rounding algorithm uses a random seed value to initiate the rounding pattern for tables. In these routines, the method used to seed the pattern can result in the same count in the same table being rounded up in one execution and rounded down in the next.

Top of Page

5.2.1.4 Suppression of NHS estimates for confidentiality reasons

The previous section discussed random rounding for estimates in NHS tabulations and minimum population thresholds in order to protect the anonymity of respondents. Random rounding is used as a means of protecting confidentiality in counts when characteristics become rare. Analysis of NHS estimates revealed that weighted data may result in high estimates that meet the aforementioned population suppression threshold; in these cases individuals with rare characteristics could be more easily identified in a table, particularly if their characteristics are publicly known.

Consequently, for all quantitative variables, a statistic is suppressed if the number of actual records used in the calculation (not rounded or weighted) is less than 4. For quantile statistics, an alternate minimum number of records apply: for quartiles, quintiles and deciles, 20 records are required, and for percentiles, 400 records are required.

For more information on confidentiality (non-disclosure) rules, refer to the Data Quality and Confidentiality Standards and Guidelines (Public).

5.2.2 Data quality indicators

Data dissemination, in addition to being suppressed (limited) for confidentiality (non-disclosure) reasons, may also be limited as a result of unacceptable data quality (which will subsequently be referred to as data quality).^Footnote6

5.2.2.1 Global non-response rates

The global non-response rate (GNR) is an indicator of data quality which combines complete non-response and partial non-response to the survey. A smaller GNR indicates a lower risk of non-response bias, i.e., a lower risk of lack of accuracy. Global non-response rates are determined for each of the NHS geographic areas. These areas are flagged on the database according to the non-response rate. Geographic areas with a global non-response rate higher than or equal to 50% are suppressed from standard data products but will be available as a custom request. Geographic areas with a global non-response rate lower than 50% are identified in tabulations, but not suppressed.^Footnote7 In addition, while characteristics data were suppressed for these areas, they were included in all higher geographic level tabulations.

Canada has a total of 147 census metropolitan areas (CMAs) and census agglomerations (CAs). For all of these areas, the global non-response rate is less than 50% and published NHS data are available in standard products. In addition, NHS standard products are available for all 293 census divisions (CDs) and all 308 federal electoral districts (FEDs).

With a global non-response rate threshold of 50% for the release of NHS data, estimates are published for a majority of the total 5,253 census subdivisions (CSDs) or municipalities. A total of 4,567 CSDs have an estimated population of more than 40 (for confidentiality reasons, those with a population of less than 40 are not published); 686 CSDs have a population of less than 40 (including zero population – or uninhabited CSDs). Of the 4,567 CSDs with a population of more than 40, NHS estimates are available in standard products for 3,439 (75.3%).^Footnote8

Top of Page

5.2.3 Other occurrences when data are suppressed or not available

In addition to being suppressed for confidentiality and data quality reasons, data may also be suppressed or not available for reasons related to data collection.

5.2.3.1 Suppression of citizenship, landed immigrant status and period of immigration data – Indian reserve N2 suppression

Suppression of data also occurs when certain questions are not asked of all respondents. Persons living on Indian reserves and Indian settlements who were enumerated with the 2011 NHS N2 questionnaire were not asked the questions on citizenship (Question 10), landed immigrant status (Question 11) and year of immigration (Question 12). However, it was possible that a census subdivision (CSD) or lower geographic area was enumerated using both the N2 questionnaire (for the on-reserve population) and the N1 questionnaire (for the off-reserve population). In this case, the following rules were used to determine if suppression had to be applied to all citizenship and immigration data for that CSD (or lower geographic area):

If the population estimate from N1 questionnaires was higher than the population estimate from N2 questionnaires (based on weighted results), then citizenship and immigration estimates were included in the CSD estimates.
If the population estimate from N2 questionnaires was higher than or equal to the population estimate from N1 questionnaires (based on weighted results), then citizenship and immigration estimates were excluded from the CSD estimates.

Consequently, citizenship, landed immigrant status and period of immigration data are suppressed for Indian reserves and Indian settlements at census subdivision and lower levels of geography where the majority of the population was enumerated with the N2 questionnaire. These data are, however, included in the totals for larger geographic areas, such as census divisions and provinces.

For a complete list of Indian reserves and Indian settlements for which citizenship, landed immigrant status and period of immigration data are suppressed, refer to Indian reserves and Indian settlements for which citizenship, landed immigrant status and period of immigration data are suppressed.

5.2.3.2 Incompletely enumerated areas

In 2011, there were a total of 36 Indian reserves and Indian settlements reported^Footnote9 as 'incompletely enumerated' in the NHS. For 18 reserves or settlements, census enumeration was either not permitted or was interrupted before it could be completed and so the NHS was not administered in those areas. For four reserves or settlements, census enumeration was completed however data collection for the NHS was not permitted or interrupted, and in one reserve or settlement it was determined that there was no resident population contrary to what was erroneously reported in the census. In the case of 13 reserves in Northern Ontario, enumeration was delayed because of natural events (specifically forest fires) and estimates for these communities are not included in geographic areas that include these communities (e.g., provincial and national estimates). For these 13 reserves, separate tables are made available (see Section 4.2.1).

Top of Page

5.2.4 Data availability from the NHS for census subdivisions

5.2.4.1 Data availability from the NHS for communities (census subdivisions) with Aboriginal identity population

Of the total 5,253 CSDs in Canada, 3,972 CSDs have estimates of Aboriginal identity population. Table 4 shows the number of CSDs for which NHS Aboriginal identity estimates are available, in addition to the number of CSDs for which Aboriginal identity estimates are suppressed.

Among the 3,972 CSDs that have Aboriginal identity population, NHS Aboriginal identity estimates are available in 2,385 (60.0%). Aboriginal identity estimates are suppressed for confidentiality reasons in a total of 1,044 CSDs (in 217 because the total population is less than 40, and in 827 because Aboriginal identity cell counts are less than 4 – suppression is applied for confidentiality reasons). In addition, Aboriginal identity estimates (as well as other characteristics) are suppressed for 543 CSDs due to data quality – the global non-response rate is 50% or greater – in these CSDs data are of unacceptable quality, the response rates are not high enough to produce a valid statistical picture.

Table 4
CSDs with Aboriginal identity estimates, by type of data suppression and data availability, Canada, provinces and territories, 2011 NHS

Table 4 shows the distribution of census subdivisions with Aboriginal identity estimates, by type of data suppression and data availability, Canada, provinces and territories, 2011 National Household Survey.
Table summary
Table 4 shows the distribution census subdivisions with Aboriginal identity estimates, by type of data suppression and data availability for Canada, the provinces and the territories, 2011 National Household Survey.
The table heading is: Number of census subdivisions with Aboriginal identity estimates.
The columns contain a list of Canada, provinces and territories.
The rows contain the following different types of data suppression and availability:
Total census subdivisions
Total incompletely enumerated reserves (census subdivisions)
Total census subdivisions with no population
Total census subdivisions (with population)
Census subdivisions with Aboriginal identity population
Census subdivisions with Aboriginal identity population and total population less than 40 (for confidentiality reasons, census subdivisions with a population of less than 40 are not published)
Census subdivisions with Aboriginal identity population and total population 40 or greater and Aboriginal identity cell count less than 4 (for these census subdivisions, National Household Survey counts are suppressed for confidentiality reasons)
Census subdivisions with Aboriginal identity population and total population 40 or greater and Aboriginal identity cell count greater than 3 and global non-response rate equal to or greater than 50% (for these census subdivisions, National Household Survey counts are suppressed for data quality reasons)
Census subdivisions for which Aboriginal identity estimates are available – census subdivisions with Aboriginal identity population and total population 40 or greater and Aboriginal identity cell count greater than 3 and global non-response rate is less than 50%
This cell is intentionally left blank	Canada	N.L.	P.E.I.	N.S.	N.B.	Que.	Ont.	Man.	Sask.	Alta.	B.C.	Y.T.	N.W.T.	Nvt.
This cell is intentionally left blank	Number of CSDs with Aboriginal identity estimates
Total CSDs	5,253	376	113	99	273	1,285	574	287	959	435	743	37	41	31
Total incompletely enumerated reserves (CSDs)	36	0	0	0	0	7	22	3	2	1	1	0	0	0
Total CSDs with no population	316	10	0	4	3	102	16	10	60	20	76	6	3	6
Total CSDs (with population)	4,901	366	113	95	270	1,176	536	274	897	414	666	31	38	25
CSDs with Aboriginal identity population	3,972	206	72	93	221	877	522	268	616	346	658	30	38	25
CSDs with Aboriginal identity population and total population less than 40 (for confidentiality reasons, CSDs with a population of less than 40 are not published)	217	1	2	6	2	1	4	5	42	10	134	6	4	0
CSDs with Aboriginal identity population and total population 40 or greater and Aboriginal identity cell count less than 4 (for these CSDs NHS counts are suppressed for confidentiality reasons)	827	67	44	8	77	310	27	29	199	49	17	0	0	0
CSDs with Aboriginal identity population and total population 40 or greater and Aboriginal identity cell count greater than 3 and GNR equal to or greater than 50% (for these CSDs, NHS counts are suppressed for data quality reasons)	543	29	7	9	24	57	83	66	121	50	84	9	0	4
CSDs for which Aboriginal identity estimates are available – CSDs with Aboriginal identity population and total population 40 or greater and Aboriginal identity cell count greater than 3 and GNR is less than 50%	2,385	109	19	70	118	509	408	168	254	237	423	15	34	21
Source: Statistics Canada, National Household Survey, 2011.

The proportion of CSDs for which Aboriginal identity estimates are available (including population estimates and characteristics) varies by province and territory. With the exception of Prince Edward Island (26.4%) and Saskatchewan (41.2%), in all other provinces and territories, estimates are available for at least half of the CSDs with Aboriginal identity estimates.

Methodological changes

The availability of Aboriginal estimates, at the CSD level, reflects a change made to the data quality threshold over the 2006 Census release. The NHS 50% threshold is based on studies of the global non-response rate in relation to the indicators of non-response bias (see Section 5.5 of the National Household Survey User Guide, Catalogue no. 99-001-X2011001). The studies showed that with a global non-response rate of 50% or more, the bias was so large that the estimates were not of sufficiently high quality.

Top of Page

5.2.4.2 Data availability from the NHS for 'on reserve' communities (CSDs)

The 2011 NHS and the 2011 Census 'on reserve' area of residence is comprised of a total of 997 CSDs, including 159 uninhabited CSDs, 36 incompletely enumerated Indian reserves and settlements and 802 habited CSDs. Among the 802 'on reserve' inhabited CSDs (with population), three (3) CSDs did not have an Aboriginal identity population.^Footnote10 Of the 799 CSDs with an Aboriginal identity population, 190 were suppressed because they had less than 40 total population (Aboriginal and non-Aboriginal), one additional reserve CSD was suppressed because the Aboriginal identity cell count was less than four (4), and another 36 CSDs were suppressed for data quality reasons – the GNR was greater than or equal to 50%. Overall, Aboriginal data (population estimates and characteristics) are available for 572 'on reserve' CSDs. Table 5 shows the number of communities (CSDs) defined as 'on reserve' for which NHS Aboriginal identity data are available.

Table 5
'On reserve' CSDs, by type of data suppression and data availability, Canada, provinces and territories, 2011 NHS

Table 5 shows the distribution of 'on reserve' census subdivisions with Aboriginal identity estimates, by type of data suppression and data availability, Canada, provinces and territories, 2011 National Household Survey.
Table summary
Table 5 shows the distribution of 'on reserve' CSDs with Aboriginal identity estimates, by type of data suppression and data availability for Canada, the provinces and the territories, 2011 NHS.
The table heading is: Number of 'on reserve' CSDs with Aboriginal identity estimates.
The columns contain a list of Canada, provinces and territories.
The rows contain the following different types of data suppression and availability:
Total census subdivisions
Total incompletely enumerated reserves (census subdivisions)
Total census subdivisions with no population
Total census subdivisions (with population)
Census subdivisions with Aboriginal identity population
Census subdivisions with Aboriginal identity population **and** total population less than 40 **(for confidentiality reasons, census subdivisions with a population of less than 40 are not published)**
Census subdivisions with Aboriginal identity population **and** total population 40 or greater **and** Aboriginal identity cell count less than 4 **(for these census subdivisions, National Household Survey counts are suppressed for confidentiality reasons)**
Census subdivisions with Aboriginal identity population **and** total population 40 or greater and Aboriginal identity cell count greater than 3 and global non-response rate equal to or greater than 50% **(for these census subdivisions, National Household Survey counts are suppressed for data quality reasons)**
**Census subdivisions for which Aboriginal identity estimates are available** – census subdivisions with Aboriginal identity population **and** total population 40 or greater **and** Aboriginal identity cell count greater than 3 **and** global non-response rate is less than 50%
This cell is intentionally left blank	Canada	N.L.	P.E.I.	N.S.	N.B.	Que.	Ont.	Man.	Sask.	Alta.	B.C.	Y.T.	N.W.T.	Nvt.
This cell is intentionally left blank	Number of 'on reserve' CSDs with Aboriginal identity estimates
Total CSDs	997	3	4	25	18	42	144	79	170	85	425	0	2	0
Total incompletely enumerated reserves (CSDs)	36	0	0	0	0	7	22	3	2	1	1	0	0	0
Total CSDs with no population	159	0	0	4	0	0	12	6	52	11	74	0	0	0
Total CSDs (with population)	802	3	4	21	18	35	110	70	116	73	350	0	2	0
CSDs with Aboriginal identity population	799	3	4	21	18	35	109	70	116	73	348	0	2	0
CSDs with Aboriginal identity population and total population less than 40 (for confidentiality reasons, CSDs with a population of less than 40 are not published)	190	0	2	6	2	1	3	5	27	9	134	0	1	0
CSDs with Aboriginal identity population and total population 40 or greater and Aboriginal identity cell count less than 4 (for these CSDs NHS counts are suppressed for confidentiality reasons)	1	0	0	0	0	0	0	0	0	0	1	0	0	0
CSDs with Aboriginal identity population and total population 40 or greater and Aboriginal identity cell count greater than 3 and GNR equal to or greater than 50% (for these CSDs, NHS counts are suppressed for data quality reasons)	36	0	0	0	0	1	16	4	7	0	8	0	0	0
CSDs for which Aboriginal identity estimates are available – CSDs with Aboriginal identity population and total population 40 or greater and Aboriginal identity cell count greater than 3 and GNR is less than 50%	572	3	2	15	16	33	90	61	82	64	205	0	1	0
Source: Statistics Canada, National Household Survey, 2011.

The availability of Aboriginal identity data for 'on reserve' communities varies among the provinces and territories. With the exception of Prince Edward Island (50.0%), Northwest Territories (50.0%) and British Columbia (58.9%), in all other provinces and territories, Aboriginal identity data are available in at least 70% of the 'on reserve' CSDs.

Top of Page

Table 6
Data availability for on-reserve communities, 2011 NHS, 2006 and 2001 censuses

Table 6 shows the data availability for on reserve communities for the 2011 National Household Survey and the 2006 and 2011 censuses.
Table summary
Table 6 shows the data availability for on reserve communities for the 2011 National Household Survey and the 2006 and 2011 censuses.
The column headings are: 2011 National Household Survey number, 2011 National Household Survey percentage, 2006 Census number, 2006 Census percentage, 2001 Census number, 2001 Census percentage.
The rows contain the following different types of data suppression and availability:
Total reserve communities – inhabited
Incompletely enumerated reserve communities – data not available
Reserve communities where total population size is less than 40
Reserve communities where total population size is equal to or greater than 40 for which partial data are available (Note: Partial data refers to population and dwelling counts only – no characteristics data are available.)
Reserve communities with total population size is equal to or greater than 40 for which full data are available (Note: Full data refers to population and dwelling counts as well as characteristics data.)
This cell is intentionally left blank	Number	%	Number	%	Number	%
This cell is intentionally left blank	2011 NHS		2006 Census		2001 Census
Total reserve communities – inhabited	837	100.0	887	100.0	881	100.0
Incompletely enumerated reserve communities – data not available	35	4.2	22	2.5	30	3.4
Reserve communities where total population size is less than 40	193	23.1	184	20.7	190	21.6
Reserve communities where total population size is equal to or greater than 40 for which partial^{Table footnote 1} data are available	37	4.4	106	12.0	64	7.3
Reserve communities with total population size is equal to or greater than 40 for which full^{Table footnote 2} data are available	572	68.3	575	64.8	597	67.8
Notes: Table footnote 1 Partial data refers to population and dwelling counts only – no characteristics data are available. Return to table footnote 1 referrer Table footnote 2 Full data refers to population and dwelling counts as well as characteristics data. Return to table footnote 2 referrer Sources: Statistics Canada, censuses of population, 2001 and 2006, and National Household Survey, 2011.

Changes in the availability of information over time can be linked to changes in methodology which affect the level of global non-response rates, as well as the additional suppression for confidentiality reasons. In the 2011 NHS, information was suppressed when the unweighted and unrounded cell count contributing to the estimates is less than four (4). Another change regarding the amount of data available between the NHS and the previous censuses is the global non-response rate (GNR) which is used to determine what data are released – in the previous censuses, data were suppressed for CSDs with GNR ≥ 25%, whereas in the NHS, data are suppressed for CSDs with GNR ≥ 50%. The 50% threshold for the NHS is based on studies of correlation between the global non-response rate and indicators of non-response bias. The studies showed that the relationship between potential average biases and non-response was generally acceptable up to a GNR threshold of 50%, allowing for the release of estimates of sufficiently high quality.

Overall, when all groupings of data availability are considered, the data availability for 'on reserve' communities (CSDs) have increased in 2011. The proportion of reserves for which the complete set of data (full data) are available has gone from 67.8% in 2001, to 64.8% in 2006 (using among others, data quality threshold of a GNR < 25%) and to 68.3% in the 2011 NHS (using the data quality threshold of a GNR < 50%).

As noted earlier, there were a total of 36 Indian reserves and Indian settlements that were 'incompletely enumerated' in the NHS, compared to 22 in 2006 and 30 in 2001. However, it should be noted that the 36 incompletely enumerated reserves in 2011 include 13 reserves in Northern Ontario for which NHS enumeration was not possible at the time of NHS data collection in May 2011, because of forest fires. These 13 reserves were enumerated later and their data are released in a separate set of special tables (see Section 4.2.1). The proportion of incompletely enumerated reserves for which no data are available remained almost unchanged (2.8% vs. 2.5%) from 2006 to 2011, when the 13 Northern Ontario reserves are excluded from the 2011 estimate.

There was a slight increase from 2006 to 2011 in the proportion of small reserves with a total population size less than 40 for which only population counts are available, from 20.7% in 2006 to 23.1% in 2011.

Top of Page

5.3 Coverage

There are two types of coverage error. Population undercoverage refers to the error of excluding someone who should have been enumerated. Population overcoverage refers to the error of either enumerating someone more than once or including someone who should not have been enumerated. Undercoverage is more common than overcoverage. The net impact of undercoverage and overcoverage on the size of a population of interest is population net undercoverage. Net undercoverage is calculated as the number of persons excluded who should have been enumerated (undercoverage) less the number of excess enumerations of persons enumerated more than once (overcoverage). It is the net of undercoverage and overcoverage, census net population undercoverage that quantifies the net number of persons missed by the census.

This section presents estimates of census net population undercoverage for the 2011 Census for people residing on participating Indian reserves and settlements including people without Aboriginal identify.

Coverage error generally occurs during the field collection stage of the census. Examples of undercoverage and overcoverage are:

Examples of undercoverage

A person temporarily out of the country during the collection of the census is missed.
A questionnaire was returned but someone who lived there was not included.
The dwelling never received a questionnaire.
Persons residing at more than one address may be missed at both addresses because of the uncertainty of what is their main address.
Persons who do not reside at a fixed address are often missed by the census.

Examples of overcoverage

Children whose parents live in separate households where each parent includes the children in their questionnaire.
Young adults, newly away from home perhaps searching for work or attending a post-secondary institution, who are listed there and at home by their parents.
Persons whose employment requires them to live away from home. They are listed at both locations.
Persons in institutions who are also listed by their families as living at home.

5.3.1 Net undercoverage error for participating reserves

Table 7 gives estimates of 2011 Census net undercoverage for persons living on participating reserves for Canada and for each province and territory.^Footnote11 The split between persons with Aboriginal identity and persons without Aboriginal identity is not available. The rate of census net undercoverage, on a net basis indicates what proportion of the entire population that should have been enumerated but was missed in 2011 Census tabulations. Negative estimates mean that the estimates of overcoverage were higher than the estimates of undercoverage.

Table 7
The 2011 Census population net undercoverage for participating reserve communities, Canada, provinces and territories

This table shows the 2011 Census population net undercoverage for participating reserve communities, for Canada, the provinces and the territories.
Table summary
This table shows the 2011 Census population net undercoverage for participating reserve communities, for Canada, the provinces and the territories.
The column headings are: provinces and territories, census count (number), census net undercoverage estimated number, census net undercoverage standard error, census net undercoverage estimated percent rate, census net undercoverage standard percent error.
The rows contain a list of Canada, the provinces and the territories.
Provinces and territories	Census count	Census net undercoverage
Provinces and territories	Number	Estimated number	Standard error	Estimated error (%)	Standard error (%)
Source: Statistics Canada, 2011 Census of Population.
Canada	351,394	10,125	6,135	2.80	1.65
Newfoundland and Labrador	3,165	379	317	10.68	7.98
Prince Edward Island	514	-6	23	-1.16	4.61
Nova Scotia	9,629	-102	311	-1.08	3.30
New Brunswick	7,855	658	389	7.73	4.21
Quebec	40,455	-1,965	2,381	-5.11	6.50
Ontario	42,328	613	2,938	1.43	6.74
Manitoba	64,000	2,478	1,962	3.73	2.84
Saskatchewan	56,977	-765	2,017	-1.36	3.64
Alberta	47,050	5,815	2,865	11.00	4.82
British Columbia	79,127	2,998	2,625	3.65	3.08
Yukon	0	0	0	0.00	0.00
Northwest Territories	294	23	31	7.11	9.14
Nunavut	0	0	0	0.00	0.00

The estimate of population overcoverage, for a particular geography such as participating reserves, includes persons who appear on questionnaires for two dwellings where at least one of the dwellings is on reserve. The other dwelling may be on the same reserve, on a different reserve, or not on a reserve. Since the Census Overcoverage Study (COS) does not determine at which dwelling an individual should have been listed at, the assumption is made that it is equally likely that the individual should have been listed at the first dwelling as at the second dwelling. Therefore, in order to produce estimates of overcoverage, half of the weight for the person is assigned to each dwelling. This concept is important for small domains such as the on reserve population. About half of the overcoverage cases involving a dwelling on reserve also involved a dwelling off reserve.

Top of Page

5.3.1.1 Data sources

The estimates of 2011 Census of Population coverage error are derived from 2011 Census data and the results of two studies. The Reverse Record Check (RRC) measures population undercoverage while the Census Overcoverage Study (COS) measures population overcoverage. In the RRC, a random sample of individuals representing the census target population is taken from frames independent of the 2011 Census. The 2011 Census database is searched to determine if these people had indeed been enumerated. When required, an interview (mostly by telephone) was conducted to collect further information to declare the individual as in or not in scope for the census, and when in scope, to provide further data to ascertain that individual's coverage status.

Overcoverage is measured by matching the 2011 Census database to a partial list of persons who should have been enumerated (a list constructed from administrative data sources), and by matching the 2011 Census database to itself. The COS applies statistical matching, which identifies matches that are close or exact. Pairs of potential duplicates are sampled and the sampled person's name and demographic characteristics are used to identify the cases of duplication.

For more information on 2011 Census population coverage error, refer to Final estimates of 2011 Census coverage.

5.3.2 Coverage error for incompletely enumerated reserves and settlements

As noted earlier, some Indian reserves and settlements did not participate in the census as enumeration was not permitted or was interrupted before completion. In 2011, there were 31 incompletely enumerated reserves in the census (36 in the NHS). For 13 of these reserves, enumeration was prevented by forest fires in Northern Ontario at the time of the census. Census and NHS collection for these 13 reserves was conducted at a later date (fall 2011), and these data are disseminated in a special series of tables (see Section 4.2.1) and are not included in any census or NHS tabulation because of the different collection period. For the remaining 18 incompletely enumerated reserves, census/NHS data are not available and therefore have not been included in any census or NHS tabulation. For four additional reserves, while the census information exists, no NHS data was collected and therefore data for these reserves have not been included in NHS tabulations. Of the five reserves incompletely enumerated in the NHS, only four are considered as participating reserves and their coverage measurements are included in the Section 5.3.1. The remaining reserve, Opaskwayak Cree Nation 27A (Carrot River) was determined to be uninhabited, despite being listed as inhabited in the census.

These areas present unique problems for the coverage studies and for the Population Estimates Program. The survey population of the Reverse Record Check (RRC) does not include those residents where the census was unable to collect any data. However, the Population Estimates Program requires an estimate of the permanent resident population living in these areas. For the 13 reserves in Northern Ontario, a base estimate exists and so do not need to be estimated by a model. However, as neither the census nor the RRC is in a position to produce an estimate of the population living in the remaining 18 areas, a model-based methodology was used for these reserves. The resulting estimates should be used with caution as they are based entirely on a model. Table 8 gives the national model results.

Table 8
Model estimated counts and rates for incompletely enumerated Indian reserves (IER) and settlements for Canada, 2001, 2006 and 2011

Table 8 Model estimated counts and rates for incompletely enumerated Indian reserves and settlements for Canada, 2001, 2006 and 2011
Table summary
Model estimated counts and rates for incompletely enumerated Indian reserves and settlements for Canada, 2001, 2006 and 2011
This cell is intentionally left blank	Canada
2001 estimate of IER population	34,992
2006 estimate of IER population	40,115
2011 estimated census count	37,574
2011 net undercoverage rate	-0.4%
2011 net undercoverage	-182
2011 population estimate for IER	37,392
Note: These estimates should be used with caution as they are based on a model whose assumptions cannot be verified. Sources: Statistics Canada, 2001, 2006 and 2011 censuses of population.

The 2011 net undercoverage rate is different from the rate presented in Section 5.3.1 for participating Indian reserves and settlements as it is calculated by dividing the 2011 net undercoverage for incompletely enumerated Indian reserves and settlements by the corresponding adjusted 2011 population estimate.

In the 2006 Census, 22 reserves, with an estimated 40,100 persons, were classified as 'incompletely enumerated.' Among the 31 reserves and settlements considered as incompletely enumerated in the 2011 Census, 18 were 'incompletely enumerated' or 'refusal' while the other 13 were enumerated at a later date. The 2011 estimates of the incompletely enumerated population are approximately 6.8% lower than the 2006 estimates.

Top of Page

5.3.2.1 Estimation model

A two-step estimation model was developed to estimate the population. The first step uses a simple linear regression to predict the census count in 2011 for the 18 reserves where no data was collected. The linear regression was constructed using all Indian reserves that were completely enumerated in both the 2006 Census and the 2011 Census. The model assumes a linear growth from 2006 to 2011 for all provinces with separate estimates for the intercept and the regression parameters for each province. The model was evaluated for the basic regression assumptions of independence of errors, homogeneity of variances and normality of errors. For the 13 reserves where late enumeration was done, their counts were used as enumerated for this first step.

For each incompletely enumerated reserve, the input variable for the regression model was either the actual census count in 2006 or the best predicted census count from the 2006 model, or the late enumeration for the 13 reserves in Northern Ontario. The output of the model was the estimated census count in 2011.

The second step is done to produce consistency with the results of the census coverage studies. An adjustment was made to the estimated 'census' count to account for net undercoverage of all subjected census counts. Net undercoverage for the incompletely enumerated reserves was estimated by calculating the net undercoverage rate for all completely enumerated reserves in each province and then applying that rate to the estimated 'census' count of all the incompletely enumerated Indian reserves in the province. The estimated 'census' count and the 'estimated net missed persons' in each reserve were then summed to create an 'estimated' population for the incompletely enumerated Indian reserves.

Footnotes

Footnote 1

For more information on sampling error, refer to the National Household Survey User Guide, Catalogue no. 99-001-X2011001.

Return to footnote 1 referrer

Footnote 2

Processing errors can occur at various steps including coding, when 'write-in responses' are transformed into numerical codes; data capture, when responses are transferred from the questionnaire in an electronic format, by optical character recognition methods or key-entry operators; and imputation, when a 'valid,' but not necessarily correct, response is inserted into a record by the computer to replace missing or 'invalid' data ('valid' and 'invalid' referring to whether or not the response is consistent with other information on the record).

Return to footnote 2 referrer

Footnote 3

For more information refer to NHS data quality assessment process and indicators.

Return to footnote 3 referrer

Footnote 4

For more information on standard areas, refer to the 2011 Census Dictionary, Catalogue no. 98-301-X.

Return to footnote 4 referrer

Footnote 5

For more information refer to area suppression for NHS standard and non standard geographic areas.

Return to footnote 5 referrer

Footnote 6

The NHS standard products contain the following two confidentiality and data quality symbols:
... not applicable
x suppressed to meet the confidentiality requirements of the Statistics Act

Return to footnote 6 referrer

Footnote 7

For more information concerning global non-response rates in the NHS, refer to the Data Quality and Confidentiality Standards and Guidelines (Public).

Return to footnote 7 referrer

Footnote 8

Refer to the National Household Survey User Guide for provincial/territorial distribution of published CSDs (Coverage of published NHS data).

Return to footnote 8 referrer

Footnote 9

For more information on 2011 NHS incompletely enumerated Indian reserves and Indian settlements.

Return to footnote 9 referrer

Footnote 10

These may be reserves with lease land occupied by non-Aboriginal persons or they may be reserves where the Aboriginal population participated in the census but did not participate in the NHS. Note that these three reserves have less than 40 total population, and for confidentiality reasons, no characteristics data are released, only total population estimates are published.

Return to footnote 10 referrer

Footnote 11

The estimates do not include the reserves that did not participate to the 2006 Census, but participated to the 2011 Census.

Return to footnote 11 referrer

Date modified:: 2015-12-31

Language selection

Search and menus

Search

Archived Content

5 Data quality assessment and indicators

Table of contents

5.1 Sources of error

5.1.1 Sampling error

5.1.2 Non-sampling error

5.1.2.1 Non-response bias for Aboriginal variables

5.2 Data suppression related to confidentiality and data quality

5.2.1 Data suppression related to confidentiality (non-disclosure)

5.2.1.1 Area suppression for standard geographic areas

5.2.1.2 Area suppression for income characteristics data

5.2.1.3 Random rounding

5.2.1.4 Suppression of NHS estimates for confidentiality reasons

5.2.2 Data quality indicators

5.2.2.1 Global non-response rates

5.2.3 Other occurrences when data are suppressed or not available

5.2.3.1 Suppression of citizenship, landed immigrant status and period of immigration data – Indian reserve N2 suppression

5.2.3.2 Incompletely enumerated areas

5.2.4 Data availability from the NHS for census subdivisions

5.2.4.1 Data availability from the NHS for communities (census subdivisions) with Aboriginal identity population

5.2.4.2 Data availability from the NHS for 'on reserve' communities (CSDs)

5.3 Coverage

5.3.1 Net undercoverage error for participating reserves

5.3.1.1 Data sources

5.3.2 Coverage error for incompletely enumerated reserves and settlements

5.3.2.1 Estimation model