Download Adjustment Methodologies for the Census of Agriculture

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Adjustment Methodologies for the Census of Agriculture
Andrea C. Lamas, Denise A. Abreu, Shu Wang, Daniel Adrian, Linda J. Young
•
•
National Agricultural Statistics Service (NASS)
Capture Recapture Methodology
NASS conducts hundreds of surveys including the Census of
Agriculture.
• Dual system estimation (DSE) requires two independent surveys.
• The goal is to get estimates for the Census of Agriculture.
NASS prepares reports covering every facet of United States
agriculture.
•
• Sample 1: Census of Agriculture records overlapping JAS tracts (not all
census records)
• Sample 2: June Area Survey tracts
For example:
• Production and food supplies
• Prices paid and received by farmers
• Farm income and finances
• Number of farms and land in farms
• DSE has a primary assumption that there is independence of the Census
and the JAS.
• DSE also assumes that the proportion of JAS farms capture by the Census
is equal to the proportion of U.S. farms captured by the Census.
• A farm is any place from which $1,000 or more of agricultural
products were produced and sold or normally would have been sold
during the year.
•
•
•
•
•
•
• DSE adjusts for farms that are not captured by either the Census of the JAS.
• Data required for the DSE are the matched Census records and JAS tracts.
Some special case examples are:
• In order to obtain this, the 2012 CML and JAS tracts are overlapped. The
records kept are CML respondents/non-respondents matching JAS tracts.
Also farms on the JAS that are not on the CML are kept.
Christmas trees
"government payment" farms
"pasture only" farms (at least 100 acres)
nurseries and greenhouses
exotic livestock
CML Records
Census of Agriculture
Census Sample
• Conducted every 5 years (years ending in 2 and 7)
• Count of all US Agricultural operations ($1000 or more in sales)
• Also collect information on agricultural operations’ commodities and
operator demographics
JAS Tracts
• Only source of uniform, comprehensive agricultural data for every
county or county equivalent in the U.S.
• Is a list-based survey. The list is referred to as the Census Mail List
(CML).
• Primarily mail data collection
Sources of Error
• Under-coverage
• Incompleteness of the list, which occurs when not all agricultural
operations appear on the Census Mail List
• Nonresponse
• Not all agricultural operations on the Census Mail List respond
• Errors in Census reporting:
• Misclassification of Census non-farms
• Occurs when non-farms are classified as farms. This
misclassification includes a subset of non-farms.
• Misclassification of Census farms
• Occurs when farms are classified as non-farms. This
misclassification omits a subset of farms.
U.S. Agricultural Operations
Unresolved Records
• In the matched dataset, farm status based on the Census and the JAS agree in
most cases.
• Resolved farm status.
•
Some records are identified as farms (non-farms) on the JAS and non-farms
(farms) on the Census.
• Unresolved farm status
•
To account for this, logistic model of the probability of an operation being a farm
based on records with resolved farm status is developed.
•
The final model is used to estimate the probability that each of the agricultural
operations with unresolved farm status is a farm.
•
•
Normalized JAS weights are used in the model.
The probability that an unresolved record is a farm will be reflected in a
reduction of the associated JAS weights.
wFi  wi pˆ Fi
where wi = initial JAS weight
p̂ Fi = predicted probability that a record is a farm
wFi = adjusted weight that will be used in the regression models
U.S. Farms
Operations on CML
Logistic Regression
• The matched dataset is used to model the probabilities of a
farm being on the CML, of responding, and of responding
as a farm.
Census Farms
• C = (CML|Farm)
Census Respondents
• R = (Responded|CML,Farm)
U.S. Agricultural Operations
• CCF = (CML Farm|CML,Responded,Farm)
June Area Survey (JAS)
• The JAS is an area-based survey. It is conducted annually and
is a theoretically complete sampling frame with no overlaps or
gaps.
• It uses a sampling rotation scheme every year with 20% of the
sample replaced each year. A sample rotation remains for 5
years. It is a stratified sample based on land-use and percent
of cultivation
• Some farms on CML may be misclassified and may be nonfarms.
• Use matched dataset to also model the probability of being
classified correctly
• CCFC = (Farm|CML Farm)
• Logistic regression is used to model the probabilities.
• JAS survey weights are used in the models.
• The weights are applied to the in-scope records (CML
respondents that are farms).
•
weight =

pCCFC ,i
pˆ Ci pˆ Ri pˆ CCFi
Example
JAS Tracts
• Segments of land are sampled. Sampled segments are divided
into tracts that represent unique land operating arrangements.
Matches
• The survey is conducted with in-person interviews. Crop and
livestock information is collected only on the agricultural tracts.
Census List
Resolve
Weights for
Unresolved
Records
P(R,CML, Census
Farm | Farm)
P(Farm | Census
Farm)
CML
Respondents
Total Farms =
Sum Weights
Conclusions
• Dual System Estimation adjusts for farms not captured by either the
Census or JAS.
• It identifies farms that have been undercounted or previously
missed by the Census (small and minority farms).