Skip to main content

Re: weighting

HUD.GOV HUDUser.gov
eList
[This was originally sent to Richard Levy; I intended it for the HAS list.]

At 11:37 AM 5/3/2004 -0400, Richard Levy wrote:

>From: American Housing Survey (AHS) ListServ <ahs@huduser.gov>
>
>I'm looking at households over time from the 1997, 1999, and 2001 national
>AHS
>surveys. What weight would one recommend applying? Please let me know. Thanks.
>
>Richard Levy
>Research Analyst
>202.974.2343
>rlevy@nmhc.org
>[...]

Greg Watson gave a thorough answer; I would like to add a few words.

On the occasions where I have done a longitudinal analysis, there has
always been a concept of "looking forward" or "looking backward".  One
dataset is chosen as your basis, and the other is either later (looking
forward) or earlier (looking backwards).  Then you use the weights from the
basis dataset.

For example, you can choose the 1999 survey as your basis, and analyze
various features about them in 2001; this is looking forward, and you
should use the 1999 weights. Your analysis is in terms of a set of
households or housing units in 1999, and their 2001 outcomes.  (It may seem
odd at first, using 1999 weights for 2001 measures (or some other equally
odd arrangement), but it becomes natural once you comprehend the situation.)

This puts one twist into the analysis that you don't find in single-year
analyses: cases that fail to match longitudinally.  First, you should be
concerned with those in the basis set that fail to match the other -- not
the others that fail to match the basis set.  Secondly, these cases
constitute a new category: the lost-from-sample (forward) or the
new-to-sample (backwards). Thus, you might report that of all
owner-occupied units in 1999, x percent were still owner-occupied in 2001,
y percent were converted to renter-occupied, and z percent were lost from
the sample.  Because of this issue, forward-looking analyses are easier to
understand -- or more natural to explain.

(Actually, the "lost-from-sample" can be more complex.  Some units are
truly dropped.  Others may change status, which makes a difference if, say,
you are limited to occupied interviews. Thus, you can make finer categories
out of the "lost" cases: dropped from sample, became vacant, failed to
interview.)

You can, alternatively, limit your analysis to those cases that are present
in both years. But you need to mention this in what you report: "Of all
owner-occupied units in 1999 which were also present in the 2001 survey, x
percent were ...in 2001".  Whether this presents a bias, I cannot comment
on.  (Whether you want to make estimates of actual numbers of households is
another matter. It can be awkward to say "... which were also present in
the 2001 survey" when reporting estimated actual numbers of households. The
alternative is to inflate the weights so that the total weight is the same
as that of the original basis dataset, but I'm not sure that that is a good
thing to do.)

You should keep in mind that the matching of cases from one year to another
matches housing units.  If you want to analyze attributes of the people
living in the units, you need to filter on whether the same people are
living there.  Usually, I use the SAMEHH variable for this (though there
may have been some problems with it in past years). This makes for yet
another category: moved.  Thus, in a forward-looking analysis, you might
report that so many were lost form the sample, and so many moved (had a
change of occupants). Or you would report estimates about households that
were not lost from the sample and did not move.

Finally, you mentioned that you want to compare three years of data. It
would be most natural to use 1997 as the basis, and do a forward-looking
analysis. But now you have two longitudinal matches to track -- when you
speak of cases lost from the sample or moved.

(Conceivably, 1999 (the middle year) could be the basis, but that would be
very awkward.)

I hope this helps.
-- David K.

David Kantor
Institute for Policy Studies
Johns Hopkins University
dkantor@jhu.edu
410-516-5404