Saturday, December 1, 2012

PUF the magic survey

We have a PUF.

That is, the survey on which I work, the Rental Housing Finance Survey (www.census.gov/hhes/rhfs) has just completed the first draft of a public use file (PUF). That is, the file with the data we collected from respondents. Multifamily housing researchers are eager to analyze the data, see the current state of the multifamily rental industry, and also see how the multifamily housing industry and stock have changed over the past ten or so years, since the last federal survey of mulitfamily housing was conducted in 2001 (http://www.census.gov/housing/rfs/).

The PUF doesn't contain the raw data, though. The survey team edited the data for inconsistencies among responses, and we averaged the highest and lowest figures for selected survey data and applied other measures so as to make it difficult to identify any companies that might be outliers. It's a balance between giving the purest data to researchers and making sure that no individual respondent could be identified through the responses. Title 13 requires that any data collected by the Census Bureau be kept confidential.

In that vein decennial census records are only made publicly available seventy-two years after they are collected. Earlier this year I helped my parents find their 1940 census records.

This has been about the most challenging project on which I have ever worked. There were times I felt like giving up, even quitting my secure federal job. In the process I was as frustrated with my own limitations, if not more so, as with various staff with whom I worked. I have never worked on a project where I made so many mistakes, and so openly.

Just earlier in the week, for example, the programmers asked if the subject matter area (my area) had identified all the variables we needed to remove from the file until we were able to satisfy the disclosure avoidance requirements of the Census Bureau's Disclosure Review Board (DRB). I wrote to the team -- or at least the part of the team involved with survey processing -- that, yes, indeed, I had identified all the variables...only to have my boss and me identify two more variables that need to be removed. I know it was frustrating to the programmers.

This was one of various mistakes I made in a process that frayed nerves and tested working relationships. The process was messy. To me, it seemed needlessly so because it's not like this is the first time the Census Bureau has edited data or produced a public use file.

Unfortunately the area of the bureau in which I work does have standard editing processes and procedures. We were somewhat inventing on the fly, which seems unfortunate.

My boss has more experience with reviewing and editing data than I do. For seven years I had worked at an organization where I was a semi-advanced to advanced user of public use files, also known as public use microdata sets (PUMS). I knew the files and their contents well. That hadn't automatically translated to being a good data editor and producer. On the one hand, I am supportive of people who switch jobs, who bring alternate experiences to federal service. On the other, I am seeing how it really does help to have sometimes decades of experience working within government.

I hope to write at least one blog entry a month, versus just one a year, which has been my track record. As an Aries, I tend to stop and start projects. Big hopes, false starts. But now that the dust has settled, now that we've given a public use file to our sponsor, for the time being, my workload has gone back to semi-normal levels and I hope to write more. For now I find I'm better at writing about the survey process than actually doing it, and hope to change that, too. I am tweaking my individual development plan.

Life gets messy, years of accumulated hopes and dreams and plans crashing against more evolved relationships and heightened responsibilities.