Ariell Zimran
[About Me] [Research] [CV] [Teaching] [Official Bio] [Official Page] [NBER Working Papers] [Google Scholar Profile]

Using Discrepancies to Correct for False Matches in Historical Data
with Yuya Sasaki

Research in economic history has increasingly come to rely on and benefit from the ability to link data sources to one another, with the most prominent example being the linkage of census records over time to create panel datasets. But the potential for error in this linkage poses a challenge to the use of such data. We propose an approach to correct the bias arising from false matches in linked data based on two principles. The first is that the rate of false matching can be inferred from the extent to which information not used in linking that should agree across a linked sample disagrees. The second is that knowledge of the rate of false matching enables the researcher to correct for the bias that false matches induce. Based on these principles, we derive estimators, as well as their properties, to enable researchers to correct both population quantities and regression coefficients for false matches, where the probability of a false match may be constant or related to variables of interest. We provide simulations to demonstrate the properties of these estimators. Finally, we provide an example of the application of the method in using linked US census data to study internal migration, and using discrepancies in the birthplaces of parents to infer the rate of false matching.