The numbering of equations in the originally published online version of the article contained some errors. These have since been corrected both in the online version and in the printed version that appears in the same issue as this erratum.
SummaryIn a randomized control trial, the precision of an average treatment effect estimator and the power of the corresponding t-test can be improved either by collecting data on additional individuals, or by collecting additional covariates that predict the outcome variable. To design the experiment, a researcher needs to solve this trade-off subject to her budget constraint. We show that this optimization problem is equivalent to optimally predicting outcomes by the covariates, which in turn can be solved using existing machine learning techniques using pre-experimental data such as other similar studies, a census, or a household survey. In two empirical applications, we show that our procedure can lead to reductions of up to 58% in the costs of data collection, or improvements of the same magnitude in the precision of the treatment effect estimator.
SummaryThis paper proposes nonparametric kernel-smoothing estimation for panel data to examine the degree of heterogeneity across cross-sectional units. We first estimate the sample mean, autocovariances, and autocorrelations for each unit and then apply kernel smoothing to compute their density functions. The dependence of the kernel estimator on bandwidth makes asymptotic bias of very high order affect the required condition on the relative magnitudes of the cross-sectional sample size ($N$) and the time-series length ($T$). In particular, it makes the condition on $N$ and $T$ stronger and more complicated than those typically observed in the long-panel literature without kernel smoothing. We also consider a split-panel jackknife method to correct bias and construction of confidence intervals. An empirical application illustrates our procedure.
SummaryThis paper develops new tests against a structural break in panel data models with common factors when T is fixed, where T denotes the number of observations over time. For this class of models, the available tests against a structural break are valid only under the assumption that T is ‘large’. However, this may be a stringent requirement—more commonly so in datasets with annual time frequency, in which case the sample may cover a relatively long period even if T is not large. The proposed approach builds upon existing generalized method of moments methodology and develops Distance-type and Lagrange Multiplier-type tests for detecting a structural break, both when the break point is known and when it is unknown. The proposed methodology permits weak exogeneity and/or endogeneity of the regressors. In a simulation study, the method performed well, in terms of size and power, as well as in terms of successfully locating the time of the structural break. The method is illustrated by testing the so-called ‘Gibrat’s Law’, using a dataset from 4,128 financial institutions, each one observed for the period 2002–2014.
Information technology outsourcing and firm productivity: eliminating bias from selective missingness in the dependent variable
SummaryMissing values are a major problem in all econometric applications based on survey data. A standard approach assumes data are missing at random and uses imputation methods or even listwise deletion. This approach is justified if item nonresponse does not depend on the potentially missing variables’ realization. However, assuming missingness at random may introduce bias if nonresponse is, in fact, selective. Relevant applications range from financial or strategic firm-level data to individual-level data on income or privacy-sensitive behaviors. In this paper, we propose a novel approach to deal with selective item nonresponse in the model’s dependent variable. Our approach is based on instrumental variables that affect selection only through a partially observed outcome variable. In addition, we allow for endogenous regressors. We establish identification of the structural parameter and propose a simple two-step estimation procedure for it. Our estimator is consistent and robust against biases that would prevail when assuming missingness at random. We implement the estimation procedure using firm-level survey data and a binary instrumental variable to estimate the effect of outsourcing on productivity.
SummaryThis paper proposes a fully Bayesian semi-parametric method for efficiency and productivity analysis based on Gaussian processes. The proposed technique frees the researcher from having to specify a functional form for the production frontier, and it is shown in simulated data to perform as well as flexible parametric models when correct distributional assumptions are imposed on the inefficiency component of the error term, and slightly better when incorrect assumptions are made. The technique is applied to a panel dataset of US electric utilities, where total-factor productivity growth is estimated and decomposed with both parametric and semi-parametric techniques.
SummaryThis paper studies inference on finite-population average and local average treatment effects under limited overlap, meaning that some strata have a small proportion of treated or untreated units. We model limited overlap in an asymptotic framework, sending the propensity score to zero (or one) with the sample size. We derive the asymptotic distribution of analogue estimators of the treatment effects under two common randomization schemes: conditionally independent and stratified block randomization. Under either scheme, the limit distribution is the same and conventional standard error formulas remain asymptotically valid, but the rate of convergence is slower the faster the propensity score degenerates. The practical import of these results is two-fold. When overlap is limited, standard methods can perform poorly in smaller samples, as asymptotic approximations are inadequate owing to the slower rate of convergence. However, in larger samples, standard methods can work quite well even when the propensity score is small.
SummaryThis paper investigates the quasi-maximum likelihood estimation of short dynamic panel data models. We consider their estimation on both fixed effects and random effects specifications and propose a Hausman test when exogenous variables are present. For a dynamic panel model, initial conditions play important roles in model structure and estimation, and they give rise to a between equation under the random effects framework. With the between equation properly defined, we show that the random effects model can be decomposed into a within equation and a between equation; hence, the random effects estimate is a pooling of the within and between estimates. Thus, our paper extends the pooling in the static panel data model (Maddala, 1971a) to the setting of dynamic panel data. This decomposition of a dynamic panel data model is revealing and valuable for estimation and the formulation of a Hausman test to test the possible correlation of individual effects with included regressors. Monte Carlo experiments are conducted to investigate the finite sample performance of estimators and the Hausman test. An empirical application of growth convergence in OECD countries is provided.
SummaryThis paper combines a Roy model of migration and counterfactual wages with racial differences in migration rates during the Great Migration to recover lower bounds on black–white differences in the wage impacts of northward migration. Identification is predicated on the idea that, when migration is more selective for whites, regional wage differentials for whites will be more contaminated with selection bias. In this case, the black–white difference in North–South wage differentials bounds the racial difference in wage impacts from below. Furthermore, as long as the impact of migration on whites’ wages is nonnegative, a lower bound on the black–white difference in wage impacts is also a lower bound on the impact itself for blacks. Applying the identification result, I find that northward migration increased blacks’ wages by at least 36% more than whites’, and hence by at least 36%, on average between 1940 and 1970.