Papers Under Review

  • The Paradox of Openness: Exposure vs. Efficiency of APIs, with Seth Benzell, Guillermo Lagarda Cuervas, and Marshall Van Alstyne
    • View abstract
        APIs are the building blocks of digital platforms, yet there is little quantitative evidence on their use. Do API adopting firms do better? Do such firms change their operating procedures? Using proprietary data from a major API tools provider, we explore the impact of API use on firm value and operations. We find evidence that API use increases market capitalization and lowers R\&D expenditures. We then document an important downside. API adoption increases the risk of data breaches, a risk that rises when APIs are more open or place less emphasis on security. Firms reduce API data flows in the month before a hack announcement, consistent with a conscious attempt to limit breach scope. In the same period, however, the variance of API data flows increases, consistent with heterogeneity in firms’ ability to detect and shut down unauthorized data access. Our findings highlight a fundamental paradox of openness: It increases upside value and downside risk at the same time. We document that firms respond to these trade-offs in logical ways and conclude that the benefits of opening APIs exceed the risks for firms situated to adopt a platform strategy.
    • Status: Reject and Resubmit, Management Science
  • Digitally Distracted at the Wheel: Car Accidents and Smartphone Coverage, with Bree Lang and Matt Lang
    • View abstract
       We examine the relationship between the growth of 3G cellular coverage and traffic accidents in California between 2001 and 2013. Using machine learning techniques to predict historical 3G coverage, we find that car accident rates increase after 3G coverage is introduced, controlling for traffic volume and road fixed effects. The accidents most responsive to 3G coverage are non-severe crashes that take place in highly trafficked areas. Accidents caused by drivers over 65 do not change in response to 3G coverage. In contrast to previous research examining cell phone laws, we find a positive relationship between accidents and cell phone use.
    • Status: Under Review
  • Poverty from Space: Using High Resolution Satellite Imagery for Estimating Economic Well-being and Geographic Targeting, with Ryan Engstrom and David Newhouse
    • [slides] [Press: BrookingsBloomberg, Atlantic City Lab, Fast Company, Borgen Magazine]
    • View abstract
       Can features extracted from high spatial resolution satellite imagery accurately estimate poverty and economic well-being? We investigate this question by extracting both object and texture features from satellite images of Sri Lanka, which are used to estimate poverty rates and average log consumption for 1,291 administrative units (Grama Niladhari (GN) Divisions). Features extracted include the number and density of buildings, the prevalence of building shadows (a proxy for building height), the number of cars, density and length of roads, type of agriculture, roof material, and a suite of texture and spectral features. A linear regression model explains sixty percent of both poverty headcount rates and average log consumption. In comparison, models built using Night Time Lights explain only 15 percent. Estimates remain accurate throughout the consumption distribution. Two sample applications, extrapolating predictions into adjacent areas and estimating local area poverty using an artificially reduced census, confirm the out of sample predictive capabilities.
    • Status: Revisions Requested, World Bank Economic Review
  • Sweet diversity: Colonial goods and the rise of European living standards after 1492with Hans-Joachim Voth
    • View abstract
      When did overseas trade start to matter for living standards? Traditional real-wage indices suggest that living standards in Europe stagnated before 1800. In this paper, we argue that welfare rose substantially, but surreptitiously, because of an influx of new goods as a result of overseas trade. Colonial luxuries such as tea, coffee, and sugar transformed European diets after the discovery of America and the rounding of the Cape of Good Hope. These goods became household items in many countries by the end of the 18th century. We use three different methods to calculate welfare gains based on price data and the rate of adoption of these new colonial goods. Our results suggest that by 1800, the average Englishman would have been willing to forego 10% or more of his income in order to maintain access to sugar and tea alone. These findings are robust to a wide range of alternative assumptions, data series, and valuation methods. 
    • Status: Revisions Requested, Explorations in Economic History

 Published Papers and Peer-Reviewed Conference Proceedings

  • The Effect of Piracy Website Blocking on Consumer Behavior (with Brett Danaher, Michael D. Smith, and Rahul Telang) forthcoming, MIS Quarterly, 2019.
    • View abstract
       In this paper we ask whether antipiracy enforcement interventions that aim to make copyright-infringing content more difficult to access can decrease piracy and increase legitimate consumption.  We do this in the context of three court-ordered events affecting consumers in the UK: We first study Internet Service Providers’ blocking of 53 major piracy sites in 2014 and we then study two smaller waves of blocking – the blocking of 19 piracy sites in 2013 and the blocking of The Pirate Bay in 2012. We show that blocking 53 sites in 2014 caused treated users to decrease piracy and to increase their usage of legal subscription sites by 7-12%. Similarly, we found that the 2013 block of 19 different piracy sites caused users to increase visits to legal sites by 8%. However, blocking a single dominant site in 2012—The Pirate Bay—caused no increase in usage of legal sites, but it did cause users to increase visits to other unblocked piracy sites and VPN sites. This suggests that to increase legal IP use when faced with a dominant piracy channel, the optimal policy response must block multiple channels of access to pirated content, a distinction that the current literature has not made clear. 


  • Big Data in Economics (with Matthew Harding) IZA World of Labor2018.
    • [final draft]
    • View abstract
      Big Data refers to datasets of much larger size, higher frequency, and often more personalized information. Examples include data collected by smart sensors in homes or aggregation of tweets on Twitter. In small datasets, traditional econometric methods tend to outperform more complex techniques. In large datasets, however, machine learning methods shine. New analytic approaches are needed to make the most of Big Data in economics. Researchers and policy makers should thus pay close attention to recent developments in machine learning techniques if they want to fully take advantage of these new sources of Big Data.
  • Poverty Mapping Using Convolutional Neural Networks Trained on High and Medium Resolution Satellite Images, With an Application in Mexico (with Boris Babenko, David Newhouse, Anusha Ramakrishnan, and Tom Swartz) Proceedings of the Neural Information Processing Systems 2017.
    • [conference draft] [slides]
    • View abstract
       Mapping the spatial distribution of poverty in developing countries remains an important and costly challenge. These “poverty maps” are key inputs for poverty targeting, public goods provision, political accountability, and impact evaluation, that are all the more important given the geographic dispersion of the remaining bottom billion severely poor individuals. In this paper we train Convolutional Neural Networks (CNNs) to estimate poverty directly from high and medium resolution satellite images. We use both Planet and Digital Globe imagery with spatial resolutions of 3-5 m 2 and 50 cm 2 respectively, covering all 2 million km 2 of Mexico. Benchmark poverty estimates come from the 2014 MCS-ENIGH combined with the 2015 Intercensus and are used to estimate poverty rates for 2,456 Mexican municipalities. CNNs are trained using the 896 municipalities in the 2014 MCS-ENIGH. We experiment with several architectures (GoogleNet, VGG) and use GoogleNet as a final architecture where weights are fine-tuned from ImageNet. We find that 1) the best models, which incorporate satellite-estimated land use as a predictor, explain approximately 57% of the variation in poverty in a validation sample of 10 percent of MCS-ENIGH municipalities; 2) Across all MCS-ENIGH municipalities explanatory power reduces to 44% in a CNN prediction and landcover model; 3) Predicted poverty from the CNN predictions alone explains 47% of the variation in poverty in the validation sample, and 37% over all MCS-ENIGH municipalities; 4) In urban areas we see slight improvements from using Digital Globe versus Planet imagery, which explain 61% and 54% of poverty variation respectively. We conclude that CNNs can be trained end-to-end on satellite imagery to estimate poverty, although there is much work to be done to understand how the training process influences out of sample validation. 
  • Engstrom, R., Newhouse, D., Haldavanekar, V., Copenhaver, A., & Hersh, J. (2017, March). Evaluating the Relationship Between Spatial and Spectral Features Derived from High Spatial Resolution Satellite Data and Urban Poverty in Colombo, Sri Lanka. In Urban Remote Sensing Event (JURSE), 2017 Joint (pp. 1-4). IEEE.
  • Historical Health Conditions in Major U.S. Cities, (with Carlos Villarreal, Brian Bettenhausen, Eric Hanss)” Historical Methods Vol. 47 , Iss. 2,2014.]
    • [published version]  
    • View abstract
      The Historical Urban Ecological data set is a new resource detailing health and environmental conditions within seven major U.S. cities during the study period from 1830 to 1930. Researchers collected and digitized ward-level data from annual reports of municipal departments that detail the epidemiological, economic, and demographic conditions within each city. They then drafted new geographic information system data to link the tabular records to ward geographies. These data provide a new foundation to revisit questions surrounding the urban mortality transition and the growth of U.S. cities. 

Papers in Progress

  • The MegafilmesHD Shutdown in Brazil and itsEffect on Piracy and Media Consumption, with Brett Danaher and Mike D. Smith
  • Delay Forecasts in Infrastructure Projects and Managerial Decisions
  • Analyzing Conflict From Space: Identification of Physical Destruction During the Syrian Civil War, with Hannes Muller, Andre Groger, and Andrea Matranga
  • Combining Deep Learning With Surveys to Generate Better Poverty Maps, with an application to Mexico, with Boris Babenko, David Newhouse, Anusha Ramakrishnan, and Tom Swartz 
    • View abstract
       Mapping the spatial distribution of poverty in developing countries remains a costly but important endeavor. Recent advances in applying machine learning to satellite imagery to generate intermediate features or predict poverty directly, offer less costly solutions to estimating poverty than using on the ground surveys. However, current frontier deep learning methods cannot reliably estimate poverty without being trained using survey data close in proximity and characteristics to the areas where poverty rates are to be estimated, effectively rending these frontier methods ineffective at their goal of more cheaply estimating poverty rates. In this paper, we propose a two step “Deep-ELL” methodology which combines the CNN predictions and information from surveys to generate more accurate poverty maps than using survey information alone. We train Convolutional Neural Networks (CNNs) to estimate poverty directly from high and medium resolution satellite imagery, covering all 2 million square kilometers of Mexico. Benchmark poverty estimates come from the 2014 MCS-ENIGH combined with the 2015 Intercensus, and are used to estimated poverty rates for 2,456 Mexican municipalities. CNNs are trained using the 896 municipalities in the 2014 MCS-ENIGH. We document that these CNN predictions alone capture 34% to 54% of the variation in poverty rates in a validation sample of municipality, however they fare much poorly in non-MCS-ENIGH surveyed municipalities. Using our “Deep-ELL” two-stage methodology we find that estimated coefficient of variation declines by substantially when using surveys with CNN predictions, leading to significant cost savings. These findings have implications for survey design as well as present a tractable method for any poverty survey to take advantage of advances in Deep Learning to generate more accurate poverty maps.

Resting Working Papers

  • Unintended Consequences of the African Growth and Opportunity Act: The Role of Trade Diversion and Structural Change (with Klaus-Peter Hellwig)”
    • [draft available upon request] 
    • View abstract
        This paper investigates the effects of preferential trade programs such as the U.S. African Growth and Opportunity Act (AGOA) on the direction of African countries’ exports. While these programs intend to promote African exports, textbook models of trade suggest that such asymmetric tariff reductions could additionally divert African exports from other destinations to the tariff reducing economy. We examine the import patterns of 177 countries and estimate the diversion effect using a triple-difference estimation strategy, which exploits time variation in the product and country coverage of AGOA. We find no evidence of systematic trade diversion within Africa, whereas diversion from other industrialized destinations to the US was significant, in particular for apparel products. At the same time we show that, more than diverting trade, AGOA had positive spillovers on the product composition of trade, which suggests that the product coverage of preferential trade agreements can influence structural change in Africa. 
  • Building a better model: Variable selection to predict poverty in Pakistan and Sri Lanka (with Marium Afzal and David Newhouse)”
    • [latest draft]  
    • View abstract
       Numerous studies have developed models to predict poverty, but surprisingly few have rigorously examined different approaches to developing prediction models. This paper applies out of sample validation techniques to household data from Pakistan and Sri Lanka, to compare the accuracy of regional poverty predictions from models derived using manual selection, stepwise regression, and Lasso-based procedures. It also examines how much incorporating publically available satellite data into the model improves its accuracy. The five main findings are that: 1) Lasso tends to outperform both discretionary and stepwise models in Pakistan, where the set of potential predictors is large. 2) Lasso and stepwise models give comparable results in Sri Lanka, where the set of predictors is smaller. 3) The accuracy of the prediction model depends considerably on the poverty threshold  4) Including publically available satellite data makes poverty predictions more accurate in Sri Lanka, where predictors are scarce, but slightly less accurate in Pakistan and 5) Including the satellite data increases the benefit of using Lasso in Sri Lanka. We conclude that among the three model selection methods considered, lasso-based models are preferred for generating poverty predictions, especially when the pool of candidate variables is large. Furthermore, when the pool of candidate variables available from household surveys is smaller, incorporating publicly available satellite data can considerably improve the accuracy of regional poverty predictions.