Validating the Quality of Crowdsourced Data for Flood Modeling of Hurricane Harvey in Houston, Texas
MetadataShow full metadata
Flood is one of the most widespread natural hazards in the world. Hurricane Harvey, a 1000-year flood event, hit Texas in 2017 and resulted in significant property damage, bodily injury, and casualty. As one of the most impacted areas, Houston is chosen to be the study area of this study. For future flood risk prediction and mitigation, it is important to find out the flood dynamics of Hurricane Harvey and generate flood inundation maps.
In these years, Volunteered Geographic Information (VGI) data such as social media and crowdsourced data arise as an alternative and supplementary data source to enhance the exercise of flood inundation mapping. However, compared to authoritative data acquired by government agencies (i.e. stream gage data, remote sensing), the quality of crowdsourced data often exists uncertainty due to lack of clear data standard and quality assurance/quality control (QA/QC) procedure. Therefore, the primary objective of this study was to examine the quality of crowdsourced data for flood mapping of Hurricane Harvey in the Houston area. As a free and innovative crowdsourced platform, the U-Flood project (map.u-flood.com), which reported and mapped flooded streets in the Houston metro area, is the target crowdsourced data to be examined in this study. The research questions of this study include (1) Are there any significant differences in the water depth among the H&H model (i.e. HEC-RAS), authorized reference (i.e. FEMA) and crowdsourced data (i.e. U-Flood data)? (2) Are there any significant differences in the inundated areas between the HEC-RAS modeled floodplain and U-Flood data observations? To answer these research questions, this study used HEC-RAS to simulate flood inundation maps in the Houston study area during Hurricane Harvey and validate the result maps by comparing with Harvey High Water Marks (HWM) points using Wilcoxon sign rank test, and comparing with FEMA modeled floodplain using paired-samples t-test. Next, the crowdsourced U-Flood dataset was validated by comparing with HEC-RAS modeled result and the authorized reference (i.e. FEMA modeled flood map for Hurricane Harvey and USGS stream gages) in terms of a) water depth (WD) using Friedman test and b) the percentage of U-Flood street’s count and length inside / outside of HEC-RAS modeled floodplain. In addition, the U-Flood dataset is compared with HEC-RAS and FEMA separately using the Wilcoxon Sign Rank test. The statistical results showed that there was a statistically significant difference among all comparison sets in terms of WD. In addition, the results showed that there was a statistically significant difference between the HEC-RAS modeled floodplain and U-Flood data in terms of U-Flood count and length inside/outside of HEC-RAS modeled floodplain. The results showed that a less consistent decreasing trend between U-Flood data and the modeled floodplain over time. Moreover, the U-Flood data distribution map with the WD difference level also visually displays spatial distribution.
Overall, this study provides a preliminary evaluation of data quality of VGI by comparing the WD among crowdsourced data, authoritative data, and HEC-RAS modeled output. Furthermore, the theoretical significance of this study as the first study in empirically comparing crowdsourced data with observed and modeled data in flood monitoring. Findings from this study also fill gaps in the literature of improving and assessing the uncertainty of crowdsourced data quality, and crowdsourcing data supplements in flood mapping research.