Theo phong trào MOOC trên công ty cũng tham gia 1 số course ... :D. Riết rảnh thấy cũng có cái để làm cho vui.
Reproducible Research: Peer Assessment 2 [Data Science course Johns Hopkins University] (submit by February 14, 11:59 PM PT)
Synonpsis
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
This report aim to analyze the impact of different weather events on public health and economy based on the storm database collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1950 to November 2011. From these data, we found that Tornado is most harmful with respect to population health, while Flood has the greatest economic consequences.
Data processing
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
- Storm Data [47Mb]
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
- National Weather Service Storm Data Documentation
- National Climatic Data Center Storm Events FAQ
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
Load library
Downloading data
dataFile <- span=""> 'repdata_data_StormData.csv.bz2'
if(!file.exists(dataFile)) {
dataUrl <- span=""> 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
download.file(url = dataUrl, destfile = dataFile)
}->->
If the data already exists in the working environment, we do not need to reload it again. Otherwise, we read the csv file.
if(!exists('stormData')) {
stormData <- span=""> read.csv(dataFile, sep = ',')
}
dimInfo <- span=""> dim(stormData)
dimInfo->->
## [1] 902297 37
head(stormData, n = 3)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
There are 902297 rows with 37 columns. There were fewer events recorded in the earlier years. The names of the columns are:
names(stormData)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
Add year column based on BGN_DATE using “%m/%d/%Y” format
# If this is original data, add new column 'YEAR'
stormData$YEAR <- span=""> as.numeric(format(as.Date(stormData$BGN_DATE, format = '%m/%d/%Y %H:%M:%S'), '%Y'))
histogram(stormData$YEAR, breaks = 50, main = 'Data in U.S. from 1950 to 2011', xlab = 'Year')->
Based on the histogram, we see that the number of events tracked starts to significantly increase from 1995. Therefore, we use the subset of the data from 1995 to 2011 for studying only significant information.
We only need these columns for analysis: BGN_DATE date, EVTYPE type of weather event, FATALITIES number of fatalities, INJURIES number of injuries, PROPDMG amount of property damage, PROPDMGEXP exponential of property damage, CROPDMG amount of crop damage, CROPDMGEXP exponential of crop damage.
columns <- span=""> c('BGN_DATE', 'EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP', 'YEAR')
stormData <- span=""> stormData[stormData$YEAR >= 1995, columns]
dimInfo <- span=""> dim(stormData)
dimInfo->->->
## [1] 681500 9
head(stormData, n = 3)
## BGN_DATE EVTYPE FATALITIES INJURIES PROPDMG
## 187560 1/6/1995 0:00:00 FREEZING RAIN 0 0 0
## 187561 1/22/1995 0:00:00 SNOW 0 0 0
## 187563 2/6/1995 0:00:00 SNOW/ICE 0 0 0
## PROPDMGEXP CROPDMG CROPDMGEXP YEAR
## 187560 0 1995
## 187561 0 1995
## 187563 0 1995
Now, there are 681500 rows with 9 columns.
Processing data
We sanitize the event types in the data.
stormData$EVTYPE <- span=""> toupper(trimws(stormData$EVTYPE))
evtype.data <- span=""> unique(as.vector(stormData$EVTYPE))
head(evtype.data)->->
## [1] "FREEZING RAIN" "SNOW"
## [3] "SNOW/ICE" "HURRICANE OPAL/HIGH WINDS"
## [5] "HAIL" "THUNDERSTORM WINDS"
length(evtype.data)
## [1] 708
The EVTYPE contains 708 unique source events. According to the NOAA criteria in the National Weather Service Storm Data Documentation, we can group them into 48 official weather events.
evtype.dst <- span=""> c('Astronomical Low Tide', 'Avalanche', 'Blizzard',
'Coastal Flood', 'Cold/Wind Chill', 'Debris Flow', 'Dense Fog',
'Dense Smoke', 'Drought', 'Dust Devil', 'Dust Storm',
'Excessive Heat', 'Extreme Cold/Wind Chill', 'Flash Flood',
'Flood', 'Frost/Freeze', 'Funnel Cloud', 'Freezing Fog',
'Hail', 'Heat', 'Heavy Rain', 'Heavy Snow', 'High Surf',
'High Wind', 'Hurricane (Typhoon)',
'Ice Storm', 'Lake-Effect Snow', 'Lakeshore Flood',
'Lightning', 'Marine Hail', 'Marine High Wind',
'Marine Strong Wind', 'Marine Thunderstorm Wind',
'Rip Current', 'Seiche', 'Sleet', 'Storm Surge/Tide',
'Strong Wind', 'Thunderstorm Wind', 'Tornado',
'Tropical Depression', 'Tropical Storm', 'Tsunami',
'Volcanic Ash', 'Waterspout', 'Wildfire', 'Winter Storm',
'Winter Weather')
evtype.categorized <- span=""> intersect(evtype.data, toupper(evtype.dst))
head(evtype.categorized)->->
## [1] "HAIL" "DENSE FOG" "RIP CURRENT" "TORNADO" "LIGHTNING"
## [6] "FLASH FLOOD"
# For mapping event type
names(evtype.dst) <- span=""> toupper(evtype.dst)->
After the first time try to categorize, we have 46 official weather events appeared in the data. Non-matched events are categorized as “Uncategorized” (NA).
stormData$DMGSRC = mapply(function(x) {
evtype.dst[x]
}, stormData$EVTYPE)
We check for uncategorized records. Try to find out event types of 10% top uncategorized events.
uncategorized.events <- span=""> stormData[is.na(stormData$DMGSRC), c('EVTYPE', 'DMGSRC')]
x <- span=""> as.data.frame(table(uncategorized.events$EVTYPE))
sum(x$Freq)->->
## [1] 162252
uncategorized.freqs <- span=""> quantile(table(x$Freq), probs = .90, na.rm = TRUE)
adjusts <- span=""> as.data.frame(which(table(uncategorized.events$EVTYPE) > uncategorized.freqs))
names(adjusts)[1] <- span=""> 'Freq'
tail(adjusts, n = 30)->->->
## Freq
## TIDAL FLOODING 548
## TORNADO F0 550
## TSTM WIND 562
## TSTM WIND (G40) 566
## TSTM WIND (G45) 567
## TSTM WIND/HAIL 579
## TYPHOON 583
## UNSEASONABLY COLD 585
## UNSEASONABLY COOL 586
## UNSEASONABLY DRY 588
## UNSEASONABLY HOT 589
## UNSEASONABLY WARM 590
## UNSEASONABLY WARM AND DRY 592
## UNSEASONABLY WET 595
## UNUSUAL WARMTH 598
## UNUSUALLY COLD 600
## URBAN FLOOD 606
## URBAN FLOODING 608
## URBAN/SMALL STREAM 609
## URBAN/SMALL STREAM FLOOD 611
## URBAN/SML STREAM FLD 614
## WATERSPOUTS 633
## WILD/FOREST FIRE 642
## WIND 645
## WIND ADVISORY 646
## WIND CHILL 648
## WIND DAMAGE 650
## WINDS 653
## WINTER WEATHER/MIX 659
## WINTRY MIX 661
sum(adjusts$Freq)
## [1] 41668
Manually fixing them based on the official categories.
stormData[is.na(stormData$DMGSRC) &
grepl('(EXTREME|EXCESSIVE|RECORD|BITTER)(.+)(COLD|WINDCHILL)',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Extreme Cold/Wind Chill'
stormData[is.na(stormData$DMGSRC) &
grepl('(PROLONG|UNSEASONABLE|UNSEASONABLY|EXTENDED) COLD',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Cold/Wind Chill'
stormData[is.na(stormData$DMGSRC) &
grepl('^COLD|^UN(.+)COLD',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Cold/Wind Chill'
stormData[is.na(stormData$DMGSRC) &
grepl('LOW(.+)TEMP',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Cold/Wind Chill'
stormData[is.na(stormData$DMGSRC) &
grepl('(EXTREME|EXCESSIVE|RECORD) HEAT',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Excessive Heat'
stormData[is.na(stormData$DMGSRC) &
grepl('RECORD TEMP|TEMPERATURE RECORD',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Excessive Heat'
stormData[is.na(stormData$DMGSRC) &
grepl('MARINE (THUNDERSTORM|TSTM)(.+)WIND',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Marine Thunderstorm Wind'
stormData[is.na(stormData$DMGSRC) &
grepl('^[^MARINE](.+)THUNDER(.+)WIND|^THUNDER(.+)WIND',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Thunderstorm Wind'
stormData[is.na(stormData$DMGSRC) &
grepl('(THUNDER|TSTM)(.+)(WIND|HAIL)',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Thunderstorm Wind'
stormData[is.na(stormData$DMGSRC) &
grepl('WINT(.+)(MIX|WEATHER)|LIGHT(.+)SNOW|ICY ROAD',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Winter Weather'
stormData[is.na(stormData$DMGSRC) &
grepl('DUST(.+)STORM',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Dust Storm'
stormData[is.na(stormData$DMGSRC) &
grepl('RECORD LOW RAIN|DRY',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Drought'
stormData[is.na(stormData$DMGSRC) &
grepl('(WILD|FOREST)(.+)FIRE',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Wildfire'
stormData[is.na(stormData$DMGSRC) &
grepl('(COASTAL|CSTL|BEACH|TIDAL)(.+)FLOOD',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Coastal Flood'
stormData[is.na(stormData$DMGSRC) &
grepl('COASTAL',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Coastal Flood'
stormData[is.na(stormData$DMGSRC) &
grepl('LAKE(.+)FLOOD',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Lakeshore Flood'
stormData[is.na(stormData$DMGSRC) &
grepl('(URBAN|MINOR|FLASH|LOCAL|SMALL)(.+)FLOOD',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Flash Flood'
stormData[is.na(stormData$DMGSRC) &
grepl('URBAN(.+)FLD',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Flash Flood'
stormData[is.na(stormData$DMGSRC) &
grepl('^FLOODI|^ST(.+)FL',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Flash Flood'
stormData[is.na(stormData$DMGSRC) &
grepl('FLOOD(.+)FLASH',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Flash Flood'
stormData[is.na(stormData$DMGSRC) &
grepl('^FLOOD|(RIVER|BREAKUP|MAJOR|RURAL|HIGHWAY)(.+)FLOOD',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Flood'
stormData[is.na(stormData$DMGSRC) &
grepl('^WIND|(STRONG|GUST|DOWN|GRAD|STORM FORCE)(.+)WIND',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Strong Wind'
stormData[is.na(stormData$DMGSRC) &
grepl('[^MARINE ]+HIGH(.+)WIND|^HIGH(.+)WIND',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'High Wind'
stormData[is.na(stormData$DMGSRC) &
grepl('WINTER(.+)STORM',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Winter Storm'
stormData[is.na(stormData$DMGSRC) &
grepl('TROPICAL STORM',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Tropical Storm'
stormData[is.na(stormData$DMGSRC) &
grepl('HEAVY SURF|HIGH SURF',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'High Surf'
stormData[is.na(stormData$DMGSRC) &
grepl('SURGE|SURF|(HIGH|BLOW-OUT)(.+)TIDE',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Storm Surge/Tide'
stormData[is.na(stormData$DMGSRC) &
grepl('RIP CURRENT',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Rip Current'
stormData[is.na(stormData$DMGSRC) &
grepl('LANDSLIDE|SLIDE',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Debris Flow'
stormData[is.na(stormData$DMGSRC) &
grepl('SMOKE',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Dense Smoke'
stormData[is.na(stormData$DMGSRC) &
grepl('ICE(.+)(STORM|PELLET)|FREEZ(.+)RAIN',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Ice Storm'
stormData[is.na(stormData$DMGSRC) &
grepl('FREEZE|FREEZING|FROST',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Frost/Freeze'
stormData[is.na(stormData$DMGSRC) &
grepl('^FOG|^CLOUD',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Dense Fog'
stormData[is.na(stormData$DMGSRC) &
grepl('LAKE(.+)SNOW',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Lake-Effect Snow'
stormData[is.na(stormData$DMGSRC) &
grepl('SNOW|(HEAVY|HVY)(.+)SNOW',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Heavy Snow'
stormData[is.na(stormData$DMGSRC) &
grepl('FREEZE|FREEZING|FROST|GLAZE|ICE|ICY|SLEET|BLIZZ|BLOWING+(.*)SNOW',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Blizzard'
stormData[is.na(stormData$DMGSRC) &
grepl('HEAVY RAIN|RAIN|SHOWER|PRECIP',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Heavy Rain'
stormData[is.na(stormData$DMGSRC) &
grepl('HURRICANE|TYPHOON',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Hurricane (Typhoon)'
stormData[is.na(stormData$DMGSRC) &
grepl('HAIL',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Hail'
stormData[is.na(stormData$DMGSRC) &
grepl('^HEAT|WARM|HOT',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Heat'
stormData[is.na(stormData$DMGSRC) &
grepl('WATERSPOUT',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Waterspout'
stormData[is.na(stormData$DMGSRC) &
grepl('FUNNEL',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Funnel Cloud'
stormData[is.na(stormData$DMGSRC) &
grepl('TORNADO|SPOUT',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Tornado'
stormData[is.na(stormData$DMGSRC) &
grepl('THUNDERSTORM',
stormData$EVTYPE), 'DMGSRC'] <- span=""> 'Lightning'->->->->->->->->->->->->->->->->->->->->->->->->->->->->->->->->->->->->->->->->->->->->
We check uncategorized records again to ensure that “Uncategorized” segment does not include any significant event types in terms of injuries or fatalities.
uncategorized.events2 <- span=""> stormData[is.na(stormData$DMGSRC),]
x <- span=""> as.data.frame(table(uncategorized.events2$EVTYPE))
sum(x$Freq)->->
## [1] 358
uncategorized.summary <- span=""> summarise(group_by(uncategorized.events2, EVTYPE), FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES))
summary(uncategorized.summary$FATALITIES); ->
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.2422 0.0000 8.0000
summary(uncategorized.summary$INJURIES)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.2174 0.0000 7.0000
# Remove uncategorized records (DMGSRC == NA) & cast as factor
stormData <- span=""> stormData[complete.cases(stormData[, 'DMGSRC']), ]
stormData$DMGSRC <- span=""> as.factor(stormData$DMGSRC)
head(stormData)->->
## BGN_DATE EVTYPE FATALITIES INJURIES
## 187560 1/6/1995 0:00:00 FREEZING RAIN 0 0
## 187561 1/22/1995 0:00:00 SNOW 0 0
## 187563 2/6/1995 0:00:00 SNOW/ICE 0 0
## 187565 2/11/1995 0:00:00 SNOW/ICE 0 0
## 187566 10/4/1995 0:00:00 HURRICANE OPAL/HIGH WINDS 2 0
## 187575 3/20/1995 0:00:00 HAIL 0 0
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP YEAR DMGSRC
## 187560 0.0 0 1995 Ice Storm
## 187561 0.0 0 1995 Heavy Snow
## 187563 0.0 0 1995 Heavy Snow
## 187565 0.0 0 1995 Heavy Snow
## 187566 0.1 B 10 M 1995 High Wind
## 187575 0.0 0 1995 Hail
Convert the property damage and crop damage data into comparable numerical forms according to the meaning of units described in the code book: Hundred (H), Thousand (K), Million (M) and Billion (B)
expToPow <- span=""> function(x) {
if(is.numeric(x)) {
x <- span=""> x
} else if(grepl('h', x, ignore.case = TRUE)) {
x <- span=""> 2
} else if(grepl('k', x, ignore.case = TRUE)) {
x <- span=""> 3
} else if(grepl('m', x, ignore.case = TRUE)) {
x <- span=""> 6
} else if(grepl('b', x, ignore.case = TRUE)) {
x <- span=""> 9
} else if(x == "" || x == " ") {
x <- span=""> 0
} else {
x <- span=""> NA
}
x
}
convertUnit <- span=""> function(num, exp) {
pow <- span=""> expToPow(exp)
if(is.numeric(num)) {
num <- span=""> num * (10 ^ pow)
}
if(!is.numeric(num)) {
num <- span=""> 0
}
num
}
stormData$ABSPROPDMG <- span="">
mapply(convertUnit, stormData$PROPDMG, stormData$PROPDMGEXP)
stormData$ABSCROPDMG <- span="">
mapply(convertUnit, stormData$CROPDMG, stormData$CROPDMGEXP)
stormData$TOTALDMG <- span=""> stormData$ABSPROPDMG + stormData$ABSCROPDMG
head(stormData[stormData$TOTALDMG > 0,])->->->->->->->->->->->->->->->
## BGN_DATE EVTYPE FATALITIES INJURIES
## 187566 10/4/1995 0:00:00 HURRICANE OPAL/HIGH WINDS 2 0
## 187581 8/3/1995 0:00:00 HURRICANE ERIN 0 0
## 187582 11/11/1995 0:00:00 THUNDERSTORM WINDS 0 0
## 187583 10/3/1995 0:00:00 HURRICANE OPAL 0 0
## 187584 10/4/1995 0:00:00 HURRICANE OPAL 0 0
## 187587 3/7/1995 0:00:00 THUNDERSTORM WINDS 0 0
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP YEAR DMGSRC
## 187566 0.1 B 10 M 1995 High Wind
## 187581 25.0 M 1 M 1995 Hurricane (Typhoon)
## 187582 50.0 K 0 1995 Thunderstorm Wind
## 187583 48.0 M 4 M 1995 Hurricane (Typhoon)
## 187584 20.0 m 10 m 1995 Hurricane (Typhoon)
## 187587 2.0 K 0 1995 Thunderstorm Wind
## ABSPROPDMG ABSCROPDMG TOTALDMG
## 187566 1.0e+08 1e+07 1.1e+08
## 187581 2.5e+07 1e+06 2.6e+07
## 187582 5.0e+04 0e+00 5.0e+04
## 187583 4.8e+07 4e+06 5.2e+07
## 187584 2.0e+07 1e+07 3.0e+07
## 187587 2.0e+03 0e+00 2.0e+03
Results
Most harmful events to population health
In this section we want to find the top 10 of weather events that caused the most fatalities and injuries.
fatalities <- span=""> aggregate(FATALITIES~DMGSRC, data = stormData, 'sum')
fatalities <- span=""> fatalities[order(-fatalities$FATALITIES),][1:10,]
fatalities->->
## DMGSRC FATALITIES
## 12 Excessive Heat 1998
## 40 Tornado 1545
## 20 Heat 1100
## 14 Flash Flood 981
## 29 Lightning 730
## 34 Rip Current 564
## 15 Flood 427
## 39 Thunderstorm Wind 419
## 13 Extreme Cold/Wind Chill 270
## 24 High Wind 254
ggplot(fatalities, aes(x = DMGSRC, y = FATALITIES, fill = DMGSRC)) + geom_bar(stat = 'identity') + labs(x = 'Event Type', y = 'Total number of fatalities', title = 'Fatalities from severe weather by events') + theme(legend.title = element_blank(), plot.title = element_text(size = 18), legend.position = 'none', axis.text.x = element_text(angle = 60, vjust = 1, hjust = 1))
The number of fatalities is also the highest due to Excessive Heat, follwed by Tornado, Flash Flood and Heat.
injuries <- span=""> aggregate(INJURIES~DMGSRC, data = stormData, 'sum')
injuries <- span=""> injuries[order(-injuries$INJURIES),][1:10,]
injuries->->
## DMGSRC INJURIES
## 40 Tornado 21783
## 15 Flood 6771
## 12 Excessive Heat 6680
## 39 Thunderstorm Wind 5614
## 29 Lightning 4671
## 20 Heat 2442
## 14 Flash Flood 1819
## 46 Wildfire 1456
## 25 Hurricane (Typhoon) 1332
## 47 Winter Storm 1313
ggplot(injuries, aes(x = DMGSRC, y = INJURIES, fill = DMGSRC)) + geom_bar(stat = 'identity') + labs(x = 'Event Type', y = 'Total number of injuries', title = 'Injuries from severe weather by events') + theme(legend.title = element_blank(), plot.title = element_text(size = 18), legend.position = 'none', axis.text.x = element_text(angle = 60, vjust = 1, hjust = 1))
Tornado causes the most number of injuries, followed by Flood, Excessive Heat, and Thunderstorm Wind.
In conclusion, Tornado is the most harmful event with respect to population health in terms of the number of fatalities or injuries.
Events with the greatest economic consequences
In this section we want to find the top 10 of weather events that caused the most fatalities and injuries.
damages <- span=""> aggregate(TOTALDMG~DMGSRC, data = stormData, 'sum')
damages <- span=""> damages[order(-damages$TOTALDMG),][1:10,]
damages->->
## DMGSRC TOTALDMG
## 15 Flood 149774811657
## 25 Hurricane (Typhoon) 90656027810
## 37 Storm Surge/Tide 47845014000
## 40 Tornado 25226930470
## 19 Hail 17897498922
## 14 Flash Flood 17055569917
## 9 Drought 14969919600
## 39 Thunderstorm Wind 9775351746
## 42 Tropical Storm 8352221550
## 46 Wildfire 8163264130
ggplot(damages, aes(x = DMGSRC, y = TOTALDMG/1000000, fill = DMGSRC)) + geom_bar(stat = 'identity') + labs(x = 'Event Type', y = 'Total damage (in millions)', title = 'Top 10 weather events with the greatest economic consequence') + theme(legend.title = element_blank(), plot.title = element_text(size = 18), legend.position = 'none', axis.text.x = element_text(angle = 60, vjust = 1, hjust = 1))
Across the United States, Flood, Hurricane/Typhoon, and Storm Surge/Tide are the 3 events with the greatest economic consequence (starting in the year 1950 and ending in November 2011).
No comments:
Post a Comment