[译]R与可重复金融:ETF数据采集与可视化（二）

在这个漫长假日季（从感恩节到新年包含了圣诞节、光明节、宽扎节）意味着我们有一件事情可以做：给各个国家的ETF数据做一个Leaflet地图！

前文，我们尝试了如何导出股票数据然后计算投资组合的夏普率。今天我们将跳过计算环节，关注在数据采集和可视化上。尤其是我们的最终产品将提供一个各国ETF价格的图表，并支持点击显示而不是使用ETF 标的代码。当然，这部分想要这么做也是因为我根本记不住每个国家的ETF标的代码，希望这个工具可以帮到忙。

我们的应用在展示历史价格时将非常简单，但是它可以作为下一步复杂工作的基石，随着我们在下文的讨论才会彻底完成。开始之前，要指出这里的 Notebook 将和之前的Notebook 发挥不同的作用。如前所述，我们将在制作 Shiny 应用之前使用 Notebok 来测试数据的导入、清洗以及可视化。但是，我们将 Notebook 中的对象保存在 .Rdat 文件中，以便后续的应用中复用。这种方式下，这个 Notebook 会以更底层的方式和之前的 Notebook 关联起来。在下面一篇叫做 "金融星期五 = 同乐日" 的文章中，我们将彻底讲解如何构建这个应用，但是今天不在这里继续了。

首先，我们将获取 ETF 行情数据，并且清晰成干净的数据帧格式。注意数据帧将不会自己主动处理历史价格数据，而是简单地识别成标的标示、国家名称、YTD百分比。接着，我们传递这些标的名称到getSymbols()函数中来下载各个国家ETF的历史价格。需要注意的是：这个例子中包含了42个国家ETF的数据，下载42个 xts 对象会吃掉大量内存和时间，所以我建议你在服务器上来执行这些代码。

正如我们所见，如果用户只是在Shiny 应用中选中一个国家，给getSymbols()函数传所有标的是没必要的。但是一旦这个环节耗时过多，我建议你还是事先缓存这些数据比较好。一旦你搞定了数据，接下来我们就进行第三步：
使用 shapefile 构建世界地图的国家边界，这部通常很吃内存，但是 leaflet 可以很聪明地搞定它。

第四步，而是最重要的一步，我们将添加年初至今ETF标的的数据到shapefile 中，然后通过点击地图来实现数据交互呈现。在这一步，我们将利用到第一步中创建的数据帧，我们使用同样的国家名称：这样的未雨绸缪使得我们可以使用merge()函数轻松地将数据聚合。一旦，shapefile搞定了，我们就把它保存到 .RDat 文件中，之后加载到 Shiny 应用中去。

# Let's build a dataframe to store these ticker symbols, country names and YTD numbers.
library(dplyr)

ticker <-  c("EWJ",  "EWZ",  "INDA", "FXI",  "EWG",  "EWC",  "EWY",  "EWT",  "EWU",  "EWH",  "EWA",
             "EWW",  "EWL",  "EWP", "EWS",  "EWI",  "EIDO", "ERUS", "ECH",  "EZA",  "THD",  "TUR",
             "EWD",  "EWQ",  "EWM",  "EPU",  "EWN",  "EPOL", "EPHE", "ENZL", "EIRL", "EWK",  "EIS",
             "EWO", "EDEN", "QAT", "UAE", "EFNL", "ENOR", "ICOL", "HEWY", "KSA")

name <-   c("Japan", "Brazil" ,"India", "China", "Germany" , "Canada", "Korea", "Taiwan", 
              "United Kingdom", "Hong Kong", "Australia", "Mexico", "Switzerland", "Spain", 
              "Singapore", "Italy", "Indonesia", "Russia", "Chile", "South Africa", "Thailand",  
              "Turkey", "Sweden", "France", "Malaysia", "Peru", "Netherlands", "Poland",
              "Philippines", "New Zealand", "Ireland", "Belgium", "Israel", "Austria","Denmark",
              "Qatar", "United Arab Emirates", "Finland", "Norway", "Colombia", "South Korea", 
              "Saudi Arabia")

ytd <- c(0.0358, 0.6314, -0.0140,  0.0721, -0.0289, 0.2198,  0.0729,  0.2029, -0.0467,  0.0897,
         0.0944, -0.1045, -0.0623, -0.0916,  0.0305, -0.1857,  0.1309,  0.3828,  0.1987,  0.1219,
         0.2458, -0.1053, -0.0052, -0.0199, -0.0410,  0.6015,  0.0017, -0.0481, -0.0408,  0.1394,
         -0.1183, -0.0428, -0.0432,  0.0462, -0.1078, -0.0244, 0.0570,  0.6397, -0.0146,  0.1424,
         0.1313,  0.0751) * 100

etf_ticker_country <- data_frame(ticker, name, ytd)

etf_ticker_country

# getSymbols is part of the 'quantmod' package.

library(quantmod)

# Using getSymbols to import the ETF price histories will take a minute or two or 
# five - 42 time series is a lot of data. 

invisible(getSymbols(etf_ticker_country$ticker, auto.assign = TRUE, warnings = FALSE))

# Let select just the closing prices of the ETFs and merge them into a list.
# We'll use lapply for that purpose. Again, this is for testing purposes. It's not 
# going into production in our app.

etf_prices <- do.call(merge, lapply(etf_ticker_country$ticker, function(x) Cl(get(x))))

#Change the column names to the country names from our dataframe above.

colnames(etf_prices) <- etf_ticker_country$name

# Take a peek at the last 5 rows of each of the time series, 
# just to make sure it looks complete.

tail(etf_prices, n = 5)

好的，看来现在我们已经搞定了各国ETF收盘价的导入。我们需要一个包含世界各国空间多边形的shapefile，这个可以从naturalearthdata.com获取，它包含了一组经纬度坐标。我们用rgdal::readORG()函数加载shapefile 到全局环境中，这样包含经济数据的shapefile就准备好了。

library(rgdal)
library(httr)

tmp <- tempfile()

invisible(GET("http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/50m/cultural/ne_50m_admin_0_countries.zip", write_disk(tmp)))

unzip(tmp, exdir = 'ne_50m_admin_0_countries')

world <-readOGR("./ne_50m_admin_0_countries", 'ne_50m_admin_0_countries', verbose = FALSE)

滚动数据帧到右边可以看到经济数据，比如GDP估计和经济发展阶段。干净的shapefile包含一部分经济数据和一部分空间数据。

head(world@data)

如果你对空间数据帧不熟悉，那么也还OK，因为你和我差不多，哈哈。leaflet包使得构建一个交互式地图非常简单。gpd_md_est这一列包含了每个国家的GDP估计值，让我们根据GDP的值域确定相应漫游颜色，我们用深蓝表示更高的GDP，浅蓝表示更低的GDP。

library(leaflet)

# Create a palette with different shades of blue for different
# GDP estimates.

gdpPal <- colorQuantile("Blues", world$gdp_md_est, n = 20)

我们想在点击每个国家时弹出相关信息：

# Make a popup object.
# Notice we're referencing the column names with the '$', same as we would with a non-spatial dataframe.
economyPopup <- paste0("<strong>Country: </strong>", 
                world$name, 
                "<br><strong>Market Stage: </strong>", 
                 world$economy)

现在我们可以使用leaflet来构建一个用GDP着色的世界地图并且支持弹出显示。注意到layerId = ~name 代码段会给每个国家名称创建一个图层。

# Build our leaflet map object.

leaf_world_economy <- leaflet(world) %>%
  addProviderTiles("CartoDB.Positron") %>% 
  setView(lng =  20, lat =  15, zoom = 2) %>%
      addPolygons(stroke = FALSE, smoothFactor = 0.2, fillOpacity = .7, color =
      ~gdpPal(gdp_md_est), layerId = ~name, popup = economyPopup)

# Display that object below.

leaf_world_economy

地图看起来不错，但是还需要加上金融数据，因为ETF数据帧和地图数据帧中的 name列是一样的，所以我们可以用sp::merge()函数来聚合他们，这类似于dplyr中的join。

这次我们的国家名称恰巧是一致的，其实比较推荐的方法还是用国家代码来join，如果你以后还想做地图，建议你可以把这些数据规范化采集起来。

融合数据后，股票代码和当年数值这两列都被根据name列合并在一起了，如果对应不上的话将会以NA值呈现在ticker和ytd两列中。

library(sp)
# Once we run this line of code, our ticker symbols and ytd numbers will be added
# to the spatial dataframe.

world_etf <- merge(world, etf_ticker_country, by = "name")

现在，对于没有ETF数据的国家将会显示成灰色。

# Create a palette with different shades of blue for different
# year-to-date performances. Previously, we shaded by 'world$gdp_md_est', now
# we'll shade by 'world_etf$ytd'.

ytdPal <- colorQuantile("Blues", world_etf$ytd, n = 20)

新的图形很棒，但是依然需要根据ytd加上弹窗，给到想看详情的用户。

# Create a popup that displays the year-to-date performance.

ytdPopup <- paste0("<strong>Country: </strong>", 
                world_etf$name,
                "<br><strong> Year-to-date: </strong>", 
                world_etf$ytd, "%")

现在，我们同理构建了另一个，点击显示行情数据的地图，这里我们还要保证大量图层变化不卡顿。为什么要做这一步呢？因为我们最终是要构建 Shiny 应用的，如果每次用户点击都调用getSymbols(),这个layerId会根据用户点击的国家来获取。现在是一个新的地图：

leaf_world_etf <- leaflet(world_etf) %>%
  addProviderTiles("CartoDB.Positron") %>% 
  setView(lng =  20, lat =  15, zoom = 2) %>%
      addPolygons(stroke = FALSE, smoothFactor = 0.2, fillOpacity = .7,
                  
      # The next line of code is really important for creating the map we want to use later.       
      
      color =~ytdPal(ytd), layerId = ~ticker, popup = ytdPopup)

leaf_world_etf