According to the PBS documentary, “The Human Face of Big Data,” we now create as much data every two days as the world generated from its inception through 2003
Naturally, organizations are struggling with the challenge of managing massive amounts of data that is constantly being created by new data sources, as well as traditional data sources that are often dumped into large, hard-to-manage repositories fragmented across multiple networks. The data is being further split and made inaccessible due to security half-measures. Big data is a big mess, and the Geospatial Intelligence (GEOINT) Community must clean it up in order to to meet their objectives.
Of course, there are plenty of technologies out there that tout the ability to make GEOINT data analysis efficient, simple and cost effective. They claim they have the ability to let GEOINT professionals collaborate in new and innovative ways using their data, providing never before seen insights. These claims all go out the window when they see the current state of the data we have to work with.
Much of GEOINT data is hard to access, oftentimes tucked away into numerous physical storage environments on multiple networks, with no regard for how many redundant copies have been made and continue to be created. This is further complicated by the fact that the full metadata sometimes required to solve analytic issues is not always available.
So how can the GEOINT Community start to clean up this big data mess?
1. Purchase the data
The National Geospatial-Intelligence Agency (NGA) has proposed the Commercial Initiative to Buy an Operationally Responsive GEOINT (CIBORG) vehicle for acquiring geospatial data. Perhaps CIBORG could provide transparency in regards to the terms under which NGA and National System for Geospatial-Intelligence (NSG) partners can acquire valuable spatiotemporal data. This dynamic acquisition vehicle could potentially provide data transparency between the U.S. national security community, international partners, humanitarian partners, the U.S. government and private citizens to begin tackling the big data mess.
2. Crowdsource the data
The desire for increased visibility into government data and the rising popularity of citizen science indicate a natural shift toward collaboration. It’s time to start thinking about how organizations like NGA can better leverage crowdsourcing tools to collect and create data. OpenStreetMap and similar initiatives have proven the value of user communities around the world for creating data sets in areas that have previously been underserved, too dangerous to study, or not a priority. What policy changes will need to be made in order to ensure crowdsourced data sets, like those of OpenStreetMap, are regarded as legitimate? Will NGA be willing to let citizen scientists verify and edit them as needed? With a growing number of autonomous data sensors and an increasingly capable community of citizen scientists, it will be essential to legitimize and leverage crowdsourced data sets as much as possible moving forward.
3. Migrate legacy data
Suppose NGA’s CIBORG initiative comes to fruition and they open their unclassified data sets to citizen scientists. We would then have the massive task of wrangling the legacy environments of said data across countless networks, file systems, databases and APIs. Many efforts have been made in the past two decades to patch these legacy environments together so that data can be accessed seamlessly; all with marginal to little success. However, the GEOINT Community has found hope in the the Intelligence Community Information Technology Enterprise (IC ITE) cloud. It offers the possibility, but not yet the reality, of migrating data into environments that will allow the GEOINT Community to get the most out of the modern technologies and strategies available to them. IC ITE could be the key to cleaning up the big data mess.
4. Address complex data challenges
A successful transition to the IC ITE cloud involves a four-step process that begins with taking a mission needs inventory by creating specific user stories that define the most common activities supporting GEOINT mission threads. Next, inventory will need to be taken for government, commercial and public GEOINT data sources. The data will then need to be decoupled from analytics by storing GEOINT content in IC ITE cloud-based open storage systems (e.g., Hadoop, HBase, Accumulo, Elasticsearch) that provide multiple ways of accessing content such as ArcMap, QGIS, full-text search, Google Earth, etc. Lastly, we’ll need to simplify data discovery by communicating to analysts, software engineers, data scientists and leaders how to access data for each GEOINT mission thread.
Once this occurs, the U.S. government and GEOINT Community will then have the ability to introduce big data and machine learning tools into their data sources. These powerful technologies will let data be stored at the level of its classification with cross-domain access for users on every network.
Micro-services and Open Geospatial Consortium (OGC) web services will be deployed on the elastic cloud, providing unprecedented flexibility and scalability. In turn, the data will be easily accessible for cataloging and a wide range of indexing schemes, greatly improving discovery and access. This will also undoubtedly open the floodgates for conversations regarding new ways individuals, teams and communities can collaboratively interact with each other among the data. At this point, the big data mess could be nearing its end. But what will be in store for the future?
A Bright Future
Although a daunting task, getting a handle on the big data mess gives us much to look forward to. By moving relevant data into an elastic cloud with basic standards for data structure and access, we will radically transform the face of GEOINT data. The GEOINT Community will at last be able to fully leverage the global wealth of data available to them to provide the U.S. with a time-dominant decision advantage when it comes to international affairs. Machine learning tools will sort through the data, identifying what’s known and queuing up the unknown for analysts to learn more. Invaluable geospatial data will be curated in collaborative efforts by the modern analytic workforce. The volume of data being dispatched will be massive and continue to grow, but the fidelity of the data will be unparalleled, and its update cycle will be much more rapid than it is today. When we solve the big data mess, the GEOINT Community will have the power to turn intelligent insights into meaningful actions like never before.
Anthony Calamito serves as the Chief Evangelist & VP of Product for Boundless and is responsible for product strategy and outreach, educational initiatives, and our ongoing commitment to the open source community. Anthony is also a Steering Committee member at LocationTech, and and a Fellow of the American Geographical Society. His commitment to geography education and community outreach extends beyond Boundless — he is also an adjunct instructor at George Mason University.
Chris Tucker is the creator of Mapstory.org, a companion to Wikipedia, and Chairman of the Board of Trustees of the Mapstory Foundation. In addition to his work at Mapstory, Chris also manages Yale House Ventures, a portfolio of technology companies, social ventures and public entrepreneurship initiatives, and can also be found at a number of think tanks he supports including the Center for National Policy and the Institute for State Effectiveness.