In order to organize the different types of data into a structure that can be analyzed, it is necessary to create a high-level structure of data within the data lake. As data enters the lake it first enters the raw data pond. The purpose of the raw data pond is to serve as a holding cell. There is little or no analysis or other organized activity of the data while in the raw data pond. Once it is time for analysis, the information in the raw data pond is sent to one of three different ponds based on the kind of data entailed. For example, analog, application and textual data all require a unique data pond. While it is important to separate the three types of data, once inside the pond considerable processing takes place. It’s note...
- Chapter 4 Data Ponds
- from Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump
- Publisher: Technics Publications
- Released: April 2016
High level steps
Share this highlighthttp://www.safaribooksonline.com/a/data-lake-architecture/7948796/