Data graveyards are evidence of wasted time, money and efforts.
A plethora of terminologies and technologies are currently in the mix. For the un-inaugurated and the professionals alike it has become a minefield, open to misinterpretation and misrepresentation.
Confused about EDW, DW, Data Lakes, Hadoop, HTAP, Hive, LLAP, Spark SQL… (...you can complete the list! )? Are you wondering ... To ETL or not, to ELT or not, or to do no ETL/ELT at all?
Choosing a solution well suited for your organisation needs to be driven by the motto of “Pragmatism and Flexibility”.
Very seldom do our Data Lakes get filled in a well-planned, organized and consistent manner. Gathering, storing and analyzing all sorts of data (for example: raw customer data, customer behavior data, location data, website clickstreams, images, social media trails, video and audio files plus regular structured data); requires intensive planning and must be implemented based on de facto standards. Otherwise Lakes of Data quickly become Data Swamps and eventually Data Graveyards.
How do you avoid this data-death trap?
Instead of haphazardly collecting data, it is more sensible to deliver information for decision making purposes in an orchestrated stream of bite-size chunks while demonstrating progress, spreading cost and gaining business users trust. The current rage which advocates collecting all data in a Data Lake should be approached with caution.
Recognize risk-increasing factors such as appointing inexperience staff and/or forsaking proper training, not engaging with business people to collect requirement and designing technology driven solutions, to name a few.
Curb risks by demonstrating early success albeit in small steps. Increase the success rate through a closely followed Business Intelligence Strategy and address each aspect of the process for building a solution (which might include a Data Lake and Extended Data Warehouse). Pre-determine caveats regarding meta-data/catalogues, collecting and consolidating, cleaning and structuring data - right through to data analytics, visualization and all forms of information deployment.
Be guided by a blueprint on how it should be done. Continually striving for optimal information utilization yet remaining pragmatic and flexible. Avoid Data Lakes becoming Swamps and Graveyards. Learn how to apply sound principles no matter what technology is used.
Attend the ever popular updated course Concepts, Design and Modelling for Extended Data Warehousing a technology independent course offered by Alicornio Africa.