Part 2: Data storage advice for beginners

Do you need to store your data? Is a 'data lake' the right storage system for you?

Alicornio Africa, specialists in data warehousing and business intelligence, offers some insight into the leading integration systems used by companies to store masses of information.

In Part 1 Alicornio highlighted the Data Warehouse and Extended Data Warehouse management systems. But data storage is not limited to these two programmes.

Below we explore the benefits and problems associated with data lakes within the Extended Data Warehouse environment specifically, and Hadoop the most common set of technologies utilised to deploy data lakes.

Data Lakes

This is a storage repository that holds enormous volumes of data in its native form until needed, according to Alicornio.

Data is usually stored in raw format as "structured, unstructured or semi structured" information.

In this system data is organised as it comes out of storage rather than when it goes in. This concept is known as “Schema on Read”.

"Data is stored just in case it is valuable. Structure, model and meaning is associated with the data at the time of usage."

Data Lake Challenges

"It could very easily become a dump of data where no-one knows what the raw data represents, how old it is and what the quality of the data is." Organisations need to monitor the use of data lakes to avoid the creation of a dumping ground.

Another drawback for companies using data lakes is the demand for sophisticated tools and skilled data scientists.

In order to query information in the data lake organisations will need people who "understand data, the business meaning of the data and all the new associated technologies."

Hadoop

"Hadoop was created to address the challenge of indexing the entire World Wide Web every day."

Most of the data lake implementations utilises the Hadoop ecosystem in one way or another explains Alicornio.

"It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs."

Hadoop Challenges

It is a complex technical environment with a range of open-source software which comes and goes.

"It is difficult to decide which Hadoop technologies will solve which problems as there are so many solutions available to address various problems. There is no one solution available to solve all problems"

Hadoop like other data systems brings with it a mix of benefits and drawbacks.

Still not sure which system will suit your business?

Join the Concepts, Design and Modelling for Extended Data Warehousing course offered by Alicornio Africa to find out more.

Please Add Your Comments Below: