What is Big Data?

What actually is Big Data?

Where did Big Data come from?

How big is Big Data?

Why are everyone so much concerned about Big Data?

When to use Big data?

If you are studying this article, you must be one of the many victims who were hit by the storm called Big Data. The above must be few questions haunting many of us, the budding software engineers for the last couple of years in every tech discussion we have on emerging technologies. I was too on the same page until I decided to know what actually it is. Now it’s your turn to do the honors. By the time we reach the end of this article, you must be able to answer all the above questions.

In General words,  Big Data is huge amount of data that is grown beyond the capabilities of traditional Databases to store and process. It can be referred to as huge data sets that cannot fit into the existing systems to process it. It can be of any type. There are no schema (data type) limitations to Big Data. It can be in structured, Semi – Structured or Unstructured form. In simple terms, we save whatever data is being generated.

Technical Definition: Big Data is schema less data growing extreme in size and changing at a rapid speed. It is coined by the industry experts as 3 V’s. For any data to be called as Big Data it should satisfy the 3 V’s of Big Data.

Volume:  As the name suggests, it talks about the size of data. There are many factors that contribute to the volume of data. Through the years, only transactional data was being stored due to the unaffordable hardware costs. Now, storage has become very economical which in turn let us store unstructured data from Social Media, Sensors, Text Messages, etc.

E.g.

  •  A study reveals that Facebook stores a massive data of 100 petabytes of data in a single disk Hadoop cluster.

Velocity:  This talks about the speed at which the data is being generated. There is a necessity to address the issue of pace at which data is moving. It is definitely a challenge to process the information of such volume at Real-time interval.

E.g. 

  • So much that 90% of the data in the world today has been created in the last two years.
  • Every day about 2.5 quintillion bytes of data is being created.

Variety: This defines data can be of any type. It has become very expensive to load data into Relational databases as it has to be cleaned before processing. There are systems which accepts schema less data and process them as they are.

E.g.

Structured Data: It is Data that can be stored in Relational databases like Oracle, SQL Server, MySql , Etc. It is stored in Tables which are related to each other.

Semi-Structured Data: It is the data stored in Flat files like Excel, Text files, PDF’s, etc,. XML and JSON are also a type of Semi- Structured data.

UnStructured Data : It is highly unstructured type of data. It can be in any form like jpg or PNG Images, Video Files, SMS, Emails, Data generated from Sensors, etc.

What is Big  Data

What does saving such huge data give us? Isn’t it waste of space?

No, definitely not. With data comes greater value to the organizations. This data can be used to analyze the historic information and predict the future outcomes based on samplings and customer patterns, which is what they call Business Intelligence or Big Data Analytics. That is the reason many organizations are investing substantial part of their budget on Big Data.

How much data can be considered as Big Data?

Although there is no standard quantity to measure the Big Data, it is often referred in the range of Petabytes and Exabytes.

Tools to process Big Data:

Big Data is often confused to as a technology when it is not. On a broader note is a catchy phrase or Umbrella term which mentions about the technologies that can be used to harness Big Data. Below are few technologies capable of handling Big data.

  • Hadoop is one technology in conjunction to the word Big Data. This very well suits to process Big Data.
  • SAP HANA is one of the prestigious products from SAP to take on Big data.
  • IBM Infosphere can process the Big Data in Real-Time.

Why is Big Data Important to us? Why should we learn to handle Big Data?

The people with data analysis skills are called as Data Scientists and it is considered to be the sexiest job of the 21st century. There is a huge demand for professional with expertise in Data Science and analysis.  There is a huge gap in the demand supply index in this regard.

As per the research conducted by The McKinsey Global Institute, the business and economics research arm of McKinsey & Co., the United States alone could face a shortage of about 190,000 data analysts and about 1.5 million managers and executives who can use Big Data to extract information and make decisions.

Applications of Big Data: There is wide variety of industries where Big Data can be put to improve the decision making process to develop the business.

  1. Health Services : Improving the services by analyzing the data available.
  2. Banking: Fraud detection by recognizing the fraud patterns from historical information.
  3. Sports: Improving the player’s ability by recognizing weak points.
  4. Financial Trading: Automated algorithms to perform accurate trading.
  5. Target Campaigning : Targeting the customers through online adds.

Practical Example on Target Campaigning: This is one area which we can check practically sitting at our Desks with internet connection.

Step 1: Open the browser and browse for any Online Shopping websites like Flipkart.com or Myntra.com

Step 2: Open the Facebook account in a different tab. You can find the advertising boards on the right side corner of your home page.

This means, you are being targeted by the Ad agencies. This can be achieved by leveraging the power of Big data analytics. Facebook uses such methods to improve its revenue. This is called Target Campaigning.

Happy Reading :-)

Latest Comments
  1. Aswani chintalapudi

    Good explanation

Leave a Reply

Your email address will not be published. Required fields are marked *