For forums, blogs and more please visit our
Developer Tools Community.
By: Tim DelChiaro
Abstract: This article by Pinal Dave takes a different approach to understanding the nuances of Big Data and what is this process that industry is trying to talk
Guest article by Pinal Dave.
The talk of the town these days are around Big Data if you are in a data-related domain. Most of the top CIO’s are trying to get into this bandwagon of Big-data and want to implement the same within their organization. Organizations have had data and loads of data inside their premises for ages and they have been doing all kinds of reporting, prediction, analysis on top of this data. So what make this concept of Big-data really unique? For a matter of fact, there is nothing unique about it – it is just the process that follows that makes it unique.
In this article, we will take a different approach of understanding the nuances of Big Data and what is this process that industry is trying to talk? How should we systematically approach the same?
Watch the video: Do you have the right database administration tools?
Embarcadero® DBArtisan® is the premiere database administration toolset helping DBAs maximize availability, performance and security across multiple DBMSs. This essential toolset consistently boosts productivity, streamlines routine tasks, and reduces errors.
There are some high level terminologies we see when building a robust architecture, these are high level conceptual ideas that we thought are worth a mention.
This is the foundational building block and built on top of a domain. These models are agnostic of the technology used.
This step defines the rules and patterns for the domain. These are the logical structure and pave way for specific implementation.
This is the physical deployment of a defined Architecture. Suitable as a solution or multiple implementations can be done using different technologies.
Looks into specific requirement and expands on the implementation. It honors to rules of architecture.
This is the specific implementation of the solution defined with technology development and deployment plans.
The industry always talks about Big Data in the lenses of 3 V’s (Volume, Velocity and Variety). Our fundamental understanding is if the data over runs the current capacity of processing of relational databases, then we are most likely looking at a Big data scenario. There are number of implementations of big data and it need not always be about being big, the “V” of variety makes the solution of implementation of big-data really messy.
When we say it is messy, we mean it is about analyzing or dissecting large amount of loosely structured data and performing distributed aggregations around them. In the enterprises of this era, we are sometimes talking about multi-petabytes, billions of transactions per day/week, incomplete textual data (social for example) and much more.
Let us outline some of the terms used in this context: Hide image
We would like to put into perspective of how data flows into any enterprise system:
This is a simple process and there are multiple tools that help us in this process. Let us take a simple example of some tools available in the Microsoft ecosystem to illustrate the same.
With this growing trend, in the current era – know how the Big-data solution would be implemented. Today, we can implement it as an on-Premise, Cloud only or Hybrid. One of the question we ask is – “Is data born in the cloud and does it always stay in the cloud”.
As you start your journey with Big Data, know your environment and build the solution in a phased manner. If we don’t know where we need to go, we will never be helped by any map. So know what the parameters we want to analyse. Clean the data, scrub for invalid outliners and then analyse for the majority 80% content. Big data implementation is a journey and we need to take one step at a time. Hope this blog brought out some of the facets.
About Pinal Dave - Technology Evangelist & Founder of SQL Authority
Pinal Dave works as a Technology Evangelist (Database and BI) with Microsoft India. He has written over 2000 articles on the subject on his blog at https://blog.sqlauthority.com. During his career he has worked both in India and the US, mostly working with SQL Server Technology – right from version 6.5 to its latest form. Pinal has worked on many performance tuning and optimization projects for high transactional systems. He has been a regular speaker at many international events like TechEd, SQL PASS, MSDN, TechNet and countless user groups.
Could not retrieve comments. Please try again later.
The DBArtisan Community Has moved
Visit the new Embarcadero Community site for the latest news, articles, blog posts, Q&A and forums for Embarcadero DBArtisan
Click here to learn more about DBArtisan or download a free trial
Server Response from: ETNASC04