Big Data

 Download your Full Reports for Big Data

Every day, we create 2.5 quintillion bytes(one quintillion bytes = one billion gigabytes). of data ? so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data.

????????????????????????? The advent of technologies such as software and hardware that instantly analyzes natural human language, and massive amounts and varieties of big data flowing from sensors, mobile devices, and the Web, are helping today?s data pioneers find answers to questions like, ?How do consumers feel about my product?? and ?Why are patients being readmitted to our hospital?? and ?What happens if we put a wind farm here instead of there?? These systems literally sift through the data and identify patterns and trends on the fly, then present them in a way that?s easy for people to understand.
Trends can then be fed back into systems for further analysis that allow for new kinds of questions to be asked, such as ?What will consumer reaction be if we introduce these kinds of products?? or ?How will patients in emerging markets benefit from healthcare transformation in North America?? or ?Why does sustainability positively impact our business model over the next five years??

Everyday enormous amount of data is being produced worldwide. Big Data analytics has brought a big opportunity for organizations. Companies capture trillions of bytes of information about their customers, suppliers, and operations. IT organizations are exploring the analytics technologies to explore web-based data sources and extract value from the social networking boom. In the western world, organizations are wondering about the kind of business intelligence they could derive from all the information they have at their disposal.? The organizations are trying to leverage Big Data by trying to make sense from the data that they have and by securing it. In the next three to five years, there will be a widening gap between companies that understand and exploit Big Data and companies that are aware of it but do not know what to do with it. Already the forward thinking players of the banking, insurance, manufacturing, retail, wholesale, healthcare, communications, transportation, construction, utilities, and education are successfully using big data by exploiting meaningful information from all the Data they have and using those information in formulating their strategic moves. Those companies who will be able to use Big Data successfully will be clearly ahead of those who will react slowly to capitalize on Big Data.
What is Big Data
Big data is very large, distributed aggregations of loosely structured data that often incomplete and inaccessible. To be more specific Big data has the following characteristics stated below.

  1. It works with Petabytes/exabytes of data
  2. Involved million/billions of people
  3. Accumulate billions/trillions of records
  4. Flat schemas with a few complex interrelationships
  5. Involved time-stamped events most often
  6. Work out with incomplete data set
  7. Includes connections between data set those are probabilistically incidental

Big Data is not only regarding cool technologies and Web 2.0 companies are testing with massive data sets. Rather it's defining new value streams based on leveraging information. (Floyer, 2011)
Components of Big Data Processing
Big-data projects have a number of different layers of abstraction from abstraction of the data through to running analytics against the abstracted data. Following figure shows the basic elements of analytical Big-data and their interrelationships. The higher level components help make big data projects easier and more dynamic. Hadoop is often at the center of Big-data projects, but it is not a precondition.

The components of analytical Big-data are given below

  • Hadoop packaging and support organizations like Cloudera; to include MapReduce - essentially the compute layer of big data.
  • Any File system like Hadoop Distributed File System (HDFS), that manages the retrieval and storing of data and metadata required for computation. Databases such as Hbase? can also be used.
  • A higher level language such as Pig (part of Hadoop) can be used instead of using JAVA to simplify the writing of computations.
  • A data warehouse layer named Hive is built on top of Hadoop
  • A thin Java library named Cascading is sits on top of Hadoop to allow suites of MapReduce jobs to be run and managed as a unit. This is a widely used as a special tool
  • CR-X, a Semi-automated modeling tool allow to develop interactively at great speed, and can help set up the database that will run the analytics.
  • Greenplum or Netezza, a specialized scale-out analytic databases allows very fast load & reload the data for the analytic models
  • ISV big data analytical packages like ClickFox and Merced run against the database to help address the business issues
Hadoop is not used in Transactional Big-data projects because it is not real-time. For transactional systems that do not need a database with ACID guarantees, NoSQL databases can? used, though there are constraints like weak consistency guarantees or restricting transactions to a single data item. In case of big-data transactional SQL databases that need the ACID guarantees the choices are limited. Traditional scale-up databases are naturally too costly for very large-scale deployment, and don't scale out very well. Most social medial databases have had to hand-craft solutions. Recently a new breed of scale-out SQL database have emerged with architectures that move the processing next to the data (in the same way as Hadoop), such as Clustrix. These allow greater scaleoutability.

 Download your Full Reports for Big Data


© 2013 All Rights Reserved.