
Big Data Visualization
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Key Features
This unique guide teaches you how to visualize your cluttered, huge amounts of big data with ease
It is rich with ample options and solid use cases for big data visualization, and is a must-have book for your shelf
Improve your decision-making by visualizing your big data the right way
Book DescriptionGain valuable insight into big data analytics with this book. Covering the tools you need to analyse data, together with IBM certified expert James Miller?s insight, this book is the key to data visualization success. ? Learn the tools & techniques to process big data for efficient data visualization ? Packed with insightful real-world use cases ? Addresses the difficulties faced by professionals in the field of big data analyticsWhat you will learn
Get to grips with the basics of big data visualization before moving onto data storage, adding context to data using R, and addressing data quality issues. Learn how to use D3 and dashboards to display and present results, and how to use Python to deal with anomalies.
Who this book is forWho is this book for? ? Data analysis beginners & data analysts who want to use visualization for more powerful analysis ? Knowledge of big data platform tools such as Hadoop & programming languages such as R is required
All prices
More details
Other editions
Additional editions

Person
His experience includes IBM Planning Analytics, BI, Web architecture & design, systems analysis, GUI design & testing, Data modeling, design, and development of OLAP, Client/Server, Web & Mainframe applications and systems utilizing: Planning Analytics Workspace (PAW), IBM Watson Analytics, Cognos BI & TM1, Framework Manager, dynaSight/ArcPlan, ASP, DHTML, XML, MS Visual Basic, VBA, PERL, R, SPLUNK, MS SQL Server, ORACLE, etc.
He has authored numerous books, including Implementing Splunk - Second Edition; Mastering Splunk; Hands-On Machine Learning with IBM Watson; IBM Watson Projects; Statistics for Data Science; Mastering Predictive Analytics with R - Second Edition and others.
Project areas include those with Data Analytics, Planning Analytics, and FOPM projects, holding various roles from architect, developer, technical and project leader.
Content
Introduction to Big Data Visualization
Access, Speed and Storage with Hadoop
Context - Understanding your data
Addressing Data Quality
Displaying Results with D3
Dashboard for Big Data -Tableau
Dealing with Outliers
Big Data Operational Intelligence with Splunk
Challenges of big data visualization
We're assuming that you have some background with the topic of data visualization and therefore the earlier deliberations were just enough to refresh your memory and sharpen your appetite for the real purpose of this book.
Big data
Let's take a pause here to define big data.
A large assemblage of data and datasets that are so large or complex that traditional data processing applications are inadequate and data about every aspect of our lives has all been used to define or refer to big data.
In 2001, then Gartner analyst Doug Laney introduced the 3Vs concept ( refer to the following link http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf). The 3Vs, according to Doug Laney, are volume, variety, and velocity. The 3Vs make up the dimensionality of big data: volume (or the measurable amount of data), variety (meaning the number of types of data), and velocity (referring to the speed of processing or dealing with that data).
With this concept in mind, all aspects of big data become increasingly challenging and as these dimensions increase or expand they will also encumber the ability to effectively visualize the data.
Using Excel to gauge your data
Look at the following figure and remember that Excel is not a tool to determine whether your data qualifies as big data:
If your data is too big for Microsoft Excel, it still really doesn't necessarily qualify as big data. In fact, gigabytes of data still are manageable with various techniques, enterprise, and even open source tools, especially with the lower cost of storage today. It is important to be able to realistically size the data that you will be using in an analytic or visualization project before selecting an approach or technology (keeping in mind expected data growth rates).
Pushing big data higher
As the following figure illustrates, the aforementioned Volume, Variety, and Velocity have and will continue to lift Big Data into the future:
The 3Vs
Let's take a moment to further examine the Vs.
Volume
Volume involves determining or calculating how much of something there is, or in the case of big data, how much of something there will be. Here is a thought provoking example:
How fast does moon dust pile up?
As written by Megan Gannon in 2014, (http://www.space.com/23694-moon-dust-mystery-apollo-data.html), a revisited trove of data from NASA's Apollo missions more than 40 years ago is helping scientists answer a lingering lunar question: how fast does moon dust build up? The answer: it would take 1,000 years for a layer of moon dust about a millimeter (0.04 inches) thick to accumulate (big data accumulates much quicker than moon dust!).
With every click of a mouse, big data grows to be petabytes (1,024 terabytes) or even Exabyte's (1,024 petabytes) consisting of billions to trillions of records generated from millions of people and machines.
Although it's been reported (for example, you can refer to the following link: http://blog.sqlauthority.com/2013/07/21/sql-server-what-is-the-maximum-relational-database-size-supported-by-single-instance/) that structured or relational database technology could support applications capable of scaling up to 1 petabyte of storage, it doesn't take a lot of thought to understand with that kind of volume it won't be easy to handle capably, and the accumulation rate of big data isn't slowing any time soon.
It's the case of big, bigger (and we haven't even approached determining), and biggest yet!
Velocity
Velocity is the rate or pace at which something is occurring. The measured velocity experience can and usually does change over time. Velocities directly affect outcomes.
Previously, we lived and worked in a batch environment, meaning we formulate a question (perhaps what is our most popular product?), submit the question (to the information technology group), and wait--perhaps after the nightly sales are processed (maybe 24 hours later), and finally, we receive an answer. This is a business model that doesn't hold up now with the many new sources of data (such as social media or mobile applications), which record and capture data in real time, all of the time. The answers to the questions asked may actually change within a 24-hour period (such is the case with trending now information that you may have observed when you are online).
Given the industry hot topics such as Internet of Things (IoT), it is safe to say that these pace expectations will only quicken.
Variety
Thinking back to our previous mention of relational databases, it is generally accepted that relational databases are considered to be highly structured, although they may contain text in VCHAR, CLOB, or BLOB fields.
Data today (and especially when we talk about big data) comes from many kinds of data sources, and the level in which that data is structured varies greatly from data source to data source. In fact, the growing trend is for data to continue to lose structure and to continue to add hundreds (or more?) of new formats and structures (formats that go beyond pure text, photo, audio, video, web, GPS data, sensor data, relational databases, documents, SMS, pdf, flash, and so on) all of the time.
Categorization
The process of categorization helps us to gain an understanding of the data source.
The industry commonly categorizes big data this way--into the two groups (structured and unstructured)--but the categorizing doesn't stop there.
Some simple research reveals some interesting new terms for subcategorizing these two types of data varieties:
Structured data includes subcategories such as created, provoked, transactional, compiled, and experimental, while unstructured data includes subcategories such as captured and submitted (just to name a few of the currently trending terms for categorizing the types of big data. You may be familiar with or be able to find more).
It's worth taking some time here to speak about these various data formats (varieties) to help drive the point to the reader of the challenges of dealing with the numerous big data varieties:
- Created data: This is the data being created for a purpose; such as focus group surveys or asking website users to establish an account on the site (rather than allowing anonymous access).
- Provoked data: This is described as data received after some form of provoking, perhaps such as providing someone with the opportunity to express the individual's personal view on a topic, such as customers filling out product review forms.
- Transactional data: This is data that is described as database transactions, for example, the record of a sales transaction.
- Compiled data: This is data described as information collected (or compiled) on a particular topic such as credit scores.
- Experimental data: This is described as when someone experiments with data and/or sources of data to explore potential new insights. For example, combining or relating sales transactions to marketing and promotional information to determine a (potential) correlation.
- Captured data: This is the data created passively due to a person's behavior (like when you enter a search term on Google, perhaps the creepiest data of all!).
- User-generated data: This is the data generated every second by individuals, such as from Twitter, Facebook, YouTube, and so on (compared to captured data, this is data you willingly create or put out there).
To sum up, big data comes with no common or expected format and the time required to impose a structure on the data has proven to be no longer worth it.
Such are the 3Vs
In addition to what we mentioned earlier, there are additional challenging areas that big data brings to the table especially to the task of data visualization, for example, the ability to effectively deal with data quality, outliers, and to display results in a meaningful way, to name a few.
Again, it's worth quickly visiting each of these topics here now.
Data quality
The value of almost anything and everything is directly proportional to its level of quality and higher quality is equal to higher value.
Data is no different. Data (any data) can only prove to be a valuable instrument if its quality is certain.
The general areas of data quality include:
- Accuracy
- Completeness
- Update status
- Relevance
- Consistency (across sources)
- Reliability
- Appropriateness
- Accessibility
The quality of data can be affected by the way it is entered, stored, and managed and...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.