Big Data: Principles and best practices of scalable realtime data systems

Nathan Marz, James Warren

$49.99

Publication Date: May 10th, 2015

Publisher:

Manning

ISBN:

9781617290343

Pages:

328

The MIT Press Bookstore

1 on hand, as of Apr 19 6:11pm

(CS:PR)

On Our Shelves Now

Description

Summary

Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they're built.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the Book

Web-scale applications like social networks, real-time analytics, or e-commerce sites deal with a lot of data, whose volume and velocity exceed the limits of traditional database systems. These applications require architectures built around clusters of machines to store and process data of any size, or speed. Fortunately, scale and simplicity are not mutually exclusive.

Big Data teaches you to build big data systems using an architecture designed specifically to capture and analyze web-scale data. This book presents the Lambda Architecture, a scalable, easy-to-understand approach that can be built and run by a small team. You'll explore the theory of big data systems and how to implement them in practice. In addition to discovering a general framework for processing big data, you'll learn specific technologies like Hadoop, Storm, and NoSQL databases.

This book requires no previous exposure to large-scale data analysis or NoSQL tools. Familiarity with traditional databases is helpful.

What's Inside

Introduction to big data systems
Real-time processing of web-scale data
Tools like Hadoop, Cassandra, and Storm
Extensions to traditional database skills

About the Authors

Nathan Marz is the creator of Apache Storm and the originator of the Lambda Architecture for big data systems. James Warren is an analytics architect with a background in machine learning and scientific computing.

Table of Contents

A new paradigm for Big Data

PART 1 BATCH LAYER

Data model for Big Data
Data model for Big Data: Illustration
Data storage on the batch layer
Data storage on the batch layer: Illustration
Batch layer
Batch layer: Illustration
An example batch layer: Architecture and algorithms
An example batch layer: Implementation

PART 2 SERVING LAYER

Serving layer
Serving layer: Illustration

PART 3 SPEED LAYER

Realtime views
Realtime views: Illustration
Queuing and stream processing
Queuing and stream processing: Illustration
Micro-batch stream processing
Micro-batch stream processing: Illustration
Lambda Architecture in depth

About the Author

Nathan Marz is currently working on a new startup. Previously, he was the lead engineer at BackType before being acquired by Twitter in 2011. At Twitter, he started the streaming compute team which provides and develops shared infrastructure to support many critical realtime applications throughout the company. Nathan is the creator of Cascalog and Storm, open-source projects which are relied upon by over 50 companies around the world, including Yahoo!, Twitter, Groupon, The Weather Channel, Taobao, and many more companies.

James Warren is an analytics architect at Storm8 with a background in big data processing, machine learning and scientific computing.

Big Data: Principles and best practices of scalable realtime data systems

Description

About the Author

Friending the Past: The Sense of History in the Digital Age

Micro:bit for Mad Scientists: 30 Clever Coding and Electronics Projects for Kids

Computer Science for Kids: A Storytelling Approach

Nightwork, updated edition: A History of Hacks and Pranks at MIT

Cloud Native Transformation: Practical Patterns for Innovation

Boosting: Foundations and Algorithms (Adaptive Computation and Machine Learning series)

The Equality Machine: Harnessing Digital Technology for a Brighter, More Inclusive Future

The C# Type System

A Billion Little Pieces: RFID and Infrastructures of Identification

There Are No Facts: Attentive Algorithms, Extractive Data Practices, and the Quantification of Everyday Life

How to Build a Skyscraper

The Shortcut: Why Intelligent Machines Do Not Think Like Us

Glorious Beef: The LaFrieda Family and the Evolution of the American Meat Industry

Ignorance and Surprise: Science, Society, and Ecological Design (Inside Technology)

Computational Thinking (The MIT Press Essential Knowledge series)

Design Justice: Community-Led Practices to Build the Worlds We Need (Information Policy)

The Joy of Search: A Google Insider's Guide to Going Beyond the Basics

Machine Art in the Twentieth Century (Leonardo)

The Tone of Our Times: Sound, Sense, Economy, and Ecology (Leonardo)

Instrumental Community: Probe Microscopy and the Path to Nanotechnology (Inside Technology)

The Works of Archimedes (Dover Books on Mathematics)

Technology and Society, second edition: Building Our Sociotechnical Future (Inside Technology)

In Pursuit of Zeta-3: The World's Most Mysterious Unsolved Math Problem

The Book of I²C: A Guide for Adventurers

Science Fiction (The MIT Press Essential Knowledge series)

Grokking Concurrency

Invisibility: The History and Science of How Not to Be Seen

How Data Happened: A History from the Age of Reason to the Age of Algorithms

PFR: Book Talk: Blotter: The Untold Story of an Acid Medium with Erik Davis

Big Data: Principles and best practices of scalable realtime data systems

Description

About the Author

You May Also Like

Friending the Past: The Sense of History in the Digital Age

Micro:bit for Mad Scientists: 30 Clever Coding and Electronics Projects for Kids

Computer Science for Kids: A Storytelling Approach

Nightwork, updated edition: A History of Hacks and Pranks at MIT

Cloud Native Transformation: Practical Patterns for Innovation

Boosting: Foundations and Algorithms (Adaptive Computation and Machine Learning series)

The Equality Machine: Harnessing Digital Technology for a Brighter, More Inclusive Future

The C# Type System

A Billion Little Pieces: RFID and Infrastructures of Identification

There Are No Facts: Attentive Algorithms, Extractive Data Practices, and the Quantification of Everyday Life

How to Build a Skyscraper

The Shortcut: Why Intelligent Machines Do Not Think Like Us

Glorious Beef: The LaFrieda Family and the Evolution of the American Meat Industry

Ignorance and Surprise: Science, Society, and Ecological Design (Inside Technology)

Computational Thinking (The MIT Press Essential Knowledge series)

Design Justice: Community-Led Practices to Build the Worlds We Need (Information Policy)

The Joy of Search: A Google Insider's Guide to Going Beyond the Basics

Machine Art in the Twentieth Century (Leonardo)

The Tone of Our Times: Sound, Sense, Economy, and Ecology (Leonardo)

Instrumental Community: Probe Microscopy and the Path to Nanotechnology (Inside Technology)

The Works of Archimedes (Dover Books on Mathematics)

Technology and Society, second edition: Building Our Sociotechnical Future (Inside Technology)

In Pursuit of Zeta-3: The World's Most Mysterious Unsolved Math Problem

The Book of I²C: A Guide for Adventurers

Science Fiction (The MIT Press Essential Knowledge series)

Grokking Concurrency

Invisibility: The History and Science of How Not to Be Seen

How Data Happened: A History from the Age of Reason to the Age of Algorithms

Sign up to receive our newsletter

PFR: Book Talk: Blotter: The Untold Story of an Acid Medium with Erik Davis