spark_the_definitive_guide.pdf

(9932 KB) Pobierz
Spark
BIG DATA PROCESSING MADE SIMPLE
The Definitive Guide
Bill Chambers & Matei Zaharia
Spark: The Definitive Guide
Big Data Processing Made Simple
Bill Chambers and Matei Zaharia
Beijing
Boston Farnham Sebastopol
Tokyo
Spark: The Definitive Guide
by Bill Chambers and Matei Zaharia
Copyright © 2018 Databricks. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐
tutional sales department: 800-998-9938 or
corporate@oreilly.com.
and Amanda Kersey
Proofreader:
Jasmine Kwityn
February 2018:
Editor:
Nicole Tache
Production Editor:
Justin Billing
Copyeditor:
Octal Publishing, Inc., Chris Edwards,
Indexer:
Judith McConville
Interior Designer:
David Futato
Cover Designer:
Karen Montgomery
Illustrator:
Rebecca Demarest
First Edition
Revision History for the First Edition
2018-02-08:
2018-03-16:
First Release
Second Release
See
http://oreilly.com/catalog/errata.csp?isbn=9781491912218
for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc.
Spark: The Definitive Guide,
the cover
image, and related trade dress are trademarks of O’Reilly Media, Inc. Apache, Spark and Apache Spark are
trademarks of the Apache Software Foundation.
While the publisher and the authors have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.
978-1-491-91221-8
[LSI]
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
Part I.
Gentle Overview of Big Data and Spark
4
6
7
8
9
9
10
11
11
1.
What Is Apache Spark?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Apache Spark’s Philosophy
Context: The Big Data Problem
History of Spark
The Present and Future of Spark
Running Spark
Downloading Spark Locally
Launching Spark’s Interactive Consoles
Running Spark in the Cloud
Data Used in This Book
Spark’s Basic Architecture
Spark Applications
Spark’s Language APIs
Spark’s APIs
Starting Spark
The SparkSession
DataFrames
Partitions
Transformations
Lazy Evaluation
Actions
Spark UI
An End-to-End Example
2.
A Gentle Introduction to Spark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
13
14
15
16
16
17
17
18
19
20
20
21
22
iii
Zgłoś jeśli naruszono regulamin