![]() |
Data Algorithm |
Book Details
Title: Data Algorithm
Author: Mahmoud Parsian
Language: English
Subject: Swift / Computers & Technology / Programming / Apple Programming
No. of pages: 778
Format: PDF, EPUB, Mobi
What Is in This Book?
Each chapter of this book presents a problem and solves it through a set of MapRe‐
duce algorithms. MapReduce algorithms/solutions are complete recipes (including
the MapReduce driver, mapper, combiner, and reducer programs). You can use the
code directly in your projects (although sometimes you may need to cut and paste the
sections you need). This book does not cover the theory behind the MapReduce
framework, but rather offers practical algorithms and examples using MapReduce/
Hadoop and Spark to solve tough big data problems. Topics covered include:
-
Market Basket Analysis for a large set of transactions
-
Data mining algorithms (K-Means, kNN, and Naive Bayes)
-
DNA sequencing and RNA sequencing using huge genomic data
-
Naive Bayes classification and Markov chains for data and market prediction
-
Recommendation algorithms and pairwise document similarity
-
Linear regression, Cox regression, and Pearson correlation
- Allelic frequency and mining DNA
- Social network analysis (recommendation systems, counting triangles, sentiment analysis)
You may cut and paste the provided solutions from this book to build your own Map‐
Reduce applications and solutions using Hadoop and Spark. All the solutions have
been compiled and tested. This book is ideal for anyone who knows some Java (i.e.,
can read and write basic Java programs) and wants to write and deploy MapReduce
algorithms using Java/Hadoop/Spark. The general topic of MapReduce has been dis‐
cussed in detail in an excellent book by Jimmy Lin and Chris Dyer[16]; again, the
goal of this book is to provide concrete MapReduce algorithms and solutions using
Hadoop and Spark. Likewise, this book will not discuss Hadoop itself in detail; Tom
White’s excellent book[31] does that very well.
This book will not cover how to install Hadoop or Spark; I am going to assume you
already have these installed. Also, any Hadoop commands are executed relative to the
directory where Hadoop is installed (the $HADOOP_HOME environment variable). This
book is explicitly about presenting distributed algorithms using MapReduce/Hadoop
and Spark. For example, I discuss APIs, cover command-line invocations for running
jobs, and provide complete working programs (including the driver, mapper,
combiner, and reducer).
Data Structures and Algorithms in Swift Ray Wenderlich
What Is the Focus of This Book?
The focus of this book is to embrace the MapReduce paradigm and provide concrete
problems that can be solved using MapReduce/Hadoop algorithms. For each problem
presented, we will detail the map(), combine(), and reduce() functions and provide a
complete solution, which has:
-
A client, which calls the driver with proper input and output parameters.
-
A driver, which identifies map() and reduce() functions, and identifies input and
output.
-
A mapper class, which implements the map() function.
-
A combiner class (when possible), which implements the combine() function.
We will discuss when it is possible to use a combiner. - A reducer class, which implements the reduce() function.
One goal of this book is to provide step-by-step instructions for using Spark and
Hadoop as a solution for MapReduce algorithms. Another is to show how an output
of one MapReduce job can be used as an input to another (this is called chaining or
pipelining MapReduce jobs).
No comments:
Post a Comment