Data Algorithm - Programming Ebook


Download Programming Ebook

Tuesday, May 28, 2019

Data Algorithm

Data Algorithm
Data Algorithm

Book Details 
             TitleData Algorithm
         Author: Mahmoud Parsian
    Language: English
        SubjectSwift / Computers & Technology / Programming / Apple Programming
No. of pages: 778
         Format: PDF, EPUB, Mobi

What Is in This Book?

Each chapter of this book presents a problem and solves it through a set of MapRe‐ duce algorithms. MapReduce algorithms/solutions are complete recipes (including the MapReduce driver, mapper, combiner, and reducer programs). You can use the code directly in your projects (although sometimes you may need to cut and paste the sections you need). This book does not cover the theory behind the MapReduce
framework, but rather offers practical algorithms and examples using MapReduce/ Hadoop and Spark to solve tough big data problems. Topics covered include:
  • Market Basket Analysis for a large set of transactions
  • Data mining algorithms (K-Means, kNN, and Naive Bayes)
  • DNA sequencing and RNA sequencing using huge genomic data
  • Naive Bayes classification and Markov chains for data and market prediction
  • Recommendation algorithms and pairwise document similarity
  • Linear regression, Cox regression, and Pearson correlation
  • Allelic frequency and mining DNA
  • Social network analysis (recommendation systems, counting triangles, sentiment analysis)
You may cut and paste the provided solutions from this book to build your own Map‐ Reduce applications and solutions using Hadoop and Spark. All the solutions have been compiled and tested. This book is ideal for anyone who knows some Java (i.e., can read and write basic Java programs) and wants to write and deploy MapReduce algorithms using Java/Hadoop/Spark. The general topic of MapReduce has been dis‐ cussed in detail in an excellent book by Jimmy Lin and Chris Dyer[16]; again, the goal of this book is to provide concrete MapReduce algorithms and solutions using Hadoop and Spark. Likewise, this book will not discuss Hadoop itself in detail; Tom White’s excellent book[31] does that very well.

This book will not cover how to install Hadoop or Spark; I am going to assume you already have these installed. Also, any Hadoop commands are executed relative to the directory where Hadoop is installed (the $HADOOP_HOME environment variable). This book is explicitly about presenting distributed algorithms using MapReduce/Hadoop and Spark. For example, I discuss APIs, cover command-line invocations for running jobs, and provide complete working programs (including the driver, mapper, combiner, and reducer).

Data Structures and Algorithms in Swift Ray Wenderlich

What Is the Focus of This Book?

The focus of this book is to embrace the MapReduce paradigm and provide concrete problems that can be solved using MapReduce/Hadoop algorithms. For each problem presented, we will detail the map(), combine(), and reduce() functions and provide a complete solution, which has:
  • A client, which calls the driver with proper input and output parameters.
  • A driver, which identifies map() and reduce() functions, and identifies input and output.
  • A mapper class, which implements the map() function.
  • A combiner class (when possible), which implements the combine() function.
    We will discuss when it is possible to use a combiner.
  • A reducer class, which implements the reduce() function.

One goal of this book is to provide step-by-step instructions for using Spark and Hadoop as a solution for MapReduce algorithms. Another is to show how an output of one MapReduce job can be used as an input to another (this is called chaining or pipelining MapReduce jobs). 

No comments:

Post a Comment