Beschreibung
Many corporations are finding that the size of their data sets are outgrowing the capability of their systems to store and process them. The data is becoming too big to manage and use with traditional tools. The solution: implementing a big data system.As Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset shows, Apache Hadoop offers a scalable, fault-tolerant system for storing and processing data in parallel. It has a very rich toolset that allows for storage (Hadoop), configuration (YARN and ZooKeeper), collection (Nutch and Solr), processing (Storm, Pig, and Map Reduce), scheduling (Oozie), moving (Sqoop and Avro), monitoring (Chukwa, Ambari, and Hue), testing (Big Top), and analysis (Hive).The problem is that the Internet offers IT pros wading into big data many versions of the truth and some outright falsehoods born of ignorance. What is needed is a book just like this one: a wide-ranging but easily understood set of instructions to explain where to get Hadoop tools, what they can do, how to install them, how to configure them, how to integrate them, and how to use them successfully. And you need an expert who has worked in this area for a decadesomeone just like author and big data expert Mike Frampton.Big Data Made Easy approaches the problem of managing massive data sets from a systems perspective, and it explains the roles for each project (like architect and tester, for example) and shows how the Hadoop toolset can be used at each system stage. It explains, in an easily understood manner and through numerous examples, how to use each tool. The book also explains the sliding scale of tools available depending upon data size and when and how to use them. Big Data Made Easy shows developers and architects, as well as testers and project managers, how to: - Store big data Configure big data Process big data Schedule processes Move data among SQL and NoSQL systems Monitor data Perform big data analytics Report on big data processes and projects Test big data systems Big Data Made Easy also explains the best part, which is that this toolset is free. Anyone can download it andwith the help of this bookstart to use it within a day. With the skills this book will teach you under your belt, you will add value to your company or client immediately, not to mention your career.
Autorenporträt
InhaltsangabeChapter 1: The Problem with DataChapter Goal: Explain the big data problem Explain how Hadoop tools can help Explain my method of Hadoop tool use Explain how these tools fit together using a data warehouse as a metaphor Explain to people how using these tools can save them time and money while "futureproofing" their organizations. Chapter 2: Storing and Configuring Data with Hadoop, Yarn, and ZooKeeper Chapter Goal: Provide a Hadoop platform overview Explain how Hadoop can be installed and configured Explain how Hadoop can be used via examples Explain configuration tools with examples Briefly explain the wider command set. Chapter 3: Collecting Data with Nutch and SolrChapter Goal: Explain how big data can be modified and imported into Hadoop Explain how ETL streams can quickly become very long and complex Explain the Hadoop collection tools with worked examples Chapter 4: Processing Data with Storm, Pig, and Map ReduceChapter Goal: Explain how big data can be processed using Hadoop tools Give examples of processing tool use and when and why they might be useful Show results and compare tools Chapter 5: Scheduling Using OozieChapter Goal: 1. Explain how important scheduling is to system management 2. Explain monitoring and problem alerting 3. Explain the tools used via example Chapter 6: Moving Data with Sqoop and AvroChapter Goal: Explain the special problems that big data brings to data movement Explain the tools used to move big data Give worked examples for tool installation and use Chapter 7: Monitoring the System with Chukwa, Ambari, and HueChapter Goal: Explain the need to monitor a big data system, which may contain millions of files Explain the systems and tools available to monitor Give worked examples for tool installation and use Chapter 8: Analyzing and Querying Data with Hive and MongoDB Chapter Goal: Explain how to query data Explain the tools available to the analyst/manager/tester Show how to install and use analytics tools, with examples Chapter 9: Reporting with Hadoop and Other SoftwareChapter Goal: Explain how you can assist management via reports Explain the tools Hadoop and other software provides Show to how to install reporting tools and use them, with examples Chapter 10: Testing with Big TopChapter Goal: Explain how to test a big data system Explain what testing tools are available Show how to install and use them, with examples Chapter 11: Hadoop Present and Future Chapter Goal: Explain that data sizes will just keep growing Explain that financial and regulatory pressures will push for greater data retention Explain that this is already happening in the energy and banking sectors Explain how Hadoop, a free tool, will help solve these problems going forward Explain to readers that getting involved now could build them a new career and will certainly help their company now and in the future.
Herstellerkennzeichnung:
APress in Springer Science + Business Media
Heidelberger Platz 3
14197 Berlin
DE
E-Mail: juergen.hartmann@springer.com




































































































