Impala in Action: Querying and Mining Big Data

Impala in Action: Querying and Mining Big Data

By Ricky Saltzer, Istvan Szegedi, and Paul De Schacht

Hadoop queries in Pig or Hive can be too slow for real-time data analysis. Impala, an ultra-speedy query engine from Cloudera, supercharges Hadoop by avoiding the typical Map-Reduce overhead and parallelizing queries so that they can run on multiple nodes. This is a big deal for big data, because with Impala, querying Hadoop takes seconds rather than minutes.

READ FULL DESCRIPTION

Quantity Price Discount
List Price $44.99  

Quick Quote

Lorem ipsum dolor sit amet, consectetur adipisicing elit

Non-returnable discount pricing

$44.99


Book Information

Publisher: Manning Publications
Publish Date: 04/07/2015
Pages: 250
ISBN-13: 9781617291982
ISBN-10: 1617291986
Language: English

Full Description

Hadoop queries in Pig or Hive can be too slow for real-time data analysis. Impala, an ultra-speedy query engine from Cloudera, supercharges Hadoop by avoiding the typical Map-Reduce overhead and parallelizing queries so that they can run on multiple nodes. This is a big deal for big data, because with Impala, querying Hadoop takes seconds rather than minutes. Impala's dialect is close to standard SQL, and Impala seamlessly accesses HBase and HDFS (Hadoop Distributed File System), allowing considerable freedom in choice of data formats.

Impala in Action is a hands-on guide to querying Hadoop using Impala. It starts by comparing Impala to traditional databases and database services on Hadoop. Then it explains Impala's SQL dialect and the basics of data access. Next, it tackles data visualization tasks and provides techniques for securing Impala with Apache Sentry. The book also shows how to embed Impala queries in a Java client and how to connect to JDBC and ODBC clients. Advanced readers will appreciate the deep dive into Impala's architecture and the practical insights into the issues complicated configurations and complex queries can cause.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the Authors

Ricky Saltzer is a tools developer at Cloudera, where he writes scalable software to support custom operations engineers. He is an expert in Impala, Hive, and HBase Istvan Szegedi is a lead solution architect working for a large telecommunications company in the UK; he works regularly with a wide array of programming languages and enterprise data architectures.

Learn More


Ricky Saltzer is a tools developer at Cloudera, where he writes scalable software to support custom operations engineers. He is an expert in Impala, Hive, and HBase Istvan Szegedi is a lead solution architect working for a large telecommunications company in the UK; he works regularly with a wide array of programming languages and enterprise data architectures.

Learn More


Paul De Schacht is a Data Scientist for the leading provider of IT solutions to the global travel industry. He is at the heart of the travel intelligence platform where Hadoop and Impala are an essential part of daily operations. Paul has 15+ years of experience with distributed software architectures and has a passion for large data sets.<

Learn More

We have updated our privacy policy. Click here to read our full policy.