Hands-On Presto Tutorial: Presto 101

In this blog we'll show you how to get started with Presto, the open source SQL query engine for the data lake. By the end you'll be able to run Presto locally on your machine.

Presto Installation

Presto can be installed manually or using docker images on:

Manual Installing Presto

Download the Presto server tarball, presto-server-0.253.1.tar.gz and unpack it. The tarball will contain a single top-level directory, presto-server-0.253.1 which we will call the installation directory.

Run the commands below to install the official tarballs for presto-server and presto-cli from prestodb.io

[root@prestodb_c01 ~]# curl -O https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.235.1/presto-server-0.235.1.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed100 721M 100 721M 0 0 72.9M 0 0:00:09 0:00:09 --:--:-- 111M[root@prestodb_c01 ~]# curl -O https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.235.1/presto-cli-0.235.1-executable.jar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                 Dload  Upload   Total   Spent    Left  Speed100 12.7M  100 12.7M    0     0  21.9M      0 --:--:-- --:--:-- --:--:-- 21.9M

Data Directory

Presto needs a data directory for storing logs, etc. We recommend creating a data directory outside of the installation directory, which allows it to be easily preserved when upgrading Presto.

[root@prestodb_c01 ~]# mkdir -p /var/presto/data

Configuration Settings

Create an etc directory inside the installation directory. This will hold the following configuration:

[root@prestodb_c01 ~]# mkdir etc

Node Properties

The node properties file, etc/node.properties contains configuration specific to each node. A node is a single installed instance of Presto on a machine. This file is typically created by the deployment system when Presto is first installed. The following is a minimal etc/node.properties:

[root@prestodb_c01 ~]# cat etc/node.propertiesnode.environment=productionnode.id=ffffffff-ffff-ffff-ffff-ffffffffffffnode.data-dir=/var/presto/data

The above properties are described below:

JVM configuration

The JVM config file, etc/jvm.config, contains a list of command-line options used for launching the Java Virtual Machine. The format of the file is a list of options, one per line. These options are not interpreted by the shell, so options containing spaces or other special characters should not be quoted.

The following provides a good starting point for creating etc/jvm.config:

[root@prestodb_c01 ~]# cat etc/jvm.config
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError

Because an OutOfMemoryError will typically leave the JVM in an inconsistent state, we write a heap dump (for debugging) and forcibly terminate the process when this occurs.

Config Properties

The config properties file, etc/config.properties, contains the configuration for the Presto server. Every Presto server can function as both a coordinator and a worker, but dedicating a single machine to only perform coordination work provides the best performance on larger clusters.

In order to set up a single machine for testing that will function as both a coordinator and worker, then set the below parameters to true in etc/config.properties

[root@singlenode01 ~]# cat etc/config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://example.net:8080

You may also wish to set the following properties:

Log Levels

The optional log levels file, etc/log.properties allows setting the minimum log level for named logger hierarchies. Every logger has a name, which is typically the fully qualified name of the class that uses the logger. 

[root@coordinator01 ~]# cat  etc/log.properties
com.facebook.presto=INFO

There are four levels: DEBUG, INFO, WARN and ERROR.

Catalog Properties

Presto accesses data via connectors, which are mounted in catalogs. The connector provides all of the schemas and tables inside of the catalog. 

Catalogs are registered by creating a catalog properties file in the etc/catalog directory. For example, create etc/catalog/jmx.properties with the following contents to mount the jmx connector as the jmx catalog

[root@coordinator01 ~]# mkdir etc/catalog
[root@coordinator01 ~]# echo "connector.name=jmx" >>
etc/catalog/jmx.properties

Running Presto

The installation directory contains the launcher script in bin/launcher. Presto can be started as a daemon by running the following:

[root@hsrhvm01 presto-server-0.235.1]# bin/launcher start
Started as 23378

After launching, you can find the log files in var/log:

 

 

 

 

Top