We have a presto cluster with 160 worker nodes.
- presto coordinator installed on VM machine ( 32G + 16 CPU ) and all other are worker machines ( physical machines )
the basic question is – – if the machine with the presto coordinator can serve 160 workers machines
additional to that:
How we do the sizing for presto coordinator (memory,cpu)?
What is the best practice sizing formula for the coordinator machine?
Dose presto coordinator can handle and manage 160 workers machines ? ,
- Dose Presto coordinator machine is limited to manage until X workers machines ?
reverence – http://prestodb.github.io/docs/current/overview/concepts.html
Coordinator The Presto coordinator is the server that is responsible for parsing statements, planning queries, and managing Presto worker nodes. It is the “brain” of a Presto installation and is also the node to which a client connects to submit statements for execution. Every Presto installation must have a Presto coordinator alongside one or more Presto workers. For development or testing purposes, a single instance of Presto can be configured to perform both roles.
The coordinator keeps track of the activity on each worker and coordinates the execution of a query. The coordinator creates a logical model of a query involving a series of stages which is then translated into a series of connected tasks running on a cluster of Presto workers.
Coordinators communicate with workers and clients using a REST API.
Worker A Presto worker is a server in a Presto installation which is responsible for executing tasks and processing data. Worker nodes fetch data from connectors and exchange intermediate data with each other. The coordinator is responsible for fetching results from the workers and returning the final results to the client.
When a Presto worker process starts up, it advertises itself to the discovery server in the coordinator, which makes it available to the Presto coordinator for task execution.
Workers communicate with other workers and Presto coordinators using a REST API.
Advertisement
Answer
TL;DR in general, coordinator can easily handle much more worker nodes than 160, but your milage may vary.
Longer version Presto coordinator can manage 1000 workers. However, you’re asking for a coordinator with particular memory and CPU resources. The answer is — it depends.
The coordinator tracks tasks execution across workers, and so memory limits depend on the complexity of your queries. Also, when you’re querying partitioned tables (e.g. in S3, Hive), some information about the partitions accessed by the query needs, by necessity, be kept in the memory. When you have multiple queries, this adds up.