Package org.apache.drill.yarn.appMaster
Class ClusterControllerImpl
java.lang.Object
org.apache.drill.yarn.appMaster.ClusterControllerImpl
- All Implemented Interfaces:
ClusterController,RegistryHandler
Controls the Drill cluster by representing the current cluster state with a
desired state, taking corrective action to keep the cluster in the desired
state. The cluster as a whole has a state, as do each task (node) within the
cluster.
This class is designed to allow unit tests. In general, testing the controller on a live cluster is tedious. This class encapsulates the controller algorithm so it can be driven by a simulated cluster.
This object is shared between threads, thus synchronized.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumController lifecycle state. -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected intMaximum number of retries for each task launch. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionbooleancancelTask(int id) Cancels the given task, reducing the target task count.voidcompletionAck(Task task, String propertyKey) voidcontainerAllocated(Task task) voidcontainerReleased(Task task) voidcontainersAllocated(List<org.apache.hadoop.yarn.api.records.Container> containers) The RM has allocated one or more containers in response to container requests submitted to the RM.voidcontainersCompleted(List<org.apache.hadoop.yarn.api.records.ContainerStatus> statuses) The Resource Manager reports that containers have completed with the given statuses.voidcontainerStarted(org.apache.hadoop.yarn.api.records.ContainerId containerId) The NM reports that a container has successfully started.voidcontainerStopped(org.apache.hadoop.yarn.api.records.ContainerId containerId) The Node Manager reports that a container has stopped.voidenableFailureCheck(boolean flag) voidfireLifecycleChange(TaskLifecycleListener.Event event, EventContext context) intGet the approximate number of free YARN nodes (those that can accept a task request.) Starts with the number of nodes from the node inventory, then subtracts any in-flight requests (which do not, by definition, have node allocated.)intgetPools()floatgetProperty(String key) getState()intintReturn the target number of tasks that the controller seeks to maintain.getYarn()booleanisLive()booleanisTaskLive(int id) voidvoidregisterScheduler(Scheduler scheduler) Define a task type.voidvoidreleaseHost(String hostName) voidreserveHost(String hostName) voidresizeDelta(int delta) Request to resize the Drill cluster by a relative amount.intresizeTo(int n) Request to resize the Drill cluster to the given size.voidsetMaxRetries(int value) voidsetProperty(String key, Object value) voidshutDown()Indicates a request to gracefully shut down the cluster.voidvoidstarted()Called when the caller has completed start-up and the controller should become live.voidstopTaskFailed(org.apache.hadoop.yarn.api.records.ContainerId containerId, Throwable t) The Node Manager API reports that a request sent to the NM to stop a task has failed.booleanbooleanWhether this distribution of YARN supports disk resources.voidvoidtaskGroupCompleted(SchedulerStateActions taskGroup) voidtaskRetried(Task task) voidtaskStartFailed(org.apache.hadoop.yarn.api.records.ContainerId containerId, Throwable t) The RM API reports that an attempt to start a container has failed locally.voidtick(long curTime) Called by the timer ("pulse") thread to trigger time-based events.voidGet an update from YARN on available resources.voidvisit(ControllerVisitor visitor) Allow an observer to see a consistent view of the controller's state by performing the visit in a synchronized block.voidvisitTasks(TaskVisitor visitor) Allow an observer to see a consistent view of the controller's task state by performing the visit in a synchronized block.booleanCalled by the main thread to wait for the normal shutdown of the controller.
-
Field Details
-
maxRetries
protected int maxRetriesMaximum number of retries for each task launch.
-
-
Constructor Details
-
ClusterControllerImpl
-
-
Method Details
-
enableFailureCheck
public void enableFailureCheck(boolean flag) - Specified by:
enableFailureCheckin interfaceClusterController
-
registerScheduler
Define a task type. Registration order is important: the controller starts task in the order that they are registered. Must happen before the YARN callbacks start.- Specified by:
registerSchedulerin interfaceClusterController- Parameters:
scheduler-
-
started
Called when the caller has completed start-up and the controller should become live.- Specified by:
startedin interfaceClusterController- Throws:
YarnFacadeExceptionAMException
-
tick
public void tick(long curTime) Description copied from interface:ClusterControllerCalled by the timer ("pulse") thread to trigger time-based events.- Specified by:
tickin interfaceClusterController- Parameters:
curTime-
-
getFreeNodeCount
public int getFreeNodeCount()Get the approximate number of free YARN nodes (those that can accept a task request.) Starts with the number of nodes from the node inventory, then subtracts any in-flight requests (which do not, by definition, have node allocated.)This approximation does not consider whether the node has sufficient resources to run a task; only whether the node itself exists.
- Specified by:
getFreeNodeCountin interfaceClusterController- Returns:
- The approximate number of free YARN nodes.
-
updateRMStatus
public void updateRMStatus()Get an update from YARN on available resources.- Specified by:
updateRMStatusin interfaceClusterController
-
containersAllocated
Description copied from interface:ClusterControllerThe RM has allocated one or more containers in response to container requests submitted to the RM.- Specified by:
containersAllocatedin interfaceClusterController- Parameters:
containers- the set of containers provided by YARN
-
containerStarted
public void containerStarted(org.apache.hadoop.yarn.api.records.ContainerId containerId) Description copied from interface:ClusterControllerThe NM reports that a container has successfully started.- Specified by:
containerStartedin interfaceClusterController- Parameters:
containerId- the container which started
-
taskStartFailed
public void taskStartFailed(org.apache.hadoop.yarn.api.records.ContainerId containerId, Throwable t) Description copied from interface:ClusterControllerThe RM API reports that an attempt to start a container has failed locally.- Specified by:
taskStartFailedin interfaceClusterController- Parameters:
containerId- the container that failed to launcht- the error that occurred
-
containerStopped
public void containerStopped(org.apache.hadoop.yarn.api.records.ContainerId containerId) Description copied from interface:ClusterControllerThe Node Manager reports that a container has stopped.- Specified by:
containerStoppedin interfaceClusterController- Parameters:
containerId-
-
containersCompleted
Description copied from interface:ClusterControllerThe Resource Manager reports that containers have completed with the given statuses. Find the task for each container and mark them as completed.- Specified by:
containersCompletedin interfaceClusterController- Parameters:
statuses-
-
getProgress
public float getProgress()- Specified by:
getProgressin interfaceClusterController
-
stopTaskFailed
Description copied from interface:ClusterControllerThe Node Manager API reports that a request sent to the NM to stop a task has failed.- Specified by:
stopTaskFailedin interfaceClusterController- Parameters:
containerId- the container that failed to stopt- the reason that the stop request failed
-
resizeDelta
public void resizeDelta(int delta) Description copied from interface:ClusterControllerRequest to resize the Drill cluster by a relative amount.- Specified by:
resizeDeltain interfaceClusterController- Parameters:
delta- the amount of change. Can be positive (to grow) or negative (to shrink the cluster)
-
resizeTo
public int resizeTo(int n) Description copied from interface:ClusterControllerRequest to resize the Drill cluster to the given size.- Specified by:
resizeToin interfaceClusterController- Parameters:
n- the desired cluster size
-
shutDown
public void shutDown()Description copied from interface:ClusterControllerIndicates a request to gracefully shut down the cluster.- Specified by:
shutDownin interfaceClusterController
-
waitForCompletion
public boolean waitForCompletion()Description copied from interface:ClusterControllerCalled by the main thread to wait for the normal shutdown of the controller. Such shutdown occurs when the admin sends a sutdown command from the UI or REST API.- Specified by:
waitForCompletionin interfaceClusterController
-
isLive
public boolean isLive() -
succeeded
public boolean succeeded() -
containerAllocated
-
getYarn
-
containerReleased
-
taskEnded
-
taskRetried
-
taskGroupCompleted
-
getMaxRetries
public int getMaxRetries() -
getStopTimeoutMs
public int getStopTimeoutMs() -
reserveHost
- Specified by:
reserveHostin interfaceRegistryHandler
-
releaseHost
- Specified by:
releaseHostin interfaceRegistryHandler
-
getNodeInventory
-
setProperty
- Specified by:
setPropertyin interfaceClusterController
-
getProperty
- Specified by:
getPropertyin interfaceClusterController
-
registerLifecycleListener
- Specified by:
registerLifecycleListenerin interfaceClusterController
-
fireLifecycleChange
-
setMaxRetries
public void setMaxRetries(int value) - Specified by:
setMaxRetriesin interfaceClusterController
-
getTargetCount
public int getTargetCount()Description copied from interface:ClusterControllerReturn the target number of tasks that the controller seeks to maintain. This is the sum across all pools.- Specified by:
getTargetCountin interfaceClusterController
-
getState
-
visit
Description copied from interface:ClusterControllerAllow an observer to see a consistent view of the controller's state by performing the visit in a synchronized block.- Specified by:
visitin interfaceClusterController- Parameters:
visitor-
-
getPools
-
visitTasks
Description copied from interface:ClusterControllerAllow an observer to see a consistent view of the controller's task state by performing the visit in a synchronized block.- Specified by:
visitTasksin interfaceClusterController- Parameters:
visitor-
-
getHistory
-
isTaskLive
public boolean isTaskLive(int id) - Specified by:
isTaskLivein interfaceClusterController
-
cancelTask
public boolean cancelTask(int id) Description copied from interface:ClusterControllerCancels the given task, reducing the target task count. Called from the UI to allow the user to select the specific task to end when reducing cluster size.- Specified by:
cancelTaskin interfaceClusterController- Parameters:
id-
-
completionAck
- Specified by:
completionAckin interfaceRegistryHandler
-
startAck
- Specified by:
startAckin interfaceRegistryHandler
-
supportsDiskResource
public boolean supportsDiskResource()Description copied from interface:ClusterControllerWhether this distribution of YARN supports disk resources.- Specified by:
supportsDiskResourcein interfaceClusterController- Returns:
- True if this distribution of YARN supports disk resources. False otherwise.
-
registryDown
public void registryDown()- Specified by:
registryDownin interfaceRegistryHandler
-