.. _deploy: Run on Clusters =============== Basic Steps ----------- Mars can be deployed on a cluster. First, you need to run .. code-block:: bash pip install pymars on every node in the cluster. This will install dependencies needed for distributed execution on your cluster. After that, you may select a node as supervisor which also integrated web service, leaving other nodes as workers. The supervisor can be started with the following command: .. code-block:: bash mars-supervisor -H -p -w Web service will be started as well. Workers can be started with the following command: .. code-block:: bash mars-worker -H -p -s : After all Mars processes are started, you can open a Python console and run .. code-block:: python import mars import mars.tensor as mt import mars.dataframe as md # create a default session that connects to the cluster mars.new_session('http://:') a = mt.random.rand(2000, 2000, chunk_size=200) b = mt.inner(a, a) b.execute() # submit tensor to cluster df = md.DataFrame(a).sum() df.execute() # submit DataFrame to cluster You can open a web browser and type ``http://:`` to open Mars UI to look up resource usage of workers and execution progress of the task submitted just now. Using Command Lines ------------------- When running Mars with command line, you can specify arguments to control the behavior of Mars processes. All Mars services have common arguments listed below. +------------------+----------------------------------------------------------------+ | Argument | Description | +==================+================================================================+ | ``-H`` | Service IP binding, ``0.0.0.0`` by default | +------------------+----------------------------------------------------------------+ | ``-p`` | Port of the service. If absent, a randomized port will be used | +------------------+----------------------------------------------------------------+ | ``-f`` | Path to service configuration file. Absent when use default | | | configuration. | +------------------+----------------------------------------------------------------+ | ``-s`` | List of supervisor endpoints, separated by commas. Useful for | | | workers and webs to spot supervisors, or when you want to run | | | more than one supervisor | +------------------+----------------------------------------------------------------+ | ``--log-level`` | Log level, can be ``debug``, ``info``, ``warning``, ``error`` | +------------------+----------------------------------------------------------------+ | ``--log-format`` | Log format, can be Python logging format | +------------------+----------------------------------------------------------------+ | ``--log-conf`` | Python logging configuration file, ``logging.conf`` by default | +------------------+----------------------------------------------------------------+ | ``--use-uvloop`` | Whether to use ``uvloop`` to accelerate, ``auto`` by default | +------------------+----------------------------------------------------------------+ Extra arguments for supervisors are listed below. +------------------+----------------------------------------------------------------+ | Argument | Description | +==================+================================================================+ | ``-w`` | Port of web service in supervisor | +------------------+----------------------------------------------------------------+ Extra arguments for workers are listed below. Details about memory tuning can be found at the next section. .. _deploy_extra_arguments: +--------------------+----------------------------------------------------------------+ | Argument | Description | +====================+================================================================+ | ``--n-cpu`` | Number of CPU cores to use. If absent, the value will be | | | the available number of cores | +--------------------+----------------------------------------------------------------+ | ``--n-io-process`` | Number of IO processes for network operations. 1 by default | +--------------------+----------------------------------------------------------------+ | ``--cuda-devices`` | Index of CUDA devices to use. If not specified, all devices | | | will be used. Specifying an empty string will ignore all | | | devices | +--------------------+----------------------------------------------------------------+ For instance, if you want to start a Mars cluster with two supervisors and two workers, you can run commands below (memory and CPU tunings are omitted): On Supervisor 1 (192.168.1.10): .. code-block:: bash mars-supervisor -H 192.168.1.10 -p 7001 -w 7005 -s 192.168.1.10:7001,192.168.1.11:7002 On Supervisor 2 (192.168.1.11): .. code-block:: bash mars-supervisor -H 192.168.1.11 -p 7002 -s 192.168.1.10:7001,192.168.1.11:7002 On Worker 1 (192.168.1.20): .. code-block:: bash mars-worker -H 192.168.1.20 -p 7003 -s 192.168.1.10:7001,192.168.1.11:7002 On Worker 2 (192.168.1.21): .. code-block:: bash mars-worker -H 192.168.1.21 -p 7004 -s 192.168.1.10:7001,192.168.1.11:7002