Mars on GPU#

Mars can run on NVIDIA GPUs. However, extra requirements are necessary for different modules.

Installation#

For Mars tensors, CuPy is required. Assuming that your CUDA driver is 10.1, install cupy via:

pip install cupy-cuda101

Refer to install cupy for more information.

For Mars DataFrame, RAPIDS cuDF is required. Install cuDF via conda:

conda install -c rapidsai -c nvidia -c conda-forge \
 -c defaults cudf=0.13 python=3.7 cudatoolkit=10.1

Refer to install cuDF for more information.

Mars tensor on CUDA#

Tensor can be created on GPU via specifying gpu=True. Methods included are mentioned in tensor creation and random data.

>>> import mars.tensor as mt
>>> a = mt.random.rand(10, 10, gpu=True)  # indicate to create tensor on CUDA
>>> a.sum().execute()                     # execution will happen on CUDA

Remember that when creating tensors, no GPU memory allocation happens yet. When .execute() is triggered, real memory allocation and computation on GPU will happen then.

For a tensor on host memory, call .to_gpu() to tell Mars to move data to GPU.

>>> b = mt.random.rand(10, 10)  # indicate to create on main memory
>>> b = b.to_gpu()              # indicate to move data to GPU memory
>>> b.sum().execute()

Call .to_cpu() to tell Mars to move data to host memory.

>>> c = b.to_cpu()     # b is allocated on GPU, move back to main memory
>>> c.sum().execute()  # execution will happen on CPU

Mars DataFrame on CUDA#

Mars can read CSV files into GPU directly.

>>> import mars.dataframe as md
>>> df = md.read_csv('data.csv', gpu=True)  # indicates to read csv into GPU memory
>>> df.groupby('a').sum().execute()         # execution will happen on GPU

For a DataFrame that on host memory, call .to_gpu() to tell Mars to move data to GPU.

>>> import mars.tensor as mt
>>> import mars.dataframe as md
>>> df = md.DataFrame(mt.random.rand(10, 10))  # indicate to create on main memory
>>> df = df.to_gpu()                            # indicate to move data to GPU memory

Call .to_cpu() to tell Mars to move data to host memory.

>>> df2 = df.to_cpu()     # df is allocated on GPU, move back to main memory
>>> df2.sum().execute()     # execution will happen on CPU

Multiple GPU#

For Mars tensor and DataFrame, multiple GPUs on a single machine can be utilized.

>>> import mars.tensor as mt
>>> t = mt.random.rand(10000, 10000, gpu=True)
>>> t.sum().execute()

The code above will try to leverage all the visible GPU cards to perform computation.

If you want to limit computation to some GPU cards, you can set environment variable CUDA_VISIBLE_DEVICES.

CUDA_VISIBLE_DEVICES=0,3,5 ipython

This will limit the ipython to GPU 0, 3 and 5 only. Thus all the Mars tensor executed in the ipython will run on the visible GPUs only.

Distributed#

For Mars supervisor, the command to start is the same. Refer to Run on Clusters.

For Mars worker, one worker can be bind to one or multiple GPUs.

Basic command to start a worker that binds to some GPU is:

mars-worker -H <worker_ip> -p <worker_port> -s <supervisor_ip>:<supervisor_port> --cuda-devices 0,1,2

The worker started will be bind to GPU 0, 1 and 2.

Refer to extra arguments for starting worker for more information.

Once a Mars cluster is started, you can run the code below.

>>> import mars
>>> import mars.tensor as mt
>>> new_session('http://<web_ip>:<web_port>')
>>> t = mt.random.rand(20, 20, gpu=True)
>>> t.sum().execute()  # run on workers which are bind to GPU