mars.learn.contrib.pytorch.run_pytorch_script#

mars.learn.contrib.pytorch.run_pytorch_script(script: Union[bytes, str, BinaryIO, TextIO], n_workers: int, data: Optional[Dict[str, TileableType]] = None, gpu: Optional[bool] = None, command_argv: Optional[List[str]] = None, retry_when_fail: bool = False, session: Optional[SessionType] = None, run_kwargs: Optional[Dict[str, Any]] = None, port: Optional[int] = None)[source]#

Run PyTorch script in Mars cluster.

Parameters
  • script (str or file-like object) – Script to run

  • n_workers (int) – Number of PyTorch workers

  • data (dict) – Variable name to data.

  • gpu (bool) – Run PyTorch script on GPU

  • command_argv (list) – Extra command args for script

  • retry_when_fail (bool) – If True, retry when function failed.

  • session – Mars session, if not provided, will use default one.

  • run_kwargs (dict) – Extra kwargs for session.run.

  • port (int) – Port of PyTorch worker or ps, will automatically increase for the same worker

Returns

return {‘status’: ‘ok’} if succeeded, or error raised

Return type

status