mars.dataframe.read_parquet¶
- mars.dataframe.read_parquet(path, engine: str = 'auto', columns=None, groups_as_chunks=False, use_arrow_dtype=None, incremental_index=False, storage_options=None, memory_scale=None, **kwargs)[source]¶
Load a parquet object from the file path, returning a DataFrame.
- Parameters
path (str, path object or file-like object) – Any valid string path is acceptable. The string could be a URL. For file URLs, a host is expected. A local file could be:
file://localhost/path/to/table.parquet
. A file URL can also be a path to a directory that contains multiple partitioned parquet files. Both pyarrow and fastparquet support paths to directories as well as file URLs. A directory path could be:file://localhost/path/to/tables
. By file-like object, we refer to objects with aread()
method, such as a file handler (e.g. via builtinopen
function) orStringIO
.engine ({'auto', 'pyarrow', 'fastparquet'}, default 'auto') – Parquet library to use. The default behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.
columns (list, default=None) – If not None, only these columns will be read from the file.
groups_as_chunks (bool, default False) – if True, each row group correspond to a chunk. if False, each file correspond to a chunk. Only available for ‘pyarrow’ engine.
incremental_index (bool, default False) – If index_col not specified, ensure range index incremental, gain a slightly better performance if setting False.
use_arrow_dtype (bool, default None) – If True, use arrow dtype to store columns.
storage_options (dict, optional) – Options for storage connection.
memory_scale (int, optional) – Scale that real memory occupation divided with raw file size.
**kwargs – Any additional kwargs are passed to the engine.
- Returns
- Return type
Mars DataFrame