mars.dataframe.
read_sql_table
Read SQL database table into a DataFrame.
Given a table name and a SQLAlchemy connectable, returns a DataFrame. This function does not support DBAPI connections.
table_name (str) – Name of SQL table in database.
con (SQLAlchemy connectable or str) – A database URI could be provided as as str. SQLite DBAPI connection mode not supported.
schema (str, default None) – Name of SQL schema in database to query (if database flavor supports this). Uses default schema if None (default).
index_col (str or list of str, optional, default: None) – Column(s) to set as index(MultiIndex).
coerce_float (bool, default True) – Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point. Can result in loss of Precision.
parse_dates (list or dict, default None) –
List of column names to parse as dates.
Dict of {column_name: format string} where format string is strftime compatible in case of parsing string times or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.
{column_name: format string}
Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of pandas.to_datetime() Especially useful with databases without native Datetime support, such as SQLite.
{column_name: arg dict}
pandas.to_datetime()
columns (list, default None) – List of column names to select from SQL table.
chunksize (int, default None) – If specified, returns an iterator where chunksize is the number of rows to include in each chunk. Note that this argument is only kept for compatibility. If a non-none value passed, an error will be reported.
test_rows (int, default 5) – The number of rows to fetch for inferring dtypes.
chunk_size (: int or tuple of ints, optional) – Specifies chunk size for each dimension.
engine_kwargs (dict, default None) – Extra kwargs to pass to sqlalchemy.create_engine
incremental_index (bool, default False) – Create a new RangeIndex if csv doesn’t contain index columns.
use_arrow_dtype (bool, default None) – If True, use arrow dtype to store columns.
partition_col (str, default None) – Specify name of the column to split the result of the query. If specified, the range [low_limit, high_limit] will be divided into n_partitions chunks with equal lengths. We do not guarantee the sizes of chunks be equal. When the value is None, OFFSET and LIMIT clauses will be used to cut the result of the query.
[low_limit, high_limit]
n_partitions
OFFSET
LIMIT
num_partitions (int, default None) – The number of chunks to divide the result of the query into, when partition_col is specified.
partition_col
low_limit (default None) – The lower bound of the range of column partition_col. If not specified, a query will be executed to query the minimum of the column.
high_limit (default None) – The higher bound of the range of column partition_col. If not specified, a query will be executed to query the maximum of the column.
A SQL table is returned as two-dimensional data structure with labeled axes.
DataFrame
See also
read_sql_query
Read SQL query into a DataFrame.
read_sql
Read SQL query or database table into a DataFrame.
Notes
Any datetime values with time zone information will be converted to UTC.
Examples
>>> import mars.dataframe as md >>> md.read_sql_table('table_name', 'postgres:///db_name')