mars.dataframe.read_sql#

mars.dataframe.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None, test_rows=5, chunk_size=None, engine_kwargs=None, incremental_index=True, partition_col=None, num_partitions=None, low_limit=None, high_limit=None)[source]#

Read SQL query or database table into a DataFrame.

This function is a convenience wrapper around read_sql_table and read_sql_query (for backward compatibility). It will delegate to the specific function depending on the provided input. A SQL query will be routed to read_sql_query, while a database table name will be routed to read_sql_table. Note that the delegated function might have more specific notes about their functionality not listed here.

Parameters

sql (str or SQLAlchemy Selectable (select or text object)) – SQL query to be executed or a table name.
con (SQLAlchemy connectable (engine/connection) or database str URI) –
or DBAPI2 connection (fallback mode)’

Using SQLAlchemy makes it possible to use any DB supported by that library. If a DBAPI2 object, only sqlite3 is supported. The user is responsible for engine disposal and connection closure for the SQLAlchemy connectable. See here
index_col (str or list of strings, optional, default: None) – Column(s) to set as index(MultiIndex).
coerce_float (bool, default True) – Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.
params (list, tuple or dict, optional, default: None) – List of parameters to pass to execute method. The syntax used to pass parameters is database driver dependent. Check your database driver documentation for which of the five syntax styles, described in PEP 249’s paramstyle, is supported. Eg. for psycopg2, uses %(name)s so use params={‘name’ : ‘value’}.
parse_dates (list or dict, default: None) –
- List of column names to parse as dates.
- Dict of {column_name: format string} where format string is strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.
- Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of pandas.to_datetime() Especially useful with databases without native Datetime support, such as SQLite.
columns (list, default: None) – List of column names to select from SQL table (only used when reading a table).
chunksize (int, default None) – If specified, return an iterator where chunksize is the number of rows to include in each chunk. Note that this argument is only kept for compatibility. If a non-none value passed, an error will be reported.
test_rows (int, default 5) – The number of rows to fetch for inferring dtypes.
chunk_size (: int or tuple of ints, optional) – Specifies chunk size for each dimension.
engine_kwargs (dict, default None) – Extra kwargs to pass to sqlalchemy.create_engine
incremental_index (bool, default True) – If index_col not specified, ensure range index incremental, gain a slightly better performance if setting False.
partition_col (str, default None) – Specify name of the column to split the result of the query. If specified, the range [low_limit, high_limit] will be divided into n_partitions chunks with equal lengths. We do not guarantee the sizes of chunks be equal. When the value is None, OFFSET and LIMIT clauses will be used to cut the result of the query.
num_partitions (int, default None) – The number of chunks to divide the result of the query into, when partition_col is specified.
low_limit (default None) – The lower bound of the range of column partition_col. If not specified, a query will be executed to query the minimum of the column.
high_limit (default None) – The higher bound of the range of column partition_col. If not specified, a query will be executed to query the maximum of the column.

Return type

DataFrame