DataFrame.
stack
Stack the prescribed level(s) from columns to index.
Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame. The new inner-most levels are created by pivoting the columns of the current dataframe:
if the columns have a single level, the output is a Series; if the columns have multiple levels, the new index level(s) is (are) taken from the prescribed level(s) and the output is a DataFrame.
if the columns have a single level, the output is a Series;
if the columns have multiple levels, the new index level(s) is (are) taken from the prescribed level(s) and the output is a DataFrame.
level (int, str, list, default -1) – Level(s) to stack from the column axis onto the index axis, defined as one index or label, or a list of indices or labels.
dropna (bool, default True) – Whether to drop rows in the resulting Frame/Series with missing values. Stacking a column level onto the index axis can create combinations of index and column values that are missing from the original dataframe. See Examples section.
Stacked dataframe or series.
DataFrame or Series
See also
DataFrame.unstack
Unstack prescribed level(s) from index axis onto column axis.
DataFrame.pivot
Reshape dataframe from long format to wide format.
DataFrame.pivot_table
Create a spreadsheet-style pivot table as a DataFrame.
Notes
The function is named by analogy with a collection of books being reorganized from being side by side on a horizontal position (the columns of the dataframe) to being stacked vertically on top of each other (in the index of the dataframe).
Examples
Single level columns
>>> import mars.dataframe as md >>> df_single_level_cols = md.DataFrame([[0, 1], [2, 3]], ... index=['cat', 'dog'], ... columns=['weight', 'height'])
Stacking a dataframe with a single level column axis returns a Series:
>>> df_single_level_cols.execute() weight height cat 0 1 dog 2 3 >>> df_single_level_cols.stack().execute() cat weight 0 height 1 dog weight 2 height 3 dtype: int64
Multi level columns: simple case
>>> multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'), ... ('weight', 'pounds')]) >>> df_multi_level_cols1 = md.DataFrame([[1, 2], [2, 4]], ... index=['cat', 'dog'], ... columns=multicol1)
Stacking a dataframe with a multi-level column axis:
>>> df_multi_level_cols1.execute() weight kg pounds cat 1 2 dog 2 4 >>> df_multi_level_cols1.stack().execute() weight cat kg 1 pounds 2 dog kg 2 pounds 4
Missing values
>>> multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'), ... ('height', 'm')]) >>> df_multi_level_cols2 = md.DataFrame([[1.0, 2.0], [3.0, 4.0]], ... index=['cat', 'dog'], ... columns=multicol2)
It is common to have missing values when stacking a dataframe with multi-level columns, as the stacked dataframe typically has more values than the original dataframe. Missing values are filled with NaNs:
>>> df_multi_level_cols2.execute() weight height kg m cat 1.0 2.0 dog 3.0 4.0 >>> df_multi_level_cols2.stack().execute() height weight cat kg NaN 1.0 m 2.0 NaN dog kg NaN 3.0 m 4.0 NaN
Prescribing the level(s) to be stacked
The first parameter controls which level or levels are stacked:
>>> df_multi_level_cols2.stack(0).execute() kg m cat height NaN 2.0 weight 1.0 NaN dog height NaN 4.0 weight 3.0 NaN >>> df_multi_level_cols2.stack([0, 1]).execute() cat height m 2.0 weight kg 1.0 dog height m 4.0 weight kg 3.0 dtype: float64
Dropping missing values
>>> df_multi_level_cols3 = md.DataFrame([[None, 1.0], [2.0, 3.0]], ... index=['cat', 'dog'], ... columns=multicol2)
Note that rows where all values are missing are dropped by default but this behaviour can be controlled via the dropna keyword parameter:
>>> df_multi_level_cols3.execute() weight height kg m cat NaN 1.0 dog 2.0 3.0 >>> df_multi_level_cols3.stack(dropna=False).execute() height weight cat kg NaN NaN m 1.0 NaN dog kg NaN 2.0 m 3.0 NaN >>> df_multi_level_cols3.stack(dropna=True).execute() height weight cat m 1.0 NaN dog kg NaN 2.0 m 3.0 NaN