mafw.tools.pandas_tools
A collection of useful convenience functions for common pandas operations
Functions
|
Utility function to perform dataframe groupby and aggregation. |
|
Slice a data frame according to slicing_dict. |
- mafw.tools.pandas_tools.group_and_aggregate_data_frame(data_frame: DataFrame, grouping_columns: Iterable[str], aggregation_functions: Iterable[str | Callable[[Any], Any]]) DataFrame[source]
Utility function to perform dataframe groupby and aggregation.
This function is a simple wrapper to perform group by and aggregation operations on a dataframe. The user must provide a list of columns to perform the group by on and a list of functions for the aggregation of the other columns.
The output dataframe will have the aggregated columns renamed as originalname_aggregationfunction.
Note
Only numeric columns (and columns that can be aggregated) will be included in the aggregation. String columns that are not used for grouping will be automatically excluded from aggregation.
- Parameters:
data_frame (pandas.DataFrame) – The input data frame
grouping_columns (Iterable[str]) – The list of columns to group by on.
aggregation_functions (Iterable[str | Callable[[Any], Any]) – The list of functions to be used for the aggregation of the not grouped columns.
- Returns:
The aggregated dataframe after the groupby operation.
- Return type:
pandas.DataFrame
- mafw.tools.pandas_tools.slice_data_frame(input_data_frame: DataFrame, slicing_dict: MutableMapping[str, Any] | None = None, **kwargs: Any) DataFrame[source]
Slice a data frame according to slicing_dict.
The input data frame will be sliced using the items of the slicing_dict applying the loc operator in this way:
sliced = input_data_frame[(input_data_frame[key]==value)].If the slicing_dict is empty, then the full input_data_frame is returned.
Instead of the slicing_dict, the user can also provide key and value pairs as keyword arguments.
slice_data_frame(data_frame, {'A':14})is equivalent to
slice_data_frame(data_frame, A=14).If the user provides a keyword argument that also exists in the slicing_dict, then the keyword argument will update the slicing_dict.
No checks on the column name is done, should a label be missing, the loc method will raise a KeyError.
- Parameters:
input_data_frame (pd.DataFrame) – The data frame to be sliced.
slicing_dict (dict, Optional) – A dictionary with columns and values for the slicing. Defaults to None
kwargs – Keyword arguments to be used instead of the slicing dictionary.
- Returns:
The sliced dataframe
- Return type:
pd.DataFrame