mafw.tools.pandas_tools

A collection of useful convenience functions for common pandas operations

Functions

group_and_aggregate_data_frame(data_frame, ...)

Utility function to perform dataframe groupby and aggregation.

slice_data_frame(input_data_frame[, ...])

Slice a data frame according to slicing_dict.

mafw.tools.pandas_tools.group_and_aggregate_data_frame(data_frame: DataFrame, grouping_columns: Iterable[str], aggregation_functions: Iterable[str | Callable[[Any], Any]]) DataFrame[source]

Utility function to perform dataframe groupby and aggregation.

This function is a simple wrapper to perform group by and aggregation operations on a dataframe. The user must provide a list of columns to perform the group by on and a list of functions for the aggregation of the other columns.

The output dataframe will have the aggregated columns renamed as originalname_aggregationfunction.

Note

Only numeric columns (and columns that can be aggregated) will be included in the aggregation. String columns that are not used for grouping will be automatically excluded from aggregation.

Parameters:
  • data_frame (pandas.DataFrame) – The input data frame

  • grouping_columns (Iterable[str]) – The list of columns to group by on.

  • aggregation_functions (Iterable[str | Callable[[Any], Any]) – The list of functions to be used for the aggregation of the not grouped columns.

Returns:

The aggregated dataframe after the groupby operation.

Return type:

pandas.DataFrame

mafw.tools.pandas_tools.slice_data_frame(input_data_frame: DataFrame, slicing_dict: MutableMapping[str, Any] | None = None, **kwargs: Any) DataFrame[source]

Slice a data frame according to slicing_dict.

The input data frame will be sliced using the items of the slicing_dict applying the loc operator in this way: sliced = input_data_frame[(input_data_frame[key]==value)].

If the slicing_dict is empty, then the full input_data_frame is returned.

Instead of the slicing_dict, the user can also provide key and value pairs as keyword arguments.

slice_data_frame(data_frame, {'A':14})

is equivalent to

slice_data_frame(data_frame, A=14).

If the user provides a keyword argument that also exists in the slicing_dict, then the keyword argument will update the slicing_dict.

No checks on the column name is done, should a label be missing, the loc method will raise a KeyError.

Parameters:
  • input_data_frame (pd.DataFrame) – The data frame to be sliced.

  • slicing_dict (dict, Optional) – A dictionary with columns and values for the slicing. Defaults to None

  • kwargs – Keyword arguments to be used instead of the slicing dictionary.

Returns:

The sliced dataframe

Return type:

pd.DataFrame