mafw.processor_library.sns_plotter

Module implements a Seaborn plotter processor with a mixin structure to generate seaborn plots.

This module implements the abstract_plotter functionalities using seaborn and pandas.

These two packages are not installed in the default installation of MAFw, unless the user decided to include the optional feature seaborn.

Along with the SNSPlotter, it includes a set of standard data retriever specific for pandas data frames.

Classes

CatPlot([x, y, hue, row, col, palette, ...])

The categorical plot mixin.

DisPlot([x, y, hue, row, col, palette, ...])

The distribution plot mixin.

FromDatasetDataRetriever([dataset_name])

A data retriever to get a dataframe from a seaborn dataset

HDFPdDataRetriever([hdf_filename, key])

Retrieve a data frame from a HDF file

LMPlot([x, y, hue, row, col, palette, ...])

The linear regression model plot mixin.

PdDataRetriever(*args, **kwargs)

The dataframe instance.

RelPlot([x, y, hue, row, col, palette, ...])

The relational plot mixin.

SNSFigurePlotter(*args, **kwargs)

Base mixin class to generate a seaborn Figure level plot

SNSPlotter(*args, **kwargs)

The Generic Plotter processor.

SQLPdDataRetriever([table_name, ...])

A specialized data retriever to get a data frame from a database table.

class mafw.processor_library.sns_plotter.CatPlot(x: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, y: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, hue: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, row: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, col: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, palette: str | Sequence[tuple[float, float, float] | str | tuple[float, float, float, float] | tuple[tuple[float, float, float] | str, float] | tuple[tuple[float, float, float, float], float]] | Mapping[Any, tuple[float, float, float] | str | tuple[float, float, float, float] | tuple[tuple[float, float, float] | str, float] | tuple[tuple[float, float, float, float], float]] | None = None, kind: Literal['strip', 'swarm', 'box', 'violin', 'boxen', 'point', 'bar', 'count'] = 'strip', legend: Literal['auto', 'brief', 'full'] | bool = 'auto', native_scale: bool = False, plot_kws: Mapping[str, Any] | None = None, facet_kws: dict[str, Any] | None = None, *args: Any, **kwargs: Any)[source]

Bases: SNSFigurePlotter

The categorical plot mixin.

This mixin will produce a figure level categorical plot as described here.

Constructor parameters:

Parameters:
  • x (str | Iterable, Optional) – The name of the x variable or an iterable containing the x values.

  • y (str | Iterable, Optional) – The name of the y variable or an iterable containing the y values.

  • hue (str | Iterable, Optional) – The name of the hue variable or an iterable containing the hue values.

  • row (str | Iterable, Optional) – The name of the row category or an iterable containing the row values.

  • col (str | Iterable, Optional) – The name of the column category or an iterable containing the column values.

  • palette (str, Optional) – The colour palette to be used.

  • kind (str, Optional) – The type of relational plot (scatter or line). Defaults to scatter.

  • legend (str | bool, Optional) – How to draw the legend. If “brief”, numeric hue and size variables will be represented with a sample of evenly spaced values. If “full”, every group will get an entry in the legend. If “auto”, choose between brief or full representation based on number of levels. If False, no legend data is added and no legend is drawn. Defaults to auto.

  • native_scale (bool, Optional) – When True, numeric or datetime values on the categorical axis will maintain their original scaling rather than being converted to fixed indices. Defaults to False.

  • plot_kws (dict[str, Any], Optional) – A dictionary like list of keywords passed to the underlying seaborn.catplot.

  • facet_kws (dict[str, Any], Optional) – A dictionary like list of keywords passed to the underlying seaborn.FacetGrid

plot() None[source]

Implements the plot method of a figure-level categorical graph.

class mafw.processor_library.sns_plotter.DisPlot(x: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, y: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, hue: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, row: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, col: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, palette: str | Sequence[tuple[float, float, float] | str | tuple[float, float, float, float] | tuple[tuple[float, float, float] | str, float] | tuple[tuple[float, float, float, float], float]] | Mapping[Any, tuple[float, float, float] | str | tuple[float, float, float, float] | tuple[tuple[float, float, float] | str, float] | tuple[tuple[float, float, float, float], float]] | Colormap | None = None, kind: Literal['hist', 'kde', 'ecdf'] = 'hist', legend: bool = True, rug: bool = False, rug_kws: dict[str, Any] | None = None, plot_kws: Mapping[str, Any] | None = None, facet_kws: dict[str, Any] | None = None, *args: Any, **kwargs: Any)[source]

Bases: SNSFigurePlotter

The distribution plot mixin.

This mixin is the MAFw implementation of the seaborn displot and will produce one of the following figure level plots:

Constructor parameters:

Parameters:
  • x (str | Iterable, Optional) – The name of the x variable or an iterable containing the x values.

  • y (str | Iterable, Optional) – The name of the y variable or an iterable containing the y values.

  • hue (str | Iterable, Optional) – The name of the hue variable or an iterable containing the hue values.

  • row (str | Iterable, Optional) – The name of the row category or an iterable containing the row values.

  • col (str | Iterable, Optional) – The name of the column category or an iterable containing the column values.

  • palette (str | Colormap, Optional) – The colour palette to be used.

  • kind (str, Optional) – The type of distribution plot (hist, kde or ecdf). Defaults to hist.

  • legend (bool, Optional) – If false, suppress the legend for the semantic variables. Defaults to True.

  • rug (bool, Optional) – If true, show each observation with marginal ticks. Defaults to False.

  • rug_kws (Mapping[str, Any], Optional) – Parameters to control the appearance of the rug plot.

  • plot_kws (Mapping[str, Any], Optional) – Parameters passed to the underlying plotting object.

  • facet_kws (Mapping[str, Any], Optional) – Parameters passed to the facet grid object.

plot() None[source]

Implements the plot method for a figure-level distribution graph

class mafw.processor_library.sns_plotter.FromDatasetDataRetriever(dataset_name: str | None = None, *args: Any, **kwargs: Any)[source]

Bases: PdDataRetriever

A data retriever to get a dataframe from a seaborn dataset

The dataframe instance. It will be filled for the main class

_attributes_valid() bool[source]

Checks if the attributes of the mixin are all valid

get_data_frame() None[source]

Gets the data frame from the standard seaborn datasets

class mafw.processor_library.sns_plotter.HDFPdDataRetriever(hdf_filename: str | Path | None = None, key: str | None = None, *args: Any, **kwargs: Any)[source]

Bases: DataRetriever

Retrieve a data frame from a HDF file

This data retriever is getting a dataframe from a HDF file provided the filename and the object key.

Constructor parameters:

Parameters:
  • hdf_filename (str | Path, Optional) – The filename of the HDF file

  • key (str, Optional) – The key of the HDF store with the dataframe

get_data_frame() None[source]

Retrieve the dataframe from a HDF file

Raises:

PlotterMixinNotInitialized – if some of the required attributes are not initialised or invalid.

patch_data_frame() None[source]

The mixin implementation of the shared method with the base class

class mafw.processor_library.sns_plotter.LMPlot(x: str | None = None, y: str | None = None, hue: str | None = None, row: str | None = None, col: str | None = None, palette: str | Sequence[tuple[float, float, float] | str | tuple[float, float, float, float] | tuple[tuple[float, float, float] | str, float] | tuple[tuple[float, float, float, float], float]] | Mapping[Any, tuple[float, float, float] | str | tuple[float, float, float, float] | tuple[tuple[float, float, float] | str, float] | tuple[tuple[float, float, float, float], float]] | None = None, legend: bool = True, scatter_kws: dict[str, Any] | None = None, line_kws: dict[str, Any] | None = None, facet_kws: dict[str, Any] | None = None, *args: Any, **kwargs: Any)[source]

Bases: SNSFigurePlotter

The linear regression model plot mixin.

This mixin will produce a figure level regression model as described here

Constructor parameters:

Parameters:
  • x (str, Optional) – The name of the x variable or an iterable containing the x values.

  • y (str, Optional) – The name of the y variable or an iterable containing the y values.

  • hue (str, Optional) – The name of the hue variable or an iterable containing the hue values.

  • row (str, Optional) – The name of the row category or an iterable containing the row values.

  • col (str, Optional) – The name of the column category or an iterable containing the column values.

  • palette (str, Optional) – The colour palette to be used.

  • legend (bool, Optional) – If True and there is a hue variable, add a legend.

  • scatter_kws (dict[str, Any], Optional) – A dictionary like list of keywords passed to the underlying scatter.

  • scatter_kws

    A dictionary like list of keywords passed to the underlying scatter.

  • line_kws (dict[str, Any], Optional) – A dictionary like list of keywords passed to the underlying plot.

  • facet_kws (dict[str, Any], Optional) –

    A dictionary like list of keywords passed to the underlying seaborn.FacetGrid

plot() None[source]

Implements the plot method for a figure-level regression model.

class mafw.processor_library.sns_plotter.PdDataRetriever(*args: Any, **kwargs: Any)[source]

Bases: DataRetriever

The dataframe instance. It will be filled for the main class

get_data_frame() None[source]

The mixin implementation of the shared method with the base class

patch_data_frame() None[source]

The mixin implementation of the shared method with the base class

class mafw.processor_library.sns_plotter.RelPlot(x: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, y: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, hue: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, row: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, col: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, palette: str | Sequence[tuple[float, float, float] | str | tuple[float, float, float, float] | tuple[tuple[float, float, float] | str, float] | tuple[tuple[float, float, float, float], float]] | Mapping[Any, tuple[float, float, float] | str | tuple[float, float, float, float] | tuple[tuple[float, float, float] | str, float] | tuple[tuple[float, float, float, float], float]] | Colormap | None = None, kind: Literal['scatter', 'line'] = 'scatter', legend: Literal['auto', 'brief', 'full'] | bool = 'auto', plot_kws: Mapping[str, Any] | None = None, facet_kws: dict[str, Any] | None = None, *args: Any, **kwargs: Any)[source]

Bases: SNSFigurePlotter

The relational plot mixin.

This mixin will produce either a scatter or a line figure level plot.

The full documentation of the relplot object can be read at this link.

Constructor parameters:

Parameters:
  • x (str | Iterable, Optional) – The name of the x variable or an iterable containing the x values.

  • y (str | Iterable, Optional) – The name of the y variable or an iterable containing the y values.

  • hue (str | Iterable, Optional) – The name of the hue variable or an iterable containing the hue values.

  • row (str | Iterable, Optional) – The name of the row category or an iterable containing the row values.

  • col (str | Iterable, Optional) – The name of the column category or an iterable containing the column values.

  • palette (str | Colormap, Optional) – The colour palette to be used.

  • kind (str, Optional) – The type of relational plot (scatter or line). Defaults to scatter.

  • legend (str | bool, Optional) – How to draw the legend. If “brief”, numeric hue and size variables will be represented with a sample of evenly spaced values. If “full”, every group will get an entry in the legend. If “auto”, choose between brief or full representation based on number of levels. If False, no legend data is added and no legend is drawn. Defaults to auto.

  • plot_kws (dict[str, Any], Optional) – A dictionary like list of keywords passed to the underlying seaborn.relplot.

  • facet_kws (dict[str, Any], Optional) –

    A dictionary like list of keywords passed to the underlying seaborn.FacetGrid

plot() None[source]

Implements the plot method of a figure-level relational graph.

class mafw.processor_library.sns_plotter.SNSFigurePlotter(*args: Any, **kwargs: Any)[source]

Bases: FigurePlotter

Base mixin class to generate a seaborn Figure level plot

data_frame: DataFrame

The dataframe instance shared with the main class

facet_grid: FacetGrid

The facet grid instance shared with the main class

class mafw.processor_library.sns_plotter.SNSPlotter(*args: Any, **kwargs: Any)[source]

Bases: GenericPlotter

The Generic Plotter processor.

This is a subclass of a Processor with advanced functionality to fetch data in the form of a dataframe and to produce plots.

The key difference with respect to a normal processor is it process() method that has been already implemented as follows:

def process(self) -> None:
    """
    Specific implementation of the process method for the Seaborn plotter.

    It is almost the same as the GenericProcessor, with the addition that all open pyplot figures are closed
    after the process is finished.

    This part cannot be moved upward to the :class:`~.GenericPlotter` because the user might have selected
    another plotting library different from :link:`matplotlib`.
    """
    super().process()
    if not self.is_data_frame_empty():
        plt.close('all')

This actually means that when you are subclassing a SNSPlotter you do not have to implement the process method as you would do for a normal Processor, but you will have to implement the following methods:

  • in_loop_customization().

    The processor execution workflow (LoopType) can be any of the available, so actually the process method might be invoked only once, or multiple times inside a loop structure (for or while). If the execution is cyclic, then you may want to have the possibility to do some customisation for each iteration, for example, changing the plot title, or modifying the data selection, or the filename where the plots will be saved.

    You can use this method also in case of a single loop processor, in this case you will not have access to the loop parameters.

  • get_data_frame().

    This method has the task to get the data to be plotted in the form of a pandas DataFrame. The processor has the data_frame attribute where the data will be stored to make them accessible from all other methods.

  • patch_data_frame().

    A convenient method to apply data frame manipulation to the data just retrieved.

  • plot().

    This method is where the actual plotting occurs. Use the data_frame to plot the quantities you want.

  • customize_plot().

    This method can be optionally used to customize the appearance of the facet grid produced by the plot() method. It is particularly useful when the user is mixing this class with one of the FigurePlotter mixin, thus not having direct access to the plot method.

  • save().

    This method is where the produced plot is saved in a file. Remember to append the output file name to the list of produced outputs so that the _update_plotter_db() method will automatically store this file in the database during the finish() execution.

  • update_db().

    If the user wants to update a specific table in the database, they can use this method.

    It is worth reminding that all plotters are saving all generated files in the standard table PlotterOutput. This is automatically done by the _update_plotter_db() method that is called in the finish() method.

You do not need to overload the slice_data_frame() nor the group_and_aggregate_data_frame() methods, but you can simply use them by setting the slicing_dict and the grouping_columns and the aggregation_functions.

The processor comes with two processors parameters that can be used by user-defined subclasses:

  1. The output_folder that is the path where the output file will be saved

  2. The force_replot flag to be used whether the user wants the plot to be regenerated even if the output file already exists.

The default value of these parameters can be changed using the Processor.new_defaults dictionary as shown in this example.

Processor parameters

  • force_replot: Whether to force re-plotting even if the output file already exists (default: False)

  • output_folder: The path where the output file will be saved (default: PosixPath(‘/builds/kada/mafw’))

Constructor parameters:

Parameters:
  • slicing_dict (dict[str, Any], Optional) – A dictionary with key, value pairs to slice the input data frame before the plotting occurs.

  • grouping_columns (list[str], Optional) – A list of columns for the groupby operation on the data frame.

  • aggregation_functions (list[str | Callable[[Any], Any], Optional) – A list of functions for the aggregation on the grouped data frame.

  • matplotlib_backend (str, Optional) – The name of the matplotlib backend to be used. Defaults to ‘Agg’

  • output_folder (Path, Optional) – The path where the output file will be saved

  • force_replot (bool, Optional) – Whether to force re-plotting even if the output file already exists.

get_data_frame() None[source]

Specific implementation of the get data frame for the Seaborn plotter.

It must be overloaded.

The method is NOT returning the data_frame, but in your implementation you need to assign the data frame to the class data_frame attribute.

group_and_aggregate_data_frame() None[source]

Performs groupyby and aggregation of the data frame.

If the user provided some grouping columns and aggregation functions then the group_and_aggregate_data_frame() is invoked accordingly.

The user can update the values of those attributes during each cycle iteration within the implementation of the in_loop_customization().

See also

This method is simply invoking the group_and_aggregate_data_frame() function from the pandas_tools.

is_data_frame_empty() bool[source]

Check if the data frame is empty

process() None[source]

Specific implementation of the process method for the Seaborn plotter.

It is almost the same as the GenericProcessor, with the addition that all open pyplot figures are closed after the process is finished.

This part cannot be moved upward to the GenericPlotter because the user might have selected another plotting library different from matplotlib.

slice_data_frame() None[source]

Perform data frame slicing

The user can set some slicing criteria in the slicing_dict to select some specific data subset. The values of the slicing dict can be changed during each iteration within the implementation of the in_loop_customization().

See also

This method is simply invoking the slice_data_frame() function from the pandas_tools.

start() None[source]

Overload of the start method.

The SNSPlotter is overloading the start() in order to allow the user to change the matplotlib backend.

The user can selected which backend to use either directly in the class constructor or assign it to the class attribute matplotlib_backend.

_config: dict[str, Any]

A dictionary containing the processor configuration object.

This dictionary is populated with configuration parameter (always type 2) during the _load_parameter_configuration() method.

The original value of the configuration dictionary that is passed to the constructor is stored in _orig_config.

Changed in version v2.0.0: Now it is an empty dictionary until the _load_parameter_configuration() is called.

_processor_parameters: dict[str, PassiveParameter[ParameterType]]

A dictionary to store all the processor parameter instances.

The name of the parameter is used as a key, while for the value an instance of the PassiveParameter is used.

aggregation_functions: Iterable[str | Callable[[Any], Any]] | None

The list of aggregation functions to be applied to the grouped dataframe

data_frame: pd.DataFrame

The pandas DataFrame containing the data to be plotted.

facet_grid: sns.FacetGrid | None

The reference to the facet grid.

filter_register: mafw.db.db_filter.ProcessorFilter

The DB filter register of the Processor.

grouping_columns: Iterable[str] | None

The list of columns for grouping the data frame

item: Any

The current item of the loop.

loop_type: LoopType

The loop type.

The value of this parameter can also be changed by the execution_workflow() decorator factory.

See LoopType for more details.

matplotlib_backend: str

The backend to be used for matplotlib.

output_filename_list: list[Path]

The list of produced filenames.

remove_orphan_files: bool

The flag to remove or protect the orphan files. Defaults to True

slicing_dict: MutableMapping[str, Any] | None

The dictionary for slicing the input data frame

class mafw.processor_library.sns_plotter.SQLPdDataRetriever(table_name: str | None = None, required_cols: Iterable[str] | str | None = None, where_clause: str | None = None, *args: Any, **kwargs: Any)[source]

Bases: PdDataRetriever

A specialized data retriever to get a data frame from a database table.

The idea is to implement an interface to the pandas read_sql. The user has to provide the table name, the the list of required columns and an optional where clause.

Constructor parameters:

Parameters:
  • table_name (str, Optional) – The name of the table from where to get the data

  • required_cols (Iterable[str] | str | None, Optional) – A list of columns to be selected from the table and transferred as column in the dataframe.

  • where_clause (str, Optional) – The where clause used in the select SQL statement. If None is provided, then all rows will be selected.

_attributes_valid() bool[source]

Check if all required parameters are provided and valid.

get_data_frame() None[source]

Retrieve the dataframe from a database table.

Raises:

PlotterMixinNotInitialized – If some of the required attributes are missing.

database: Database

The database instance. It comes from the main class

required_columns: Iterable[str]

The iterable of columns.

Those are the column names to be selected from the table_name and included in the dataframe.

table_name: str

The table from where the data should be taken.

where_clause: str

The where clause of the SQL statement