mafw.processor_library.sns_plotter

Module implements a Seaborn plotter processor with a mixin structure to generate seaborn plots.

This module implements the abstract_plotter functionalities using seaborn and pandas.

These two packages are not installed in the default installation of MAFw, unless the user decided to include the optional feature seaborn.

Along with the SNSPlotter, it includes a set of standard data retriever specific for pandas data frames.

Classes

`CatPlot`([x, y, hue, row, col, palette, ...])	The categorical plot mixin.
`DisPlot`([x, y, hue, row, col, palette, ...])	The distribution plot mixin.
`FromDatasetDataRetriever`([dataset_name])	A data retriever to get a dataframe from a seaborn dataset
`HDFPdDataRetriever`([hdf_filename, key])	Retrieve a data frame from a HDF file
`LMPlot`([x, y, hue, row, col, palette, ...])	The linear regression model plot mixin.
`PdDataRetriever`(args, *kwargs)	The dataframe instance.
`RelPlot`([x, y, hue, row, col, palette, ...])	The relational plot mixin.
`SNSFigurePlotter`(args, *kwargs)	Base mixin class to generate a seaborn Figure level plot
`SNSPlotter`(args, *kwargs)	The Generic Plotter processor.
`SQLPdDataRetriever`([table_name, ...])	A specialized data retriever to get a data frame from a database table.

Bases: SNSFigurePlotter

The categorical plot mixin.

This mixin will produce a figure level categorical plot as described here.

Constructor parameters:

Parameters:

x (str | Iterable, Optional) – The name of the x variable or an iterable containing the x values.
y (str | Iterable, Optional) – The name of the y variable or an iterable containing the y values.
hue (str | Iterable, Optional) – The name of the hue variable or an iterable containing the hue values.
row (str | Iterable, Optional) – The name of the row category or an iterable containing the row values.
col (str | Iterable, Optional) – The name of the column category or an iterable containing the column values.
palette (str, Optional) – The colour palette to be used.
kind (str, Optional) – The type of relational plot (scatter or line). Defaults to scatter.
legend (str | bool, Optional) – How to draw the legend. If “brief”, numeric hue and size variables will be represented with a sample of evenly spaced values. If “full”, every group will get an entry in the legend. If “auto”, choose between brief or full representation based on number of levels. If False, no legend data is added and no legend is drawn. Defaults to auto.
native_scale (bool, Optional) – When True, numeric or datetime values on the categorical axis will maintain their original scaling rather than being converted to fixed indices. Defaults to False.
plot_kws (dict[str, Any], Optional) – A dictionary like list of keywords passed to the underlying seaborn.catplot.
facet_kws (dict[str, Any], Optional) – A dictionary like list of keywords passed to the underlying seaborn.FacetGrid

plot() → None[source]: Implements the plot method of a figure-level categorical graph.

Bases: SNSFigurePlotter

The distribution plot mixin.

This mixin is the MAFw implementation of the seaborn displot and will produce one of the following figure level plots:

histplot: a simple histogram plot
kdeplot: a kernel density estimate plot
ecdfplot: an empirical cumulative distribution functions plot
rugplot: a plot of the marginal distributions as ticks.

Constructor parameters:

Parameters:

x (str | Iterable, Optional) – The name of the x variable or an iterable containing the x values.
y (str | Iterable, Optional) – The name of the y variable or an iterable containing the y values.
hue (str | Iterable, Optional) – The name of the hue variable or an iterable containing the hue values.
row (str | Iterable, Optional) – The name of the row category or an iterable containing the row values.
col (str | Iterable, Optional) – The name of the column category or an iterable containing the column values.
palette (str | Colormap, Optional) – The colour palette to be used.
kind (str, Optional) – The type of distribution plot (hist, kde or ecdf). Defaults to hist.
legend (bool, Optional) – If false, suppress the legend for the semantic variables. Defaults to True.
rug (bool, Optional) – If true, show each observation with marginal ticks. Defaults to False.
rug_kws (Mapping[str, Any], Optional) – Parameters to control the appearance of the rug plot.
plot_kws (Mapping[str, Any], Optional) – Parameters passed to the underlying plotting object.
facet_kws (Mapping[str, Any], Optional) – Parameters passed to the facet grid object.

plot() → None[source]: Implements the plot method for a figure-level distribution graph

class mafw.processor_library.sns_plotter.FromDatasetDataRetriever(dataset_name: str | None = None, *args: Any, **kwargs: Any)[source]

Bases: PdDataRetriever

A data retriever to get a dataframe from a seaborn dataset

The dataframe instance. It will be filled for the main class

_attributes_valid() → bool[source]: Checks if the attributes of the mixin are all valid

get_data_frame() → None[source]: Gets the data frame from the standard seaborn datasets

class mafw.processor_library.sns_plotter.HDFPdDataRetriever(hdf_filename: str | Path | None = None, key: str | None = None, *args: Any, **kwargs: Any)[source]

Bases: DataRetriever

Retrieve a data frame from a HDF file

This data retriever is getting a dataframe from a HDF file provided the filename and the object key.

Constructor parameters:

Parameters:

hdf_filename (str | Path, Optional) – The filename of the HDF file
key (str, Optional) – The key of the HDF store with the dataframe

get_data_frame() → None[source]

Retrieve the dataframe from a HDF file

Raises:: PlotterMixinNotInitialized – if some of the required attributes are not initialised or invalid.

patch_data_frame() → None[source]: The mixin implementation of the shared method with the base class

Bases: SNSFigurePlotter

The linear regression model plot mixin.

This mixin will produce a figure level regression model as described here

Constructor parameters:

Parameters:

x (str, Optional) – The name of the x variable or an iterable containing the x values.
y (str, Optional) – The name of the y variable or an iterable containing the y values.
hue (str, Optional) – The name of the hue variable or an iterable containing the hue values.
row (str, Optional) – The name of the row category or an iterable containing the row values.
col (str, Optional) – The name of the column category or an iterable containing the column values.
palette (str, Optional) – The colour palette to be used.
legend (bool, Optional) – If True and there is a hue variable, add a legend.
scatter_kws (dict[str, Any], Optional) – A dictionary like list of keywords passed to the underlying scatter.
scatter_kws –
A dictionary like list of keywords passed to the underlying scatter.
line_kws (dict[str, Any], Optional) – A dictionary like list of keywords passed to the underlying plot.
facet_kws (dict[str, Any], Optional) –
A dictionary like list of keywords passed to the underlying seaborn.FacetGrid

plot() → None[source]: Implements the plot method for a figure-level regression model.

class mafw.processor_library.sns_plotter.PdDataRetriever(*args: Any, **kwargs: Any)[source]

Bases: DataRetriever

The dataframe instance. It will be filled for the main class

get_data_frame() → None[source]: The mixin implementation of the shared method with the base class

patch_data_frame() → None[source]: The mixin implementation of the shared method with the base class

Bases: SNSFigurePlotter

The relational plot mixin.

This mixin will produce either a scatter or a line figure level plot.

The full documentation of the relplot object can be read at this link.

Constructor parameters:

Parameters:

x (str | Iterable, Optional) – The name of the x variable or an iterable containing the x values.
y (str | Iterable, Optional) – The name of the y variable or an iterable containing the y values.
hue (str | Iterable, Optional) – The name of the hue variable or an iterable containing the hue values.
row (str | Iterable, Optional) – The name of the row category or an iterable containing the row values.
col (str | Iterable, Optional) – The name of the column category or an iterable containing the column values.
palette (str | Colormap, Optional) – The colour palette to be used.
kind (str, Optional) – The type of relational plot (scatter or line). Defaults to scatter.
legend (str | bool, Optional) – How to draw the legend. If “brief”, numeric hue and size variables will be represented with a sample of evenly spaced values. If “full”, every group will get an entry in the legend. If “auto”, choose between brief or full representation based on number of levels. If False, no legend data is added and no legend is drawn. Defaults to auto.
plot_kws (dict[str, Any], Optional) – A dictionary like list of keywords passed to the underlying seaborn.relplot.
facet_kws (dict[str, Any], Optional) –
A dictionary like list of keywords passed to the underlying seaborn.FacetGrid

plot() → None[source]: Implements the plot method of a figure-level relational graph.

class mafw.processor_library.sns_plotter.SNSFigurePlotter(*args: Any, **kwargs: Any)[source]

Bases: FigurePlotter

Base mixin class to generate a seaborn Figure level plot

data_frame: DataFrame: The dataframe instance shared with the main class

facet_grid: FacetGrid: The facet grid instance shared with the main class

class mafw.processor_library.sns_plotter.SNSPlotter(*args: Any, **kwargs: Any)[source]

Bases: GenericPlotter

The Generic Plotter processor.

This is a subclass of a Processor with advanced functionality to fetch data in the form of a dataframe and to produce plots.

The key difference with respect to a normal processor is it process() method that has been already implemented as follows:

def process(self) -> None:
    """
    Specific implementation of the process method for the Seaborn plotter.

    It is almost the same as the GenericProcessor, with the addition that all open pyplot figures are closed
    after the process is finished.

    This part cannot be moved upward to the :class:`~.GenericPlotter` because the user might have selected
    another plotting library different from :link:`matplotlib`.
    """
    super().process()
    if not self.is_data_frame_empty():
        plt.close('all')

This actually means that when you are subclassing a SNSPlotter you do not have to implement the process method as you would do for a normal Processor, but you will have to implement the following methods:

in_loop_customization().

The processor execution workflow (LoopType) can be any of the available, so actually the process method might be invoked only once, or multiple times inside a loop structure (for or while). If the execution is cyclic, then you may want to have the possibility to do some customisation for each iteration, for example, changing the plot title, or modifying the data selection, or the filename where the plots will be saved.

You can use this method also in case of a single loop processor, in this case you will not have access to the loop parameters.

get_data_frame().

This method has the task to get the data to be plotted in the form of a pandas DataFrame. The processor has the data_frame attribute where the data will be stored to make them accessible from all other methods.

patch_data_frame().

A convenient method to apply data frame manipulation to the data just retrieved.

plot().

This method is where the actual plotting occurs. Use the data_frame to plot the quantities you want.

customize_plot().

This method can be optionally used to customize the appearance of the facet grid produced by the plot() method. It is particularly useful when the user is mixing this class with one of the FigurePlotter mixin, thus not having direct access to the plot method.

save().

This method is where the produced plot is saved in a file. Remember to append the output file name to the list of produced outputs so that the _update_plotter_db() method will automatically store this file in the database during the finish() execution.

update_db().

If the user wants to update a specific table in the database, they can use this method.

It is worth reminding that all plotters are saving all generated files in the standard table PlotterOutput. This is automatically done by the _update_plotter_db() method that is called in the finish() method.

You do not need to overload the slice_data_frame() nor the group_and_aggregate_data_frame() methods, but you can simply use them by setting the slicing_dict and the grouping_columns and the aggregation_functions.

The processor comes with two processors parameters that can be used by user-defined subclasses:

The output_folder that is the path where the output file will be saved

The force_replot flag to be used whether the user wants the plot to be regenerated even if the output file already exists.

The default value of these parameters can be changed using the Processor.new_defaults dictionary as shown in this example.

Processor parameters

force_replot: Whether to force re-plotting even if the output file already exists (default: False)
output_folder: The path where the output file will be saved (default: PosixPath(‘/builds/kada/mafw’))

Constructor parameters:

Parameters:

slicing_dict (dict[str, Any], Optional) – A dictionary with key, value pairs to slice the input data frame before the plotting occurs.
grouping_columns (list[str], Optional) – A list of columns for the groupby operation on the data frame.
aggregation_functions (list[str | Callable[[Any], Any], Optional) – A list of functions for the aggregation on the grouped data frame.
matplotlib_backend (str, Optional) – The name of the matplotlib backend to be used. Defaults to ‘Agg’
output_folder (Path, Optional) – The path where the output file will be saved
force_replot (bool, Optional) – Whether to force re-plotting even if the output file already exists.

get_data_frame() → None[source]

Specific implementation of the get data frame for the Seaborn plotter.

It must be overloaded.

The method is NOT returning the data_frame, but in your implementation you need to assign the data frame to the class data_frame attribute.

group_and_aggregate_data_frame() → None[source]

Performs groupyby and aggregation of the data frame.

If the user provided some grouping columns and aggregation functions then the group_and_aggregate_data_frame() is invoked accordingly.

The user can update the values of those attributes during each cycle iteration within the implementation of the in_loop_customization().