mafw.processor_library.sns_plotter
Module implements a Seaborn plotter processor with a mixin structure to generate seaborn plots.
This module implements the abstract_plotter functionalities using seaborn and pandas.
These two packages are not installed in the default installation of MAFw, unless the user decided to include the optional feature seaborn.
Along with the SNSPlotter, it includes a set of standard data retriever specific for pandas data frames.
Classes
|
The categorical plot mixin. |
|
The distribution plot mixin. |
|
A data retriever to get a dataframe from a seaborn dataset |
|
Retrieve a data frame from a HDF file |
|
The linear regression model plot mixin. |
|
The dataframe instance. |
|
The relational plot mixin. |
|
Base mixin class to generate a seaborn Figure level plot |
|
The Generic Plotter processor. |
|
A specialized data retriever to get a data frame from a database table. |
- class mafw.processor_library.sns_plotter.CatPlot(x: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, y: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, hue: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, row: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, col: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, palette: str | Sequence[tuple[float, float, float] | str | tuple[float, float, float, float] | tuple[tuple[float, float, float] | str, float] | tuple[tuple[float, float, float, float], float]] | Mapping[Any, tuple[float, float, float] | str | tuple[float, float, float, float] | tuple[tuple[float, float, float] | str, float] | tuple[tuple[float, float, float, float], float]] | None = None, kind: Literal['strip', 'swarm', 'box', 'violin', 'boxen', 'point', 'bar', 'count'] = 'strip', legend: Literal['auto', 'brief', 'full'] | bool = 'auto', native_scale: bool = False, plot_kws: Mapping[str, Any] | None = None, facet_kws: dict[str, Any] | None = None, *args: Any, **kwargs: Any)[source]
Bases:
SNSFigurePlotterThe categorical plot mixin.
This mixin will produce a figure level categorical plot as described here.
Constructor parameters:
- Parameters:
x (str | Iterable, Optional) – The name of the x variable or an iterable containing the x values.
y (str | Iterable, Optional) – The name of the y variable or an iterable containing the y values.
hue (str | Iterable, Optional) – The name of the hue variable or an iterable containing the hue values.
row (str | Iterable, Optional) – The name of the row category or an iterable containing the row values.
col (str | Iterable, Optional) – The name of the column category or an iterable containing the column values.
palette (str, Optional) – The colour palette to be used.
kind (str, Optional) – The type of relational plot (scatter or line). Defaults to scatter.
legend (str | bool, Optional) – How to draw the legend. If “brief”, numeric hue and size variables will be represented with a sample of evenly spaced values. If “full”, every group will get an entry in the legend. If “auto”, choose between brief or full representation based on number of levels. If False, no legend data is added and no legend is drawn. Defaults to auto.
native_scale (bool, Optional) – When True, numeric or datetime values on the categorical axis will maintain their original scaling rather than being converted to fixed indices. Defaults to False.
plot_kws (dict[str, Any], Optional) – A dictionary like list of keywords passed to the underlying seaborn.catplot.
facet_kws (dict[str, Any], Optional) – A dictionary like list of keywords passed to the underlying seaborn.FacetGrid
- class mafw.processor_library.sns_plotter.DisPlot(x: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, y: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, hue: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, row: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, col: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, palette: str | Sequence[tuple[float, float, float] | str | tuple[float, float, float, float] | tuple[tuple[float, float, float] | str, float] | tuple[tuple[float, float, float, float], float]] | Mapping[Any, tuple[float, float, float] | str | tuple[float, float, float, float] | tuple[tuple[float, float, float] | str, float] | tuple[tuple[float, float, float, float], float]] | Colormap | None = None, kind: Literal['hist', 'kde', 'ecdf'] = 'hist', legend: bool = True, rug: bool = False, rug_kws: dict[str, Any] | None = None, plot_kws: Mapping[str, Any] | None = None, facet_kws: dict[str, Any] | None = None, *args: Any, **kwargs: Any)[source]
Bases:
SNSFigurePlotterThe distribution plot mixin.
This mixin is the MAFw implementation of the seaborn displot and will produce one of the following figure level plots:
histplot: a simple histogram plot
kdeplot: a kernel density estimate plot
ecdfplot: an empirical cumulative distribution functions plot
rugplot: a plot of the marginal distributions as ticks.
Constructor parameters:
- Parameters:
x (str | Iterable, Optional) – The name of the x variable or an iterable containing the x values.
y (str | Iterable, Optional) – The name of the y variable or an iterable containing the y values.
hue (str | Iterable, Optional) – The name of the hue variable or an iterable containing the hue values.
row (str | Iterable, Optional) – The name of the row category or an iterable containing the row values.
col (str | Iterable, Optional) – The name of the column category or an iterable containing the column values.
palette (str | Colormap, Optional) – The colour palette to be used.
kind (str, Optional) – The type of distribution plot (hist, kde or ecdf). Defaults to hist.
legend (bool, Optional) – If false, suppress the legend for the semantic variables. Defaults to True.
rug (bool, Optional) – If true, show each observation with marginal ticks. Defaults to False.
rug_kws (Mapping[str, Any], Optional) – Parameters to control the appearance of the rug plot.
plot_kws (Mapping[str, Any], Optional) – Parameters passed to the underlying plotting object.
facet_kws (Mapping[str, Any], Optional) – Parameters passed to the facet grid object.
- class mafw.processor_library.sns_plotter.FromDatasetDataRetriever(dataset_name: str | None = None, *args: Any, **kwargs: Any)[source]
Bases:
PdDataRetrieverA data retriever to get a dataframe from a seaborn dataset
The dataframe instance. It will be filled for the main class
- class mafw.processor_library.sns_plotter.HDFPdDataRetriever(hdf_filename: str | Path | None = None, key: str | None = None, *args: Any, **kwargs: Any)[source]
Bases:
DataRetrieverRetrieve a data frame from a HDF file
This data retriever is getting a dataframe from a HDF file provided the filename and the object key.
Constructor parameters:
- Parameters:
hdf_filename (str | Path, Optional) – The filename of the HDF file
key (str, Optional) – The key of the HDF store with the dataframe
- get_data_frame() None[source]
Retrieve the dataframe from a HDF file
- Raises:
PlotterMixinNotInitialized – if some of the required attributes are not initialised or invalid.
- class mafw.processor_library.sns_plotter.LMPlot(x: str | None = None, y: str | None = None, hue: str | None = None, row: str | None = None, col: str | None = None, palette: str | Sequence[tuple[float, float, float] | str | tuple[float, float, float, float] | tuple[tuple[float, float, float] | str, float] | tuple[tuple[float, float, float, float], float]] | Mapping[Any, tuple[float, float, float] | str | tuple[float, float, float, float] | tuple[tuple[float, float, float] | str, float] | tuple[tuple[float, float, float, float], float]] | None = None, legend: bool = True, scatter_kws: dict[str, Any] | None = None, line_kws: dict[str, Any] | None = None, facet_kws: dict[str, Any] | None = None, *args: Any, **kwargs: Any)[source]
Bases:
SNSFigurePlotterThe linear regression model plot mixin.
This mixin will produce a figure level regression model as described here
Constructor parameters:
- Parameters:
x (str, Optional) – The name of the x variable or an iterable containing the x values.
y (str, Optional) – The name of the y variable or an iterable containing the y values.
hue (str, Optional) – The name of the hue variable or an iterable containing the hue values.
row (str, Optional) – The name of the row category or an iterable containing the row values.
col (str, Optional) – The name of the column category or an iterable containing the column values.
palette (str, Optional) – The colour palette to be used.
legend (bool, Optional) – If True and there is a hue variable, add a legend.
scatter_kws (dict[str, Any], Optional) – A dictionary like list of keywords passed to the underlying scatter.
scatter_kws –
A dictionary like list of keywords passed to the underlying scatter.
line_kws (dict[str, Any], Optional) – A dictionary like list of keywords passed to the underlying plot.
facet_kws (dict[str, Any], Optional) –
A dictionary like list of keywords passed to the underlying seaborn.FacetGrid
- class mafw.processor_library.sns_plotter.PdDataRetriever(*args: Any, **kwargs: Any)[source]
Bases:
DataRetrieverThe dataframe instance. It will be filled for the main class
- class mafw.processor_library.sns_plotter.RelPlot(x: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, y: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, hue: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, row: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, col: str | bytes | date | datetime | timedelta | bool | complex | Timestamp | Timedelta | Iterable[float | complex | int] | None = None, palette: str | Sequence[tuple[float, float, float] | str | tuple[float, float, float, float] | tuple[tuple[float, float, float] | str, float] | tuple[tuple[float, float, float, float], float]] | Mapping[Any, tuple[float, float, float] | str | tuple[float, float, float, float] | tuple[tuple[float, float, float] | str, float] | tuple[tuple[float, float, float, float], float]] | Colormap | None = None, kind: Literal['scatter', 'line'] = 'scatter', legend: Literal['auto', 'brief', 'full'] | bool = 'auto', plot_kws: Mapping[str, Any] | None = None, facet_kws: dict[str, Any] | None = None, *args: Any, **kwargs: Any)[source]
Bases:
SNSFigurePlotterThe relational plot mixin.
This mixin will produce either a scatter or a line figure level plot.
The full documentation of the relplot object can be read at this link.
Constructor parameters:
- Parameters:
x (str | Iterable, Optional) – The name of the x variable or an iterable containing the x values.
y (str | Iterable, Optional) – The name of the y variable or an iterable containing the y values.
hue (str | Iterable, Optional) – The name of the hue variable or an iterable containing the hue values.
row (str | Iterable, Optional) – The name of the row category or an iterable containing the row values.
col (str | Iterable, Optional) – The name of the column category or an iterable containing the column values.
palette (str | Colormap, Optional) – The colour palette to be used.
kind (str, Optional) – The type of relational plot (scatter or line). Defaults to scatter.
legend (str | bool, Optional) – How to draw the legend. If “brief”, numeric hue and size variables will be represented with a sample of evenly spaced values. If “full”, every group will get an entry in the legend. If “auto”, choose between brief or full representation based on number of levels. If False, no legend data is added and no legend is drawn. Defaults to auto.
plot_kws (dict[str, Any], Optional) – A dictionary like list of keywords passed to the underlying seaborn.relplot.
facet_kws (dict[str, Any], Optional) –
A dictionary like list of keywords passed to the underlying seaborn.FacetGrid
- class mafw.processor_library.sns_plotter.SNSFigurePlotter(*args: Any, **kwargs: Any)[source]
Bases:
FigurePlotterBase mixin class to generate a seaborn Figure level plot
- data_frame: DataFrame
The dataframe instance shared with the main class
- facet_grid: FacetGrid
The facet grid instance shared with the main class
- class mafw.processor_library.sns_plotter.SNSPlotter(*args: Any, **kwargs: Any)[source]
Bases:
GenericPlotterThe Generic Plotter processor.
This is a subclass of a Processor with advanced functionality to fetch data in the form of a dataframe and to produce plots.
The key difference with respect to a normal processor is it
process()method that has been already implemented as follows:def process(self) -> None: """ Specific implementation of the process method for the Seaborn plotter. It is almost the same as the GenericProcessor, with the addition that all open pyplot figures are closed after the process is finished. This part cannot be moved upward to the :class:`~.GenericPlotter` because the user might have selected another plotting library different from :link:`matplotlib`. """ super().process() if not self.is_data_frame_empty(): plt.close('all')
This actually means that when you are subclassing a SNSPlotter you do not have to implement the process method as you would do for a normal Processor, but you will have to implement the following methods:
-
The processor execution workflow (LoopType) can be any of the available, so actually the process method might be invoked only once, or multiple times inside a loop structure (for or while). If the execution is cyclic, then you may want to have the possibility to do some customisation for each iteration, for example, changing the plot title, or modifying the data selection, or the filename where the plots will be saved.
You can use this method also in case of a single loop processor, in this case you will not have access to the loop parameters.
-
This method has the task to get the data to be plotted in the form of a pandas DataFrame. The processor has the
data_frameattribute where the data will be stored to make them accessible from all other methods. -
A convenient method to apply data frame manipulation to the data just retrieved.
-
This method is where the actual plotting occurs. Use the
data_frameto plot the quantities you want. -
This method can be optionally used to customize the appearance of the facet grid produced by the
plot()method. It is particularly useful when the user is mixing this class with one of theFigurePlottermixin, thus not having direct access to the plot method. -
This method is where the produced plot is saved in a file. Remember to append the output file name to the
list of produced outputsso that the_update_plotter_db()method will automatically store this file in the database during thefinish()execution. -
If the user wants to update a specific table in the database, they can use this method.
It is worth reminding that all plotters are saving all generated files in the standard table PlotterOutput. This is automatically done by the
_update_plotter_db()method that is called in thefinish()method.
You do not need to overload the
slice_data_frame()nor thegroup_and_aggregate_data_frame()methods, but you can simply use them by setting theslicing_dictand thegrouping_columnsand theaggregation_functions.Constructor parameters:
- Parameters:
slicing_dict (dict[str, Any], Optional) – A dictionary with key, value pairs to slice the input data frame before the plotting occurs.
grouping_columns (list[str], Optional) – A list of columns for the groupby operation on the data frame.
aggregation_functions (list[str | Callable[[Any], Any], Optional) – A list of functions for the aggregation on the grouped data frame.
matplotlib_backend (str, Optional) – The name of the matplotlib backend to be used. Defaults to ‘Agg’
- get_data_frame() None[source]
Specific implementation of the get data frame for the Seaborn plotter.
It must be overloaded.
The method is NOT returning the data_frame, but in your implementation you need to assign the data frame to the class
data_frameattribute.
- group_and_aggregate_data_frame() None[source]
Performs groupyby and aggregation of the data frame.
If the user provided some
grouping columnsandaggregation functionsthen thegroup_and_aggregate_data_frame()is invoked accordingly.The user can update the values of those attributes during each cycle iteration within the implementation of the
in_loop_customization().See also
This method is simply invoking the
group_and_aggregate_data_frame()function from thepandas_tools.
- process() None[source]
Specific implementation of the process method for the Seaborn plotter.
It is almost the same as the GenericProcessor, with the addition that all open pyplot figures are closed after the process is finished.
This part cannot be moved upward to the
GenericPlotterbecause the user might have selected another plotting library different from matplotlib.
- slice_data_frame() None[source]
Perform data frame slicing
The user can set some slicing criteria in the
slicing_dictto select some specific data subset. The values of the slicing dict can be changed during each iteration within the implementation of thein_loop_customization().See also
This method is simply invoking the
slice_data_frame()function from thepandas_tools.
- start() None[source]
Overload of the start method.
The
SNSPlotteris overloading thestart()in order to allow the user to change the matplotlib backend.The user can selected which backend to use either directly in the class constructor or assign it to the class attribute
matplotlib_backend.
- aggregation_functions: Iterable[str | Callable[[Any], Any]] | None
The list of aggregation functions to be applied to the grouped dataframe
- data_frame: pd.DataFrame
The pandas DataFrame containing the data to be plotted.
- facet_grid: sns.FacetGrid | None
The reference to the facet grid.
- filter_register: mafw.db.db_filter.FilterRegister
The DB filter register of the Processor.
- grouping_columns: Iterable[str] | None
The list of columns for grouping the data frame
- item: Any
The current item of the loop.
- loop_type: LoopType
The loop type.
The value of this parameter can also be changed by the
execution_workflow()decorator factory.See
LoopTypefor more details.
- matplotlib_backend: str
The backend to be used for matplotlib.
- output_filename_list: list[Path]
The list of produced filenames.
- remove_orphan_files: bool
The flag to remove or protect the orphan files. Defaults to True
- slicing_dict: MutableMapping[str, Any] | None
The dictionary for slicing the input data frame
-
- class mafw.processor_library.sns_plotter.SQLPdDataRetriever(table_name: str | None = None, required_cols: Iterable[str] | str | None = None, where_clause: str | None = None, *args: Any, **kwargs: Any)[source]
Bases:
PdDataRetrieverA specialized data retriever to get a data frame from a database table.
The idea is to implement an interface to the pandas
read_sql. The user has to provide thetable name, thethe list of required columnsand an optionalwhere clause.Constructor parameters:
- Parameters:
table_name (str, Optional) – The name of the table from where to get the data
required_cols (Iterable[str] | str | None, Optional) – A list of columns to be selected from the table and transferred as column in the dataframe.
where_clause (str, Optional) – The where clause used in the select SQL statement. If None is provided, then all rows will be selected.
- get_data_frame() None[source]
Retrieve the dataframe from a database table.
- Raises:
PlotterMixinNotInitialized – If some of the required attributes are missing.
- database: Database
The database instance. It comes from the main class
- required_columns: Iterable[str]
The iterable of columns.
Those are the column names to be selected from the
table_nameand included in the dataframe.
- table_name: str
The table from where the data should be taken.
- where_clause: str
The where clause of the SQL statement