mafw.processor_library.abstract_plotter

Module implements the abstract base interface to a processor to generate plots.

This abstract interface is needed because MAFw does not force the user to select a specific plot and data manipulation library.

The basic idea is to have a basic processor class featuring a modified process() method where a skeleton of the standard operations required to generate a graphical representation of a dataset is provided.

The user has the possibility to compose the GenericPlotter by mixing it with one DataRetriever and a FigurePlotter.

For a specific implementation based on seaborn, please refer to sns_plotter.

Classes

`DataRetriever`(args, *kwargs)	Base mixin class to retrieve a data frame from an external source
`FigurePlotter`()
`GenericPlotter`(args, *kwargs)	The Generic Plotter processor.
`PlotterMeta`(name, bases, namespace, /, **kwargs)	Metaclass for the plotter mixed classes

class mafw.processor_library.abstract_plotter.DataRetriever(*args: Any, **kwargs: Any)[source]

Bases: ABC

Base mixin class to retrieve a data frame from an external source

The dataframe instance. It will be filled for the main class

abstractmethod get_data_frame() → None[source]: The mixin implementation of the shared method with the base class

abstractmethod patch_data_frame() → None[source]: The mixin implementation of the shared method with the base class

class mafw.processor_library.abstract_plotter.FigurePlotter[source]: Bases: ABC

class mafw.processor_library.abstract_plotter.GenericPlotter(*args: Any, **kwargs: Any)[source]

Bases: Processor

The Generic Plotter processor.

This is a subclass of a Processor with advanced functionality to fetch data in the form of a dataframe and to produce plots. When mentioning dataframe in the context of the generic plotter, we do not have in mind any specific dataframe implementation.

The GenericPlotter is actually a kind of abstract class: since MAFw is not forcing you to use any specific plotting and data manipulation library, you need to subclass the GenericPlotter in your code, be sure that the required dependencies are available for import and use it as a normal processor.

If you are ok with using seaborn (with matplotlib as a graphical backend and pandas for data storage and manipulation), then be sure to install mafw with the optional feature seaborn (pip install mafw[seaborn]) and have a look at the sns_plotter for an already prepared implementation of a Plotter.

The key difference with respect to a normal processor is its process() method that has been already implemented as follows:

def process(self) -> None:
    """
    Process method overload.

    In the case of a plotter subclass, the process method is already implemented and the user should not overload
    it. On the contrary, the user must overload the other implementation methods described in the general
    :class:`class description <.SNSPlotter>`.
    """
    if self.filter_register.new_only:
        if self.is_output_existing():
            return

    self.in_loop_customization()
    self.get_data_frame()
    self.patch_data_frame()
    self.slice_data_frame()
    self.group_and_aggregate_data_frame()
    if not self.is_data_frame_empty():
        self.plot()
        self.customize_plot()
        self.save()
        self.update_db()

This actually means that when you are subclassing a GenericPlotter you do not have to implement the process method as you would do for a normal Processor, but you will have to implement the following methods:

in_loop_customization().

The processor execution workflow (LoopType) can be any of the available, so actually the process method might be invoked only once, or multiple times inside a loop structure (for or while). If the execution is cyclic, then you may want to have the possibility to do some customisation for each iteration, for example, changing the plot title, or modifying the data selection, or the filename where the plots will be saved.

You can use this method also in case of a single loop processor, in this case you will not have access to the loop parameters.

get_data_frame().

This method has the task to get the data to be plotted. Since it is an almost abstract class, you need to

patch_data_frame().

A convenient method to apply data frame manipulation to the data just retrieved. A typical use case is for conversion of unit of measurement. Imagine you saved the data in the S.I. units, but for the visualization you prefer to use practical units, so you can subclass this method to add a new column containing the same converted values of the original one.

slice_data_frame().

Slicing a dataframe is similar as applying a where clause in a SQL query. Implement this method to select which row should be used in the generation of your plot.

group_and_aggregate_data_frame().

In this method, you can manipulate your data frame to perform row grouping and aggregation.

is_data_frame_empty().

A simple method to test if the dataframe contains any data to be plotted. In fact, after the slicing, grouping and aggregation operations, it is possible that the dataframe is now left without any row. In this case, it makes no sense to waste time in plotting an empty graph.

plot().

This method is where the actual plotting occurs.

customize_plot().

This method can be optionally used to customize the appearance of the facet grid produced by the plot() method. It is particularly useful when the user is mixing this class with one of the FigurePlotter mixin, thus not having direct access to the plot method.

save().

This method is where the produced plot is saved in a file. Remember to append the output file name to the list of produced outputs so that the _update_plotter_db() method will automatically store this file in the database during the finish() execution.

update_db().

If the user wants to update a specific table in the database, they can use this method.

It is worth reminding that all plotters are saving all generated files in the standard table PlotterOutput. This is automatically done by the _update_plotter_db() method that is called in the finish() method.

Processor parameters

force_replot: Whether to force re-plotting even if the output file already exists (default: False)
output_folder: The path where the output file will be saved (default: PosixPath(‘/tmp/mafw-docs-4tgr7il6/v2.2.0’))

Constructor parameters

Parameters:

name (str, Optional) – The name of the processor. If None is provided, the class name is used instead. Defaults to None.
description (str, Optional) – A short description of the processor task. Defaults to the processor name.
config (dict, Optional) – A configuration dictionary for this processor. Defaults to None.
looper (LoopType, Optional) – Enumerator to define the looping type. Defaults to LoopType.ForLoop
user_interface (UserInterfaceBase, Optional) – A user interface instance to be used by the processor to interact with the user.
timer (Timer, Optional) – A timer object to measure process duration.
timer_params (dict, Optional) – Parameters for the timer object.
database (Database, Optional) – A database instance. Defaults to None.
database_conf (dict, Optional) – Configuration for the database. Default to None.
remove_orphan_files (bool, Optional) – Boolean flag to remove files on disc without a reference to the database. See Standard tables and _remove_orphan_files(). Defaults to True
replica_id (str, Optional) – The replica identifier for the current processor.
create_standard_tables (bool, Optional) – Boolean flag to create std tables on disk. Defaults to True When a nested steering configuration is loaded, this value can be overridden by the global create_standard_tables entry. Flat processor configurations keep the constructor value.
max_workers (int, Optional) – Number of worker threads for parallel loops.
queue_size (int, Optional) – Maximum size of the internal queue for the queue-based parallel loop.
queue_batch_size (int, Optional) – Number of items processed per worker task in the queue-based parallel loop.
kwargs – Keyword arguments that can be used to set processor parameters.

_update_plotter_db() → None[source]

Updates the Plotter DB.

A plotter subclass primarily generates plots as output in most cases, which means that no additional information needs to be stored in the database. This is sufficient to prevent unnecessary execution of the processor when it is not required.

This method is actually protected against execution without a valid database instance.

Changed in version v2.0.0: Using the Processor.replica_name instead of the Processor.name as plotter_name in the PlotterOutput Model.

customize_plot() → None[source]

The customize plot method.

The user can overload this method to customize the output produced by the plot() method, like, for example, adding meaningful axis titles, changing format, and so on.

As usual, it is possible to use the item, i_item and n_item to access the loop parameters.

finish() → None[source]

Concludes the execution.

The user can reimplement this method if there are some conclusive tasks that must be achieved. Always include a call to super().

format_progress_message() → None[source]

Customizes the progress message with information about the current item.

The user can overload this method in order to modify the message being displayed during the process loop with information about the current item.

The user can access the current value, its position in the looping cycle and the total number of items using Processor.item, Processor.i_item and Processor.n_item.

get_data_frame() → None[source]

Get the data frame with the data to be plotted.

This method can be either implemented in the SNSPlotter subclass or via a DataRetriever mixin class.

in_loop_customization() → None[source]: Customize the parameters for the output or input data for each execution iteration.

is_data_frame_empty() → bool[source]: Check if the data frame is empty

is_output_existing() → bool[source]

Check for plotter output existence.

Generally, plotter subclasses do not have a real output that can be saved to a database. This class is meant to generate one or more graphical output files.

One of the biggest advantages of having the output of a processor stored in the database is the ability to conditionally execute the processor if, and only if, the output is missing or changed.

In order to allow also plotter processor to benefit from this feature, a dedicated table is available among the standard tables.

If a connection to the database is provided, then this method is invoked at the beginning of the process() and a select query over the PlotterOutput model is executed filtering by processor name. All files in the filename lists are checked for existence and also the checksum is verified.

Especially during debugging phase of the processor, it is often needed to generate the plot several times, for this reason the user can switch the force_replot parameter to True in the steering file and the output file will be generated even if it is already existing.

This method will return True, if the output of the processor is already existing and valid, False, otherwise.

Changed in version v2.0.0: Using Processor.replica_name instead of Processor.name for storage in the PlotterOutput

Returns:: True if the processor output exists and it is valid.
Return type:: bool

patch_data_frame() → None[source]

Modify the data frame

This method can be used to perform operation on the data frame, like adding new columns. It can be either implemented in the plotter processor subclasses or via a mixin class.

plot() → None[source]

The plot method.

This is where the user has to implement the real plot generation

process() → None[source]

Process method overload.

In the case of a plotter subclass, the process method is already implemented and the user should not overload it. On the contrary, the user must overload the other implementation methods described in the general class description.

save() → None[source]

The save method.

This is where the user has to implement the saving of the plot on disc.

update_db() → None[source]

The update database method.

This is where the user has to implement the optional update of the database.

force_replot: Flag to force the regeneration of the output file even if it is already existing.

class mafw.processor_library.abstract_plotter.PlotterMeta(name, bases, namespace, /, **kwargs)[source]

Bases: _ProtocolMeta, ProcessorMeta

Metaclass for the plotter mixed classes