Writing a New Filter

Developing Custom filters

Writing filters is straightforward. An example is provided here: Example of a custom filter. Look inside the bibolamazi/filters/ directory at the existing filters for further examples, e.g. arxiv.py, duplicates.py or url.py. They should be rather simple to understand.

A filter can either act on individual entries (e.g. the arxiv.py filter), or on the whole database (e.g. duplicates.py).

For your organization, it is recommended to develop your filter(s) in a custom filter package which you keep a repository e.g. on github.com, so that the filter package can be easily installed on the different locations you would like to run bibolamazi from.

Don’t forget to make use of the bibolamazi cache, in case you fetch or compute values which you could cache for further reuse. You should access caches through the BibUserCacheAccessor class. Look at for the documentation for the bibusercache module. Look at examples most of all!! (TODO: add documentation about caches)

There are a couple utilities provided for the filters, check the bibolamazi.filters.util module. In particular check out the arxivutil and auxfile modules.

Feel free to contribute filters, it will only make bibolamazi more useful!

The Filter Module

There are two main objects your module should define at the very least:

  • a filter class, subclass of BibFilter.

  • a method called bibolamazi_filter_class(), which should return the filter class object. For example:

    def bibolamazi_filter_class():
        return ArxivNormalizeFilter;
    

You may want to have a look at Example of a custom filter for an example of a custom filter.

Your filter should log error, warning, information and debug messages to a logger obtained via Python’s logging mechanism, as demonstrated in the example.

Passing Arguments to the Filter

Command line arguments passed to the filter in the user’s bibolamazi config section are parsed into Python arguments to the filter class’ constructor. The translation is rather intuitive: each argument to the filter may be specified as an option, either using the syntax --use-uppercase=value or --use-uppercase value, where underscores are replaced by dashes, or using the Ghostscript-like syntax -dUseUppercase or -dUseUppercase=false, or for other types -sMode=fixed.

Some remarks:

  • to each filter argument corresponds a command-line option starting with --, where underscores are replaced by dashes. The command-line takes a single mandatory argument (except for arguments declared as booleans in their arg-docs, see Argdocs: Filter Argument Documentation below).
  • to each filter argument, corresponds a command-line option starting with -d or -s, using the syntax -dFilterOptionName, -dFilterOptionName=Value or -sFilterOptionName=Value. The -d variant is used to specify boolean option values, the -s variant any other type. The FilterOptionName is obtained by camel-casing the filter python argument: for example, if the filter constructor accepts an argument named use_uppercase_chars, then the corresponding camel-cased version will be UseUppercaseChars. (See note below on case sensitivity.)
  • each filter argument may be documented using Argdocs: Filter Argument Documentation. This information will appear in the filter help text.
  • if the filter constructor accepts a **kwargs, then any additional option-value pairs given as -sKey=Value or -dKey or -dKey=Value are passed on to the filter constructor’s kwargs.
  • if the filter constructor accepts a *args, then any additional positional arguments on the command line is passed to that *args parameter. The ordering of positional and optional arguments on the command-line make no difference. (Note that this also works this way if not all the previous declared arguments are specified. There’s some python hacking in there ;) )

Note

If even a single filter argument uses an uppercase letter, then the option parser will not convert any letter casing, and all option names will have the exact same letter casing as the filter arguments. Similarly, no camel-casing will occur with the -s... or -d... options.

Filter General Help Documentation

The filter class should declare the members helpauthor, helpdescription and helptext with meaningful help text:

  • helpauthor should be a short one-line description of the filter and contributor with license. E.g.:

    ArXiv clean-up filter by Philippe Faist, (C) 2013, GPL 3+
    
  • helpdescription is a brief description of what the filter does. This is displayed right after the Usage section in the help text, and before the filter arguments description.

  • helptext is a long description of what the filter exactly does, how to use it, the advantages, tricks, pitfalls, etc.

In the built-in filters, as well as the examples, the text is declared outside of the class (see HELP_AUTHOR etc.) so that we don’t have to deal with the indentation (and in the class, we only have helpauthor=HELP_AUTHOR etc.). That’s perfectly fair and completely optional.

Argdocs: Filter Argument Documentation

The docstring of the filter constructor is parsed in a special way. Documentation of the function arguments are specially parsed: they should have the form:

- argument_name(type): Description of the argument. The description may
  span over several lines.
- other_argument_name: Description of the other option. Notice that the
  type is optional and will default to a simple string.

This information will be displayed when running bibolamazi --help filtername.

If a type is specified, it should be a name of a python type, or a type which is available in the namespace of the filter module. The filter factory will attempt to convert the given string to the specified type when calling the filter constructor. If the given type is a custom type, and it has a docstring, then the docstring is included in the “Note on Filter Options Syntax” section of the help text.

There are some convenient predefined types for filter arguments, all defined in the module bibolamazi.bibfilter.argtypes:

  • CommaStrList: a comma-separated list of strings. This type may directly be used as a list type.
  • enum_class(): a function which returns a custom class which represents an enumeration value of several options.

Maybe look at the built-in filters and other examples to get an idea.

More doc should come here at some point in the future..........

Customizing Default Behavior

There are several other functions the module may define, although they are not mandatory.

  • parse_args() should parse an argument string, and return a tuple (args, kwargs) of how the filter constructor should be called. If the module does not provide this function, a very powerful default automatic filter option processor (based on python’s argparse module) is built using the filter argument names as options names.

  • format_help() should return a string with full detailed information about how to use the filter, and which options are accepted. If the module does not provide this function, the default automatic filter option processor is used to format a useful help text (which should be good enough for most of your purposes, especially if you don’t want to reinvent the wheel).

    Note: the helptext attribute of your BibFilter subclass is only used by the default automatic filter option processor; so if you implement format_help() manually, the helptext attribute will be ignored.