bibolamazi.core.bibusercache package

bibolamazi.core.bibusercache.tokencheckers module

This module provides a collection of useful token checkers that can be used to make sure the cache information is always valid and up-to-date.

Recall the Bibolamazi Cache is organized as nested dictionaries in which the cached information is organized.

One main concern of the caching mechanism is that information be invalidated when it is no longer relevant (between different runs of bibolamazi). This may be for example because the original bibtex entry from the source has changed.

Each cache dictionary (BibUserCacheDic) may be set a token validator, that is a verifier instance class which will invalidate items it detects as no longer valid. The validity of items is determined on the basis of validation tokens.

When an item in a cache dictionary is added or updated, a token (which can be any python value) is generated corresponding to the cached value. This token may be, for example, the date and time at which the value was cached. The validator then checks the tokens of the cache values and detects those entries whose token indicates that the entries are no longer valid: for example, if the token corresponds to the date and time at which the entry was stored, the validator may invalidate all entries whose token indicates that they are too old.

Token Checkers are free to decide what information to store in the tokens. See the tokencheckers module for examples. Token checkers must derive from the base class TokenChecker.

class bibolamazi.core.bibusercache.tokencheckers.EntryFieldsTokenChecker(bibdata, fields=[], store_type=False, store_persons=[], **kwargs)[source]

Bases: bibolamazi.core.bibusercache.tokencheckers.TokenChecker

A TokenChecker implementation that checks whether some fields of a bibliography entry have changed.

This works by calculating a MD5 hash of the contents of the given fields.

Constructs a token checker that will invalidate an entry if any of its fields given here have changed.

bibdata is a reference to the bibolamazifile’s bibliography data; this is the return value of bibolamaziData().

fields is a list of bibtex fields which should be checked for changes. Note that the ‘author’ and ‘editor’ fields are treated specially, with the store_persons argument.

If store_type is True, the entry is also invalidated if its type changes (for example, from @unpublished’ to @article’).

store_persons is a list of person roles we should check for changes (see person roles in pybtex.database.Entry : this is either ‘author’ or ‘editor’). Specify for example ‘author’ here instead of in the fields argument. This is because pybtex treats the ‘author’ and ‘editor’ fields specially.

new_token(key, value, **kwargs)[source]

Return a token which will serve to identify changes of the dictionary entry (key, value). This token may be any Python picklable object. It can be anything that cmp_tokens() will undertsand.

The default implementation returns True all the time. Subclasses should reimplement to do something useful.

class bibolamazi.core.bibusercache.tokencheckers.TokenChecker(**kwargs)[source]

Bases: object

Base class for a token checker validator.

The new_token() function always returns True and cmp_tokens() just compares tokens for equality with the == operator.

Subclasses should reimplement new_token() to return something useful. Subclasses may either use the default implementation equality comparision for cmp_tokens() or reimplement that function for custom token validation condition (e.g. as in TokenCheckerDate).

cmp_tokens(key, value, oldtoken, **kwargs)[source]

Checks to see if the dictionary entry (key, value) is still up-to-date and valid. The old token, returned by a previous call to new_token(), is provided in the argument oldtoken.

The default implementation calls new_token() for the (key, value) pair and compares the new token with the old token oldtoken for equality with the == operator. Depending on your use case, this may be enough so you may not have to reimplement this function (as, for example, in EntryFieldsTokenChecker).

However, you may wish to reimplement this function if a different comparision method is required. For example, if the token is a date at which the information was retrieved, you might want to test how old the information is, and invalidate it only after it has passed a certain amount of time (as done in TokenCheckerDate).

It is advisable that code in this function should be protected against having the wrong type in oldtoken or being given None. Such cases might easily pop up say between Bibolamazi Versions, or if the cache was once not properly set up. In any case, it’s safer to trap exceptions here and return False to avoid an exception propagating up and causing the whole cache load process to fail.

Return True if the entry is still valid, or False if the entry is out of date and should be discarded.

new_token(key, value, **kwargs)[source]

Return a token which will serve to identify changes of the dictionary entry (key, value). This token may be any Python picklable object. It can be anything that cmp_tokens() will undertsand.

The default implementation returns True all the time. Subclasses should reimplement to do something useful.

class bibolamazi.core.bibusercache.tokencheckers.TokenCheckerCombine(*args, **kwargs)[source]

Bases: bibolamazi.core.bibusercache.tokencheckers.TokenChecker

A TokenChecker implementation that combines several different token checkers. A cache entry is deemed valid only if it considered valid by all the installed token checkers.

For example, you may want to both make sure the cache has the right version (with a VersionTokenChecker and that it is up-to-date).

Constructor. Pass as arguments here instances of token checkers to check for, e.g.:

chk = TokenCheckerCombine(
    VersionTokenChecker('2.0'),
    EntryFieldsTokenChecker(bibdata, ['title', 'journal'])
    )
cmp_tokens(key, value, oldtoken, **kwargs)[source]

Checks to see if the dictionary entry (key, value) is still up-to-date and valid. The old token, returned by a previous call to new_token(), is provided in the argument oldtoken.

The default implementation calls new_token() for the (key, value) pair and compares the new token with the old token oldtoken for equality with the == operator. Depending on your use case, this may be enough so you may not have to reimplement this function (as, for example, in EntryFieldsTokenChecker).

However, you may wish to reimplement this function if a different comparision method is required. For example, if the token is a date at which the information was retrieved, you might want to test how old the information is, and invalidate it only after it has passed a certain amount of time (as done in TokenCheckerDate).

It is advisable that code in this function should be protected against having the wrong type in oldtoken or being given None. Such cases might easily pop up say between Bibolamazi Versions, or if the cache was once not properly set up. In any case, it’s safer to trap exceptions here and return False to avoid an exception propagating up and causing the whole cache load process to fail.

Return True if the entry is still valid, or False if the entry is out of date and should be discarded.

new_token(key, value, **kwargs)[source]

Return a token which will serve to identify changes of the dictionary entry (key, value). This token may be any Python picklable object. It can be anything that cmp_tokens() will undertsand.

The default implementation returns True all the time. Subclasses should reimplement to do something useful.

class bibolamazi.core.bibusercache.tokencheckers.TokenCheckerDate(time_valid=datetime.timedelta(days=5), **kwargs)[source]

Bases: bibolamazi.core.bibusercache.tokencheckers.TokenChecker

A TokenChecker implementation that remembers the date and time at which an entry was set, and invalidates the entry after an amount of time time_valid has passed.

The amount of time the information remains valid is given in the time_valid argument of the constructor or is set with a call to set_time_valid(). In either case, you should provide a python datetime.time_delta object.

cmp_tokens(key, value, oldtoken, **kwargs)[source]

Checks to see if the dictionary entry (key, value) is still up-to-date and valid. The old token, returned by a previous call to new_token(), is provided in the argument oldtoken.

The default implementation calls new_token() for the (key, value) pair and compares the new token with the old token oldtoken for equality with the == operator. Depending on your use case, this may be enough so you may not have to reimplement this function (as, for example, in EntryFieldsTokenChecker).

However, you may wish to reimplement this function if a different comparision method is required. For example, if the token is a date at which the information was retrieved, you might want to test how old the information is, and invalidate it only after it has passed a certain amount of time (as done in TokenCheckerDate).

It is advisable that code in this function should be protected against having the wrong type in oldtoken or being given None. Such cases might easily pop up say between Bibolamazi Versions, or if the cache was once not properly set up. In any case, it’s safer to trap exceptions here and return False to avoid an exception propagating up and causing the whole cache load process to fail.

Return True if the entry is still valid, or False if the entry is out of date and should be discarded.

new_token(**kwargs)[source]

Return a token which will serve to identify changes of the dictionary entry (key, value). This token may be any Python picklable object. It can be anything that cmp_tokens() will undertsand.

The default implementation returns True all the time. Subclasses should reimplement to do something useful.

set_time_valid(time_valid)[source]
class bibolamazi.core.bibusercache.tokencheckers.TokenCheckerPerEntry(checkers={}, **kwargs)[source]

Bases: bibolamazi.core.bibusercache.tokencheckers.TokenChecker

A TokenChecker implementation that associates different TokenChecker’s for individual entries, set manually.

By default, the items of the dictionary are always valid. When an entry-specific token checker is set with add_entry_check(), that token checker is used for that entry only.

add_entry_check(key, checker)[source]

Add an entry-specific checker.

key is the entry key for which this token checker applies. checker is the token checker instance itself. It is possible to make several keys share the same token checker instance.

Note that no explicit validation is performed. (This can’t be done because we don’t even have a pointer to the cache dict.) So you should call manually BibUserCacheDict.validate_item()

If a token checker was already set for this entry, it is replaced by the new one.

checker_for(key)[source]

Returns the token instance that has been set for the entry key, or None if no token checker has been set for that entry.

cmp_tokens(key, value, oldtoken, **kwargs)[source]

Checks to see if the dictionary entry (key, value) is still up-to-date and valid. The old token, returned by a previous call to new_token(), is provided in the argument oldtoken.

The default implementation calls new_token() for the (key, value) pair and compares the new token with the old token oldtoken for equality with the == operator. Depending on your use case, this may be enough so you may not have to reimplement this function (as, for example, in EntryFieldsTokenChecker).

However, you may wish to reimplement this function if a different comparision method is required. For example, if the token is a date at which the information was retrieved, you might want to test how old the information is, and invalidate it only after it has passed a certain amount of time (as done in TokenCheckerDate).

It is advisable that code in this function should be protected against having the wrong type in oldtoken or being given None. Such cases might easily pop up say between Bibolamazi Versions, or if the cache was once not properly set up. In any case, it’s safer to trap exceptions here and return False to avoid an exception propagating up and causing the whole cache load process to fail.

Return True if the entry is still valid, or False if the entry is out of date and should be discarded.

has_entry_for(key)[source]

Returns True if we have a token checker set for the given entry key.

new_token(key, value, **kwargs)[source]

Return a token which will serve to identify changes of the dictionary entry (key, value). This token may be any Python picklable object. It can be anything that cmp_tokens() will undertsand.

The default implementation returns True all the time. Subclasses should reimplement to do something useful.

remove_entry_check(key)[source]

As the name suggests, remove the token checker associated with the given entry key key. If no token checker was previously set, then this function does nothing.

class bibolamazi.core.bibusercache.tokencheckers.VersionTokenChecker(this_version, **kwargs)[source]

Bases: bibolamazi.core.bibusercache.tokencheckers.TokenChecker

A TokenChecker which checks entries with a given version number.

This is useful if you might change the format in which you store entries in your cache: adding a version number will ensure that any old-formatted entries will be discarded.

Constructs a version validator token checker.

this_version is the current version. Any entry that was not exactly marked with the version this_version will be deemed invalid.

this_version may actually be any python object. Comparision is done with the equality operator == (actually using the original TokenChecker implementation).

new_token(key, value, **kwargs)[source]

Return a token which will serve to identify changes of the dictionary entry (key, value). This token may be any Python picklable object. It can be anything that cmp_tokens() will undertsand.

The default implementation returns True all the time. Subclasses should reimplement to do something useful.

Module contents

class bibolamazi.core.bibusercache.BibUserCache(cache_version=None)[source]

Bases: object

The basic root cache object.

This object stores the corresponding cache dictionaries for each cache. (See cacheFor().)

(Internally, the caches are stored in one root BibUserCacheDic.)

cacheExpirationTokenChecker()[source]

Returns a cache expiration token checker validator which is configured with the default cache invalidation time.

This object may be used by subclasses as a token checker for sub-caches that need regular invalidation (typically several days in the default configuration).

Consider using though installCacheExpirationChecker(), which simply applies a general validator to your full cache; this is generally what you might want.

cacheFor(cache_name)[source]

Returns the cache dictionary object for the given cache name. If the cache dictionary does not exist, it is created.

hasCache()[source]

Returns True if we have any cache at all. This only returns False if there are no cache dictionaries defined.

installCacheExpirationChecker(cache_name)[source]

Installs a cache expiration checker on the given cache.

This is a utility that is at the disposal of the cache accessors to easily set up an expiration validator on their caches. Also, a single instance of an expiry token checker (see TokenCheckerDate) is shared between the different sub-caches and handled by this main cache object.

The duration of the expiry is typically several days; because the token checker instance is shared this cannot be changed easily nor should it be relied upon. If you have custom needs or need more control over this, create your own token checker.

Returns: the cache dictionary. This may have changed to a new empty object if the cache didn’t validate!

WARNING: the cache dictionary may have been altered with the validation of the cache! Use the return value of this function, or call BibUserCacheAccessor.cacheDic() again!

Note: this validation will not validate individual items in the cache dictionary, but the dictionary as a whole. Depending on your use case, it might be worth introducing per-entry validation. For that, check out the various token checkers in tokencheckers and call set_validation() to install a specific validator instance.

loadCache(cachefobj)[source]

Load the cache from a file-like object cachefobj.

This tries to unpickle the data and restore the cache. If the loading fails, e.g. because of an I/O error, the exception is logged but ignored, and an empty cache is initialized.

Note that at this stage only the basic validation is performed; the cache accessors should then each initialize their own subcaches with possibly their own specialized validators.

saveCache(cachefobj)[source]

Saves the cache to the file-like object cachefobj. This dumps a pickle-d version of the cache information into the stream.

setDefaultInvalidationTime(time_delta)[source]

A timedelta object giving the amount of time for which data in cache is consdered valid (by default).

class bibolamazi.core.bibusercache.BibUserCacheAccessor(cache_name, bibolamazifile, **kwargs)[source]

Bases: object

Base class for a cache accessor.

Filters should access the bibolamazi cache through a cache accessor. A cache accessor organizes how the caches are used and maintained. This is needed since several filters may want to access the same cache (e.g. fetched arXiv info from the arxiv.org API), so it is necessary to abstract out the cache object and how it is maintained out of the filter. This also avoids issues such as which filter is responsible for creating/refreshing the cache, etc.

A unique accessor instance is attached to a particular cache name (e.g. ‘arxiv_info’). It is instantiated by the BibolamaziFile. It is instructed to initialize the cache, possibly install token checkers, etc. at the beginning, before running any filters. The accessor is free to handle the cache as it prefers–build it right away, refresh it on demand only, etc.

Filters access the cache by requesting an instance to the accessor. This is done by calling cacheAccessor() (you can use bibolamaziFile() to get a pointer to the bibolamazifile object.). Filters should declare in advance which caches they would like to have access to by reimplementing the requested_cache_accessors() method.

Accessors are free to implement their public API how they deem it best. There is no obligation or particular structure to follow. (Although refreshCache(), fetchMissingItems(list), or similar function names may be typical.)

Cache accessor objects are instantiated by the bibolamazi file. Their constructors should accept a keyword argument bibolamazifile and pass it on to the superclass constructor. Constructors should also accept **kwargs for possible compatibility with future additions and pass it on to the parent constructor. The cache_name argument of this constructor should be a fixed string passed by the subclass, identifying this cache (e.g. ‘arxiv_info’).

bibolamaziFile()[source]

Returns the parent bibolamazifile of this cache accessor. This may be useful, e.g. to initialize a token cache validator in initialize().

Returns the object given in the constructor argument. Do not reimplement this function.

cacheDic()[source]

Returns the cache dictionary. This is meant as a ‘protected’ method for the accessor only. Objects that query the accessor should use the accessor-specific API to access data.

The cache dictionary is a BibUserCacheDic object. In particular, subcaches may want to set custom token checkers for proper cache invalidation (this should be done in the initialize() method).

This returns the data in the cache object that was set internally by the BibolamaziFile via the method setCacheObj(). Don’t call that manually, though, unless you’re implementing an alternative BibolamaziFile class !

cacheName()[source]

Return the cache name, as set in the constructor.

Subclasses do not need to reimplement this function.

cacheObject()[source]

Returns the parent BibUserCache object in which cacheDic() is a sub-cache. This is provided FOR CONVENIENCE! Don’t abuse this!

You should never need to access the object directly. Maybe just read-only to get some standard attributes such as the root cache version. If you’re writing directly to the root cache object, there is most likely a design flaw in your code!

Most of all, don’t write into other sub-caches!!

initialize(cache_obj)[source]

Initialize the cache.

Subclasses should perform any initialization tasks, such as install token checkers. This function should not return anything.

Note that it is strongly recommended to install some form of cache invalidation, would it be just even an expiry validator. You may want to call installCacheExpirationChecker() on cache_obj.

Note that the order in which the initialize() method of the various caches is called is undefined.

Use the cacheDic() method to access the cache dictionary. Note that if you install token checkers on this cache, e.g. with cache_obj.installCacheExpirationChecker(), then the cache dictionary object may have changed! (To be sure, call cacheDic() again.)

The default implementation raises a NotImplementedError exception.

setCacheObj(cache_obj)[source]

Sets the cache dictionary and cache object that will be returned by cacheDic() and cacheObject(), respectively. Accessors and filters should not call (nor reimplement) this function. This function gets called by the BibolamaziFile.

class bibolamazi.core.bibusercache.BibUserCacheDic(*args, **kwargs)[source]

Bases: collections.abc.MutableMapping

Implements a cache where information may be stored between different runs of bibolamazi, and between different filter runs.

This is a dictionary of key=value pairs, and can be used like a regular python dictionary.

This implements cache validation, i.e. making sure that the values stored in the cache are up-to-date. Each entry of the dictionary has a corresponding token, i.e. a value (of any python picklable type) which will identify whether the cache is invalid or not. For example, the value could be datetime corresponding to the time when the entry was created, and the rule for validating the cache might be to check that the entry is not more than e.g. 3 days old.

child_notify_changed(obj)[source]
items() → a set-like object providing a view on D's items[source]
new_value_set(key=None)[source]

Informs the dic that the value for key has been updated, and a new validation token should be stored.

If key is None, then this call is meant for the current object, so this call will relay to the parent dictionary.

set_parent(parent)[source]
set_validation(tokenchecker, validate=True)[source]

Set a function that will calculate the token for a given entry, for cache validation. The tokenchecker should be a TokenChecker instance. See the documentation for the tokencheckers modules for more information about cache validation.

If validate is True, then we immediately validate the contents of the cache.

token_for(key)[source]

Return the token that was stored associated with the given key.

Raise an exception if no cache validation set or if the key doesn’t exist.

validate()[source]

Validate this whole dictionary, i.e. make sure that each entry is still valid.

This calls validate_item() for each item in the dictionary.

validate_item(key)[source]

Validate an entry of the dictionary manually. Usually not needed.

If the value is valid, and happens to be a BibUserCacheDic, then that dictionary is also validated.

Invalid entries are deleted.

Returns True if have valid item, otherwise False.

exception bibolamazi.core.bibusercache.BibUserCacheError(cache_name, message)[source]

Bases: bibolamazi.core.butils.BibolamaziError

An exception which occurred when handling user caches. Usually, problems in the cache are silently ignored, because the cache can usually be safely regenerated.

However, if there is a serious error which prevents the cache from being regenerated, for example, then this error should be raised.

class bibolamazi.core.bibusercache.BibUserCacheList(*args, **kwargs)[source]

Bases: collections.abc.MutableSequence

append(value)[source]

S.append(value) – append value to the end of the sequence

insert(index, value)[source]

S.insert(index, value) – insert value before index