Python API: Filter Utilities Package

bibolamazi.filters.util.arxivutil Module

class bibolamazi.filters.util.arxivutil.ArxivFetchedAPIInfoCacheAccessor(**kwargs)[source]

Bases: bibolamazi.core.bibusercache.BibUserCacheAccessor

A BibUserCacheAccessor for fetching and accessing information retrieved from the arXiv API.

arxiv_403_received = False
fetchArxivApiInfo(idlist)[source]

Populates the given cache with information about the arXiv entries given in idlist. This must be, yes you guessed right, a list of arXiv identifiers that we should fetch.

This function performs a query on the arXiv.org API, using the arxiv2bib library. Please note that you should avoid making rapid fire requests in a row (this should normally not happen anyway thanks to our cache mechanism). However, beware that if we get a 403 Forbidden HTTP answer, we should not continue or else arXiv.org might interpret our requests as a DOS attack. If a 403 Forbidden HTTP answer is received this function raises BibArxivApiFetchError with a meaningful error text.

Only those entries in idlist which are not already in the cache are fetched.

idlist can be any iterable.

getArxivApiInfo(arxivid)[source]

Returns a dictionary:

{
  'reference':  <arxiv2bib.Reference>,
  'bibtex': <bibtex string>,
  'error': <None or an error string>,
}

for the given arXiv id in the cache. If the information is not in the cache, returns None.

Don’t forget to first call fetchArxivApiInfo() to retrieve the information in the first place.

Note the reference part may be a arxiv2bib.ReferenceErrorInfo, if there was an error retreiving the reference. In that case, the key ‘error’ contains an error string.

initialize(cache_obj, **kwargs)[source]

Initialize the cache.

Subclasses should perform any initialization tasks, such as install token checkers. This function should not return anything.

Note that it is strongly recommended to install some form of cache invalidation, would it be just even an expiry validator. You may want to call installCacheExpirationChecker() on cache_obj.

Note that the order in which the initialize() method of the various caches is called is undefined.

Use the cacheDic() method to access the cache dictionary. Note that if you install token checkers on this cache, e.g. with cache_obj.installCacheExpirationChecker(), then the cache dictionary object may have changed! (To be sure, call cacheDic() again.)

The default implementation raises a NotImplementedError exception.

class bibolamazi.filters.util.arxivutil.ArxivInfoCacheAccessor(**kwargs)[source]

Bases: bibolamazi.core.bibusercache.BibUserCacheAccessor

Cache accessor for detected arXiv information about bibliography entries.

complete_cache(bibdata, arxiv_api_accessor)[source]

Makes sure the cache is complete for all items in bibdata.

getArXivInfo(entrykey)[source]

Get the arXiv information corresponding to entry citekey entrykey. If the entry is not in the cache, returns None. Call complete_cache() first!

initialize(cache_obj, **kwargs)[source]

Initialize the cache.

Subclasses should perform any initialization tasks, such as install token checkers. This function should not return anything.

Note that it is strongly recommended to install some form of cache invalidation, would it be just even an expiry validator. You may want to call installCacheExpirationChecker() on cache_obj.

Note that the order in which the initialize() method of the various caches is called is undefined.

Use the cacheDic() method to access the cache dictionary. Note that if you install token checkers on this cache, e.g. with cache_obj.installCacheExpirationChecker(), then the cache dictionary object may have changed! (To be sure, call cacheDic() again.)

The default implementation raises a NotImplementedError exception.

rebuild_cache(bibdata, arxiv_api_accessor)[source]

Clear and rebuild the entry cache completely.

revalidate(bibolamazifile)[source]

Re-validates the cache (with validate()), and calls again complete_cache() to fetch all missing or out-of-date entries.

exception bibolamazi.filters.util.arxivutil.BibArxivApiFetchError(msg)[source]

Bases: bibolamazi.core.bibusercache.BibUserCacheError

bibolamazi.filters.util.arxivutil.detectEntryArXivInfo(entry)[source]

Extract arXiv information from a pybtex.database.Entry bibliographic entry.

Returns upon success a dictionary of the form:

{ 'primaryclass': <primary class, if available>,
  'arxivid': <the (minimal) arXiv ID (in format XXXX.XXXX  or  archive/XXXXXXX)>,
  'archiveprefix': value of the 'archiveprefix' field
  'published': True/False <whether this entry was published in a journal other than arxiv>,
  'doi': <DOI of entry if any, otherwise None>
  'year': <Year in preprint arXiv ID number. 4-digit, string type.>
  'isoldarxivid': <Whether the arXiv ID is of old style, i.e. 'primary-class/XXXXXXX'>
  'isnewarxivid': <Whether the arXiv ID is of new style, i.e. 'XXXX.XXXX+' (with 4 or more digits after dot)>,
}

Note that ‘published’ is set to True for PhD and Master’s thesis. Also, the arxiv filter handles this case separately and explicitly, the option there -dThesesCountAsPublished=0 has no effect here.

If no arXiv information was detected, then this function returns None.

bibolamazi.filters.util.arxivutil.get_arxiv_cache_access(bibolamazifile)[source]
bibolamazi.filters.util.arxivutil.setup_and_get_arxiv_accessor(bibolamazifile)[source]
bibolamazi.filters.util.arxivutil.stripArXivInfoInNote(notestr)[source]

Assumes that notestr is a string in a note={} field of a bibtex entry, and strips any arxiv identifier information found, e.g. of the form ‘arxiv:XXXX.YYYY’ (or similar).

bibolamazi.filters.util.auxfile Module

Utilities (actually for now, utility) to parse .aux files from LaTeX documents.

bibolamazi.filters.util.auxfile.get_action_jobname(jobname, bibolamazifile)[source]

If jobname is non-None and nonempty, then return jobname. Otherwise, find the basename of the bibolamazifile, and return that.

New in version 4.3: Added function get_action_jobname().

bibolamazi.filters.util.auxfile.get_all_auxfile_citations(jobname, bibolamazifile, filtername, search_dirs=None, callback=None, return_set=True)[source]

Get a list of bibtex keys that a specific LaTeX document cites, by inspecting its .aux file.

Look for the file <jobname>.aux in the current directory, or in the search directories search_dirs if given. Parse that file for commands of the type \citation{..}, and collect all the arguments of such commands. These commands are generated by calls to the \cite{} command in the LaTeX document.

Return a python set (unless return_set=False) with the list of all bibtex keys that the latex document cites.

Note: latex/pdflatex must have run at least once on the document already.

Arguments:

  • jobname: the base name of the TEX file; the AUX file that is searched for is “<jobname>.aux”. The jobname is expected to be non-empty; see get_action_jobname() for help on that.
  • bibolamazifile: The bibolamazifile relative to which we are analyzing citations. This is used to determine in which directory(ies) to search for the AUX file.
  • filtername: The name of the filter that is calling this function. Used for error messages and logs.
  • search_dirs: list of directories in which to search for the AUX file. These may be absolute paths or relative paths; the latter are interpreted as being relative to the bibolamazifile’s location.
  • callback: A python callable can be specified in this argument. It will be called for each occurrence of a citation in the document, with the citation key as single argument.
  • return_set: If True (the default), then this function returns a python set with all the citation keys encountered. Set this to False if you’re going to ignore the return value of this function.