Python API: Filter Utilities Package

arxivutil Module

class filters.util.arxivutil.ArxivFetchedAPIInfoCacheAccessor(**kwargs)[source]

Bases: core.bibusercache.BibUserCacheAccessor

A BibUserCacheAccessor for fetching and accessing information retrieved from the arXiv API.

fetchArxivApiInfo(idlist)[source]

Populates the given cache with information about the arXiv entries given in idlist. This must be, yes you guessed right, a list of arXiv identifiers that we should fetch.

This function performs a query on the arXiv.org API, using the arxiv2bib library. Please note that you should avoid making rapid fire requests in a row (this should normally not happen anyway thanks to our cache mechanism). However, beware that if we get a 403 Forbidden HTTP answer, we should not continue or else arXiv.org might interpret our requests as a DOS attack. If a 403 Forbidden HTTP answer is received this function raises BibArxivApiFetchError with a meaningful error text.

Only those entries in idlist which are not already in the cache are fetched.

idlist can be any iterable.

getArxivApiInfo(arxivid)[source]

Returns a dictionary:

{
  'reference':  <arxiv2bib.Reference>,
  'bibtex': <bibtex string>
}

for the given arXiv id in the cache. If the information is not in the cache, returns None.

Don’t forget to first call fetchArxivApiInfo() to retrieve the information in the first place.

Note the reference part may be a arxiv2bib.ReferenceErrorInfo, if there was an error retreiving the reference.

initialize(cache_obj, **kwargs)[source]
class filters.util.arxivutil.ArxivInfoCacheAccessor(**kwargs)[source]

Bases: core.bibusercache.BibUserCacheAccessor

A BibUserCacheAccessor for fetching and accessing information retrieved from the arXiv API.

complete_cache(bibdata, arxiv_api_accessor)[source]

Makes sure the cache is complete for all items in bibdata.

getArXivInfo(entrykey)[source]

Get the arXiv information corresponding to entry citekey entrykey. If the entry is not in the cache, returns None. Call complete_cache() first!

initialize(cache_obj, **kwargs)[source]
rebuild_cache(bibdata, arxiv_api_accessor)[source]

Clear and rebuild the entry cache completely.

revalidate(bibolamazifile)[source]

Re-validates the cache (with validate()), and calls again complete_cache() to fetch all missing or out-of-date entries.

exception filters.util.arxivutil.BibArxivApiFetchError(msg)[source]

Bases: core.bibusercache.BibUserCacheError

filters.util.arxivutil.detectEntryArXivInfo(entry)[source]

Extract arXiv information from a pybtex.database.Entry bibliographic entry.

Returns upon success a dictionary of the form:

{ 'primaryclass': <primary class, if available>,
  'arxivid': <the (minimal) arXiv ID (in format XXXX.XXXX  or  archive/XXXXXXX)>,
  'archiveprefix': value of the 'archiveprefix' field
  'published': True/False <whether this entry was published in a journal other than arxiv>,
  'doi': <DOI of entry if any, otherwise None>
  'year': <Year in preprint arXiv ID number. 4-digit, string type.>
}

Note that ‘published’ is set to True for PhD and Master’s thesis. Also, the arxiv.py filter handles this case separately and explicitly, the option there -dThesesCountAsPublished=0 has no effect here.

If no arXiv information was detected, then this function returns None.

filters.util.arxivutil.get_arxiv_cache_access(bibolamazifile)[source]
filters.util.arxivutil.setup_and_get_arxiv_accessor(bibolamazifile)[source]
filters.util.arxivutil.stripArXivInfoInNote(notestr)[source]

Assumes that notestr is a string in a note={} field of a bibtex entry, and strips any arxiv identifier information found, e.g. of the form ‘arxiv:XXXX.YYYY’ (or similar).

auxfile Module

Utilities (actually for now, utility) to parse .aux files from LaTeX documents.

filters.util.auxfile.get_all_auxfile_citations(jobname, bibolamazifile, filtername, search_dirs=None, callback=None, return_set=True)[source]

Get a list of bibtex keys that a specific LaTeX document cites, by inspecting its .aux file.

Look for the file <jobname>.aux in the current directory, or in the search directories search_dirs if given. Parse that file for commands of the type \citation{..}, and collect all the arguments of such commands. These commands are generated by calls to the \cite{} command in the LaTeX document.

This effectively gives a list of entries that a particular document cites.

Note: latex/pdflatex must have run at least once on the document already.