Python API: Filter Utilities Package¶
bibolamazi.filters.util.arxivutil
Module¶
-
class
bibolamazi.filters.util.arxivutil.
ArxivFetchedAPIInfoCacheAccessor
(**kwargs)[source]¶ Bases:
bibolamazi.core.bibusercache.BibUserCacheAccessor
A BibUserCacheAccessor for fetching and accessing information retrieved from the arXiv API.
-
arxiv_403_received
= False¶
-
fetchArxivApiInfo
(idlist)[source]¶ Populates the given cache with information about the arXiv entries given in idlist. This must be, yes you guessed right, a list of arXiv identifiers that we should fetch.
This function performs a query on the arXiv.org API, using the arxiv2bib library. Please note that you should avoid making rapid fire requests in a row (this should normally not happen anyway thanks to our cache mechanism). However, beware that if we get a
403 Forbidden
HTTP answer, we should not continue or else arXiv.org might interpret our requests as a DOS attack. If a403 Forbidden
HTTP answer is received this function raisesBibArxivApiFetchError
with a meaningful error text.Only those entries in idlist which are not already in the cache are fetched.
idlist can be any iterable.
-
getArxivApiInfo
(arxivid)[source]¶ Returns a dictionary:
{ 'reference': <arxiv2bib.Reference>, 'bibtex': <bibtex string>, 'error': <None or an error string>, }
for the given arXiv id in the cache. If the information is not in the cache, returns None.
Don’t forget to first call
fetchArxivApiInfo()
to retrieve the information in the first place.Note the reference part may be a
arxiv2bib.ReferenceErrorInfo
, if there was an error retreiving the reference. In that case, the key ‘error’ contains an error string.
-
initialize
(cache_obj, **kwargs)[source]¶ Initialize the cache.
Subclasses should perform any initialization tasks, such as install token checkers. This function should not return anything.
Note that it is strongly recommended to install some form of cache invalidation, would it be just even an expiry validator. You may want to call
installCacheExpirationChecker()
on cache_obj.Note that the order in which the initialize() method of the various caches is called is undefined.
Use the
cacheDic()
method to access the cache dictionary. Note that if you install token checkers on this cache, e.g. with cache_obj.installCacheExpirationChecker(), then the cache dictionary object may have changed! (To be sure, callcacheDic()
again.)The default implementation raises a NotImplementedError exception.
-
-
class
bibolamazi.filters.util.arxivutil.
ArxivInfoCacheAccessor
(**kwargs)[source]¶ Bases:
bibolamazi.core.bibusercache.BibUserCacheAccessor
Cache accessor for detected arXiv information about bibliography entries.
-
complete_cache
(bibdata, arxiv_api_accessor)[source]¶ Makes sure the cache is complete for all items in bibdata.
-
getArXivInfo
(entrykey)[source]¶ Get the arXiv information corresponding to entry citekey entrykey. If the entry is not in the cache, returns None. Call complete_cache() first!
-
initialize
(cache_obj, **kwargs)[source]¶ Initialize the cache.
Subclasses should perform any initialization tasks, such as install token checkers. This function should not return anything.
Note that it is strongly recommended to install some form of cache invalidation, would it be just even an expiry validator. You may want to call
installCacheExpirationChecker()
on cache_obj.Note that the order in which the initialize() method of the various caches is called is undefined.
Use the
cacheDic()
method to access the cache dictionary. Note that if you install token checkers on this cache, e.g. with cache_obj.installCacheExpirationChecker(), then the cache dictionary object may have changed! (To be sure, callcacheDic()
again.)The default implementation raises a NotImplementedError exception.
-
-
bibolamazi.filters.util.arxivutil.
detectEntryArXivInfo
(entry)[source]¶ Extract arXiv information from a pybtex.database.Entry bibliographic entry.
Returns upon success a dictionary of the form:
{ 'primaryclass': <primary class, if available>, 'arxivid': <the (minimal) arXiv ID (in format XXXX.XXXX or archive/XXXXXXX)>, 'archiveprefix': value of the 'archiveprefix' field 'published': True/False <whether this entry was published in a journal other than arxiv>, 'doi': <DOI of entry if any, otherwise None> 'year': <Year in preprint arXiv ID number. 4-digit, string type.> 'isoldarxivid': <Whether the arXiv ID is of old style, i.e. 'primary-class/XXXXXXX'> 'isnewarxivid': <Whether the arXiv ID is of new style, i.e. 'XXXX.XXXX+' (with 4 or more digits after dot)>, }
Note that ‘published’ is set to True for PhD and Master’s thesis. Also, the arxiv filter handles this case separately and explicitly, the option there -dThesesCountAsPublished=0 has no effect here.
If no arXiv information was detected, then this function returns None.
bibolamazi.filters.util.auxfile
Module¶
Utilities (actually for now, utility) to parse .aux files from LaTeX documents.
-
bibolamazi.filters.util.auxfile.
get_action_jobname
(jobname, bibolamazifile)[source]¶ If jobname is non-None and nonempty, then return jobname. Otherwise, find the basename of the bibolamazifile, and return that.
New in version 4.3: Added function
get_action_jobname()
.
-
bibolamazi.filters.util.auxfile.
get_all_auxfile_citations
(jobname, bibolamazifile, filtername, search_dirs=None, callback=None, return_set=True)[source]¶ Get a list of bibtex keys that a specific LaTeX document cites, by inspecting its .aux file.
Look for the file
<jobname>.aux
in the current directory, or in the search directories search_dirs if given. Parse that file for commands of the type\citation{..}
, and collect all the arguments of such commands. These commands are generated by calls to the\cite{}
command in the LaTeX document.Return a python set (unless return_set=False) with the list of all bibtex keys that the latex document cites.
Note: latex/pdflatex must have run at least once on the document already.
Arguments:
- jobname: the base name of the TEX file; the AUX file that is searched
for is “<jobname>.aux”. The jobname is expected to be non-empty; see
get_action_jobname()
for help on that. - bibolamazifile: The bibolamazifile relative to which we are analyzing citations. This is used to determine in which directory(ies) to search for the AUX file.
- filtername: The name of the filter that is calling this function. Used for error messages and logs.
- search_dirs: list of directories in which to search for the AUX file. These may be absolute paths or relative paths; the latter are interpreted as being relative to the bibolamazifile’s location.
- callback: A python callable can be specified in this argument. It will be called for each occurrence of a citation in the document, with the citation key as single argument.
- return_set: If True (the default), then this function returns a python set with all the citation keys encountered. Set this to False if you’re going to ignore the return value of this function.
- jobname: the base name of the TEX file; the AUX file that is searched
for is “<jobname>.aux”. The jobname is expected to be non-empty; see