API documentation¶

Global functions¶

rdflib_hdt.optimize_sparql()¶: Overrides the RDFlib SPARQL engine to optimize SPARQL query execution over HDT documents.

Note

Calling this function triggers a global modification of the RDFlib SPARQL engine. However, executing SPARQL queries using other RDFlib stores will continue to work as before, so you can safely call this function at the beginning of your code.

HDTStore¶

class rdflib_hdt.HDTStore(path: str, mapped: bool = True, indexed: bool = True, safe_mode=True, configuration=None, identifier=None)¶

Bases: rdflib.store.Store

An implementation of a Store over a HDT document.

It is heavily inspired by the work from @FlorianLudwig (https://github.com/RDFLib/rdflib/issues/894)

Warning

By default, an HDTStore discards RDF Terms with invalid UTF-8 encoding. You can change this behavior with the safe_mode parameter of the constructor.

Args:

path: Absolute path to the HDT file to load.
mapped: True if the document must be mapped on disk, False to load it in memory.
indexed: True if the document must be indexed. Indexed must be located in the same directory as the HDT file. Missing indexes are automatically generated at startup.
safe_mode: True if Unicode errors should be ignored, False otherwise.

add(_, context=None, quoted=False)¶: Adds the given statement to a specific context or to the model. The quoted argument is interpreted by formula-aware stores to indicate this statement is quoted/hypothetical It should be an error to not specify a context and have the quoted argument be True. It should also be an error for the quoted argument to be True when the store is not formula-aware.

addN(quads)¶: Adds each item in the list of statements to a specific context. The quoted argument is interpreted by formula-aware stores to indicate this statement is quoted/hypothetical. Note that the default implementation is a redirect to add

destroy(configuration)¶: This destroys the instance of the store identified by the configuration string.

is_safe() → bool¶: Return True if the HDT store ignores Unicode errors, False otherwise.

remove(_, context)¶: Remove the set of triples matching the pattern from the store

triples(pattern, context) → Iterable[Tuple[Union[rdflib.term.URIRef, rdflib.term.Literal], Union[rdflib.term.URIRef, rdflib.term.Literal], Union[rdflib.term.URIRef, rdflib.term.Literal]]]¶

Search for a triple pattern in a HDT store.

Args:

pattern: The triple pattern (s, p, o) to search.
context: The query execution context.

Returns: An iterator that produces RDF triples matching the input triple pattern.

property hdt_document¶: The HDT document used to read and query the HDT file.

property nb_objects¶: The number of objects in the HDT store.

property nb_predicates¶: The number of predicates in the HDT store.

property nb_shared¶: The number of shared subject-object in the HDT store.

property nb_subjects¶: The number of subjects in the HDT store.

HDTDocument¶

class rdflib_hdt.HDTDocument(path: str, mapped: bool = True, indexed: bool = True, safe_mode=True)¶

An HDT document, in read-only mode.

This class is a wrapper over the original hdt.HDTDocument class, which aligns it with the RDFlib data model.

Warning

By default, an HDTDocument discards RDF Terms with invalid UTF-8 encoding. You can change this behavior with the safe_mode parameter of the constructor.

Args:

path: Absolute path to the HDT file to load.
mapped: True if the document must be mapped on disk, False to load it in memory.
indexed: True if the document must be indexed. Indexed must be located in the same directory as the HDT file. Missing indexes are automatically generated at startup.
safe_mode: True if Unicode errors should be ignored, False otherwise.

nb_subjects¶: Return the number of subjects in the HDT document

nb_predicates¶: Return the number of predicates in the HDT document

nb_objects¶: Return the number of objects in the HDT document

nb_shared¶: Return the number of shared subject-object in the HDT document

from_tripleid(triple_id: int) → Union[rdflib.term.URIRef, rdflib.term.Literal]¶

Transform an RDF triple from a TripleID representation to an RDFlib representation.

Argument:

triple_id: 3-tuple of IDs (s, p, o)

Return:

A triple in RDFlib representation, i.e., a 3-tuple of RDFlib terms.

id_to_term(term_id: int, kind: int) → Union[rdflib.term.URIRef, rdflib.term.Literal]¶

Transform a RDF term from an unique ID, as used in a TripleID, to an RDFlib representation.

It can be used in interaction with the rdflib_hdt.HDTDocument.search_ids() method.

Argument:

term_id: The Term ID to transform.
kind: The term position: 0 for subjects, 1 for predicates and 2 for objects.

Return:

An RDFlib representation of the RDF Term.

is_safe() → bool¶: Return True if the HDT document ignores Unicode errors, False otherwise.

search(query: Tuple[Union[rdflib.term.URIRef, rdflib.term.Literal, None], Union[rdflib.term.URIRef, rdflib.term.Literal, None], Union[rdflib.term.URIRef, rdflib.term.Literal, None]], limit=0, offset=0) → Tuple[rdflib_hdt.iterators.HDTIterator, int]¶

Search for RDF triples matching the query triple pattern, with an optional limit and offset. Use None for wildcards/variables.

Args:

query: The triple pattern (s, p, o) to search. Use None to indicate wildcards/variables.
limit: (optional) Maximum number of triples to search.
offset: (optional) Number of matching triples to skip before returning results.

Return:

A 2-elements tuple (iterator, estimated pattern cardinality), where the iterator is a generator of matching RDF triples. An RDF triple itself is a 3-elements tuple (subject, predicate, object) of RDF terms (in rdflib format).

search_ids(query: Optional[int], limit=0, offset=0) → Tuple[hdt.TripleIDIterator, int]¶

Same as rdflib_hdt.HDTDocument.search_triples(), but RDF triples are represented as unique ids (from the HDT Dictionnary). Use None or 0 to indicate wildcards/variables.

Mapping between ids and RDF terms is done using the rdflib_hdt.HDTDocument.from_tripleid(), rdflib_hdt.HDTDocument.to_tripleid(), rdflib_hdt.HDTDocument.term_to_id(), and rdflib_hdt.HDTDocument.id_to_term() methods.

Args:

query: A tuple of triple patterns IDs (s, p, o) to search. Use None or 0 to indicate wildcards/variables.
limit: (optional) Maximum number of triples to search.
offset: (optional) Number of matching triples to skip before returning results.

Return:

A 2-elements tuple (iterator, estimated pattern cardinality), where the iterator is a generator of matching RDF triples. An RDF triple itself is a 3-elements tuple (subject, predicate, object) of IDs (positive integers from the HDT Dictionnary).

search_join(patterns: Set[Union[rdflib.term.URIRef, rdflib.term.Literal, rdflib.term.Variable]]) → hdt.JoinIterator¶

Evaluate a join between a set of triple patterns using an iterator. A triple pattern itself is a 3-elements tuple (subject, predicate, object) of RDFlib terms with at least one SPARQL variable.

Argument: A set of triple patterns.

Return:: A rdflib_hdt.HDTJoinIterator which produces rdflib.query.Results, per the Python iteration protocol.

term_to_id(term: Union[rdflib.term.URIRef, rdflib.term.Literal], kind: int) → int¶

Transform a RDF term from an RDFlib representation to an unique ID, as used in a TripleID.

It can be used in interaction with the rdflib_hdt.HDTDocument.search_ids() method.

Argument:

term: The RDF term to transform.
kind: The term position: 0 for subjects, 1 for predicates and 2 for objects.

Return:

An ID representation of the RDF Term.

to_tripleid(triple: Tuple[Union[rdflib.term.URIRef, rdflib.term.Literal, None], Union[rdflib.term.URIRef, rdflib.term.Literal, None], Union[rdflib.term.URIRef, rdflib.term.Literal, None]]) → Tuple[int, int, int]¶

Transform a triple (pattern) from an RDFlib representation to a TripleID.

It can be used to transform an RDFlib query before feeding it into the rdflib_hdt.HDTDocument.search_ids() method.

Argument:

triple: 3-tuple of RDF Terms. Use None to indicate wildcards.

Return:

A triple in TripleID representation, i.e., a 3-tuple of integers

HDTIterator¶

class rdflib_hdt.HDTIterator(input: hdt.TripleIterator, safe_mode=True)¶

An iterator that converts HDT matching triples to the RDFlib data model.

Args:

input: Input iterator that produces RDF triples with RDF terms in string format.
safe_mode: True if Unicode errors should be ignored, False otherwise.

next() → Tuple[Union[rdflib.term.URIRef, rdflib.term.Literal], Union[rdflib.term.URIRef, rdflib.term.Literal], Union[rdflib.term.URIRef, rdflib.term.Literal]]¶: Produce a new RDF triple, per the Python iterator protocol.

HDTJoinIterator¶

class rdflib_hdt.HDTJoinIterator(input: hdt.JoinIterator, safe_mode=True)¶

An iterator that converts HDT join results to the RDFlib data model.

Args:

input: Input iterator that yields join results
safe_mode: True if Unicode errors should be ignored, False otherwise.

next() → rdflib.query.ResultRow¶: Produce a new row of results, per the Python iterator protocol.