SPARQL datasets and named graphs
I’ve been working on understanding thoroughly how RDF datasets used in SPARQL 1.1 are defined and how named graphs and GRAPH graph patterns are evaluated. I have based this write-up solely on the SPARQL 1.1 specification (mostly section 13) and intentionally not on how existing SPARQL engines or stores actually work.
Datasets
When evaluating a SPARQL query, the query is answered in the context of an RDF dataset. An RDF dataset is comprised of 1 default graph and 0 or more named graphs. Each named graph holds RDF triples. Triples may be included in one or more of the default graph or named graphs but are considered independently when coming from each graph (for example, blank nodes are not comparable).
There are three places a dataset can be defined:
- Query processor – by default, the query processor will interpret the query in terms of the default dataset. The contents of the default dataset are determined by the query processor and the SPARQL query does not affect it.
- In the query – FROM and FROM NAMED in the SPARQL query define a dataset.
- Default graph = RDF-MERGE of all of the FROM graphs (empty if none).
- Named graphs = set of graphs specified in FROM NAMED (empty if none).
- In the protocol – if the SPARQL query is executed via the SPARQL protocol, the dataset information (default and named graphs) may be specified in the protocol instead.
When executing a query, exactly one of these datasets is chosen in order of precedence:
Protocol > SPARQL > default
For the rest of this discussion, “the dataset” refers to the dataset chosen in the prior step.
Active graph
When matching triple patterns in the query, the “active graph” is used to determine the scope of matching:
- By default, the active graph is set to the default graph of the dataset.
- The GRAPH graph pattern can be used to alter the active graph. If the GRAPH graph pattern is used, only the named graphs are considered.
- Fixed GRAPH patterns specify an IRI to use as a named graph. In this case, only the specified named graph in the dataset will be used. If the IRI is not one of the named graphs in the dataset, the active graph will be the empty graph.
- Variable graph patterns specify a variable to bind to the graph of each solution. In this case, the whole graph pattern is matched against each named graph in the data set, the graph variable is bound in each solution and the results are unioned.
Examples
In all examples, assume there is a default dataset defined by the query processor with default graph = DG and named graphs NG1 and NG2. No dataset is provided in the protocol.
Example 1
SELECT ?a ?b ?c ?g
WHERE {
{ ?a … }
GRAPH <NG1> {
?b …
}
GRAPH ?g {
?c …
}}
- Dataset: There is no FROM or FROM NAMED so the processor’s default dataset is used.
- Default graph: DG
- Named graphs: NG1, NG2
- Active graph
- Containing ?a: DG
- Containing ?b: NG1
- Containing ?c: NG1, NG2
Example 2
SELECT ?a ?b ?c ?g
FROM graph1
FROM graph2
WHERE {
{ ?a … }
GRAPH <graph1> {
?b …
}
GRAPH ?g {
?c …
}}
- Dataset: There is a FROM, so the default dataset is discarded.
- Default graph: RDF-MERGE(graph1, graph2)
- Named graphs: None
- Active graph
- Containing ?a: RDF-MERGE(graph1, graph2)
- Containing ?b: empty graph
- Containing ?c: empty graph
Example 3
SELECT ?a ?b ?c ?g
FROM NAMED graph1
FROM NAMED graph2
WHERE {
{ ?a … }
GRAPH <graph1> {
?b …
}
GRAPH ?g {
?c …
}}
- Dataset: There is a FROM NAMED, so the default dataset is discarded.
- Default graph: empty
- Named graphs: graph1, graph2
- Active graph
- Containing ?a: empty graph
- Containing ?b: graph1
- Containing ?c: graph1, graph2
Example 4
SELECT ?a ?b ?c ?g
FROM graph1
FROM graph2
FROM NAMED graph3
FROM NAMED graph4
WHERE {
{ ?a … }
GRAPH <graph3> {
?b …
}
GRAPH ?g {
?c …
}}
- Dataset: There is a FROM and FROM NAMED, so the default dataset is discarded.
- Default graph: RDF-MERGE(graph1, graph2)
- Named graphs: graph3, graph4
- Active graph
- Containing ?a: RDF-MERGE(graph1, graph2)
- Containing ?b: graph3
- Containing ?c: graph3, graph4
Questions
- What happens if you specify a fixed IRI in a GRAPH pattern that is not one of the named graphs in the dataset? The specification does not explicitly cover this but I believe this statement in the spec: “The GRAPH keyword is used to make the active graph one of all of the named graphs in the dataset for part of the query.” implies that only named graphs in the dataset will return data in a GRAPH.

Hi! My name is Alex Miller and I live in St. Louis. I write code for a living and currently work for
So your question is to confirm example 3? I think you’re right, but I also don’t know if any stores actually work that way.
Your post here demonstrates something I’ve been noticing lately where I think we have some confusing language in the spec. The query processor has a “default graph”. We also have the “default graph” described in a dataset. If there are no FROM statements, then the dataset’s default graph is set to the query processor’s default graph. So the query processor is the default “default graph”. This isn’t new, as it’s exactly the same as the SPARQL 1.0 spec.
It’s also worth noting that there’s another way to set the current graph, and that’s with a SERVICE block. In that case the graph will be coming from the remote store, and may be completely out of the scope of the current dataset. This can get really hairy if the service happens to be a loopback to the current store.