Ticket #595 (accepted enhancement)

Opened 2 years ago

Last modified 5 months ago

Manage truth maintenance in SPARQL UPDATE

Reported by: bryanthompson Owned by: bryanthompson
Priority: major Milestone:
Component: Bigdata SAIL Version: BIGDATA_RELEASE_1_2_1
Keywords: Cc: mikepersonick, bryanthompson

Description (last modified by bryanthompson) (diff)

Make it easier for people to manage truth maintenance, including disabling it, dropping the computed entailments, recomputing the database-at-once closure efficiently, and re-enabling truth maintenance.

The following extensions to SPARQL UPDATE are proposed to manage the materialized entailments.

DROP ENTAILMENTS
Drop the entailments. This is only required if you have removed some statements from the database. If you are only adding statements, then just execute "CREATE ENTAILMENTS".

CREATE ENTAILMENTS
(Re-)compute the entailments using an efficient "database-at-once" closure operation. This is much more efficient than incremental truth maintenance if you are loading a large amount of data into the database. It is not necessary to "DROP ENTAILMENTS" before calling "CREATE ENTAILMENTS" unless you have retracted some assertions. If you do not "DROP ENTAILMENTS" first, then "CREATE ENTAILMENTS" will have the semantics of "updating" the current entailments. Entailments which can be re-proven will have no impact and new entailments will be inserted into the KB. This is significantly more efficient than re-computing the fixed point closure of the entailments from scratch (that is, after a "DROP ENTAILMENTS").

ENABLE ENTAILMENTS
Enable incremental truth maintenance.

DISABLE ENTAILMENTS
Disable incremental truth maintenance.

The following pattern illustrates a valid use of this feature when assertions are not retracted. This sequence of operations is ACID against a Journal. Clients will never observe an intermediate state where the full set of entailments are not available.

# mutations before this point are tracked by truth maintenance.
DISABLE ENTAILMENTS; # disable truth maintenance.
# mutations do not update entailments.
LOAD file1;
LOAD file2;
INSERT DATA { triples };
CREATE ENTAILMENTS; # create new entailments using the database-at-once closure.
ENABLE ENTAILMENTS; # reenable truth maintenance.
# mutations after this point are tracked by truth maintenance.

The following pattern illustrates a valid use of this feature when some assertions are retracted. This sequence of operations is ACID against a Journal. Clients will never observe an intermediate state where the full set of entailments are not available.

# mutations before this point are tracked by truth maintenance.
DISABLE ENTAILMENTS; # disable truth maintenance.
# mutations do not update entailments.
DELETE DATA { triples };
LOAD file1;
LOAD file2;
INSERT DATA { triples };
DROP ENTAILMENTS; # drop existing entailments and proof chains
CREATE ENTAILMENTS; # create new entailments using the database-at-once closure.
ENABLE ENTAILMENTS; # reenable truth maintenance.
# mutations after this point are tracked by truth maintenance.

I am also wondering if there is any reason to "ENABLE ENTAILMENTS" or it that should be automatic when we call CREATE ENTAILMENTS.

Or ENABLE ENTAILMENTS could do the database-at-once closure and CREATE ENTAILMENTS could specify the set of rules to be maintained.

CREATE ENTAILMENTS "RDFS Plus"

CREATE ENTAILMENTS could default to the existing set of rules, but we also have an opportunity to change the rules that are being maintained at this point.


Mike and I also discussed some options to support this throught the NanoSparqlServer?'s REST API. The main concept was to add the following to the NanoSparqlServer? API for methods that perform mutations.

?suppressTruthMaintenance

And add a suppressTruthMaintenance method to the RemoteRepository?, probably encapsulating it within the AddOp? and RemoveOp? classes by extracting a common base class and also refactoring the update() method to accept an UpdateOp? that extends that common base class and inherits that boolean option. If you are using the REST API and the suppressTruthMaintenance URL query parameter to suppress incremental truth maintenance, the you can issue the "CREATE ENTAILMENTS" UPDATE REQUEST afterwards to update the entailments for the KB. If you have also retracted statements, then you would want to issue "DROP ENTAILMENTS; CREATE ENTAILMENTS;" to remove the old entailments before (re-)computing the entailments for the KB.

I am not yet convinced that it is a good idea to expose this feature through the REST API. Doing so makes it basically certain that the database will be exposed to mutation during a period when truth maintenance is disabled and that people will be able to read on states of the database that are not coherent in terms of the available entailments. It is much easier to encapsulate a series of changes in a single SPARQL UPDATE script. When that script runs, the entire process will be ACID. So long as the script restores entailments before it finishes, it will be impossible for clients to observe the intermediate database states.

Note: This issue was forked from #591 (SPARQL UPDATE "LOAD")

See also #840 (Turn on and off incremental truth maintenance and kick off database at once closure)

Change History

comment:1 Changed 2 years ago by bryanthompson

  • Status changed from new to accepted
  • Description modified (diff)

comment:2 Changed 2 years ago by bryanthompson

  • Description modified (diff)

comment:3 Changed 2 years ago by bryanthompson

  • Description modified (diff)

comment:4 Changed 2 years ago by bryanthompson

  • Description modified (diff)

comment:5 Changed 2 years ago by bryanthompson

Note that (as of the commits above) you can now obtain a connection from the BigdataSail?, disable truth maintenance on that connection, add and retract assertions through that connection, drop entailments (optional, but necessary if you have retracted any statements) and then recompute the database-at-once closure entirely using the BigdataSailConnection?.

BigdataSail sail = ...;

BigdataSailConnection conn = sail.getUnisolatedConnection();

try {

   // Disable truth maintenance on this connection.
   conn.setTruthMaintenance(false);

   // ADD and/or REMOVE Statements using the BigdataSailConnection

   if(removedStatements) {
   
      // You must drop the existing entailments if you have retracted any assertions
      // but this step is NOT required (or recommended) if you are just batch inserting
      // data into the sail.

      conn.dropAllEntailments();

   }

   conn.computeClosure();

   conn.commit();

} catch(Throwable t) {

   conn.abort();

   throw new RuntimeException(t);

} finally {

   conn.close();

}

comment:6 Changed 2 years ago by bryanthompson

Modified the BigdataSailConnection? code so we can change whether or not truth maintenance is enforced after the connection has been obtained. Setting this property flushes any buffered writes. Flushing the writes will trigger TM iff it was previously enabled. The state of the property is then updated.

This change makes it possible to set the truth maintenance behavior of the connection at the start of the LOAD operation execution, not when we obtain the SailConnection?.

The example above depends on this change set.

Committed revision r6500.


comment:7 Changed 7 months ago by bryanthompson

  • Summary changed from Manage truth maintenance to Manage truth maintenance in SPARQL UPDATE

comment:8 Changed 5 months ago by bryanthompson

  • Description modified (diff)
Note: See TracTickets for help on using tickets.