Jena TDB 102

1 Introduction

TDB is a RDF storage of Jena.

official guarantees and limitations

  • TDB support full range of Jena APIs
  • TDB can be used as a high performance RDF store on a single machine
  • TDB can be accessed and managed with cmd scripts and Java API
  • TDB dataset can be protected against corruption using transaction
  • TDB supports Serializable transaction level
  • TDB supports ACID transaction trough write-ahead-logging
  • A TDB dataset should only directly accessed from a single JVM at a time, otherwise data corruption may occur.From 1.1.0 onwards TDB includes automatic protection against multi-JVM usage which prevents this under most circumstances.

2 TDB Java API

The application obtain a model or RDF dataset from TDB, navigate or use it for other model or dataset.

construct a model or dataset

2 ways to specify data source:

  • directory

see Jena TDB API without Assembler

  • assembler file

the assembler syntax and sematic see under 5 assembler: a DSL

bulkloader

load data into an empty dataset fastly, using cmd utitilies: tdbloader

concurrency support

USE transaction DEEPLY, it's guaranteed.

cache

TDB support caching in different levels, from RDF terms to disk block.
When not using transaction, should call synchronization explicitly:

Dataset dataset = ...  ;
TDB.sync(dataset) ;

3 command line scripts

scripts located in JENA_HOME/bin

datasource specification

// assembler file
--desc=assembler.ttl
--tdb=assembler.ttl

// directory
--loc=DIRECTORY
--location=DIRECTORY

if not specified, --desc=tdb.ttl default is set.

commands

  • tdbloader

    Bulk loader and index builder.

  • tdbloader2

    Bulk loader and index builder. Faster than tdbloader but only works on Linux and Mac OS/X since it relies on some Unix system utilities.

    Restriction: This bulk loader can only be used to create a database. It may overwrite existing data. It only accepts the --loc argument and a list of files to load.

  • tdbquery

    Invoke a SPARQL query on a store.
    Use --time for timing information.

  • tdbdump

    Dump the store in N-Quads format.

  • tdbstats

    Produce a statistics for the dataset.

//TODO cmd tool usage

4 transaction

detailed limitations

  • Bulk loads: the TDB bulk loader is not transactional
  • Nested transactions are not supported.
  • Some active transaction state is held exclusively in-memory, limiting scalability.
  • Long-running read-transactions cause a build-up of pending changes.

API for transaction

read transaction

// use directory to specify datasource
Dataset dataset = TDBFactory.createDataset(directoryPathStr);
dataset.begin(ReadWrite.READ) ;
try {
    ...
    //dataset.abort();// abort transaction
} finally {
    dataset.commit();
    //dataset.end(); // same as commit() 

}    

write transaction

dataset.begin(ReadWrite.WRITE) ;
try {
  ...
  dataset.commit();
} finally {
  dataset.end();
}

see Jena TDB 101 Java API without Assembler for a running example.

tansaction with concurrency

2 methods:

  • shareable Dataset, sequential transaction behaviour
  • private Dataset, independent transaction

5 assembler: a DSL

see Jena TDB assembler

6 dataset and named graph

concept

An RDF dataset is composed of 1 unnamed default graph, and 0+ named graphs. Here graph is same as graph is SPARQL vocabulary.

storage

A RDF data set use an individual O.S. directory for storage.

The default graph is held as a single graph, while the named graphs are held in a set of indexes.

query

SPARQL query is fully supported in named graphs of TDB backed datasets.

2 special graph name

7 TDB configuration

TODO

8 TDB optimizer

TODO

references

Apache Jena TDB
Using TDB from Java through the API
Command line utilities
Transactions
Assemblers for Graphs and Datasets
Datasets and Named Graphs
Dynamic Datasets: Query a subset of the named graphs
Quad filtering: Hide information in the dataset
The TDB Optimizer
TDB Configuration
Value Canonicalization
TDB Design
Use on 64 bit or 32 bit Java systems
FAQs

上一篇:Python笔记-高级特性


下一篇:linux安装python