JDataStore: A Pure Java Embeddable Object-Relational Database

By: Steven Shaughnessy

Abstract: It has all the usual attributes of an embeddable database, but JDataStore is also a portable file system

Okay, I'm a little biased since I'm part of the team that developed the powerful JDataStore at Borland. But what really gets me excited is the broad applicability of the technology.

JDataStore has the common characteristics of an embeddable database - a small footprint, high performance, portability, zero administration, replication/synchronization with any database that surfaces a JDBC driver (which almost all worthwhile databases do), transaction management and crash recovery.

JDataStore is also a portable file system. A single JDataStore file can contain multiple streams. A stream can be a table, a secondary index for a table, an arbitrary file or even a serialized Java object. The JDataStore file has a hierarchical directory for organizing and locating the streams it contains.

There are several powerful ways to access the data stored in a JDataStore database. One choice is to use our industry standard - pure Java JDBC drivers. They have a built-in query engine that provides entry-level SQL-92 support. We provide a local JDBC driver for single-process, multithreaded access and a remote JDBC driver for multiprocess, multithreaded access. The primary benefit of the remote driver is to provide network access to a JDataStore database.

In addition to our JDBC drivers, you can access table and file streams with a set of standard JavaBean components. These are our DataExpress data access components that allow you to access JDataStore table and file streams directly.

SQL Access to JDataStore Using JDBC
We provide a JDBC Type 4 driver that allows local or remote access that provides:

  1. An industry standard SQL call-level interface to JDataStore
  2. Optimized querying capabilities against tables stored in a JDataStore file
  3. Multiuser access to a JDataStore across a network
If you already know SQL and JDBC, you know JDataStore.

Note that while the local JDBC access is for client or server single-process applications, it still allows for multithreading and multiple connections. So you can embed the JDataStore inside an application server using the local JDBC driver to handle an application server's requests from multiple users. The local access will perform better than the remote access when dealing with large result sets, since results using the latter must be transmitted using a networking protocol.

Direct JavaBean Access to JDataStore
JDataStore JavaBeans can be used in conjunction with JBuilder DataExpress JavaBeans to provide direct data access to tables, files and objects in a JDataStore database.

First let's look at the basic JDataStore bean. The DataStore bean is used to open or create a JDataStore database. DataStore.FileName is a key property for this bean; it specifies the location of the JDataStore database. The DataStore.TxManager property must be set if your application needs transaction and crash recovery support, and it must be set to a TxManager bean. The DataStore bean extends from the DataStoreConnection bean. For single-user access, you only need to use the DataStore bean. If you want to support multiple users, the DataStoreConnection bean provides a separate transactional context for each user.

DataExpress beans are data access components with a rich set of functionality for building database applications. The primary DataExpress component is a DataSet that can provide direct access to a JDataStore table. The typical usage model for a DataSet has three phases to it:

  1. DataSet.Provider implementation provides data from a data source such as any JDBC driver or EJB. The data provided is cached by the DataSet.Store implementation. By default, the implementer of DataSet.Store is MemoryStore, which caches the data in memory. But the DataStore and DataStoreConnection beans also implement DataSet.Store.
  2. Application manipulation of the provided data. DataSet provides a rich set of functionality for this phase including editing, navigation, sorting, filtering, aggregation, constraint management, default management, many-to-one master-detail relationships and one-to-many look-up relationships. Another powerful facet of the DataSet bean is that it's easy to bind to data-aware visual components using simple property settings. Note that although the DataSet bean is easy to bind to a visual component, it has no references to visual beans. The DataSet is a nonvisual bean, which makes it a handy bean for middle-tier database development.
  3. Resolver implementation allows for the resolving or saving of edits to the cached data in one or more DataSets to a data source such as JDBC or an EJB server. The DataSet automatically tracks all insert, update and delete operations made against its data cache. This allows the DataSet.Resolver implementation to use an optimistic concurrency approach for saving changes back to a data source.

Notice how I referred to DataSet.Provider, DataSet.Store and DataSet.Resolver "implementations." The bulk of DataSet functionality used in the second phase doesn't care about the origin of its data, what's caching it and what's going to save its edits back to a data source. This clean separation of roles in the architecture allows us to mix and match implementations for all three of these interfaces. For example, a DataSet may use a QueryProvider to provide data using a query against a JDBC data source and a ProcedureResolver that uses a stored procedure to save the changes back. In this scenario a DBA may provide more liberty when data is retrieved (arbitrary queries allowed) but may need the protection of a stored procedure to ensure the integrity of data saved back.

As I mentioned before, the DataStore and DataStoreConnection beans implement the DataSet.Store interface. This makes it trivial to adapt an existing DataExpress application to use a JDataStore table to cache a DataSet's data. You just set the DataSet.Store and DataSet.StoreName properties. Two big benefits you get by doing this are:

  • Increased capacity - since JDataStore is a database, it can store up to 2 billion rows per table.
  • Persistence for disconnected computing with transactional rollback and crash recovery.

You can also use a DataSet just to access a table in a JDataStore without having any intention of retrieving or saving data from or to another data source. The DataSet API provides a rich access layer for navigating, editing, indexing/ordering and filtering tables in a JDataStore.

Portable File System
I've talked about using JDataStore as an SQL database with a standard JDBC local or remote driver and DataExpress beans. Well, it's also a portable file system. Inside a physical JDataStore file there's a directory that associates names with streams. I've already discussed table streams. Table streams can also have several different related streams associated with them, such as maintained secondary indexes and resolving indexes that track inserted/updated and deleted rows. But a JDataStore can also contain what we call FileStreams. A FileStream implements the java.io.InputStream, but adds seek() and write() methods. FileStream is basically a random access file. There are methods off the DataStoreConnection bean to open and create FileStreams. There are also convenient DataStoreConnection.ReadObject and DataStoreConnection.WriteObject methods that will write or read any object that implements java.io.Serialization.

What's really neat about this is that a single physical file can be used to persist the data, files and objects that an application needs. And as we'll see next, the persistence engine is transactional, with crash recovery support!

Transactional Support and Crash Recovery
Transaction and crash recovery support can be enabled on a JDataStore database by setting the DataStore.TxManager property to a TxManager bean. The TxManager bean has several properties for specifying log file directories, max log file sizes, log file block sizes, and so on. These properties all have defaults. You might want to set the log file directory property to a different disk drive for better fault tolerance and performance.

As an embeddable database, JDataStore has a zero administration approach by default. A DataStore.ResponseListener event can be wired, allowing an application to decide what should be done in critical situations. Here are a couple of examples of such critical situations:

  1. JDataStore is being opened after a system crash. By default, JDataStore will start an automatic recovery process to redo work from transactions that were in progress and undo work of uncommitted transactions.
  2. When old log files are no longer needed for any active transaction or for crash recovery, the system will delete the old log file.

The JDataStore JDBC drivers surface transactional support with the default auto commit semantics dictated by the JDBC standard. When using DataStore directly with DataSet or FileStream beans, the system will automatically start a transaction on the first read or write operation, but doesn't automatically commit transactions. For these beans your application must call DataStoreConnection.commit() or DataStoreConnection.rollback() to complete a transaction.

Read-Only Transactions
A powerful multiuser feature of JDataStore is the support for read-only transactions. These transactions can't write data, but they're also never blocked from reading by other transactions. Since a read-only transaction sees data only from committed transactions, it doesn't need to acquire any locks or be blocked by other transactions with locks on the data the read-only transaction is reading.

Read-only transactions are ideal for querying or reporting data in a JDataStore while it's being updated by other read/write transactions. For applications with such needs, read-only transactions are also more likely to make use of higher-end servers with extra CPUs, since they never block waiting for locks from other transactions.

Tools: Visual Component Designer, JDBC Explorer and DataStore Explorer


Figure 1: DataStore Explorer

JBuilder ships with several tools that can be used with a JDataStore database. The DataStore Explorer provides a visual view into a JDataStore file system. It provides an array of utility functions for viewing and administering a JDataStore database. The DataStore Explorer provides a hierarchical tree view of the directory on the left side of its frame. On the right is a view of the selected stream. There are several preregistered readers for different stream types, such as table and image streams (e.g., .gif and .jpeg files). The Explorer also allows other readers to be registered. Figure 1 shows the DataStore Explorer.

The JDBC Explorer is a nice utility in JBuilder that provides generic explorer functionality to any JDBC driver, including JDataStore's.

JBuilder's visual component designer provides support for visually designing JDataStore and DataExpress beans.

Replication/Synchronization
It's critical for an embeddable database like JDataStore to be cooperative and complementary with other popular databases. By leveraging the built-in provider/ resolver capability of the DataExpress DataSet bean, JDataStore is friendly to any DataSet provider/resolver implementation. Since DataExpress has excellent support for JDBC-based providers and resolvers, we provide excellent support for just about any database on the planet. JDataStore supports most JDBC data types, which makes it easy to receive data from a broad variety of JDBC data sources. Data types supported include Java Object, String (up to 2 billion characters), Time, Timestamp, Date, InputStream (blobs), BigDecimal, double, float, long, int, short and boolean.

A typical usage scenario might go like this: an application uses DataSet beans with query providers to retrieve a snapshot of data needed from a primary datastore using a JDBC driver. Since the DataSet bean is using DataStore for its DataSet.Store property of all its DataSet beans, the retrieved data is automatically persisted in the JDataStore database. Now the application can go offline with the JDataStore database. The data in JDataStore can be read or written to use the JDataStore JDBC drivers or the DataSet data access beans. Tables inside a JDataStore automatically track all insert/update/delete operations. So when the application needs to reconnect to the primary data source, the DataSet resolvers know what edits need to be saved back.

The DataSet resolver capability is a synchronization technology. With a single method call to save changes, the resolution mechanism will automatically save back edits to the data source. For the JDBC query resolver, SQL insert/update/delete statements are issued. For the JDBC stored procedure resolver, the application-specified stored procedures are called for insert/update/delete requests.

Note that the resolver capability can automatically identify one-to-many relationships and automatically order insert/update/delete operations to maintain the referential integrity of a database. For example, you can't add line items for an order until the order exists for the line items. In this case the resolver will know to add the order before adding the line items. Conversely, an order can't be deleted until its line items have all been deleted.

Of course, there are always situations that can't be managed automatically when reconciling edits back to a data source. For these cases there are resolving events that can be wired to specify what the resolver should do.

Completely Internationalized
Borland has a long history of employing people from all over the world. I'm one of the only barbarians on our team speaking only one language. The JDataStore team includes people who speak (in alphabetical order) Chinese, Danish, French, Portuguese (Brazil), Indian, Japanese, Swedish and Yugoslavian.

We support a Java String data type that is Unicode. Notice how the DataStore Explorer was showing Japanese and English names in the same grid control view. But we didn't stop there. Our international contingent helped to develop and test a scheme that used the JDK 1.2 Collation keys to provide indexing and ordering. So if the JDK can order data for a given locale, JDataStore can too. In fact, the second benchmark test in Table 1 shows timings for creating secondary indexes on Japanese data.

Another nice feature, supported by our secondary indexes, is the ability to create several different indexes on the same columns using different locale settings.

Performance
JDataStore is some of the fastest Java code we've seen. I performed the simple tests in Table 1 on a 300 MHz Pentium computer by taking timings on the second execution of each test. The Java VM maximum heap setting was set to -mx32m. I constructed a parent and child table for these tests. Following are the SQL statements used to construct these tables:

CREATE TABLE PARENTS(
PARENT_ID INT,
FIRSTNAME CHAR(30),
LASTNAME CHAR(30),
SALARY DOUBLE,
DOB TIMESTAMP
)

CREATE TABLE CHILDREN (
CHILD_ID INT,
FIRSTNAME CHAR(30),
LASTNAME CHAR(30),
PARENT_ID DOUBLE,
DOB TIMESTAMP
)

The basic storage of data shows a nice linear increase as more rows are added. The first column timings use the DataSet JavaBean to add rows with a nontransactional JDataStore database. The second column shows the same operation applied to a JDataStore database that has transactional support enabled. The third column executes the SQL insert statements against the JDataStore JDBC driver. Transactional support is optional with JDataStore. The benefits of transaction commit/abort semantics and crash recovery do cost a bit. Nevertheless, the transactional timings are still good. An application may want to disable transactional support temporarily to quickly load data from another data source.

The second test in Table 1 shows timings for creating indexes on a JDataStore.

Timings for sorting Japanese data are slower because of the initial construction of Collation keys. Luckily, read access to the index doesn't incur this overhead after it's created.

The last test is a simple join on the Parents and Children table. This joins 10,000 PARENT rows to 100,000 CHILD rows. Here's the query that was used:

select"PARENTS.FIRSTNAME, PARENTS.LASTNAME,
CHILDREN.FIRSTNAME, CHILDREN.LASTNAME"
from PARENTS, CHILDREN
where PARENTS.PARENT_ID = CHILDREN.CHILD_ID

This test completed in 2.26 seconds!

By the time this article is published, you should be able to pick up the JDataStore demo application I used to produce the performance results at www.borland.com/devsupport/jbuilder/downloads/.

Deployment Options
Considering the functionality delivered, the code for JDataStore is very compact. What's nice about the JDataStore functionality is the several deployment options that can be exercised to reduce the footprint. For example, SQL, transactional support and replication/synchronization are all options that can be excluded. If you deploy all functionality, the total class file size is a little under 1.5 megabytes.

Note: If you use the JDataStore remote JDBC driver to access a JDataStore server, the footprint is only 125 K.

Using JDataStore
It's easy to get creative with JDataStore. Some of the key differentiators from other databases include pure Java implementation, zero administration, synchronization/replication functionality, SQL and direct navigational access, and a single file storage that also serves as a transactional file system for arbitrary files and Java Objects.

Here are a few ideas for using JDataStore:

  • Web servers: Internally, we have a build-management application that runs on a Web server using Java Servlets to access and update information from client-side browsers. It's easy to embed a JDataStore server into a server-based application.
  • XML: It's all the buzz these days, and a great medium for exchanging data between applications, particularly between server and thin client-side browsers. A popular application of XML is to dynamically generate XML on a server to service read requests from thin browser clients. It's also useful for the client to be able to post changes back to a server using XML. Ultimately, much of the data transmitted using XML probably came from a database. JDataStore is a database so XML can be dynamically generated for read requests. But as discussed earlier, JDataStore also has a built-in capability for reconciling edits to its data, which is useful when developing applications that allow browser clients to edit data managed by a server.

XML is also used for storing arbitrary resources, properties and documents. JDataStore can be used as a repository for such files. They can be stored as FileStreams inside the JDataStore file system or as InputStreams inside a column of a JDataStore table. So an application can easily store the majority or all of the files it needs, along with the tables it accesses, inside a single JDataStore database file.
Disconnected or mobile computing models: As mentioned earlier, JDataStore, used in conjunction with the DataExpress JavaBean components, is very good at pulling data from any JDBC data source, tracking edits to the data and reconciling edits back to the original data source. This makes it ideal for applications that perform a lot of processing offline.
Embedded Java applications: Small footprint, low maintenance and the ability to synchronize with other data sources make JDataStore ideal for these kinds of applications.

JDataStore is well suited to a variety of general database problems; those above are just a few of the interesting applications. JDataStore is focused on being a small, efficient, low-administration database that complements popular native database systems like Oracle, DB2, MS SQL server and Sybase.

About the Author
Steven Shaughnessy is a senior staff engineer at Borland and a member of the JBuilder team that is developing data access components and the JDataStore embeddable database. He can be reached at sshaughnessy@borland.com


Server Response from: ETNASC03