SELECT * FROM persons AS XML

Originally posted 2009 FEB 18.

[Update 2015 July 09]

At last 10 years after XML native support, Microsoft is announcing native JSON support in its next version of SQL Server 2016. Which is nice. It is shame PostgreSQL already has pretty decent native JSON since 2010. Ah sorry … I just can’t resist “snarky” comments.

[Update 2015 Mar 18]

Found today an interesting link. Apparently, ORACLE does JSON.

[Started on 2008 Nov 4]

This text originates from me, Dusan B. Jovanovic. Any similarities found elsewhere, are purely accidental. It, and any attachments, do contain confidential information and are subject to copyright and other intellectual property rights. It is only for the use of you, the reader. You may not copy, forward, disclose, save or otherwise use it in any way if you are not the intended reader or responsible for delivery. If you receive this text by mistake, please advise me, and erase it immediately. I (Dusan B. Jovanovic) may monitor the content of texts and emails within my network to ensure compliance with the policies and procedures. Any email or text is susceptible to alteration and its integrity cannot be assured. I (Dusan B. Jovanovic) shall not be liable if the text or message is altered, modified, falsified, or edited.

DBJSON™

Release 1.5

©2008-2009-2016 by Dusan B. Jovanovic

First of all: What is it?

DBJSON™: Provides syntax and protocol, for two-way communication, with the database in the form of a stream of JSON objects.

Why DBJSON

DBJSON is JSON “formatted” Database Communication Protocol + Interface. For talking to the RDBMS, DBJSON replaces SQL. DBJSON is JSON interface to the data based online (and offline) services. DBJSON is not SQL native DB extension to return (or index) JSON strings and data types.

You already forgot XML, now forget SQL. Think JSON.

SQL is the scripting language of talking to databases. To the caller (user), SQL enabled DB server does not return SQL. Instead, it returns a pointer into the binary result structure (aka cursor). In essence a memory block of text. SQL based communication is a text-based protocol. Ok then, what seems to be the problem?

Here we have one key issue: Transformations. Transformation chain, is always required to transform to/from SQL and Result pointer, and applications using the RDBMS.

Worse. In the case of web app’s a whole sequence of transformations is required. Here are the two common implementations and their Run-Time transformations “From the data to the HTML”:

`DB`	`->`	`SQL`	`->T1->`	`ADORecordSet`	`->T2->`	`JavaScript`	`=`	`HTML`
`DB`	`->`	`SQL`	`->T3->`	`ServiceXML`	`->T4->`	`AJAX`	`=`	`HTML`

Flow from left to right is the data flow. SQL text queries are issued from either ADO (that was “the way” a long time ago) or Service, both on the server-side. Somewhere in the “middle tier”. We have two chains: request to the DB and reply from the DB. Thus each transformation chain requires (at least 2 transformations).

T1 — transform from SQL to ADO (or simillar) recordset (or .NET dataset)
T2 — transform from recodset/dataset to the javascript
T3 — transform from SQL to XML (sometimes can use native SQL ‘AS XML’ mechanism)
T4 — transform from XML to javascript

That is a lot of code to go wrong. Now imagine, for example, an SQL query result delivered as a set of DBJSON™ objects. Where DBJSON (JavaScript) is an inbuilt (aka native) RDBMS extension, to return the result as a set of JSON objects. In a simple string.

DB -> DBJSON -> ServiceDBJSON -> AJAX == HTML

This is just an output from the DB. In this imaginary world, DB + DBJSON + AJAX, chain would be extremely simple compared to the current Result Pointer to HTML juggling through a sequence of transformations. While there are no transformations present on the diagram above. AJAX is consuming DBJSON as any other JSON formatted strings: instantly. And since DB returns DBJSON formatted text, Service is just passing it to the waiting caller. In the shape of the AJAX components on the client-side. Of course, all the security issues are solved at the point of Service.

RDBMS result pointer is this special binary entity. Which is essentially an in-memory block of text lines. Thus DB result “cursor” (standard binary DB output structure), DB can not return the full set of related DB entities all together with relations described. This is why (as a known example) .NET DataSet was invented. Opposite to this, DBJSON™ full DB SET (relations description included) would be (is?) natural and easy to implement.

It should be possible to add DBJSON™ facade/mediator component, to any legacy RDBMS. Thus to bypass DB result “cursor” transformations completely.

It has to be said, this is exactly why countless OO wrap-ups have been invented, like ADO, OLEDB, JDBC, etc … DBJSON is not very far from the same aim. But it is fundamentally different. It replaces SQL, it is a completely different paradigm of communicating with a database. And last but not least it is JavaScript.

Legacy Data Applications

All of the current data-managing legacy infrastructures can get the same DBJSON facade/mediator component. Just imagine WMI with the same DBJSON^tm facade in front of it. If you ever had programmatically used WMI, I am sure you will find the DBJSON^tmone obvious and superior solution, to the current quirky WMI concept of “classes”, which imposes yet another kind-of-a-oo language to learn. (for further reading please refer to MOF Syntax for WMI Data and Event Blocks )

WMI actually uses standard RDBM (desktop version) with a different non-standard SQL facade (aka QOL). This architecture required a lot of implementation effort and development to render a questionable result that we know and hate today as WMI API. So why QOL? Because SQL is not a natural fit (and was an overkill) for a desktop database storing a hierarchy of objects. Hierarchy of objects? But this is DBJSON by default: natural representation of the hierarchy of objects!

Also. Differences among numerous data binding and usage mechanisms on .NET/WIN platform are questionable and not very clear. OQL.NET vs WMI OQL vs SQL 92 = Confusion. Which one to use? When? Why? How?

Here you might excitedly point out: LINQ!? Where I will immediately ask: Why LINQ? Here is why: Exactly because of the need to remove the paradigm shift which in turn required programming effort of constant transformations to/from SQL and C#. LINQ is a kind-of-a (non-existent) C#SQL facade to be used by C#. OLE DB for OLAP (MDX) is exactly the same. Yet another query language. Yet another new complicated thing to learn.

My proposal

To develop DBJSON^tm protocol and service facade, to each end every data based technology: WMI, AD, OLAP, RDBMS, WBEM, etc … Enable people to use DBJSON^tm with AJAX to develop very fast HTML based app’s. Imagine this JavaScript snippet:

// connection string is actually URI to the REST end-point

var db = DBJSON.open(” http://myserver.com/dbjson_enabled_db” ) ;

// from part is separated from querying

var from = “products” ;

// querying language is superset of XPATH

var where = “[@price > 1000]“ ;

// every query is asynchronous

db.select( on_select, from, where) ; // oql select statement as a string

// NOTE: rows are transformed into JSON objects, and columns are their properties.

// NOTE: there is no ‘what’ part in the query. All the columns are carried over, but not all the rows

// this is because actual query might be executed localy on the DBJSON set returned.

// Here we receive the DBJSON result

var dbjson_result = null ;

function on_select ( dbjson_arrival )

{

if ( dbjson_arrival.error != null )

{

// DBJSON error sub-object contains full error description, etc

} else

dbjson_result = dbjson_arrival ;

}

// now we can use the resulting DBJSON set of objects

// as already defined by initial WHERE XPATH statement

apply_to_datagrid( dbjson_result.select() ) ;

// or we can *locally* defined another select statement

apply_to_datagrid( dbjson_result.select(“[@name = 'ALFKI]“) ) ;

// or we can do update

dbjson_result.update(“[@name = 'ALFKI]“, “DBJ”, on_update ) ;

// or deletion

dbjson_result.delete(“[@name = 'ALFKI]“, on_delete ) ;

// delete the whole

dbjson_result.delete( on_delete ) ;

// or insertion

dbjson_result.append( { name: ‘DBJ’ }, on_append ) ;

// above will try to append a row where only column ‘name’ will have value

The above is based on an imaginary JavaScript library object called DBJSON. Which communicates to the DBJSON Mediator, through the DBJSON Protocol.

SQL and beyond

[This is already implemented in a different way in PostgreSQL and a few other RDBMS’s]

DBJSON Mediator and DBJSON communication protocol are actually not necessary to unleash DBJSON. How? The core “trick” is to make DBJSON part of the database itself and thus make it ubiquitous. There is one very simple “thing” that can be done to SQL, by database makers like Microsoft, Oracle, etc. I would like to see SQL extended with a “DBJSON” formating directive, so

Instead of saying :

1	SELECT * FROM persons AS XML -- standard in TSQL

One could say:

1	SELECT * FROM persons AS JSON -- return DBJSON formated reply

And receive back DBJSON result. Having this used inside the javascript-based app’s would be a huge productivity boost. The whole SQL+XML+XSL paradigm transformation issue gone! This solution will still require the SQL to talk to the DB, but not the usual transformations to process the result. Here is the resulting web app, transformation chain.

`DB+DBJSON`	`SQL (to DB)` `(from DB) DBJSON`	`DBJSONService`	`DBJSON (to Svc)` `(from Svc) DBJSON`	`AJAX`	`Browser`	`HTML`
`Server`		`Server`		`Browser`

DBJSON Service here is the web service transforming DBJSON messages to the SQL for the DBJSON enabled database, which returns DBJSON messages, through the “magic” of the “AS DBJSON” formatting directive. As a result, the browser-side has no knowledge of SQL whatsoever. And Service has to do the transformation only for the traffic to the DB, not from it.

JSONHttpRequest

Less ambitious (maybe) but equally useful project would be a universal browser DOM object that acts as a DBJSON mediator between JavaScript, in the page or server-side component, and on-line Service of choice.

/**/

var dbjsonX = JSONHttpRequest() ;

dbjsonX.connect(”… connection string URI …. “, on_connect) ;

dbjsonX.select(”* from category where name = ‘ALFKI’”, on_result) ;

you know the drill ... async interface etc ...

now use dbjson_rezult as any other JSON object …

JSONHttpRequest could be built to mirror the model of the XMLHttpRequest. In essence and in accordance with the current specification, the whole XHR object description can be taken over and only one property has to be changed, to arrive at instant JSONHttpRequest spec :

Instead of : read-only attribute Document responseXML ;

It should be: read-only attribute DOMString responseJSON ;

Why stop here? Ditch DOM, make JSON browser

We are here and now meeting with very big issues like why is DOM Document based on XML, not JSON? Or, why we do not have MIME type for JSON¹. Or JSON to replace HTML ? Maybe, but currently, I am not sure JSON could replace HTML? This would require a browser that would render pages from JSON code like this (wild) example :

/*helloworld.json */

{ html : {

header : { title : "JSON", script : { source: "dbjquery.js" } } ,

body : { p : { id: "paragraph", text : "Hello World!"} ,

onload : function ()

{ alert("Indeed...") } }

}

This little snippet in essence peeks into the future (one of the possible futures) where HTML, DOM and XML are made redundant. And replaced with JSON*DM (Json Document Model).

Architectures using DBJSON

All the architectural solutions of databases with DBJSON interface fall into three categories:

A: Native DBJSON database

B: DBJSON mediator to RDBMS

C: DBJSON SQL Extensions

DBJSON Native

Native database of DBJSON^tm objects. This is a kind-of-a hierarchical database. This is also a kind-of-a OO database. It contains DBJSON^tm objects. It does not require transformations of relational DB entities to/from DBJSON entities. The internal user language of the DBJSON native DB is JavaScript. Stored procedure and functions in that DB are written in JavaScript, and send receive only DBJSON messages.

DBJSON Mediator

This is a separate component that provides DBJSON interface to the user/caller on one side and communicates with the RDBMS on the other side.

DBJSON^tm Mediator can be local (running on the same machine as the caller) or global (running on the same machine as the RDBMS it uses)

DBJSON^tm Mediator can be an in-process or out-of-process component.

The out-of-process mediator is very interesting because it can act as a cache of DBJSON objects. It also can be synchronised with RDBMS in the back in a separate parallel process.

DBJSON^tm can serve many users and use many RDBMS’s in the same time, thus acting as an TP Monitor.

One DBJSON Mediator product would be required to comprehensively mediate to one RDBMS product. DBJSONSQLSVR , DBJSONORACLE, DBJSON*MySQL, etc.

The messaging protocol

DBJSON is a JavaScript interface to the database. DBJSON^tm interface is a kind-of-a message-based interface. Users are interacting with RDBMS through a message-based interface. Until now SQL was the language that was used to make messages to call RDBMS. Return was in a form of the binary blob, which was readable with a help of an iterator called “cursor”. This is a curious asymmetry.

DBJSONtm interface communication protocol is symmetrical. Stream of DBJSON objects is flowing both ways: to the DB and from the DB. DBJSON is actually a communication protocol based on JSON. (please visit http://json.org)

DBJSONtm message is always one single JSON object. This is it:

/* one dbjson message */

{

dbjson : {/* message content */ }

}

As obvious, dbjson object is one single JSON object with a single property called ‘dbjson’. The type of a ‘dbjson’ property is an object. The whole dbjson message is always inside this object.

A slight detour: Why XML?

XML was invented as a “markup language”. Noting less and nothing more. XML might be OK, as an inter-system “lingua franca” (universal language), in today’s SOA world, of high fashion. But… Big problem is that XML needs an especially tuned transport solution for faster inter-system message transport. And no one yet has the same up with a satisfactory solution for that. Compression won’t do here, simply because software and time, is needed to compress/decompress every call/reply to/from XML. In all current SOA solutions, min 50% of the code and software layers is devoted to XML compression/decompression and coding/decoding. And do not forget to multiply that, with a number of different languages and runtime environments connected to the XML message bus.

And also: Why XSLT?

XML+XSLT is rendered as an over-engineered solution. Complex to use. and debugging is very difficult. XML+XSLT=HTML looked at until recently as a “winning formula” … but in real life, this combination is difficult to use. This kind of development renders HTML designers job, almost impossible. Contrary to that, JSON+AJAX=HTML is not so mature (old?) but is a much more natural fit, for HTML UI adorned apps. Outcome: this development is much less costly than XML+XSLT based one.

Therefore the big issue is: XML requires transformation to/from platform/applications native language. Big solution: JSON is programming language (JavaScript), thus the transformation step is not necessary. Provided application is developed in JavaScript. Where this is exactly how each and every web application front end is developed.

XML can be transformed with JavaScript to HTML, without XSLT, but nowhere near so naturally and quickly as JSON. That is because JSON is JavaScript. With JSON there is no paradigm transformation. XML is a different paradigm.

There is also a concept of XML serialization. Where run-time environment “contains” mechanisms for transforming XML to. from platforms native representation of objects. Some say that XML “serialization” delivers a “natural solution” to the code/decode obstacle. An example is .NET with ‘attributes’ developed for that. But developers know it turns out to be anything but natural.

Formalisation of XML to JSON mapping

I am not in the business of reinventing “hot water” here. Good people have spent mountains of man-hours to define and deliver XML goodies like XSD, XSL, etc. I, of course, have nothing against any of these. I just advocate DBJSON and JSON as better notation than XML. And for a different purpose than document annotation. Also, much easier to use when JavaScript is on both sides of the communication lines. For formal discussion and rules on this subject I can refer you to these two url’s :
http://msdn.microsoft.com/en-us/library/bb924435.aspx http://www.webmasterworld.com/xml/3603303.htm
So, over there we have the conceptual formalization, to allow us to map “everything” from the XML to JSON.

Back to the main theme

Travelling further ‘down’ from the top-level DBJSON object is the top-level syntax of DBJSONtm messages (described in JSON notation). These are called top-level types, and their names are reserved DBJSON names.

{ dbjson : { guid : "..." , // GUID of the message sender : "..." , // GUID of the sender selector : {}, descriptor : {}, declarator : {}, updater : {}, terminator : {}, transactor : {}, connector : {}, rezult : {} } }
‘guid’ and ‘sender’ properties are solving the issue of distributed systems typeless messaging. I can refer you to THIS URL , which might be used just as an “inspiration”.

The rest conceptually maps to the SQL protocol, used to communicate with the DB. Any of these eight top-level DBJSON types can appear once in one DBJSON message, that is sent to the data service or received from it. Top-level types can appear only on the top level. Further down I will define structure and rules for each of these fundamental, aka “top-level”, types.

dbjson.rezult

DBJSON message that has arrived from the data source. It is again a set of JSON objects representing the data source result set. Example of a dbjson.rezult would be :

{

dbjson : {

guid: "..." , sender : "...",

rezult : { set : [ { 'id': 1 }, { 'id':2 } ] }

/* 'dbjson.rezult.set' is a reserved name,

'id' is not, it is a column name */

}

The above simply maps SQL result to the dbjson.rezult. It does not contain any elaborate ‘things’ like table and column descriptors etc. Thus it is not ‘self describing’. Unless we add to the dbjson message top level types which make it self describing :

DB = {

dbjson : {

guid: "..." , sender : "...",

selector : { statement : "id from category where name = 'ALFKI'" },

descriptor : { table : { name: 'category.dbo', columns : {} }},

rezult : { set : [ { id: 1 }, { id:2 } ] }

}

DB.dbjson.rezult.set[0].id

/* is 1 */

To make the above message even more self-describing we can add a connector object which will reveal the name of the data source, etc.

DBJSON Entities

There is a number of DBJSON entities that are directly mapped from the SQL world, as DB entities. The following table is a mapping of SQL DDL entities and concepts.

SQL DDL entity	DBJSON entity	Comment
`server`	`{ server : {name : "server name"}}`	`'name' is the only mandatory property.`
`database`	`{ database : {name : "db name"}}`	`'name' is the only mandatory property.`
`table`	`{ table : {name : "table name",view : true, /* this is view */`

1	columnset : [/* array of column descriptors */]

‘name’ is the only mandatory property.

column {

column : {name: "column name",type :  "db type name",

FK : true, /* or false */

PK : true, /* or false */

ID : true, /* or false */

nullable : true /* or false */

}

‘name’ and ‘type’ are the only mandatory properties.

link{ link : {from : {table : {},

column : {} ,

one : true

/* or */

many : true

to : {

/* same as 'from' */

}

}}

DBJSON data types

There is a number of DBJSON data type presentations that are directly mapped from the SQL world of data types. Here obviously we might meet specific types as defined by some RDBMS-es but not the others.

DB data type	DBJSON data type	Comment
varchar	“string”	JavaScript string
int	123	JavaScript integer
float	123.456	JavaScript float
binary	” … “	64 bit encoded string

Conclusion

All in all, it is clear that this kind of protocol and notation is much more forgiving than SQL. The syntax is well known, powerful and simple at the same time. Because this is just JavaScript. At the same time, opposite of SQL, DBJSON is a message “packaging” protocol, allowing for unlimited nesting, presence of functions, etc. Great care was taken to make all of this flexibility available but still not wildly foreign to standard relational databases, and their users.

At the end of the day, DBJSON protocol is a question of a contract between the message sender and message consumer.

The whole web applications community will have to make an effort to decide and standardize on one single and consistent DBJSON protocol specification. At the same time carefully avoiding falling into the “jack of all trades” trap, DBJSON must not become “a little bit for everyone, but not enough for anyone”.

Further Reading

Mapping Between JSON and XML: http ://msdn.microsoft.com/en-us/library/bb924435.aspx

DBJSON(tm) idea is by no means finished. I am working on it right now.

1: That is not true. The MIME media type for JSON text is application/json . The default encoding is UTF-8. (Source: RFC 4627). RFC 4627 — The application/json Media Type for JavaScript Object Notation (JSON)

AI ROI — Bridges to ROI