Baconbutty - Programming

Basic SweetXML Parser

Contents

What is SweetXML?

SweetButty 1.0

Example of Use

How close is it to the SweetXML specification?

Let's Play

What next?

Third Party Dependencies

Licensing

The Downloads

The SweetXML proposal has caught my interest; and in a mad moment of fun, I have cobbled together a really rough, but working, JavaScript parser.

Lets call it some terrible name such as SweetButty, and give it a version, 1.0, as it does work, and a major version number sounds a bit more friendly than 0.1.

So here you have it: SweetButty 1.0 - with added XRegExp.

Tested in Internet Explorer 6, Firefox 1.4, and Opera 9. Not yet tested in Safari.

What is SweetXML?

It is an interesting proposal, that lies somewhere between the verboseness of XML and the spartan nature of JSON.

Lets compare them with something really trivial; avoiding attributes for the moment.

XML

<quotes>
	<quote>
		<text></text>
		<attribution></attribution>
	</quote>
	<quote>
		<text></text>
		<attribution></attribution>
	</quote>
</quotes>

JSON

var quotes = [
	{
		"text": "",
		"attribution": ""
	},
	{
		"text": "",
		"attribution": ""
	}
];

SWEET XML

quotes
	quote
		text: ""
		attribution: ""

	quote
		text: ""
		attribution: ""

Strengths and Weaknesses

On a quick review, its main strengths seem to be:-

The loss of all that markup noise < > / { ] ,
Only ' and '' need to be escaped, as ' and " respectively.
It retains the same semantic information as full XML.
It can support attributes in a clearer manner than JSON perhaps can.

Its main weaknesses seems to be that:-

It depends on indenting to signify nesting of nodes. Clearly this indenting must be precise and consistent in order to accurately parse the document, and this does carry some risks. A missing space, and you could completely loose the tree structure.
It would not handle encoding of arrays as efficiently as JSON.
It is not supported by browsers, compared to JSON (which is a native part of the JavaScript language) and XML (for which all main browsers provide parsers).
Because markup is not so obvious, it could become a little more difficult to follow for long and complicated documents.

SweetButty 1.0

SweetButty 1.0 processes the SweetXML document line at a time and converts it into a node tree.

It has a single constructor var instance = new classSweetButty. The class bit of the name is optional: it is just my idiosyncrasy.

At present it has just two methods in the interface (and there is no guarantee that the interface will remain fixed at the moment):-

setIndentUnit(indentUnit : String) : void

This tells the parser what indent unit, e.g. a TAB or 4 SPACES, your document is using. It is obviously important to specify this before you parse the document.

parse(sweetXMLMarkup : String) : classSweetButtyNode

This parses your markup, and returns a classSweetButtyNode. A classSweetButtyNode is just an Object created using a private constructor function classSweetButtyNode() {} ), or null in the event of parsing failure.

There is no error checking or reporting yet.

One thing is important to note. The parser will always insert a new node, called root at the top of the parsed node tree.

A classSweetButtyNode is an Object with the following properties:-

nodeType : Number

This gives you:

1 = Element

3 = Text

4 = CDATA-Section

8 = Comment

nodeName : String

This gives you <tagName>, #text, #comment or #cdata-section.

nodeValue : String

This gives you the text of a Text, CDATASection and Comment node, but an empty string for Element nodes.

[0] ... [N] : Node

These are the childNodes, but accessed through a direct accessor, rather than a childNodes collection.

<attribute-name> : String

The value of any attribute on the Node.

<node-name> : Node or String

There are 4 possibilities here:-

If the node has a single child with that name, which itself has a single text node, then you will get the value of that text node.
If the node has a single child with that name, which itself has multiple sub-child nodes, then you get child node object.
If the node has multiple children with that name, each of which has a single text node, then you get an array of strings.
If the node has multiple children with that name, each of which has sub-child nodes, then you get an array of child nodes.

See examples below if this sounds a little confusing.

Example of Use

Parsing the above examples using parse() will enable you to do the following:-

var sweetButty = new classSweetButty();
sweetButty.setIndentUnit("\t");
var root = sweetButty.parse(markup);

/* All of these are equivalent */
alert(root[0][0].text);
alert(root.quotes[0].text);
alert(root.quotes.quote[0].text);

Which is a bit similar to JSON and a bit similar to the JavaScript for XML specification E4X.

How close is it to the SweetXML specification?

Items not implemented

It is a pretty complete implementation, except for:-

Does not recognise a DOCTYPE.
Does not do much with namespaces yet.

Proposals not in the specification

The specification does not provide support for #cdata-section. My proposal is to prefix a text node with ! to signify that it is to be treated as a cdata-section. Thus:-

quotes
	quote
		text : !"This is a cdata-section"
		attribution : "This is a text node"

Let's Play

OK, the dry bit over with, here is a test page to play with. The necessary scripts have already been loaded into this page.

Remember : you must specify the inent unit you are using. The default is 4 spaces.

What next?

I don't really know yet.

It needs a bit more work perhaps to qualify as a project : or at least its own project page on my site.

Tentative road-map could be:-

Who cares? XML and JSON are perfectly fine; just put your feet up and leave it at that.
Performance improvements - it is a bit slow.
Support for DOCTYPE and namespaces.
XML and JSON converters
Error reporting
Better class structure for nodes
Mutation events (insertion etc).

Third Party Dependencies

In SweetButty 1.0 I make use of the excellent XRegExp0.2 utility for RegExp parsing with named captures. This is ©Steven Levithan. It is also dependent on whatever third party software is used within the XRegExp0.2 code.

Note : XRegExp must come first in the script.

Licensing

At present I have not decided what the ultimate licensing terms for this will be. I am looking at the GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007, but I need to understand it first.

In the meantime, if you download SweetButty 1.0, the licence terms are as follows, which you are deemed to agree to by downloading, running, or modifying SweetButty 1.0:-

SweetButty is free software.
You may copy, run and modify SweetButty 1.0 where this constitues use for domestic purposes, or use for the purposes of a not-for profit organisation, and not use as part of a business.
You may copy, run and modify SweetButty 1.0 as part of a business, but only for reasonable evaluation purposes.
If you make any copy or modification, you must reproduce on it my copyright notice and these licence terms; and, if you are distributing a modification, you must identify by comments in the code, what the modifications are.
You may not distribute copies of SweetButty 1.0, whether alone, or as part of any other program, except on these licence terms.
You may not sell copies of SweetButty 1.0, whether alone, or as part of any other program.
By downloading SweetButty 1.0 you agree that it is copied, run and modified by you at your sole risk, and that it is provided as is, without any promise, warranty, condition, or guarantee of any kind relating to its condition, compliance with description, fitness for purpose, or freedom from bugs or errors.
I assume no duty of care to you in relation to SweetButty 1.0, and disclaim and exclude any liability to you (whether in statute, contract, tort, negligence or otherwise howsoever) for any loss, damage, or liability you may suffer through copying, running, modifying or distributing SweetButty 1.0 or doing anything else with SweetButty 1.0 (whether permitted by this licence or not).
This licence may be terminated by me, and the terms of this licence may be substituted or replaced by me, by notice given by means of an update to this blog entry, or any other page on this web site which deals with SweetButty 1.0.

The Downloads

Test Page

SweetButty 1.0 : JS Text

XRegExp : JS Text

Comment(s)

Sorry, comments have been suspended. Too much offensive comment spam is causing the site to be blocked by firewalls (which ironically therefore defeats the point of posting spam in the first place!). I don't get that many comments anyway, so I am going to look at a better way of managing the comment spam before reinstating the comments.