Validating Parsers
Xerces The Apache XML Project is maintaining XML parsers in Java, C++, and Perl [free product from Apache.org; all Java, C++, and perl platforms]
IBM's XML Parser for Java Also known as XML4J. Version 2 adds these exciting new features: Configurable, Modular Architecture; High Performance; Revalidation; and XCatalog Support. Support for XML 1.0, DOM 1.0 and SAX 1.0 is also included. XML4J 3.0.1 is based on the Apache Xerces XML Parser Version 1.0.3. New features include experimental versions of DOM Level 2, SAX2 (beta 2), and parts of W3C Schema. [free product from IBM; all Java platforms]
Oracle XML Parser Oracle released its XML Parser for Java, a standalone XML component that enables parsing of XML documents through either SAX or DOM interfaces using validating or non-validating modes. See also the Oracle XML site. [free product from Oracle; all Java platforms]
XMLBooster XMLBooster generates XML parsers for COBOL, C, Java, etc. According to the company, XMLBooster is said to "achieve performance comparable with message-specific hand-written parsers by skipping the intermediate step where the message is turned into a generic DOM tree using a generic parser which must take the entire generality of XML into account and support every feature, no matter how obscure. The parsers generated by XMLBooster only recognize the XML features required to parse the message at hand, and produces directly a parser that initializes application-level data structures without going through any time-consuming intermediate representation. Tool features: (1) Generates parsers, which are between 5 and 45 times faster than generic parsers (2) Produce parsers in C, COBOL, Delphi and Java (3) Produces working data structures in the host language, rather than a dynamic and poorly typed generic tree (4) The XML message to parse can come from a file, a message, a socket, a data structure, etc. (5) Produce naturally validating parsers, far beyond the validation possibilities of DTDs." [commerical product for C, COBOL, Delphi, Java]
SXP, the Silfide XML Parser The Silfide XML Parser (SXP) is a parser and a complete XML API in Java. It is part of XSilfide, a client/server based environment. XSilfide includes SIL, the Silfide Interface Language, among other things. "The SIL DTD is organized using modules, gathering (1) the encoding of the user workspace (2) the encoding of the user informations (3) the extended query language and (4) the encoding of the queries result set." [free product from Silfide; all Java platforms]
MSXML Microsoft's XML parser in Java is included in IE4. The version presently available predates the final XML 1.0 spec by one month. "The parser checks for well-formed documents and optionally permits checking of the documents' validity. Once parsed, the XML document is exposed as a tree through a simple set of Java methods, which [Microsoft is] working with the World Wide Web Consortium (W3C) to standardize. These methods support reading and/or writing XML structures..." See sample parsing of an XML file using JScript. (Microsoft also includes an XML parser in C++ in IE4 which is "a high-performance, non-validating parser, [that] supports most of the W3C XML specification".) [free product from Microsoft; all Java platforms; all IE4 platforms]
Larval Larval is Tim Bray's validating XML processor built on the same code base as Lark (below). "Larval is a full validating XML processor; it reports violations of validity constraints, but does not apply draconian error handling to them." [freeware by Tim Bray (Textuality); all Java platforms; see Lark below]
XML::Parser This perl-based XML parser is from Larry Wall, the creator of perl. Some of the parsing code is based on James Clark's expat (below). At this time, there is no documentation or description; the link is for downloading. [freeware from Larry Wall; Perl]
xmlproc "xmlproc is an XML parser written in Python. It is a fairly complete validating parser, but does not do everything required of a validating parser, or even a well-formedness parser. The average user should not run into any omissions, though. Later releases will be more complete." freeware by Lars Marius Garshol; Python]
Non Validation Parsers
Lark Lark is a non-validating Java XML processor by Tim Bray, one of the authors of the W3C XML spec. It implements all of the XML 1.0 Recommendation and reports violations of well-formedness. [freeware by Tim Bray (Textuality); all Java platforms; see also Larval above]
XP James Clark's XML Parser in Java, complete with javadoc documentation. "XP is an XML 1.0 parser written in Java. It is fully conforming: it detects all non well-formed documents. It is currently not a validating XML processor. However it can parse all external entities: external DTD subsets, external parameter entities and external general entities. " XP is a high performance parser intended for use with Java applications, rather than applets. It includes a SAX driver implementation. (In addition to expat [below] and XP, James Clark also has developed SP, a free, object-oriented toolkit for SGML parsing and entity management; SP can parse XML and can convert SGML to XML. ) [freeware from James Clark; all Java application platforms]
HEX HEX is the HTML Enabled XML Parser. It is "simple, 100% Java, non-validating XML parser with some hooks for more-or-less correct parsing of most HTML pages. It doesn't understand either SGML or XML DTD's but the parser API allows the application to control its operation in ways that facilitate HTML parsing. " HEX includes an implementation of SAX. HEX also implements the Java binding for the DOM core level one as per the March 1998 Working Draft. [freeware by Anders Kristen, HP Labs; all Java platforms]
HXA (Hubrick's XML Analyzer) Hubick's XML Analyzer "is a pure Java tool built upon a low level XML parser (HXP) which breaks an XML file down into it's constituent productions for analysis. HXA allows one to examine the production hierarchy for any character in an XML document or document fragment. For easy reference HXA also provides links from each production in the analysis to its corresponding section in the XML specification." [freeware for all Java platforms; may require Microsoft Internet Explorer]
LT XML "LT XML is an integrated set of XML tools and a developers' tool-kit, including a C-based API...The LT XML tool-kit includes stand-alone tools for a wide range of processing of well-formed XML documents, including searching and extracting, down-translation (e.g. report generation, formatting), tokenising and sorting. Sequences of tool applications can be pipelined together to achieve complex results.... It also includes a powerful, yet simple, querying language, which allows the user to quickly and easily select those parts of an XML document which are of interest." The parser produces either a textual view or a tree view of an XML document. [freeware from the Language Technologies Group; C language; Unix and Win32 platforms]
xmlib Python 1.5.1 contains this version of xmllib.py by Sjoerd Mullender. [freeware; all Python platforms]
Xparse "Xparse is a fully compliant well-formed XML parser written in less than 5k of JavaScript." The author, Jeremie (no last name visible), plans to add DOM support when DOM becomes a W3C Recommendation. There is also a web page for trying the parser without downloading. See also Sparse, the XSL companion to Xparse. [freeware; all JavaScript platforms]