|
XMLgawk Home Page
| |
Stable XMLgawk Release: xgawk-3.1.6 release
Stable XMLpuller Release: xml_puller_20060709
XMLgawk is an experimental extension of the GNU
Awk interpreter.
It includes a small XML parsing library which is built upon the
Expat XML parser.
The parsing library is a very thin layer on top of Expat (implementing a pull-interface)
and can also be used without GNU Awk to read XML data files.
Both, XMLgawk and its XML puller library only require an ANSI C compatible compiler (GCC works, as do most vendors' ANSI C compilers) and a 'make' program.
XMLgawk provides the following functionality:
- AWK's way of reading data line by line is supplemented by reading XML files node by node.
- As a consequence, only one data item is visible at a time and DOM-style parsing is left
to the user to implement (if he needs it).
- Conversion of character encodings is done while parsing.
- Parsing speed is comparable to other stream parsers.
- Compared to XSL processors, the parsing speed is very fast.
- XMLgawk supports pull-style parsing as well as push-style parsing.
- Processing very large files (several GigaByte) is no problem;
even when many instances of XMLgawk do this at the same time on the same CPU.
- If you want to use XMLgawk, download the current release
from the project pages at SourceForge
and go through the usual "configure ; make install" steps.
Stefan Tramm has a (rather dated) collection of other
useful stuff.
- Most users are interested in binaries for Microsoft Windows.
Manuel Collado has set up a web site with binaries for
Cygwin and
DJGPP.
Victor Paesa has set up a
web page
where you can download a compressed executable
xgawk.exe.gz
for the Cygwin environment. He also provides some detailed
instructions on
how to compile the distribution with Cygwin.
The XML puller library provides the following functionality:
- Like Expat, reading of all well-formed XML files.
- Conversion of the data to any character encoding which is supported by your operating system.
- Memory allocation and deallocation is hidden inside the library.
- An arbitrary number of instances of the XML puller can be open simultaneously.
- As a streaming parser, it is faster than most other parsers but not as fast as a pure Expat application.
- The size of the largest file that can be read is limited only by your operating system.
Files larger than a GigaByte have been processed on a PC with an AMD Duron 1200 CPU.
- Memory requirement is almost independent from the size of the XML data file.
- You can choose which kind of XML data you want to read (processing instructions,
comments, declarations). Omitting any type of data improves speed significantly.
- xml_puller.h and xml_puller.c together have around 1200 lines (incl. comments).
Thanks
Thank sf.net for web hosting
| | |
|