Home | Win32 Distribution | Documentation (HTML) | Documentation (PS) | Documentation (PDF)

XMLgawk Home Page

Quick Info

Stable XMLgawk Release: xgawk-3.1.6 release

Stable XMLpuller Release: xml_puller_20060709

XMLgawk is an experimental extension of the GNU Awk interpreter. It includes a small XML parsing library which is built upon the Expat XML parser. The parsing library is a very thin layer on top of Expat (implementing a pull-interface) and can also be used without GNU Awk to read XML data files. Both, XMLgawk and its XML puller library only require an ANSI C compatible compiler (GCC works, as do most vendors' ANSI C compilers) and a 'make' program.

XMLgawk provides the following functionality:

  • AWK's way of reading data line by line is supplemented by reading XML files node by node.
  • As a consequence, only one data item is visible at a time and DOM-style parsing is left to the user to implement (if he needs it).
  • Conversion of character encodings is done while parsing.
  • Parsing speed is comparable to other stream parsers.
  • Compared to XSL processors, the parsing speed is very fast.
  • XMLgawk supports pull-style parsing as well as push-style parsing.
  • Processing very large files (several GigaByte) is no problem; even when many instances of XMLgawk do this at the same time on the same CPU.
  • If you want to use XMLgawk, download the current release from the project pages at SourceForge and go through the usual "configure ; make install" steps. Stefan Tramm has a (rather dated) collection of other useful stuff.
  • Most users are interested in binaries for Microsoft Windows. Manuel Collado has set up a web site with binaries for Cygwin and DJGPP.
    Victor Paesa has set up a web page where you can download a compressed executable xgawk.exe.gz for the Cygwin environment. He also provides some detailed instructions on how to compile the distribution with Cygwin.

The XML puller library provides the following functionality:

  • Like Expat, reading of all well-formed XML files.
  • Conversion of the data to any character encoding which is supported by your operating system.
  • Memory allocation and deallocation is hidden inside the library.
  • An arbitrary number of instances of the XML puller can be open simultaneously.
  • As a streaming parser, it is faster than most other parsers but not as fast as a pure Expat application.
  • The size of the largest file that can be read is limited only by your operating system. Files larger than a GigaByte have been processed on a PC with an AMD Duron 1200 CPU.
  • Memory requirement is almost independent from the size of the XML data file.
  • You can choose which kind of XML data you want to read (processing instructions, comments, declarations). Omitting any type of data improves speed significantly.
  • xml_puller.h and xml_puller.c together have around 1200 lines (incl. comments).


Thank sf.net for web hosting SourceForge.net Logo


XMLgawk news

Integration into Arnold's git repository

Arnold Robbins (the maintainer of GNU Awk) and Andrew Schorr have integrated the XMLgawk extension into the official GNU Awk release. As part of this effort, they agreed on some changes to gawk's extension API. The resulting source code is now available in the official GNU Awk git repository. Notice that the name of the XMLgawk project has been changed to gawkextlib (for the documentation of the gawk extension library) and its home page (for download) has moved to SourceForge. The old home page will only be available until October 2012. Please change your bookmarks.

Release Candidate 2

We don't have a new official release yet, but some news has piled up:
  • Here is the second release candidate for xgawk 3.1.6a. It's the same as xgawk 3.1.6, except for integration of all the patches that Arnold made available through the Savannah gawk-stable CVS source tree.
  • Victor Paesa has updated his page about the MS Windows binaries.
  • Where are the people that use XMLgawk ? Look at our access counter.
  • Tim Menzies has created a web page for XMLgawk at Awk.Info.

Open Source Conference 2008

Hirofumi Saito reports about the Open Source Conference 2008 Tokyo/Spring. He and Morimoto Tetsuya presented some pretty advanced application software and they also informed attendees about xgawk. The slides and the handout are available as PDF files. You can find the presentations of their application software on the OSC2008 web page (at the top of the right margin).

Stable xgawk Release

Now that Arnold has finished his GAWK 3.1.6 release, we caught up and the new xgawk distribution is built on GAWK 3.1.6. Notice that this is not a beta release anymore but a production stable release. See the release notes for more details.

Lightweight Language Conference 2006 in Japan

Hirofumi Saito took part in the Lightweight Language Conference 2006. He struggled hard to defend the colours of AWK against the competition (Ruby, Perl, Python, PHP, Haskell and others). Matt Rosin has summarized the event in English.

Lightning Talk @ OSCON06

Stefan Tramm holds a 5-minute Lightning Talk on xgawk today at OSCON 06.

Pipestreaming microformats

At developerWorks (IBM's resource for developers), there is a review article about the use of Unix pipes in XML processing ( XML Matters: Pipestreaming microformats). The authors favour Norman Walsh's SXPipe language, but XMLgawk is also mentioned in passing.

Beta Release

We have released xgawk 3.1.5. This release is based on the regular GNU Awk 3.1.5. We have three extensions now in the release: XML, PostgreSQL and MPFR. Notice that each extension provides access to an external library; the external library itself will be detected during configuration.

Japanese Presentation

Hirofumi Saito has held a presentation in Japan at the Lightweight Language Day & Night event. He reports that Japanese AWK users were mainly interested in handling XML and PostgreSQL. His presentation and Arnold's message to the attendees are available online in Japanese .

Alpha release at SourceForge

The initial alpha release (xgawk-3.1.4) is available at SourceForge. Andrew Schorr has (again) been the driving force behind our recent changes. Most importantly, the XML extension now is a real extension in the sense of a dynamically loadable module. Hence, the prefix xgawk for extension. It is still possible to have the XML extension linked statically into the executable. Stefan Tramm has supplied a library which allows Expat to read character encodings which would otherwise be unknown to Expat (e.g. Japanese euc-jp / sjis, Korean euc-kr, and similar Windows-specific Japanese encodings). I have not yet updated the manual to reflect these changes. We are quite confident that this alpha release is stable and portable because we have spent much effort into porting and testing it on SuSE Linux 9.2, Solaris 8, MacOSX and Cygwin.


We have a SourceForge project now for XMLgawk (named xmlgawk , no capital characters allowed in project names). Andrew Schorr's changes have roughly doubled the speed of the parser. He has also made significant changes to the design. Use cvs to find out more.

Gentoo Linux

The Gentoo Linux distribution has integrated XMLgawk into their GNU Awk package.

Fall 2004 Download

2004-09-20 by Juergen Kahrs
This is the first stable public release. It is a patch against version 3.1.4 of GNU Awk as announced in comp.lang.awk today.

Summer 2004 Download

2004-08-13 by Juergen Kahrs
This is the initial public release. It is a patch against version 3.1.4 of GNU Awk as announced in comp.lang.awk today.

Christmas 2003 Download

2003-12-31 by Juergen Kahrs
This is the final internal release. It is a patch against version 3.1.3 of GNU Awk. Stefan Tramm, Mirko Dziadzka and Manuel Collado were probably the only users who tried it.


XMLpuller releases

Summer 2006 Download

2006-07-09 by Juergen Kahrs
The XMLpuller library has now been the fundamental layer of the XMLgawk extension for two years. Many improvements have been built in, mostly by Andrew Schorr. As a consequence, the API has changed a bit. For a list of changes, read the file NEWS in the released tar file.

Summer 2004 Download

2004-08-13 by Juergen Kahrs
This is the initial public release as announced in comp.lang.awk today.


Copyright 2004-2005 by Juergen Kahrs. The software on this page is free software; you can redistribute it and/or modify it under the terms of the GNU Library General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.