csvmode − direct processing of Comma-Separated-Values (CSV) data files with gawk.
@include
"csv"
BEGIN { CSVMODE = 1 }
... rules with $0, $1, ...
$NF, CSVRECORD, ...
csvfield(name,
default)
csvprint(record, option...)
csvprint0()
The
gawk-csv extension can directly process CSV data
files. Uses some specific variables:
CSVMODE
Setting CSVMODE=1 lets CSV formatted input data records to be automatically converted to regular awk records with fixed field separators, and delivered as $0. And $1 .. $NF are also set accordingly. Setting CSVMODE=0 disables the conversion, and input files are processed the usual way. See NOTE 1.The conversion can be customized by some control variables:
CSVFS |
The resulting field separator, that temporarily overrides the FS and OFS predefined variables. If not set, a null char ’\0’ is used. See NOTE 1. |
CSVCOMMA
The input CSV field delimiter. Default comma ’,’.
CSVQUOTE
The input CSV quoting character. Default double quote ’"’.
CSVRECORD
The original CSV input record.
If the CSV file
has a header record, the fields can also be accessed by
name:
csvfield(name [,
missing])
Returns the named field of the current record. If there is no column named name, then return missing, or a null value if not given.
csvprint([record, [fs [, comma [, quote]]]])
A convenience function to format and print the given record with a single call. If called without arguments it prints either $0 formatted as CSV or CSVRECORD, depending on CSVMODE. Arguments are like csvformat().
csvprint0()
A convenience function to print the original input record as such. Prints either $0 or CSVRECORD, depending on CSVMODE.
CSVMODE, CSVFS, CSVCOMMA and CSVQUOTE are checked only at BEGINFILE time. Changing them in the middle of a file processing takes no effect.
CSVRECORD is updated for each CSV input record.
The CSV input mode accepts fields with embedded newlines, tabs and other control characters, except null characters (’\0’).
Extract CSV
records with some specific value in the second field:
BEGIN {CSVMODE = 1}
$2=="some value" {print CSVRECORD}
Process CSV
files with fields separated by semicolons instead of commas:
BEGIN {CSVMODE = 1; CSVFS = ";"}
... processing rules ...
Print a
specific named field of every record:
BEGIN {CSVMODE = 1;}
{ print csvfield("City") }
Print records
that contain commas as data, in both normal and CSV modes:
grepcommas.awk:
BEGINFILE {
CSVMODE = (FILENAME ~ /\.csv$/)
}
/,/ { csvprint0() }
Sample
invocation:
gawk -f grepcommas.awk a.txt, b.csv, c.txt
(1) If
the user code has a BEGINFILE action that sets CSV-mode
variables depending on the current file, this action must
appear before the @include "csv"
clause:
BEGINFILE {
CSVMODE = (FILENAME ~ /\.csv$/) # switch mode depending on
the file type
}
@include "csv"
Null characters are not allowed in fields. A null character terminates the record processing.
Manuel Collado, m-collado@users.sourceforge.net.
XML Processing With gawk, csvparse(3am).
Copyright (C) 2018, Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of this manual page provided the copyright notice and this permission notice are preserved on all copies.
Permission is granted to copy and distribute modified versions of this manual page under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.
Permission is granted to copy and distribute translations of this manual page into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation.