Options of CSV parsers

The AxoSyslog application can separate parts of log messages (that is, the contents of the ${MESSAGE} macro) at delimiter characters or strings to named fields (columns) using the csv (comma-separated-values) parser The parsed fields act as user-defined macros that can be referenced in message templates, file- and tablenames, and so on.

Parsers are similar to filters: they must be defined in the AxoSyslog configuration file and used in the log statement. You can also define the parser inline in the log path.

To create a csv-parser(), you have to define the columns of the message, the separator characters or strings (also called delimiters, for example, semicolon or tabulator), and optionally the characters that are used to escape the delimiter characters (quote-pairs()).

Declaration:

   parser <parser_name> {
        csv-parser(
            columns(column1, column2, ...)
            delimiters(chars("<delimiter_characters>"), strings("<delimiter_strings>"))
        );
    };

Column names work like macros.

Names starting with a dot (for example, .example) are reserved for use by AxoSyslog. If you use such a macro name as the name of a parsed value, it will attempt to replace the original value of the macro (note that only soft macros can be overwritten, see Hard versus soft macros for details). To avoid such problems, use a prefix when naming the parsed values, for example, prefix(my-parsed-data.)

columns()

Synopsis: columns(“PARSER.COLUMN1”, “PARSER.COLUMN2”, …)

Description: Specifies the name of the columns to separate messages to. These names will be automatically available as macros. The values of these macros do not include the delimiters.

Starting with AxoSyslog version 4.5, you can omit the columns() option, and extract the values into matches ($1, $2, $3, and so on), which are available as the anonymous list $*. For example:

@version: current

log {
    source { tcp(port(2000) flags(no-parse)); };

    parser { csv-parser(delimiters(',') dialect(escape-backslash)); };
    destination { stdout(template("$ISODATE $*\n")); };
};

delimiters()

Synopsis:

delimiters(chars("<delimiter_characters>")) or delimiters("<delimiter_characters>")

delimiters(strings("<delimiter_string1>", "<delimiter_string2>", ...)")

delimiters(chars("<delimiter_characters>"), strings("<delimiter_string1>"))

Description: The delimiter is the character or string that separates the columns in the message. If you specify multiple characters using the delimiters(chars("<delimiter_characters>)) option, every character will be treated as a delimiter. To separate the columns at the tabulator (tab character), specify \\t. For example, to separate the text at every hyphen (-) and colon (:) character, use delimiters(chars("-:")), Note that the delimiters will not be included in the column values.

String delimiters

If you have to use a string as a delimiter, list your string delimiters in the delimiters(strings("<delimiter_string1>", "<delimiter_string2>" ...)) format.

By default, AxoSyslog uses space as a delimiter. If you want to use only the strings as delimiters, you have to disable the space delimiter, for example: delimiters(chars(""), strings("<delimiter_string>"))

Otherwise, AxoSyslog will use the string delimiters in addition to the default character delimiter, so delimiters(strings("==")) actually equals delimiters(chars(" "), strings("==")), and not delimiters(chars(""), strings("=="))

Multiple delimiters

If you use more than one delimiter, note the following points:

  • AxoSyslog will split the message at the nearest possible delimiter. The order of the delimiters in the configuration file does not matter.
  • You can use both string delimiters and character delimiters in a parser.
  • The string delimiters may include characters that are also used as character delimiters.
  • If a string delimiter and a character delimiter both match at the same position of the input, AxoSyslog uses the string delimiter.

dialect()

Synopsis: escape-none, escape-backslash, escape-double-char, or escape-backslash-with-sequences

Description: Specifies how to handle escaping in the parsed message. Default value: escape-none

parser p_demo_parser {
    csv-parser(
        prefix(".csv.")
        delimiters(" ")
        dialect(escape-backslash)
        flags(strip-whitespace, greedy)
        columns("column1", "column2", "column3")
    );
};

The following values are available.

  • escape-backslash: The parsed message uses the backslash (\\) character to escape quote characters.
  • escape-backslash-with-sequences: The parsed message uses "" as an escape character but also supports C-style escape sequences, like \n or \r. Available in AxoSyslog version 4.0 and later.
  • escape-double-char: The parsed message repeats the quote character when the quote character is used literally. For example, to escape a comma (,), the message contains two commas (,,).
  • escape-none: The parsed message does not use any escaping for using the quote character literally.

flags()

Synopsis: drop-invalid, escape-none, escape-backslash, escape-double-char, greedy, strip-whitespace

Description: Specifies various options for parsing the message. The following flags are available:

  • drop-invalid: When the drop-invalid option is set, the parser does not process messages that do not match the parser. For example, a message does not match the parser if it has less columns than specified in the parser, or it has more columns but the greedy flag is not enabled. Using the drop-invalid option practically turns the parser into a special filter, that matches messages that have the predefined number of columns (using the specified delimiters).

  • greedy: The greedy option assigns the remainder of the message to the last column, regardless of the delimiter characters set. You can use this option to process messages where the number of columns varies.

  • strip-whitespace: The strip-whitespace flag removes leading and trailing whitespaces from all columns.

Example: Adding the end of the message to the last column

If the greedy option is enabled, AxoSyslog adds the not-yet-parsed part of the message to the last column, ignoring any delimiter characters that may appear in this part of the message.

For example, you receive the following comma-separated message: example 1, example2, example3, and you segment it with the following parser:

csv-parser(columns("COLUMN1", "COLUMN2", "COLUMN3") delimiters(","));

The COLUMN1, COLUMN2, and COLUMN3 variables will contain the strings example1, example2, and example3, respectively. If the message looks like example 1, example2, example3, some more information, then any text appearing after the third comma (that is, some more information) is not parsed, and possibly lost if you use only the variables to reconstruct the message (for example, to send it to different columns of an SQL table).

Using the greedy flag will assign the remainder of the message to the last column, so that the COLUMN1, COLUMN2, and COLUMN3 variables will contain the strings example1, example2, and example3, some more information.

csv-parser(columns("COLUMN1", "COLUMN2", "COLUMN3") delimiters(",") flags(greedy));

null()

Synopsis: string

Description: If the value of a column is the value of the null() parameter, AxoSyslog changes the value of the column to an empty string. For example, if the columns of the message contain the “N/A” string to represent empty values, you can use the null("N/A") option to change these values to empty stings.

prefix()

Synopsis: prefix()

Description: Insert a prefix before the name part of the parsed name-value pairs to help further processing. For example:

  • To insert the my-parsed-data. prefix, use the prefix(my-parsed-data.) option.

  • To refer to a particular data that has a prefix, use the prefix in the name of the macro, for example, ${my-parsed-data.name}.

  • If you forward the parsed messages using the IETF-syslog protocol, you can insert all the parsed data into the SDATA part of the message using the prefix(.SDATA.my-parsed-data.) option.

Names starting with a dot (for example, .example) are reserved for use by AxoSyslog. If you use such a macro name as the name of a parsed value, it will attempt to replace the original value of the macro (note that only soft macros can be overwritten, see Hard versus soft macros for details). To avoid such problems, use a prefix when naming the parsed values, for example, prefix(my-parsed-data.)

This parser does not have a default prefix. To configure a custom prefix, use the following format:

parser {
    csv-parser(prefix("myprefix."));
};

on-type-error()

Synopsis: string

Description: Specifies what to do when casting a parsed value to a specific data type fails. Note that the flags(drop-invalid) option and the on-error() global option also affects the behavior.

Accepts the same values as the on-error() global option.

quote-pairs()

Synopsis: quote-pairs(<quote_pairs>)

Description: List quote-pairs between single quotes. Delimiter characters or strings enclosed between quote characters are ignored. Note that the beginning and ending quote character does not have to be identical, for example, [} can also be a quote-pair. For an example of using quote-pairs() to parse Apache log files, see Example: Parsing Apache log files.

template()

Synopsis: template("${<macroname>}")

Description: The macro that contains the part of the message that the parser will process. It can also be a macro created by a previous parser of the log path. By default, the parser processes the entire message (${MESSAGE}).

For examples, see Example: Segmenting hostnames separated with a dash and Example: Segmenting a part of a message.