Comma-separated values

The parse_csv FilterX function can separate parts of log messages (that is, the contents of the ${MESSAGE} macro) along delimiter characters or strings into lists, or key-value pairs within dictionaries, using the csv (comma-separated-values) parser.

Usage: parse_csv(<input-string>, columns=json_array, delimiter=string, string_delimiters=json_array, dialect=string, strip_whitespace=boolean, greedy=boolean)

Only the input parameter is mandatory.

If the columns option is set, parse_csv returns a dictionary with the column names (as keys) and the parsed values. If the columns option isn’t set, parse_csv returns a list.

The following example separates hostnames like example-1 and example-2 into two parts.

block filterx p_hostname_segmentation() {
    cols = json_array(["NAME","ID"]);
    HOSTNAME = parse_csv(${HOST}, delimiter="-", columns=cols);
    # HOSTNAME is a json object containing parts of the hostname
    # For example, for example-1 it contains:
    # {"NAME":"example","ID":"1"}

    # Set the important elements as name-value pairs so they can be referenced in the destination template
    ${HOSTNAME_NAME} = HOSTNAME.NAME;
    ${HOSTNAME_ID} = HOSTNAME.ID;
};
destination d_file {
    file("/var/log/${HOSTNAME_NAME:-examplehost}/${HOSTNAME_ID}"/messages.log);
};
log {
    source(s_local);
    filterx(p_hostname_segmentation());
    destination(d_file);
};

Parse Apache log files

The following parser processes the log of Apache web servers and separates them into different fields. Apache log messages can be formatted like:

"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %T %v"

Here is a sample message:

192.168.1.1 - - [31/Dec/2007:00:17:10 +0100] "GET /cgi-bin/example.cgi HTTP/1.1" 200 2708 "-" "curl/7.15.5 (i4 86-pc-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8c zlib/1.2.3 libidn/0.6.5" 2 example.mycompany

To parse such logs, the delimiter character is set to a single whitespace (delimiter=" "). Excess leading and trailing whitespace characters are stripped.

block filterx p_apache() {
    ${APACHE} = json();
    cols = [
    "CLIENT_IP", "IDENT_NAME", "USER_NAME",
    "TIMESTAMP", "REQUEST_URL", "REQUEST_STATUS",
    "CONTENT_LENGTH", "REFERER", "USER_AGENT",
    "PROCESS_TIME", "SERVER_NAME"
    ];
    ${APACHE} = parse_csv(${MESSAGE}, columns=cols, delimiter=(" "), strip_whitespace=true, dialect="escape-double-char");

    # Set the important elements as name-value pairs so they can be referenced in the destination template
    ${APACHE_USER_NAME} = ${APACHE.USER_NAME};
};

The results can be used for example, to separate log messages into different files based on the APACHE.USER_NAME field. in case the field is empty, the nouser string is assigned as default.

log {
    source(s_local);
    filterx(p_apache());
    destination(d_file);
};
destination d_file {
    file("/var/log/messages-${APACHE_USER_NAME:-nouser}");
};

Segment a part of a message

You can use multiple parsers in a layered manner to split parts of an already parsed message into further segments. The following example splits the timestamp of a parsed Apache log message into separate fields. Note that the scoping of FilterX variables is important:

If you add the new parser to the FilterX block used in the previous example, every variable is available.
If you use a separate FilterX block, only global variables and name-value pairs (variables with names starting with the $ character) are accessible from the block.

block filterx p_apache_timestamp() {
    cols = ["TIMESTAMP.DAY", "TIMESTAMP.MONTH", "TIMESTAMP.YEAR", "TIMESTAMP.HOUR", "TIMESTAMP.MIN", "TIMESTAMP.SEC", "TIMESTAMP.ZONE"];
    ${APACHE.TIMESTAMP} = parse_csv(${APACHE.TIMESTAMP}, columns=cols, delimiters=("/: "), dialect="escape-none");
    
    # Set the important elements as name-value pairs so they can be referenced in the destination template
    ${APACHE_TIMESTAMP_DAY} = ${APACHE.TIMESTAMP_DAY};
};
destination d_file {
    file("/var/log/messages-${APACHE_USER_NAME:-nouser}/${APACHE_TIMESTAMP_DAY}");
};
log {
    source(s_local);
    filterx(p_apache());
    filterx(p_apache_timestamp());
    destination(d_file);
};

Options of CSV parsers

Last modified March 21, 2025: Merge pull request #117 from axoflow/sync-to-r2 (6fb1861)