This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

python: writing server-style Python sources

The Python source allows you to write your own source in Python. You can import external Python modules to receive or fetch the messages. Since many services have a Python library, the Python source makes integrating AxoSyslog very easy and quick.

You can write two different type of sources in Python:

  • Server-style sources that receives messages. Write server-style sources if you want to use an event-loop based, nonblocking server framework in Python, or if you want to implement a custom loop.

  • Fetcher-style sources that actively fetch messages. In general, write fetcher-style sources (for example, when using simple blocking APIs), unless you explicitly need a server-style source.

This section describes server-style sources. For details on fetcher-style sources, see python-fetcher: writing fetcher-style Python sources.

The following points apply to using Python blocks in AxoSyslog in general:

  • Python parsers and template functions are available in AxoSyslog version 3.10 and later.

    Python destinations and sources are available in AxoSyslog version 3.18 and later.

  • Supported Python versions: 2.7 and 3.4+ (if you are using pre-built binaries, check the dependencies of the package to find out which Python version it was compiled with).

  • The Python block must be a top-level block in the AxoSyslog configuration file.

  • If you store the Python code in a separate Python file and only include it in the AxoSyslog configuration file, make sure that the PYTHONPATH environment variable includes the path to the Python file, and export the PYTHON_PATH environment variable. For example, if you start AxoSyslog manually from a terminal and you store your Python files in the /opt/syslog-ng/etc directory, use the following command: export PYTHONPATH=/opt/syslog-ng/etc.

    In production, when AxoSyslog starts on boot, you must configure your startup script to include the Python path. The exact method depends on your operating system. For recent Red Hat Enterprise Linux, Fedora, and CentOS distributions that use systemd, the systemctl command sources the /etc/sysconfig/syslog-ng file before starting AxoSyslog. (On openSUSE and SLES, /etc/sysconfig/syslog file.) Append the following line to the end of this file: PYTHONPATH="<path-to-your-python-file>", for example, PYTHONPATH="/opt/syslog-ng/etc".

  • The Python object is initiated every time when AxoSyslog is started or reloaded.

  • The Python block can contain multiple Python functions.

  • Using Python code in AxoSyslog can significantly decrease the performance of AxoSyslog, especially if the Python code is slow. In general, the features of AxoSyslog are implemented in C, and are faster than implementations of the same or similar features in Python.

  • Validate and lint the Python code before using it. The AxoSyslog application does not do any of this.

  • Python error messages are available in the internal() source of AxoSyslog.

  • You can access the name-value pairs of AxoSyslog directly through a message object or a dictionary.

  • To help debugging and troubleshooting your Python code, you can send log messages to the internal() source of AxoSyslog. For details, see Logging from your Python code.

Declaration:

Python sources consist of two parts. The first is a AxoSyslog source object that you define in your AxoSyslog configuration and use in the log path. This object references a Python class, which is the second part of the Python source. The Python class receives or fetches the log messages, and can do virtually anything that you can code in Python. You can either embed the Python class into your AxoSyslog configuration file, or store it in an external Python file.

   source <name_of_the_python_source>{
        python(
            class("<name_of_the_python_class_executed_by_the_source>")
            options(
                "option1" "value1",
                "option2" "value2"
            )
        );
    };
    
    python {
    from syslogng import LogSource
    from syslogng import LogMessage
    
    class <name_of_the_python_class_executed_by_the_source>(LogSource):
        def init(self, options): # optional
            print("init")
            print(options)
            self.exit = False
            return True
    
        def deinit(self): # optional
            print("deinit")
    
        def run(self): # mandatory
            print("run")
            while not self.exit:
                # Must create a message
                msg = LogMessage("this is a log message")
                self.post_message(msg)
    
        def request_exit(self): # mandatory
            print("exit")
            self.exit = True
    };

Methods of the python() source

Server-style Python sources must be inherited from the syslogng.LogSource class, and must implement at least the run and request_exit methods. Multiple inheritance is allowed, but only for pure Python super classes.

You can implement your own event loop, or integrate the event loop of an external framework or library, for example, KafkaConsumer, Flask, Twisted engine, and so on.

To post messages, call LogSource::post_message() method in the run method.

For the list of available optional parameters, see python() and python-fetcher() source options.

init(self, options) method (optional)

The AxoSyslog application initializes Python objects every time when it is started or reloaded. The init method is executed as part of the initialization. You can perform any initialization steps that are necessary for your source to work.

When this method returns with False, AxoSyslog does not start. It can be used to check options and return False when they prevent the successful start of the source.

options: This optional argument contains the contents of the options() parameter of the AxoSyslog configuration object as a Python dictionary.

run(self) method (mandatory)

Use the run method to implement an event loop, or start a server framework or library. Create LogMessage instances in this method, and pass them to the log paths by calling LogSource::post_message().

Currently, run stops permanently if an unhandled exception happens.

For details on parsing and posting messages, see Python LogMessage API.

request_exit(self) method (mandatory)

The AxoSyslog application calls this method when AxoSyslog is shut down or restarted. The request_exit method must shut down the event loop or framework, so the run method can return gracefully. If you use blocking operations within the run() method, use request_exit() to interrupt those operations and set an exit flag, otherwise AxoSyslog is not able to stop. Note that AxoSyslog calls the request_exit method from a thread different from the source thread.

close_batch(self)

Closes the current source-side batch. Source-side batching helps AxoSyslog to effectively process a larger chunk of messages, instead of processing messages each message. For example, when feeding a destination queue and instead of taking a lock on the queue for every message (causing contention), we only take it once per batch.

The native drivers built into AxoSyslog typically close batches once every mainloop iteration, allowing a single iteration to process multiple messages. For instance, when receiving multiple messages in a single TCP datagram, all of those messages can be processed as a part of the same batch.

In Python-based log sources, a batch will automatically be closed after every message posted via post_message(), except if self.auto_close_batches is set to False during initialization. In case self.auto_close_batches is set to False, the driver has to call close_batch() explicitly, preferably at a natural boundary between incoming batches of messages. A good example is when we retrieve several messages via the same HTTP REST call, then the right time to close the batch would be after the last message in the response is posted.

The deinit(self) method (optional)

This method is executed when AxoSyslog is stopped or reloaded. This method does not return a value.

set_transport_name(self, name)

Set the transport name used to retrieve messages This function can be called to customize the ${TRANSPORT} name-value pair.

1 - Python LogMessage API

The LogMessage API allows you to create LogMessage objects in Python sources, parse syslog messages, and set the various fields of the log message.

LogMessage() method: Create log message objects

You can use the LogMessage() method to create a structured log message instance. For example:

   from syslogng import LogMessage
    
    msg = LogMessage() # Initialize an empty message with default values (recvd timestamp, rcptid, hostid, ...)
    msg = LogMessage("string or bytes-like object") # Initialize a message and set its ${MESSAGE} field to the specified argument

You can also explicitly set the different values of the log message. For example:

   msg["MESSAGE"] = "message"
    msg["HOST"] = "hostname"

You can set certain special field (timestamp, priority) by using specific methods.

Note the following points when creating a log message:

  • When setting the hostname, AxoSyslog takes the following hostname-related options of the configuration into account: chain-hostnames(), keep-hostname(), use-dns(), and use-fqdn().

  • Python sources ignore the log-msg-size() option.

  • The AxoSyslog application accepts only one message from every LogSource::post_message() or fetch() call, batching is currently not supported. If your Python code accepts batches of messages, you must pass them to AxoSyslog one-by-one. Similarly, if you need to split messages in the source, you must do so in your Python code, and pass the messages separately.

  • Do not reuse or store LogMessage objects after posting (calling post_message()) or returning the message from fetch().

parse() method: Parse syslog messages

The parse() method allows you to parse incoming messages as syslog messages. By default, the parse() method attempts to parse the message as an IETF-syslog (RFC5424) log message. If that fails, it parses the log message as a BSD-syslog (RFC3164) log message. Note that AxoSyslog takes the parsing-related options of the configuration into account: flags(), keep-hostname(), recv-time-zone().

If keep-hostname() is set to no, AxoSyslog ignores the hostname set in the message, and uses the IP address of the AxoSyslog host as the hostname (to use the hostname instead of the IP address, set the use-dns() or use-fqdn() options in the Python source).

   msg_ietf = LogMessage.parse('<165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"] An application event log entry', self.parse_options)
    msg_bsd = LogMessage.parse('<34>Oct 11 22:14:15 mymachine su: \'su root\' failed for lonvick on /dev/pts/8', self.parse_options)

set_pri() method

You can set the priority of the message with the set_pri() method.

   msg.set_pri(165)

set_timestamp() method

You can use the set_timestamp() method to set the date and time of the log message.

   timestamp = datetime.fromisoformat("2018-09-11T14:49:02.100+02:00")
    msg.set_timestamp(timestamp) # datetime object, includes timezone information

In Python 2, timezone information cannot be attached to the datetime instance without using an external library. The AxoSyslog represents naive datetime objects in UTC.

In Python 3, naive and timezone-aware datetime objects are both supported.

2 - python() and python-fetcher() source options

The python() and python-fetcher() drivers have the following options.

class()

Type: string
Default: N/A

Description: The name of the Python class that implements the source, for example:

   python(
        class("MyPythonSource")
    );

If you want to store the Python code in an external Python file, the class() option must include the name of the Python file containing the class, without the path and the .py extension, for example:

   python(
        class("MyPythonfilename.MyPythonSource")
    );

For details, see Python code in external files

fetch-no-data-delay()

Type: integer [seconds]
Default: -1 (disabled)

Description: If the fetch method of a python-fetcher() source returns with the LogFetcher.FETCH_NO_DATA constant, the source waits fetch-no-data-delay() seconds before calling the fetch method again. If you want to call the fetch method sooner, set the fetch-no-data-delay() option to the number of seconds to wait before calling the fetch method.

flags()

Type: assume-utf8, dont-store-legacy-msghdr, empty-lines, expect-hostname, kernel, no-hostname, no-multi-line, no-parse, sanitize-utf8, store-legacy-msghdr, store-raw-message, syslog-protocol, validate-utf8
Default: empty set

Description: Specifies the log parsing options of the source.

  • assume-utf8: The assume-utf8 flag assumes that the incoming messages are UTF-8 encoded, but does not verify the encoding. If you explicitly want to validate the UTF-8 encoding of the incoming message, use the validate-utf8 flag.

  • dont-store-legacy-msghdr: By default, AxoSyslog stores the original incoming header of the log message. This is useful if the original format of a non-syslog-compliant message must be retained (AxoSyslog automatically corrects minor header errors, for example, adds a whitespace before msg in the following message: Jan 22 10:06:11 host program:msg). If you do not want to store the original header of the message, enable the dont-store-legacy-msghdr flag.

  • empty-lines: Use the empty-lines flag to keep the empty lines of the messages. By default, AxoSyslog removes empty lines automatically.

  • exit-on-eof: If this flag is set on a source, AxoSyslog stops when an EOF (end of file) is received. Available in version 4.9 and later.

  • expect-hostname: If the expect-hostname flag is enabled, AxoSyslog will assume that the log message contains a hostname and parse the message accordingly. This is the default behavior for TCP sources. Note that pipe sources use the no-hostname flag by default.

  • guess-timezone: Attempt to guess the timezone of the message if this information is not available in the message. Works when the incoming message stream is close to real time, and the timezone information is missing from the timestamp.

  • kernel: The kernel flag makes the source default to the LOG_KERN | LOG_NOTICE priority if not specified otherwise.

  • no-header: The no-header flag triggers AxoSyslog to parse only the PRI field of incoming messages, and put the rest of the message contents into $MSG.

    Its functionality is similar to that of the no-parse flag, except the no-header flag does not skip the PRI field.

    Example: using the no-header flag with the syslog-parser() parser

    The following example illustrates using the no-header flag with the syslog-parser() parser:

        parser p_syslog {
          syslog-parser(
            flags(no-header)
          );
        };
    
  • no-hostname: Enable the no-hostname flag if the log message does not include the hostname of the sender host. That way AxoSyslog assumes that the first part of the message header is ${PROGRAM} instead of ${HOST}. For example:

        source s_dell {
            network(
                port(2000)
                flags(no-hostname)
            );
        };
    
  • no-multi-line: The no-multi-line flag disables line-breaking in the messages: the entire message is converted to a single line. Note that this happens only if the underlying transport method actually supports multi-line messages. Currently the file() and pipe() drivers support multi-line messages.

  • no-parse: By default, AxoSyslog parses incoming messages as syslog messages. The no-parse flag completely disables syslog message parsing and processes the complete line as the message part of a syslog message. The AxoSyslog application will generate a new syslog header (timestamp, host, and so on) automatically and put the entire incoming message into the MESSAGE part of the syslog message (available using the ${MESSAGE} macro). This flag is useful for parsing messages not complying to the syslog format.

    If you are using the flags(no-parse) option, then syslog message parsing is completely disabled, and the entire incoming message is treated as the ${MESSAGE} part of a syslog message. In this case, AxoSyslog generates a new syslog header (timestamp, host, and so on) automatically. Note that even though flags(no-parse) disables message parsing, some flags can still be used, for example, the no-multi-line flag.

  • sanitize-utf8: When using the sanitize-utf8 flag, AxoSyslog converts non-UTF-8 input to an escaped form, which is valid UTF-8.

    Prior to version 4.6, this flag worked only when parsing RFC3164 messages. Starting with version 4.6, it works also for RFC5424 and raw messages.

  • store-legacy-msghdr: By default, AxoSyslog stores the original incoming header of the log message, so this flag is active. To disable it, use the dont-store-legacy-msghdr flag.

  • store-raw-message: Save the original message as received from the client in the ${RAWMSG} macro. You can forward this raw message in its original form to another AxoSyslog node using the syslog-ng() destination, or to a SIEM system, ensuring that the SIEM can process it. Available only in 3.16 and later.

  • syslog-protocol: The syslog-protocol flag specifies that incoming messages are expected to be formatted according to the new IETF syslog protocol standard (RFC5424), but without the frame header. Note that this flag is not needed for the syslog driver, which handles only messages that have a frame header.

  • validate-utf8: The validate-utf8 flag enables encoding-verification for messages.

    Prior to version 4.6, this flag worked only when parsing RFC3164 messages. Starting with version 4.6, it works also for RFC5424 and raw messages.

    For RFC5424-formatted messages, if the BOM character is missing, but the message is otherwise UTF-8 compliant, AxoSyslog automatically adds the BOM character to the message.

    The byte order mark (BOM) is a Unicode character used to signal the byte-order of the message text.

For the python() and python-fetcher() sources you can also set the check-hostname flag, which is equivalent with the check-hostname() global option, but only applies to this source.

The flags and the hostname-related options (for example, use-dns) set in the configuration file influence the behavior of the LogMessage.parse() method of the Python source. They have no effect if you set the message or the hostname directly, without using LogMessage.parse().

keep-hostname()

Type: yes or no
Default: no

Description: Enable or disable hostname rewriting.

  • If enabled (keep-hostname(yes)), AxoSyslog assumes that the incoming log message was sent by the host specified in the HOST field of the message.

  • If disabled (keep-hostname(no)), AxoSyslog rewrites the HOST field of the message, either to the IP address (if the use-dns() parameter is set to no), or to the hostname (if the use-dns() parameter is set to yes and the IP address can be resolved to a hostname) of the host sending the message to AxoSyslog. For details on using name resolution in AxoSyslog, see Using name resolution in syslog-ng.

This option can be specified globally, and per-source as well. The local setting of the source overrides the global option if available.

log-iw-size()

Type: number
Default: 100

Description: The size of the initial window, this value is used during flow-control. Its value cannot be lower than 100, unless the dynamic-window-size() option is enabled. For details on flow-control, see Managing incoming and outgoing messages with flow-control.

loaders()

Type: list of python modules
Default: empty list

Description: The AxoSyslog application imports Python modules specified in this option, before importing the code of the Python class. This option has effect only when the Python class is provided in an external Python file. This option has no effect when the Python class is provided within the AxoSyslog configuration file (in a python{} block). You can use the loaders() option to modify the import mechanism that imports Python class. For example, that way you can use hy in your Python class.

   python(class(usermodule.HyParser) loaders(hy))

options()

Type: string
Default: N/A

Description: This option allows you to pass custom values from the configuration file to the Python code. Enclose both the option names and their values in double-quotes. The Python code will receive these values during initialization as the options dictionary. For example, you can use this to set the IP address of the server from the configuration file, so it is not hard-coded in the Python object.

   python(
        class("MyPythonClass")
        options(
            "host" "127.0.0.1"
            "port" "1883"
            "otheroption" "value")
    );

For example, you can refer to the value of the host field in the Python code as options["host"]. Note that the Python code receives the values as strings, so you might have to cast them to the type required, for example: int(options["port"])

persist-name()

Type: string
Default: N/A

Description: If you receive the following error message during AxoSyslog startup, set the persist-name() option of the duplicate drivers:

Error checking the uniqueness of the persist names, please override it with persist-name option. Shutting down.

This error happens if you use identical drivers in multiple sources, for example, if you configure two file sources to read from the same file. In this case, set the persist-name() of the drivers to a custom string, for example, persist-name("example-persist-name1").

tags()

Type: string
Default:

Description: Label the messages received from the source with custom tags. Tags must be unique, and enclosed between double quotes. When adding multiple tags, separate them with comma, for example, tags("dmz", "router"). This option is available only in version 3.1 and later.

time-reopen()

Accepted values: number [seconds]
Default: 1

Description: The time to wait in seconds before a dead connection is reestablished.

time-zone()

Type: name of the timezone, or the timezone offset
Default:

Description: The default timezone for messages read from the source. Applies only if no timezone is specified within the message itself.

The timezone can be specified by using the name, for example, time-zone("Europe/Budapest")), or as the timezone offset in +/-HH:MM format, for example, +01:00). On Linux and UNIX platforms, the valid timezone names are listed under the /usr/share/zoneinfo directory.