Humio Log parser with containerd and kubernetes

Using humio or other log aggregator like logstash with fluenbit as log shipper in kubernetes environment, I faced a problem with the log parser where the log messages where prepended by a date and stdout or stder information like in the line below:

2021-11-09T16:27:39.510951151-05:00 stdout F {"@timestamp":"2021-11-09T21:27:39.505Z","@version":"1","message":"HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@d89aadb (This connection has been closed.). Possibly consider using a shorter maxLifetime value.","logger_name":"com.zaxxer.hikari.pool.PoolBase","thread_name":"http-nio-8080-exec-5","level":"WARN","level_value":30000}

This cause parser errors when using json as a parser in humio.

The problem is when using containerd as the container runtime, (or cri-o) fluentbit needs a different parser to work with.

To solve this in humio use the following configuration:

humio-fluentbit:
 parserConfig: |-
   [PARSER]
       Name apache
       Format regex
       Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
       Time_Key time
       Time_Format %d/%b/%Y:%H:%M:%S %z
   [PARSER]
       Name apache2
       Format regex
       Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
       Time_Key time
       Time_Format %d/%b/%Y:%H:%M:%S %z
   [PARSER]
       Name apache_error
       Format regex
       Regex  ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$
   [PARSER]
       Name nginx
       Format regex
       Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")
       Time_Key time
       Time_Format %d/%b/%Y:%H:%M:%S %z
   [PARSER]
       Name json
       Format json
       Time_Key time
       Time_Format %d/%b/%Y:%H:%M:%S %z
   [PARSER]
       Name docker
       Format json
       Time_Key time
       Time_Format %Y-%m-%dT%H:%M:%S.%L
       Time_Keep   On
   [PARSER]
       Name syslog
       Format regex
       Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
       Time_Key time
       Time_Format %b %d %H:%M:%S
   [PARSER]
       Name cri
       Format regex
       Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
       Time_Key time
       Time_Format %Y-%m-%dT%H:%M:%S.%L%z
 inputConfig: |-
   [INPUT]
     Name             tail
     Path             /var/log/containers/*.log
     Parser           cri
     Tag              kube.*
     Refresh_Interval 5
     Mem_Buf_Limit    5MB
     Skip_Long_Lines  On

This configuration adds a new parser (cri) and overrides it in inputConfig.

You can adapt this solution for your fluentbit configuration.

For more information check this url: https://github.com/microsoft/fluentbit-containerd-cri-o-json-log