Now that we have looked at how to get data into our logstash instance it’s time to start exploring how we can interact with all of the information being thrown at us using conditionals. But, before we get too far into what conditionals are we are best to first have a look at the overall structure of how logstash reads inputs, filters, and outputs from the configuration.
We have spoken briefly about our inputs within part 2 of this series – and in that post, we showed the various ways of setting up configuration files for the different type of input plugins that logstash supports. Now in the examples provided, for simplicity sake, each configuration had its own input, filter, and output section. This works well if you only have one input coming in – however, in the real world, we will most likely have multiple inputs configured – now the way logstash works is it combs those configuration files first for inputs, if it finds a match, it brings the data in. It then looks through all of those files, not just the file it found the input in, for filter instances and runs it through ALL of the filters, then the same happens for outputs. So, if you have multiple files all containing filters they will all be processed with the data – not an efficient way to do things. And even worse, if you have multiple outputs, it will be processed through all of them – so if you have an output for each input you could end up storing your data multiple times in the same index.
So what do we do to get around this?
When it comes to the structure of how to set up a logstash configuration I always follow the same type of config – now I could be wrong on this, remember, this is a newbies guide, but from what I’ve seen this seems to work the best. First up, all the configuration goes inside the /etc/logstash/conf.d/ directory. The way I set it up is as follows…
Each input type/port will get its own config file – meaning I’ll have a file containing only the input {} section for syslog, another file which handles snmp, another for a filebeat, etc… Within each of these files, I apply tags where possible in order to
One filter file containing the filter {} which transforms data based on tags and/or host fields assigned to the input files. Since I’m tagging my data in my input files I’m able to use these tags to separate out my different types of filters I want to apply within my filter configuration. For those that can’t be tagged, I simply use conditionals to apply filters based on my host field.
One output file containing the output {} section which sends data to elastic search. All data processed through logstash will go through this one file to be stored within ES.
So with that said my conf.d directory may contain the following files; 3 inputs, 1 filter, and 1 output.
- 10-beats-input.conf
- 20-SNMP-input.conf
- 30-syslog-input-conf
- 50-filter.conf
- 99-output.conf
So we have several different input files, all tied to one filter file – this is a great approach – but how do we distinguish between the inputs within our filter file. I mean, we probably don’t want to run the same filters on different inputs… Providing you some sort of way of distinguishing where the data is coming from within your event data, be it tags added during the input, types assigned during the inputs, or simply using a field already existing within your data (IE host) we can use something called conditionals within the filter (and output) configurations to do just that…
Logstash Conditionals
Logstash conditionals are basically just ‘if statements’ that we can use to define certain criteria for performing actions on our data. If you have any programming experience at all these will look and be very simple for you to implement – however for those that don’t let’s take a closer look at what we can accomplish with them.
If we think back to part three of this series, particularly the forwarding of apache logs we added a tag to our data via the filebeat input – this allowed us to basically tag all events coming from the Apache server with a value of “apache-access”. Now say we wanted to take some action on data from our apache servers, for example transforming those public IPs to geo-coordinates for mapping purposes. If we were to simply add that code into our filter file without a conditional attached to it would be applied to every single input we have – we’d have logstash trying to map coordinates for snmp traps, other syslog data, windows metrics, etc… In the end, we would have a lot of failures and inefficiencies in our system. So we combat this with logstash conditionals where we simply create an if statement to check the tag, then apply our filters as shown below
1 2 3 |
if "apache-access" in [tags] { geoip { source => "clientIP" } } |
For now, try and ignore the whole “geoip” portion and focus just on the if conditional itself – we will go more into detail into geoip and many other logstash filters in a later post – our point here is to simply explain the syntax and structure of using conditionals.
Another common method we might use within our conditional statements is looking at the “host” field within a syslog message – this essentially tells us where this log is coming from. For instance, we could do something like the following…
1 2 3 4 5 |
if [host] == "10.0.0.50" { mutate { add_field => [ "servername", "My 10.0.0.50 server" ] } } |
Again, pay no mind to the mutate syntax but focus first on the if statement – it’s important to learn this first before exploring the power of the actual filters. Again, with a little programming experience, this should look super similar to people. That said, it’s not a full programming language – logstash is limited to the following statements, operators, conditionals within its conditional language…
- Support for “if”, “else if”, and “else”
- Support for == (equals), != (not equals), < (less than), > (greater than), <= (less than or equal), and >= (greater than or equal).
- Can also use inclusions such as ‘in’, and ‘not in’
- regexp such as “=~” and “!~”
- Boolean operators such as ‘and,’nand’,’or’, and ‘xor’
- Statements can standalone or also be nested.
We’ve talked a lot about using conditionals within our filter configurations here but our outputs are fair game as well – so if you wanted to say have different outputs for different sources you could definitely do so by using conditionals in your output configuration. In our next couple of posts in this series, we will look at where the real magic within logstash takes place – the filter plugins! Thanks for reading and as always, catch up on the series below if you haven’t already!