Alright! It’s been a bit of work to finally get here but we are finally at a stage where we can really begin to see some of the power of Logstash and that’s by having a look at some of the filtering plugins that are available. If we were to simply just syslog our data into Elasticsearch through Logstash as is it would probably seem kind of boring and plain. Honestly, it’d be the same as any other old syslog application. Logstash, on the other hand, has some pretty powerful plugins that we can use within our filter {} stanzas. Think of things like bursting each item from a csv file out into its own searchable, sortable, filterable field within Kibana – or taking a field which has an IP address contained in it and automatically mapping geo points from that so we can visualize it on a map. There’s a lot that can be done in regards to filtering and the next few articles in this series will be focused solely around that.
So what exactly do I mean by filter stanza? Well, if you think back to the last part in this series when we talked about the Logstash structure and conditionals then you may remember the filter configuration file – this is where we apply our filter plugins. As I mentioned above there are a lot of filter plugins, you can see them all here if you wish – and we can even create our own. There is no way we are going to be able to cover them all so I’ll pick a handful of my favorites and get started with those…
First up, grok
grok is probably one of the most useful and powerful Logstash filter plugins out there – it basically allows us to take a bunch of unstructured text and data and with a few rules and stanzas, convert it into something that’s a bit more understandable and searchable. Technically, grok is similar to that of comparing a bunch of text to a regex, then taking those matches and placing them in their own fields. Personally, my use of grok has really focussed on taking data from a non-standard syslog message field, the meat and potatoes of the logged event, and breaking that up further into its own fields and columns which I can analyze against.
As for the syntax of grok it’s quite simple, yet can be a little challenging when dealing with large amounts of text – but let’s give it ago. The basic syntax of performing a grok is below
%{PATTERN:FieldName}
Seems simple right – a pattern and a name, easy! And we can have multiple pattern matches on the same field as well – so if we wanted to say match on pattern1 then pattern2 and place them in two distinct field names we would do the following (in pseudo of course).
%{PATTERN1:FieldName1} %{PATTERN2:FieldName2}
So let’s try and make some real-world sense out of this – say we had an event come through with the message field populated as follows…
2018-01-01T23:56:42.000+00:00 INFO [mwpreston.net]: mwpreston: Login Failed
Logstash by default will simply import this whole message into the message field – but there is a ton of information within this particular message we may want to filter out. For instance, we might want to search on the site [mwpreston.net], the user (mwpreston) or the event (Login Failed). To do so, we would need to use grok to break the expression out. This particular message could be broken out with the following…
1 2 3 4 5 |
filter { grok { match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:log-level} \[%{DATA:site}\]: %{DATA:user}: %{GREEDYDATA:message}" } } } |
So we can see there are 5 different grok patterns matching on our line text here. The first simply takes the beginning text and matches on an ISO8601 Timestamp. We then have space and match on the log-level – not too bad thus far 🙂 From there you can see that we are explicitly looking for square brackets, and taking the data within them and storing it in the site field. Then, looking for a colon, matching the user, then another colon, and using what is called the GREEDYDATA directive to simply take the rest of the message. GREEDYDATA is a special piece of syntax for grok to capture the remaining data no matter what it is – and there are many other syntax shortcuts if you will as well – they are basically just regex placeholders for matches. You can find things like Usernames, IPs, numbers, integers, etc.. They are all viewable here.
After doing the above we would be left with a result as follows…
timestamp | 2018-01-01T23:56:42.000+00:00 |
log-level | INFO |
site | mwpreston.net |
message | Login·Failed |
user | mwpreston |
Now our data will be stored in our ElasticSearch Instance with multiple fields we can manipulate and filter on – a lot handier than simply having it all in one big long message.
As mentioned above grok as some built-in regex placeholders for us – which is awesome. This extends beyond just getting basic numbers and IPs – but also works for some common syslog formats and apache as well. For instance, to automatically break out apache logs we could do the following…
1 2 3 4 5 |
filter { grok { match => { "message" => "%{COMMONAPACHELOG}" } } } |
Doing this would break out all of the fields within our apache message such as sites, responses, client IPs, etc…. Very handy to have!
So spend some time on grok if you really want to start to see the power of Logstash filters – it takes a while to get your head around it and to really start writing grok stanzas that work – I still struggle from time to time but Google is your friend! – It’s best to spend some time on grok as you will run into a use-case for it on almost every input you send 🙂
In efforts to keep these posts on the shorter and sweeter side I think we will stop here. In our next article we will explore yet even more filtering plugins providing such functionality such as translation and geo mapping.