Why would I need a data model?
I was checking out the Splunk Facebook group today and ran into this really valid question and decided to cross post.
Schema - Splunk offers a feature called a “Data model” which pulls key field values in a schematized database. Basically this this data base follows a standard model and offers users a major benefit.
1) Since we’re dealing with data with a schema, more traditional, SQL like search language can be used
2) Data is predictable and can be managed in a traditional method
3) Data models offer power GUI elments to negivate and report on the data similar to an Excel pivot, which makes time to value in Splunk much lower.
Data Acceleration - But we also have features where these models can be “accelerated” where Splunk in the background keeps a copy of the data , based on the model ready at hand. This can make BILLIONS of events searchable in a second or two. Where an unstructured query might take minutes.
Common Information Model - Schema’s data offers us interoperatbility. Say we’re dealing firewall data or network data in general. How do you know what the language is for a deny? Or a user login? VPN termination? Splunk provides guidance to use using the CIM model. This simplifies analysis of data for engineers and operators.
Summary – Personally I have teams who have very clear logs and they use the Data Model pivot UI to explore their data. I have dashboards that just need to be “faster” and of course Splunk SIEM product works completely off of models rather than logs.
There is no hard and fast rule when to use or not use a model. You need to play with the tool. Get comfortable and understand your customers use cases.
To summarize you don’t need to model every log you place in Splunk. But consider it. If you’re data doesn’t have a schema to it, does it have value? Can it be dropped? If it does have value how to plan on presenting the data? How can you make 90-120 days of statistically information ready in just seconds in your dashboard or view.
Cool talk on tuning your models from Gabriel Vasseur, PhD | Senior Cyber Security Analyst https://conf.splunk.com/files/2017/recordings/running-enterprise-security-at-capacity-tuning-es-with-data-model-acceleration.mp4