Search
  • danielpwilson

Why would I need a data model?


I was checking out the Splunk Facebook group today and ran into this really valid question and decided to cross post.




Schema - Splunk offers a feature called a “Data model” which pulls key field values in a schematized database. Basically this this data base follows a standard model and offers users a major benefit.

1) Since we’re dealing with data with a schema, more traditional, SQL like search language can be used

2) Data is predictable and can be managed in a traditional method

3) Data models offer power GUI elments to negivate and report on the data similar to an Excel pivot, which makes time to value in Splunk much lower.


https://docs.splunk.com/Documentation/Splunk/8.0.0/Knowledge/Aboutdatamodels

Data Acceleration - But we also have features where these models can be “accelerated” where Splunk in the background keeps a copy of the data , based on the model ready at hand. This can make BILLIONS of events searchable in a second or two. Where an unstructured query might take minutes.

https://docs.splunk.com/Documentation/Splunk/8.0.0/Knowledge/Acceleratedatamodels


Common Information Model - Schema’s data offers us interoperatbility. Say we’re dealing firewall data or network data in general. How do you know what the language is for a deny? Or a user login? VPN termination? Splunk provides guidance to use using the CIM model. This simplifies analysis of data for engineers and operators.


https://docs.splunk.com/Documentation/CIM/4.14.0/User/Overview


Summary – Personally I have teams who have very clear logs and they use the Data Model pivot UI to explore their data. I have dashboards that just need to be “faster” and of course Splunk SIEM product works completely off of models rather than logs.


There is no hard and fast rule when to use or not use a model. You need to play with the tool. Get comfortable and understand your customers use cases.


To summarize you don’t need to model every log you place in Splunk. But consider it. If you’re data doesn’t have a schema to it, does it have value? Can it be dropped? If it does have value how to plan on presenting the data? How can you make 90-120 days of statistically information ready in just seconds in your dashboard or view.


#splunk #splunktrust



-Daniel


ps -

Cool talk on tuning your models from Gabriel Vasseur, PhD | Senior Cyber Security Analyst https://conf.splunk.com/files/2017/recordings/running-enterprise-security-at-capacity-tuning-es-with-data-model-acceleration.mp4

35 views0 comments

Recent Posts

See All

Do you need to dedup when using stats?

I had to do some casual counting of sourcetypes today. In the process I was trying to decide if I needed to dedup before going to stats. It seemed to me a dedup would, in theory, pass less data to sta

How do I learn Splunk administration?

Had an old coworker hit me up a week ago. He took a job as a SOC analyst where part of his job is going to be supporting Splunk. He's a smart guy but Splunk is more complex than it looks. Given I've a