Splunk - Verifying accuracy of counts
- Todd Waller
- 23 hours ago
- 2 min read
Hello Everyone and welcome back to Old Logs New Tricks. Happpy New Year! This year we will be working to provide new tips and tricks as well as interesting things we think will be helpful.
Today I will be talking about accurately verifying results.
I find that the hardest part of verification is making the source data and its counts match Splunk, because Splunk makes it so easy. Time ranges are critical as well.
For example imagine you have a file that has fields:
INFO 2026-01-05 13:10:15,234 Type=AAAA ID=012345 Extended_ID=ab234567.23452365
INFO 2026-01-05 13:10:15,234 Type=BB ID=345567 Extended_ID=ab456785678.3465689
INFO 2026-01-05 13:10:15,234 Type=CCC ID=678905 Extended_ID=ab06789.458657
Using a simple search in Splunk counting by Type, ID and Extended_ID would yeild
Type ID Extended_ID Count AAAA 012345 ab234567.23452365 1
BB 345567 ab456785678.3465689 1
CCC 678905 ab06789.458657 1
To verify this against the actual file you would need to download the file and the open it in an editor and compare the counts to what Splunk reports. On a small scale this is easy. On a large scale, thousands of records, it's not so easy. It is however necessary to verify accuracy.
What I do is download the Splunk results and open in a spreadsheet. Download the data source file and open in Excel and split the data into columns based on delimeters:

This will split the data into columns like this. Then split the columns based on the "=" delimiter so you can create the single columns

Remove uneeded columns and data:

At this point you can paste the columns into a single sheet and then use the option at Home-> Conditional Formatting-> Highlight Cells Rules-> Duplicate Values to highlight differences where IDs do not match:


As you can see in the the last example, we expected a record that isnt there or theres an additional record thats being accounted for. Either way this record would account for a record in the count. You would need to determine if it should be there or not and whether the count is accurate or if you need to change criteria.
Additionally, there are some online diff tools like Winmerge and similar, that will do this for you but you will need to adhere to security rules to protect data.
This is just a simple example, there are certainly other ways to verify the data as well as any analytics you are doing with it. Imagine a dataset 100 times larger and how complicated that may be.
What are some of your go to verification techniques?
Thanks for reading, have a great day!
Cheers!
-Todd