Search
  • Todd Waller

Splunk - Diagnostic Data Anonymization

Happy Friday Everyone!! Todays post is going to touch on the anonymization of Splunk diagnostic files. I am going to use Linux commands and hosts for this post.


This is something that is sometimes necessary, depending on the nature a company, to get help from support using diagnostic files while simultaneously not divulging identifying information. Things like IP addresses, host names, user names etc. can be anonymized by the Splunk internal anonymizer.


I am creating this post as there is very little supporting documentation of how to do this that includes examples, outside of a video that is very vague and a single documents guide that also vague. Also Splunk's "easy" anonymization method has proven to be unsuccessful many times.

NOTE: $SPLUNK_HOME is the director location of the Splunk installation. Typically, in Linux, the default for $SPLUNK_HOME is /opt/splunk. For this post $SPLUNK_HOME is /opt/splunk so I will prefix commands with that.


So here we go: This first thing that you need to do is create a diagnostic file. This can be done by navigating to the /opt/splunk/bin path and running:


./splunk diag


That will create a gzipped tarball file in the /opt/splunk path like this: diag-<hostname>-<date-time>.tar.gz (cleaned screenshot to remove any identifying info)



Best practice is to move this file to the /tmp path as we are going to have to extract it. To do this run : mv /opt/splunk/diag-<hostname>-<date-time>.tar.gz /tmp


Navigate to the /tmp path then un-pack the gzipped tarball by running: tar xfz diag-<hostname>-<date-time>.tar.gz


Now that you have the uncompressed file in the directory you can start anonymizing files. Check with support WHICH files they need OR you can anonymize all files in all the directories(does make things a bit difficult to diagnose as it anonymizes so much).


So say support wants only the splunkd logs. Move into the expanded directory and run this and have it anonymize all of the splunkd logs: find /tmp/diag-<hostname>-<date-time>/ -name \*splunkd.* | xargs -I{} /opt/splunk/bin/splunk anonymize file -source '{}'



For the sake of this post we will anonymize everything


find /tmp/diag-<hostname>-<date-time>/ -name \*.* | xargs -I{} /opt/splunk/bin/splunk anonymize file -source '{}'



The output looks something like this as it runs, this is a very small portion of the output (cleaned screenshot to remove any identifying info):



Once that finishes the directory will look something like this (cleaned screenshot to remove any identifying info):




Then you will want to move the files prefixed with INFO to a directory called info.files as these are the lists of things that were changed and the suggested list of things to change. To move them to a different directory run: find /tmp/diag-<hostname>-<date-time>/ -type f -name "INFO-*.*" -exec mv '{}' /tmp/info.files/ \;


Once that is completed you will want to remove all files not prefixed with ANON. You can do this by running: find /tmp/diag-<hostname>-<date-time>/ -type f ! -name "ANON-*.*" -delete


Here's an example of what the directory would look like after(cleaned screenshot to remove any identifying info):


Finally re-tar the directory you have been working in by running:

tar cfz diag-<hostname>-<date-time>.tar.gz /tmp/diag-<hostname>-<date-time>


After this is done you can remove the expanded directory you created in /tmp


Then upload that to Splunk.


This is a long process and took lots of trial and error. Again I only demonstrate for Linux systems here, I have not tested for Windows so I cannot say what works, although I'm sure the process would be similar.

130 views0 comments

Recent Posts

See All

Do you need to dedup when using stats?

I had to do some casual counting of sourcetypes today. In the process I was trying to decide if I needed to dedup before going to stats. It seemed to me a dedup would, in theory, pass less data to sta

How do I learn Splunk administration?

Had an old coworker hit me up a week ago. He took a job as a SOC analyst where part of his job is going to be supporting Splunk. He's a smart guy but Splunk is more complex than it looks. Given I've a