Filtering Signal From Noise (Part2)

Published: 2013-08-30
Last Updated: 2013-08-30 18:22:24 UTC
by Kevin Liston (Version: 1)
1 comment(s)

Two weeks ago I rambled a bit about trying to dig a signal out of the noise of SSH scans reported in to Dshield (https://isc.sans.edu/diary/Filtering+Signal+From+Noise/16385).  I tried to build a simple model to predict the next 14-days worth of SSH scans and promised that we'd check back in to see how wrong I was.

Looks like I was pretty wrong.

I have built and trained the model to do a tolerable job of describing past performance and wondered if we let it run if it'd do any better at predicting future behavior than simply taking the recent average and projecting that out linearly.  I fed the numbers into the black box and click "publish" on the article before I really took a close look at what it was spitting out.  There was a spike in the 48-hours between turing the model and publishing and it's imapct on the trend was a bit.. severe.

The Results

None of the approaches did an amazing job at predicting the total number of 6423, although it's amazing at how badly the Exponential model did.  I have had really good results using that method with other data.  I encourage you to give it a try on other problems.

Method SSH scan source total for 14-days Error (%)
Exponential Smoothing 19963 13540 (210%)
7-day average projection 7197 774 (12%)
30-day average projection 7054 631 (10%)
MCMC estimate 5390 1033 (16%)

Keywords:
1 comment(s)

Comments

Not any worse than my efforts to predict the future value of the NUGT Gold Miners EFT! (See me running around without any shirt, because I lost it!) :-(

Diary Archives