Michael Wolf | Who Looks At Microsoft Docs?

Who Looks At Microsoft Docs?
Last edited - 12/06/17

Perhaps because I know the kind of people who can survive a plain html website and perhaps because I'm pretentious, I put my most up-to-date resume on this website in plain text. However, I know not everyone who is trolling through resumes is used to such a setup. So I also put a Doc file version right there for anyone who wants to read my history in the gross, table-formatted XML gobblety-gook I've used since high school. Or at least that's what it used to be.

In August I decided to cross a project off my list and put a Canary Token in my website. For those unfamiliar, Canarytokens are small embedded triggers from Thinkst Canary that can be embedded as fake Microsoft Documents, PDFs, webhooks, svn hooks, or images. Every time a Canarytoken is accessed, you get a neat little email with the source IP address of the agent that triggered the alert and a little metadata. Most importantly, it's free!

Initially, I wanted to set up a Canarytoken to trigger whenever someone accessed a particular page in my website while still delivering the requested document. I found that sacrificing the Microsoft Doc link was worth this little experiment. If someone is curious and is really so tied to the comfort of Microsoft, my LinkedIn is just as readily available (if even more outdated). Furthermore, hovering over the link reveals that it is a canarytoken. I'm not really trying to hide anything.

Over the course of 4 months, I have gotten 26 emails, I've learned who runs the spiders on the Internet, and I've even found some interesting bites. I expected crawlers, but I didn't expect Yandex to be so interested. I was also curious about Slack img-proxy. The IP is coming out of AWS where the Slackbot is hosted. But why is it crawling my website? Has someone made a spider slackbot? According to Slack's website, its targeted objective is link expansion. Is someone blindly copying the canary token link into a Slack channel? To be determined. Of all of the messages, only two came from real-looking user-agents. The first one hit my website on October 1st, 2017 from Boston, carrying an IP from Road Runner. I have no idea who this is, but hello! The second non-bot is definitely a coworkers. The IP address is from my office, the time is from after I left that day, and the user-agent is not Firefox... hm......

Agent	Count	IPs
Baidu	7	180.76.15.11 180.76.15.156 180.76.15.150 180.76.15.158 180.76.15.31 180.76.15.149 180.76.15.139
Yandex	5	93.158.161.14
Slack Img Proxy	5	35.164.230.141 34.209.34.65 52.40.159.214 54.201.121.19 52.36.127.29
Google	3	66.249.79.6 66.249.64.70 66.249.73.134
Humans	2
Bing	1	157.55.39.19
Majesty	1	144.76.115.190
SafeDNS	1	188.226.178.29

This project, if I can call it that, has given me some interesting, entry-level analytics on this website (though I do plan to do more later). I didn't enter into this project imagining I would be flooded with emails (otherwise I would have never set it up to send to my email), but I was pleasantly surprised by the passive traffic I picked up. Sure, the NGINX access logs would do the same service for me, but this little service made it very easy to get fun reminders that my site is on the Internet for reals. Canarytokens are an amusing little toy and they certainly have their usecases in practical security applications. I'm not that hardcore. I just want to be a jerk to people who use Microsoft Office.