Skip to content Skip to navigation

Go Crawl Yourself

« previous next »


First off, happy new year, blah, blah, blah, [insert navel-gazing year-in-review/big-plans-for-2008 post here]. There, that's done.

At the moment I'm working on some post-relaunch tweaks for Gothic BC and was looking at the logs. In between the people who think I get up at 6:00 a.m. after getting in from the club at 5:00 a.m. and am going to magically download and catalogue 400 photos off my camera, weed through them for the 150 or so worth posting, processes them, and have them posted on the site by 6:15 a.m. (reality: I got up about an hour ago and the camera is still in the bag) who are already scouring the site for last night's pictures, there are a gazillion hits from someone crawling the site for video content with a VEOH client.

This is uncool. First off, there is no video content on Gothic BC, nor will there be for some time to come, if ever. Secondly, as I understand it VEOH is effectively a P2P service which would mean I could potentially have dozens of these clients crawling the site effectively amounting to, given my limited bandwidth, a DDoS attack. No thanks.

Apache rewrite module to the rescue:

RewriteCond %{HTTP_USER_AGENT} veoh [NC]

RewriteRule .* http://www.veoh.com/ [F,L]

The first line checks to see if the program making the request has the string "veoh" in its identifying "user agent" string, if it does, the second line is processed.

The second line says "whatever you are looking for, it's really at veoh.com" so the request is redirected to VEOH's own website, chewing up their bandwidth, not mine.

Of course, the "nice" thing to do would be to issue a "503 forbidden" response and just let the request die, but what VEOH is doing isn't nice. They are using a distributed network to download video off other people's sites, thereby decontextualising it from authors, who in the case of video-bloggers may be relying on the page context to serve up their advertising or other content that represents their income stream. VEOH then presents the content on their own page, with their own advertising, and makes their own money that the authors of the video content never see one red cent of. They deserve to be hoisted on their own petard.

Oringinal post: http://mbarrick.livejournal.com/837893.html