Locklin on science

A quick middle finger at “the cloud”

Posted in tools by Scott Locklin on April 22, 2011

Ah, the cloud: wonder of modern wonders! Everyone wants to be a drip on the cloud; it’s hip, it’s happening, it’s what all the successful companies use!

I, unfortunately, have to use the cloud for some of my consulting work, so I know better. First, let me get “what is cool about the cloud” out of the way.

  1. It’s easy to set up a cloud instance. Give them your credit card, and you’re all set up with basic Ubuntu and a domain name.
  2. I can use emacs and R on the cloud. Tramp and ESS is a great example of why emacs is awesome. The fact that I can use them on the cloud is, well, it’s OK.

That’s it; that is everything good about the cloud. Now for the bad.

  1. It’s not reliable. That’s what has brought on this little rant. The EC2 is down. Again. This isn’t really costing me money: I have more work than I know what to do with, but it’s costing my client time and money.
  2. There are no cloud distributions. I mean, how  hard is it to install something like a complete toolchain on “duh, cloud?” Is there anyone on the cloud who doesn’t use a database? Why do I have to maintain a distribution on my cloud instances? Why can’t I select a button which gives me a cloud instance with a database and some programming languages? Why does my cloud instance come with gnome instead of useful stuff like a compiler and MySQL? This is just lazy retardedness.
  3. It’s bloody slow. Default instances are generally 32 bit Ubuntu, which I thought went out with listening to Green Day and goth chicks who aren’t beastly heifers. Beyond that, most of ‘em don’t allow you to access your own disk drive, so anything involving a file write is abysmal (yes, I could write to /var -why not just make that /home by default?). I benchmarked dumping some PNG files on the cloud; my cloud instance (which claims to be a multicore I3 class machine) is twice as slow as my intel-atom netbook. My $600 headless Linux box with a crappy hard drive and a slower clock rate is almost 60 times faster on this task. I haven’t been benchmarking important crap like MySQL writes, but I’m guessing they suck too.
  4. It’s fucking expensive. Sure, if you have a website selling ladies underpants on der interwebs, you can get by on a cheap instance. What if you require computational horsepower? Isn’t that why the cloud is supposed to be awesome? Infinite computing power? If I had to do fast PNG writes on the cloud, I’d need 60 instances to get to where my crappy home box is at. And even then, I’m at the mercy of other people writing across the local network my instance lives on. If I needed to go real fast, and use like, 64 bits and more than 2 gigs of memory (you know, for, like, memory based machine learning, which is kind of my main trick): it costs one of my shitty servers every month and a half of use. If I need more memory in my home gizmo, well, DDR3 is cheap. If I need more on an EC2 instance, I’m screwed. Oh sure, I can do crap like use Mahout and Hadoop on the cloud. So what? Those are only cost effective if you own a freaking cloud of your own. If someone out there is paying full price on the cloud and using these tools, I’d like to hear about why you are doing this.
  5. It’s not different from a regular computer. There is nothing special about the cloud that you can’t do with a server sitting under someone’s desk. It’s the same  thing, except much lower performance, and you’re renting it. If your users don’t observe security protocols, it isn’t any safer than the box under your desk. In fact, I’m willing to bet it’s a lot less secure, because any old leet h6x0r can open up a cloud instance using stolen credit cards and spy on your traffic. There is no technological advantage to using the cloud over the under-the-desk unix box, at all. Yes, yes, Google and Amazon use clouds: they own their own freaking clouds. That means, they own a bunch of servers sitting under their proverbial desks. And they get you to pay for the processing power they’re not using off peak hours. Suckas. Go ahead, believe the hype about distributed teams or whatever: that would work with a computer under your desk just as well as it does with “duh cloud.”

If you’re selling ladies underpants on the interwebs, by all means, use the cloud; it’s probably a decent trade off. Have fun with your downtime, and pat yourself on the back you didn’t have to hire an annoying sysadmin. Instead, you’re forcing your developers to do the job of pointy-headed sysadmins. If you’re into heavy metal like me, you are deranged to even contemplate using this technology. I remember some muppet on linked-in pimping some moronic white papers on putting trading algos on the cloud, presumably hoping to gravy in on the HFT hype train. I can’t think of anything more stupid than this, but let me try anyway. Hey, we can put factory robots on the cloud as well! We has the technology! Then … you can check into your factory robot cluster using … an iPad and your custom LAMP monitoring software! Wouldn’t that be nerdtastic! Why not put pacemakers and automotive computers on the cloud too while we’re at it? After all, Google is doing the cloud thing, it must be awesome and futurific!

The cloud is for people who believe everything farted out by Silly Con valley marketing departments is rainbows and bunny rabbits. Excepting for a few very narrow use cases, it’s mostly retarded, and if you use it blindly, you are the drippy victim of marketing departments.

John Mount pegged this years ago; respect, yo:

http://www.win-vector.com/blog/2009/08/on-the-hysteria-over-the-cloud/

About these ads

7 Responses

Subscribe to comments with RSS.

  1. Andreas Yankopolus said, on April 22, 2011 at 1:06 am

    Seems like the issue is that Amazon’s implementation of cloud computing sucks, not necessarily the idea itself. Maybe the technology isn’t quite ready for prime time?

    • Scott Locklin said, on April 22, 2011 at 2:50 am

      I’ve heard of some cheaper, less known clouds which run faster servers, but I don’t see how it could be any better. The technology is designed to run big memory jobs on Hadoop on a privately owned farm; for that, I’m sure it’s pretty good. Hell, I’ve seen implementations of code like this which were way neat. But … if you have to pay for your instances: fuck that dumbness!
      As I said though, it’s OK to sling some early code on a project if you don’t have any infrastructure, but the longer you stay on it, the more drippy it becomes.

  2. PJ said, on April 22, 2011 at 3:46 am

    Re: maintain a distro for the cloud – I think the chef and puppet guys are aiming for something like this: describe what you wand and push a button and they fire it up in the cloud.

    Re: expensive – sort of. But some accountants prefer monthly fees over capex. I’m not an accountant, so I don’t pretend to understand why that might be so.

    Re: reliable – sort of. One key here is that if you’re doing your devops correctly, you don’t care if any one machine goes down because you can just fire up another instance configured that way. You *do* care if there’s some sort of systemic outage, but even with that, once it’s back, you’re up and running again, not caring exactly what hardware failed.

    Other than that, you’re pretty much on the money, though I think you left out one of the key (IMO) ‘cool about the cloud’ : flexibility. I can use 80 cpus for 6 hours once per quarter (to crunch logs into invoices, perhaps) and pay less than buying 80 machines that sit around most of the time.

    • Scott Locklin said, on April 22, 2011 at 4:04 am

      Accountants only prefer this because they’re lazy shits who prefer eating long lunches to doing useful work (I’m mad at one right now for not paying his bills on time, so, I’m biased). Amortization don’t take that long in a spreadsheet, but it is a lot harder than just subtracting a number every month, which is something a monkey could do.

      The reliability thing is pretty major. I’ve been screwed by the EC2 several times now. Presently, a lot of companies are realizing this. I think foursquare and some other decently sized joints have been off line all day because some dork in Virginia spilled his coffee on a router.

      You’re absolutely right about the flexibility thing. However, the number of people this applies to is probably very small. I may some day run my junk on the cloud if I can’t wait, or I don’t feel like driving to Frys to buy another server, but the probability of that happening is very low. I’m guessing there is hidden overhead here, mostly because there always is, and I haven’t tried it yet. I also develop in a language which has blowers and sidepipes: I can go real fast on virtually any bit of it which might be bogging me down by fooling around in the profiler a bit. Spending a few hours doing that will pay dividends over years, rather than just giving me a temporary speed up.

      Thanks for a thoughtful comment!

  3. Petro said, on April 22, 2011 at 12:58 pm

    Every vendor has a cloud solution, and their definition of “cloud computing” looks a LOT like their solution. IBM, VMware and Cray will all queue up to sell you shit for cloud computing and it will look like GM, Trek and Bugatti all lined up to sell you “personal transportation”.

    “The Cloud” is different things to different marketing weasels.

    I am, for various reasons (including I can get paid decently for it) a fan of virtualization for some workloads.

    I am also a fan (though I haven’t gotten paid for it) of HPC clusters like you’d use for Hadoop.

    Both of these are “clouds”. “Clouds” are when the person who wants cycles doesn’t have control over the hardware (that’s a little vague because I don’t want to write a fucking whitepaper. Or maybe I should).

    Anyway, cloud instances–at least external–shouldn’t have entire tool chains on them, if by tool chain you mean development stuff. Now, if by tool-chain you mean application frameworks and databases then maybe. But a cloud instance is something you deploy TOO, not from.

    Of course, if all you need is a MySQL instance and some web frameworks, virtual domains are the old-school solution. They are, of course, still susceptible to a junior network engineer with can of redbull.

    The big problem is that reliable hardware is fucking expensive. You can get 8 cores/8 gig of ram and a two terabytes of disk in 1 rack unit, run xen on it and sell 7 VMs on each node for some reasonable amount, but you’re can’t get past the cost, and that’s NOT a reliable environment.

    Adding some shitty tape robot to that is going to force you to double what you charge.

    You add a SAN to that–even some shit ass iSCSI thing–and you’re going to triple your costs.

    You’re better off (if you can afford it) finding an off-lease 1u rackmount and sticking it in a rack down at the HE colo. Run either VMware Server, ESX-i or Xen on it so you can do hardwareish stuff (like completely fuck up your install and have to re-build from scratch) without leaving your house, and take a *slight* performance hit.

    Basically build your own small cloud :)

    I know some dudes who are looking at deploying to EC2, but IIRC they were keeping their crown jewels (the databases) in their own colo.

    • John Flanagan said, on April 22, 2011 at 6:40 pm

      Sup Petro! Old school chigoths represent!

      Hmm, virtualizing a colo-racked 1U is a pretty neat idea for the small shop. For the likes of our crowd it’s a way better solution than cloud crap, but even the fairly minimal sophistication required to do it is beyond the capability of the cloud target market.

      Us pro HFT folks install our own goddamn servers in the colos, thank you very much. Generally racks and racks of them. If you’ve got the capital to play the game in the first place, then the hardware and colo lease costs are not even remotely a consideration.

      • Petro said, on April 24, 2011 at 12:44 pm

        Dude, it IS cloud crap. It’s what you would call a “private cloud”. A REALLY SMALL private cloud.

        From sometime in 2007 or 2008 until about 2 months ago mail.bounty.org was a VM running on quad core 1U with 2 500 GiB hard drives in it. The Host OS was Centos, and it was running between 2 and 5 other VMs. My instance was running the Zimbra mail server which uses a lot of java crap, so it used about all the memory available to IT.

        BTW, I upgraded and bounced it about the time i got to Iraq and didn’t touch it AT ALL for about 9-10 months. Not a reboot, not a patch nothing. The guys who owned it didn’t touch it either (I as their SA for about a year and latency over a wide area 802.11 network feeding in to a sat hop + cross country made for nearly 2 second of latency during sand storms).

        I was also wondering if Amazon tiers their EC2 stuff–if you’re a pleb paying marginal rent your VM (these are xen instances) gets shoved off on the ghetto servers, if you’re a company paying full load you get decent response and if you’re a big player offloading some processing you get first class hardware, software and tuning.

        Now, I’m not *accusing* anyone of anything here, just wondering.

        You can get 3-4 year old off-lease hardware stupid cheap, it just depends on what you want it for.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 329 other followers

%d bloggers like this: