Clouds to Trenches

From PrgmrWiki

Cloud computing

See our previous talk: http://book.xen.prgmr.com/mediawiki/index.php/Xen:_a_view_from_the_trenches

Quantity has a quality all its own

There is nothing new under the sun; everything that could be called 'cloud computing' is really just evolutionary advances on older technology, but bigger, stronger, and faster.


When someone speaks of 'cloud computing' they can mean several quite different things:

'application level cloud'

Google app engine is an example, as is engine yard and heroku of this type of cloud. This is quite similar in concept to old-style shared hosting... you upload an application in a particular format (usually a scripting language, in the case of old-style shared hosting, php) and then the hosting company bills you based on usage (it used to be billed on total bandwidth usage, but a 'per hit' charge is becoming more common)

you really do outsource your SysAdmin here. just like the old style shared hosting platforms, don't do anything unusual, and you should be fine. New 'per hit' billing accounts better for CPU and IO time, so these new systems tend to perform better than the (often massively oversubscribed) older shared-hosting setups.


fast provisioning

Then there is the 'cloud' as envisioned by amazon ec2 - sometimes referred to as 'utility computing' or 'grid computing' - This is quite a lot like an old-school pxeboot-with-systemimager setup, in that you can quickly spin up a new server. Now, amazon does save you (mostly) from mucking with hardware, and they do have a pretty nice programmatic interface to spin up and shut down nodes, but I think the biggest breakthrough is the 'rent servers by the hour' concept, which is really useful for some things (but not so useful for others)


both are super expensive compared to owning your own hardware.

real implementation underpinnings

fast provisioning

pay as you go billing

Traditionally, buying compute resources required committing for long periods of time. This was in part due to 'slow provisioning'

quote amazon paper here

Some terminology and notes

   we're talking about xen because it's what we know -- other
           virtualization products generally have similar features.
   paravirtualization
   domain
   distinction between sysadmins and programmers
   so why do you care?

Current situation

   very little automation
   disjoint billing and provisioning systems
   no provision for migration
       each customer has a fixed allocation on a particular server
   manual resource controls
   (still a useful vps service, but not "cloud")
   dedicated/vps/cloud
       dedicated servers can be just as cloud as amazon
       (apply the same model to hardware.)

Addressing each point

   automation
       better scripting would solve this
           but we also want a self-service api
           need to have machines communicate available resources
       ip address allocation
   link billing and provisioning
       enables utility pricing
       also link billing to resource usage (disk, net)
   hardware dependence
       simple: each machine has a console whose 'location' can be
               updated through dns
       migration is harder
           need local storage for speed, reliability, cheapness
               fortunately xen offers migration hooks that can freeze
                       the domain, migrate storage, and then move the domain.
               so, we do that
               cite oracle paper
               talk about opensolaris
           don't forget to update dns, notify billing machine.
   resource controls
       currently manual
           "Luke is a very good sysadmin, but that doesn't scale."
       cpu
           <bits from previous talk>
       memory
           fixed allocation -- not our problem
           balloon driver
       need for monitoring daemons that can automatically adjust
       overview of linux net qos
           <insert bits from old talk>
           "You can tell we've been learning the business as we go
                   along"
               free month debacle
       overview of disk qos
           ionice
           map between dom0 processes and domU
           only addresses priorities -- arrange like cpu use.

pieces of prgmr.com api

   "the more centalized you make the system, the harder you need to
           fortify it."
   two separate machines
       reduces odds of compromise
       successful attack against either won't result in immediate
               disruption
           (still a serious problem, of course.)
       machine a handles kernel/image selection
       machine b handles rebooter
   auth via x.509 certs
   avoid handling credit cards, etc. at all costs.
       dreamhost fiasco
   avoid passwords -- too insecure.
       (most users pick bad passwords or reuse only a few.)

conclusion -- our plans

   drive cost of computing down to minimum
   xenoservers -- machines join the network, paid for use
       "See, in the winter, you can turn on that old space heater
               you've got lying around, and get something back for it."
           (hey, i have a sun e4500 in my kitchen.)
       highly speculative -- vps business is okay too.



wins of virtualization

http://wiki.oracle.com/page/Oracle+VM+Live+Storage+Migration

"we are talking about cloud computing. some vapor is obligatory."