<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>
<channel>
	<title>My AWS Musings</title>
	<atom:link href="http://aws-musings.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://aws-musings.com</link>
	<description>Cloud computing, EC2, RDS, SQS, S3, Java...</description>
	<lastBuildDate>Mon, 30 May 2011 18:13:33 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>HBase on EC2 using EBS volumes : Lessons Learned</title>
		<link>http://aws-musings.com/hbase-on-ec2-using-ebs-volumes-lessons-learned/</link>
		<comments>http://aws-musings.com/hbase-on-ec2-using-ebs-volumes-lessons-learned/#comments</comments>
		<pubDate>Mon, 30 May 2011 18:12:33 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[EBS]]></category>
		<category><![CDATA[EC2]]></category>
		<guid isPermaLink="false">http://aws-musings.com/?p=426</guid>
		<description><![CDATA[We started using HBase on EC2 sometime back in 2009. We thought that our data is important and we should have an option of restoring the data. We attached EBS volumes to our HBase nodes and configured HBase and Hadoop installation to store all the data on the attached EBS volumes. Then came the concept [...]]]></description>
			<content:encoded><![CDATA[<p>We started using HBase on EC2 sometime back in 2009. We thought that our data is important and we should have an option of restoring the data. We attached EBS volumes to our HBase nodes and configured HBase and Hadoop installation to store all the data on the attached EBS volumes. </p>
<p>Then came the concept of EBS backed instances. In those days we were still experimenting and HBase was releasing new versions very frequently. We were already few versions ahead pf our original AMI for Hadoop and HBase. We were also in the process of tuning our HBase/Hadoop cluster. The process of documenting all the changes after the changes are done to the installation or creating a new image everytime you changed something was very cumbersome. Instead, we thought if we converted our nodes to EBS backed instances, we won&#8217;t have to do any of it. We simply have to take a snapshot of the root device and then restore it incase the volume fails. </p>
<p>And this worked happily for few months. One day it suddenly stopped working. </p>
<p>There are many wayas to restore EBS backed instances from their snapshots. Here are all of the ways I knew:<br />
1) Register the snapshot as an AMI and start an instance from the image.<br />
2) Create a volume from your snapshot. Start a similar EBS backed instance, stop the instance and swap the root device.<br />
3) Create an AMI from a running instance. This causes the instance to reboot immediately. It wasn&#8217;t an option for us. There is no way were could afford to reboot our master!</p>
<p>You have to know kernel and ramdisk ids if you want to go for option 1 and 2. You may think it&#8217;s a no brainer &#8211; just use the meta data query tool and find out kernel and ramdisk of the running instances. But not all instances have that meta data available to them! Our instances did not have a ramdisk meta data available! When we contacted Amazon support they told us that the instance is very old and there is simply no way to know which ramdisk it is using. That means you need to choose a ramdisk yourself. If the kernel or ramdisk you are using to create AMI from the snapshot is not compaitable, your instance will not boot up correctly. And this is especially true in case of Ubuntu images.</p>
<p>That&#8217;s what happened with us. It stopped working &#8211; somehow the kernel files were not available. Even though ramdisk information was not available, it was the kernel that caused us a problem. Here is what Amazon support had to say on our problems:</p>
<p>&#8220;Your practice of taking snapshots and starting instances from those machines can work, as it has in the past, but will always be susceptible to kernel/ramdisk mismatches.&#8221;</p>
<p>&#8220;Our standard practice of creating an image (AMI) from a running instance (option 3 as described above) and launching instances from that AMI would avoid the problem you&#8217;re seeing with the mismatched/incompatible kernels.&#8221;</p>
<p>When we told Amazon that it&#8217;s not an option for us as it causes the instance to reboot immidiately, here is what they suggested:</p>
<p>&#8220;Have you considered writing data to an EBS volume that is separate from your root EBS volume?  I&#8217;m just wondering if that&#8217;s a viable option as it wouldn&#8217;t require stopping or rebooting the instance.&#8221;</p>
<p>There lies the answer! We have a requirement of recreating the cluster in case we accidently delete entire data or if we loose our master. In such a case the reliable backup can only be taken if your HDFS data does not reside on the root devices. A reliable backup of the root device cannot be taken without rebooting the device. Furthermore it&#8217;s stored as an AMI which mean you have to create a new AMI every day and delete the old one. This means to solve all of our problems we need HBase installation and data both stored on attached EBS volumes that are not the root devices. </p>
<p>It was news to us. </p>
<p>We had no choice. We decided to invest time to convert our architecture to use attached EBS volumes rather than waking up in the middle of a night and realizing that we are not able to restore our backup!</p>
]]></content:encoded>
			<wfw:commentRss>http://aws-musings.com/hbase-on-ec2-using-ebs-volumes-lessons-learned/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Manage EBS snapshots with a python script</title>
		<link>http://aws-musings.com/manage-ebs-snapshots-with-a-python-script/</link>
		<comments>http://aws-musings.com/manage-ebs-snapshots-with-a-python-script/#comments</comments>
		<pubDate>Sat, 29 Jan 2011 19:43:43 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[EBS]]></category>
		<guid isPermaLink="false">http://aws-musings.com/?p=406</guid>
		<description><![CDATA[I was looking for a simple script that creates a new ebs snapshot and deletes all the previous snapshots except a few newest snapshots. I found a script written in php called manage snapshots at http://www.thecloudsaga.com/aws-ec2-manage-snapshots/. But the script only deletes snapshots. It does not create a new snapshot. That is why I decided to [...]]]></description>
			<content:encoded><![CDATA[<p>I was looking for a simple script that creates a new <a href="http://aws.amazon.com/ebs/">ebs snapshot</a> and deletes all the previous snapshots except a few newest snapshots. I found a script written in php called manage snapshots at <a href="http://www.thecloudsaga.com/aws-ec2-manage-snapshots/">http://www.thecloudsaga.com/aws-ec2-manage-snapshots/</a>. But the script only deletes snapshots. It does not create a new snapshot. That is why I decided to write a script on my own.<br />
<span id="more-406"></span><br />
In this post I am going to describe how to use my script. You can download my script from <a href='http://aws-musings.com/wp-content/uploads/2011/01/manage_snapshots1.txt'>manage_snapshots.txt</a></p>
<p>It’s a <a href="http://www.python.org/">python script</a>. Change the extension to be .py. Ubuntu comes with python installed. You will need to install <a href="http://boto.cloudhackers.com/">aws python library boto</a>. To install boto all ob ubuntu you can simply execute<br />
<code><br />
sudo apt-get install python-boto</code></p>
<p>You will need to change the following lines with your aws access key and aws secret key. Replace MY_ACCESS_KEY_HERE with your access key and MY_SECRET_KEY_HERE with your secret key.<br />
<code><br />
# Substitute your access key and secret key here<br />
aws_access_key = 'MY_ACCESS_KEY_HERE'<br />
aws_secret_key = 'MY_SECRET_KEY_HERE'</code></p>
<p><strong>Here is how you can use the script:</strong><br />
The script takes three arguments:<br />
1) volume-id (required) &#8211; This is Amazon’s volume id<br />
2) number of snapshots to preserve (required) &#8211; An integer. If you specify 2 it will keep 2 newest snapshots (including the one it just created). If you specify 0, the script will delete all the snapshots (including the one it just created).<br />
3) description (optional) &#8211; Description you want to use for your snapshot.</p>
<p>Example:<br />
<code>python manage_snaphots.py vol-dkls343e 2  'log server daily snapshot'</code></p>
<p>This execution will  create a snapshot of vol-dkls343e and delete anything but that snapshot and the one before it. It will make sure that only two latest snapshots are kept. You can simply <a href="https://help.ubuntu.com/community/CronHowto">add a cron</a> per volume.</p>
]]></content:encoded>
			<wfw:commentRss>http://aws-musings.com/manage-ebs-snapshots-with-a-python-script/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Architecting for Cloud</title>
		<link>http://aws-musings.com/architecting-for-cloud/</link>
		<comments>http://aws-musings.com/architecting-for-cloud/#comments</comments>
		<pubDate>Thu, 13 Jan 2011 22:33:42 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[EC2]]></category>
		<guid isPermaLink="false">http://aws-musings.com/?p=376</guid>
		<description><![CDATA[A sunny Sunday morning. I am preparing to go out with my wife. And suddenly a pingdom alert comes to my phone. The website is down! I run to my ubuntu desktop and hurriedly open ylastic, amazon aws console, splunk etc. Console shows nothing unusual. The instance on which the website is running is shown [...]]]></description>
			<content:encoded><![CDATA[<p>A sunny Sunday morning. I am preparing to go out with my wife. And suddenly a <a href="http://www.pingdom.com">pingdom</a> alert comes to my phone. The website is down! I run to my ubuntu desktop and hurriedly open <a href="http://ylastic.com">ylastic</a>, amazon aws console, <a href="http://splunk.com">splunk</a> etc. Console shows nothing unusual. The instance on which the website is running is shown as &#8216;up&#8217;. I try to ssh the instance with its public dns name. I can&#8217;t get to it! The security groups are in place. Then what happened? Why is the instance not accessible?<br />
<span id="more-376"></span><br />
Does it sound familiar? Should we blame Amazon for such occasional failures? Not really. I don&#8217;t blame Amazon for such occasional failures at all. In my opinion, the moment I signed up for a cloud service, I signed up for this kind of behavior. The cloud technologies are not perfect yet and this kind of behavior is &#8216;normal&#8217;. Buying premium support will not buy you much. (I haven&#8217;t tried it, but I am pretty sure).</p>
<p><em>In my opinion, people who build their systems in AWS cloud should be prepared for such a failure. They need to design their systems in a way that can accommodate such failures.</em> Here are my suggestions to minimize the damage caused by such failures:</p>
<h4>1) Architect your system to operate without most of the components</h4>
<p>Some pieces might force the entire system to go down, but you may realize that a lot of them don&#8217;t. For example, in our case we cache all the configuration data in memory. Permanent home for this data is our RDS database. Because of the cache we get better performance and  if our database goes down, our app can keep functioning. There are some inserts done at real time, but we have a JMX based switch that can switch it off. That way the app can keep functioning as much as it could without needing the database. Also note that many systems have pieces that can go down without causing any damage. Identify them and be aware of them. That way you can avoid panic when they go down.<br />
<br/></p>
<h4>2) Use elastic load balancer for critical services.</h4>
<p>Elastic load balancer + auto scaling ensures that the service behind a load balancer is up all the time.<br />
<br/></p>
<h4>3) Use EBS volumes wherever data on the disk is important.</h4>
<p>Use EBS backed images wherever possible. It&#8217;s very easy to take backups of EBS volumes (snapshot) and it&#8217;s very easy to boot up a new instance with a backup. Furthermore in many cases we have found that if an instance is unreachable, simply rebooting the instance is sufficient to bring it up again. You can&#8217;t reboot a s3 backed instance without loosing disk data.<br />
<br/></p>
<h4>4) Duplicate your data (unless it&#8217;s in S3) on multiple machines. </h4>
<p>Use distributed file systems. We use HBase which uses Hadoop internally to make sure that the data is distributed and duplicated.<br />
<br/></p>
<h4>5) Create Snapshots often</h4>
<p>Do not hesitate to take snapshots every hour if you need to. <a href="http://aws.amazon.com/ebs/">Snapshots are incremental backups</a>. Every time you take a snapshot, only change are backed up to save space (and cost).<br />
<br/></p>
<h4>6) Use Amazon&#8217;s services wherever possible instead of using software installed on EC2 instances.</h4>
<p>For example use SQS and SNS for your asynchronous and publish-subscribe communication wherever possible instead of installing and taking care of queue systems yourself.<br />
<br/></p>
<h4>7) Follow best practices</h4>
<p>Follow EC2 best practices mentioned in the following document -<a href="http://jineshvaria.s3.amazonaws.com/public/cloudbestpractices-jvaria.pdf"> Architecting for Cloud: Best Practices</a>. It&#8217;s a very useful guide. </p>
]]></content:encoded>
			<wfw:commentRss>http://aws-musings.com/architecting-for-cloud/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Web Service version of the EC2 Instance Metadata Query Tool</title>
		<link>http://aws-musings.com/web-service-version-of-the-ec2-instance-metadata-query-tool/</link>
		<comments>http://aws-musings.com/web-service-version-of-the-ec2-instance-metadata-query-tool/#comments</comments>
		<pubDate>Wed, 29 Dec 2010 18:53:26 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[EC2]]></category>
		<guid isPermaLink="false">http://aws-musings.com/?p=366</guid>
		<description><![CDATA[I think most of the EC2 users know about EC2&#8242;s instance metadata query tool. It&#8217;s an executable you can install on your ec2 instance. Once you save the file and make it executable you can get metadata of the instance such as hostname, block-device-mapping, instance-id etc. But there is a web service version of this [...]]]></description>
			<content:encoded><![CDATA[<p>I think most of the EC2 users know about <a href="http://aws.amazon.com/code/1825?_encoding=UTF8&#038;jiveRedirect=1">EC2&#8242;s instance metadata query tool</a>. It&#8217;s an executable you can install on your ec2 instance. Once you save the file and make it executable you can get metadata of the instance such as hostname, block-device-mapping, instance-id etc. </p>
<p>But there is a web service version of this metadata tool too. This version is not so much advertised. Whenever I google &#8216;aws metadata query tool&#8217;, I never find it. I like the webservice version better becasue I don&#8217;t have to go through installation steps everytime I want get metadata of an instance. </p>
<p><span id="more-366"></span><br />
Here is how you can use the webservice version:</p>
<p><code>curl http://169.254.169.254/latest/meta-data/</code></p>
<p>This will print all the metadata fields available for querying. It should print something like:<br />
<code><br />
ami-id<br />
ami-launch-index<br />
ami-manifest-path<br />
block-device-mapping/<br />
hostname<br />
instance-action<br />
instance-id<br />
instance-type<br />
kernel-id<br />
local-hostname<br />
local-ipv4<br />
placement/<br />
public-hostname<br />
public-ipv4<br />
public-keys/<br />
reservation-id<br />
security-groups<br />
</code></p>
<p>Now, to get instance-id,you simply have to execute the following command:</p>
<p><code>curl http://169.254.169.254/latest/meta-data/instance-id</code></p>
<p>It will simply print the instance id of the machine you are executing this command on.</p>
<p>Please note that some metadata fields listed above have a / in front of them. The webservice url for those fields must have forward slash at the end. For example </p>
<p><code>curl http://169.254.169.254/latest/meta-data/public-keys</code></p>
<p>will not print anything. You must execute the command as:</p>
<p><code>curl http://169.254.169.254/latest/meta-data/public-keys/</code></p>
]]></content:encoded>
			<wfw:commentRss>http://aws-musings.com/web-service-version-of-the-ec2-instance-metadata-query-tool/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Choosing the right metrics for autoscaling your ec2 cluster</title>
		<link>http://aws-musings.com/choosing-the-right-metrics-for-autoscaling-your-ec2-cluster/</link>
		<comments>http://aws-musings.com/choosing-the-right-metrics-for-autoscaling-your-ec2-cluster/#comments</comments>
		<pubDate>Sun, 07 Nov 2010 01:12:29 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Autoscaling]]></category>
		<category><![CDATA[EC2]]></category>
		<guid isPermaLink="false">http://aws-musings.com/?p=343</guid>
		<description><![CDATA[At GumGum we are using autoscaling successfully. Choosing the right metrics for autoscaling is an ongoing process as your cluster and applications change. When we researched which metrics to use for autoscaling we found very little literature in the blogosphere. That is why I decided to document our experiences with it. To begin with let&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>At <a href="http://gumgum.com">GumGum</a> we are using <a href="http://aws.amazon.com/autoscaling/">autoscaling</a> successfully. Choosing the right metrics for autoscaling is an ongoing process as your cluster and applications change.  When we researched which metrics to use for autoscaling we found very little literature in the blogosphere. That is why I decided to document our experiences with it. </p>
<p><span id="more-343"></span></p>
<p>To begin with let&#8217;s look at an <a href="http://af-design.com/aws/auto_scaling/#as-create-or-update-trigger">autoscaling trigger creation command</a>:</p>
<p><code>as-create-or-update-trigger my-cpu --auto-scaling-group myautoscalinggroup --dimensions "AutoScalingGroupName=myautoscalinggroup" --measure CPUUtilization --period 60 --statistic Average --lower-threshold 20 --upper-threshold 50 --breach-duration 300 --upper-breach-increment 1 --lower-breach-increment 1</code></p>
<p>The dimensions parameter indicates that this is a trigger created on myautoscalinggroup autoscaling group. The measure indicates that it&#8217;s a trigger based on CPU Utilization. Period indicates that the trigger will be evaluated every 60 seconds. if upper threshold (average cpu utilization goes above 50%) is breached for 300 seconds then start 1 additional instance. If lower threshold (average cpu utilization goes below 20%) is breached for 300 seconds then 1 instance is scaled down.</p>
<p>The metrics on which a autoscaling trigger is built (<a href="http://aws.amazon.com/cloudwatch/">Cloudwatch</a> measure) is the most important decision you make when you set up an autoscaling cluster.</p>
<p>There are many metrics you can choose to scale up or down your cluster. CPU utilization, latency, network out bytes, disk reads in bytes etc. You can set up autoscaling based on whatever data Cloudwatch offers. </p>
<p>Latency or CPU utilization were obvious choices for our web cluster. We started out with latency as the metrics for autoscaling. Quickly we realized that latency was not the right metrics for us. Our application was making calls to some third party web services. Sometimes third party webservices problems caused our latency to go up. In that case, adding more instances to the cluster didn&#8217;t help much. Furthermore, third party calls made our latency spike unevenly. We wanted our cluster to go up smoothly as the traffic increases and come down smoothly as the traffic decreased.</p>
<p>We decided to switch to CPU Utilization as the metric for autoscaling. We started thinking about our upper and lower thresholds. We chose 50 as the upper threshold. It can take autoscaling few minutes to kick in. It was important for us to keep enough room (50%) for CPU to go up in those few minutes. In fact by setting a breach duration of 300 seconds we ensured that autoscaling did not kick in immediately. This adjustment accommodated occasional traffic spikes that went down on their own. By looking at our average CPU utilization, we decided to keep the lower threshold at 20% CPU utilization. This worked well when we had a relatively small cluster. Our traffic pattern caused dramatic increase in our average CPU utilization in a smaller cluster (3 to 5 instances). CPU went up to 50% quickly and came back down to 20% when traffic subsided.</p>
<p>Recently our traffic increased dramatically and as a result our cluster increased tenfold! Now we were running 40 instances! <a href="http://blog.kenweiner.com/">Our CTO</a> observed that the autoscaling was adding instances when they are necessary, but the cluster was not scaling down quickly enough. In a bigger cluster, average cpu was fairly stable and was not undergoing dramatic changes as in a smaller cluster. So our CTO decided to adjust the lower threshold to 40% of CPU Utilization. This worked and the autoscaling cluster started scaling down.</p>
<p>You can see the trigger scaling instances up and down throughout a day below. Time is on x axis and number of instances is on y axis.<br />
<a href="http://aws-musings.com/wp-content/uploads/2010/11/autoscaling-effect.png"><img src="http://aws-musings.com/wp-content/uploads/2010/11/autoscaling-effect.png" alt="" title="autoscaling-effect" width="402" height="134" class="alignnone size-full wp-image-353" /></a><br />
We hope our experiences are helpful to others implementing autoscaling. I am sure we will have to revisit these numbers when our cluster grows to 100s of instances! I will write a new blog post describing my experiences then.</p>
]]></content:encoded>
			<wfw:commentRss>http://aws-musings.com/choosing-the-right-metrics-for-autoscaling-your-ec2-cluster/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>7 Tips for running HBase in EC2</title>
		<link>http://aws-musings.com/7-tips-for-running-hbase-in-ec2/</link>
		<comments>http://aws-musings.com/7-tips-for-running-hbase-in-ec2/#comments</comments>
		<pubDate>Mon, 02 Aug 2010 00:14:31 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[EC2]]></category>
		<guid isPermaLink="false">http://aws-musings.com/?p=323</guid>
		<description><![CDATA[We are running a HBase (Currently 0.20.4) cluster on ec2. I thought it will be useful for others to know some tips about running HBase in EC2. 1) Use private dns addresses in config files such as hdfs-site.xml, hbase-site.xml. On ec2 Ubuntu instances java&#8217;s getHost() gets resolved to the private dns addresses. 2) Use c1.xlarge [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://gumgum.com">We</a> are running a <a href="http://hbase.apache.org/">HBase</a> (Currently 0.20.4) cluster on ec2. I thought it will be useful for others to know some tips  about running HBase in EC2. </p>
<p><strong>1) Use private dns addresses</strong> in config files such as hdfs-site.xml, hbase-site.xml. On ec2 Ubuntu instances java&#8217;s getHost() gets resolved to the private dns addresses. </p>
<p><strong>2) Use <a href="http://aws.amazon.com/ec2/instance-types/">c1.xlarge</a> or bigger node to start with. </strong>I have seen Andrew Purtell (HBase committer) recommending this on <a href="http://hbase.apache.org/mailing_lists.html">HBase mailing list</a>. We have tried m1.large machines. It has worked well for us when our traffic was small. We are hitting HBase in real time. As traffic started increasing we started getting CPU maxouts. Currently we use c1.xlarge machines.</p>
<p><span id="more-323"></span></p>
<p><strong>3) Define a security group for all of your Hadoop/HBase nodes.</strong> Hadoop, HBase, Zookeeper nodes need to talk with each other on many ports. It&#8217;s better to assign one security group to all the nodes and give that group permission to talk to it self.</p>
<p><strong>4) Use dfs.host property in hdfs-site.xml.</strong> This property allows you to specify a path of a file with a list of dns addresses of allowable nodes. Only nodes listed in the file will be allowed to join the Hadoop cluster. This becomes important in case you are spanning a QA cluster in ec2. You don&#8217;t want your QA node to join your production cluster accidentally. If you add a new node, you need execute the following command to refresh the allowable nodes list to avoid entire cluster restart:<br />
<code><br />
hadoop dfsadmin -refreshNodes<br />
</code><br />
I am assuming here that you are not running map reduce on the same cluster. If you want to run map reduce on the same cluster do not use a similar setting in mapred-site.xml. There is no refreshNodes option available for mradmin. It&#8217;s coming up in 0.21. </p>
<p><strong>5) Use EBS backed instances for a Hadoop/HBase nodes.</strong> There are many benefits of using EBS volumes instead of the s3 backed ones. For example, let&#8217;s say you tried a different vm args on a node to test the performance. And if the very same node goes down, you don&#8217;t have to worry about loosing those settings. All you need to do is take a snapshot of that volume and spawn an instance. I generally find it easier to deal with EBS backed instances than S3 backed instances. The added cost is too small in comparison with the added convenience of not loosing data.</p>
<p><strong>6) Prepare an AMI with Hadoop, HBase installation</strong> and with all the configurations. This will allow you to add a node quickly when you want to do so.</p>
<p><strong>7) Use Elastic Map Reduce to run map reduce jobs with your HBase.</strong> We have found this much more cost effective. Let&#8217;s say you have a 7 node HBase cluster. But your map reduce jobs need at least 10 nodes for it to finish on time. Elastic map reduce will allow you to spawn a 10 node job with your HBase. The load on your HBase cluster will reduce and you don&#8217;t have to pay for 3 extra nodes 24 x 7. Furthermore, you can spawn cheaper instances for your EMR job than you are using for your HBase (c1.xlarge) if you want to. We use c1.mediums.</p>
<p>HBase guys have written scripts to spawn a cluster in ec2. The scripts makes it easier to spawn an entire cluster. Don&#8217;t hesitate to try it out. Let me know if you have any other tips that you would like to share. I am sure many people are running their HBase clusters on ec2. I am eager to learn from their experiences.</p>
]]></content:encoded>
			<wfw:commentRss>http://aws-musings.com/7-tips-for-running-hbase-in-ec2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why Amazon&#8217;s new AWS java sdk sucks</title>
		<link>http://aws-musings.com/why-amazons-new-aws-java-sdk-sucks/</link>
		<comments>http://aws-musings.com/why-amazons-new-aws-java-sdk-sucks/#comments</comments>
		<pubDate>Sat, 12 Jun 2010 00:48:43 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[SDK]]></category>
		<guid isPermaLink="false">http://aws-musings.com/?p=292</guid>
		<description><![CDATA[I am an ardent fan and user of Amazon&#8217;s AWS and that is the reason I don&#8217;t like their new API. Amazon has so far done a great job of making their services very intuitive, simple and easy to use. But somehow they forgot their principals while designing the sdk. In this post I am [...]]]></description>
			<content:encoded><![CDATA[<p>I am an ardent fan and user of Amazon&#8217;s AWS and that is the reason I don&#8217;t like their <a href="http://aws.amazon.com/sdkforjava/">new API</a>. Amazon has so far done a great job of making their services very intuitive, simple and easy to use. But somehow they forgot their principals while designing the sdk. In this post I am planning state my case.</p>
<p><span id="more-292"></span></p>
<p>Let&#8217;s discuss an example. The example is in groovy, but I am sure the code can be understood by everybody: </p>
<p>             <code><br />
                 def registerInstance(String instanceId, String loadBalancerName) {<br />
                 &nbsp;&nbsp;def request request = new RegisterInstancesWithLoadBalancerRequest()<br />
                 &nbsp;&nbsp;def server = new com.amazonaws.services.elasticloadbalancing.model.Instance()<br />
                 &nbsp;&nbsp;server.setInstanceId(instanceId)<br />
                 &nbsp;&nbsp;request = request.withInstances([server]).withLoadBalancerName(loadBalancerName)<br />
                 &nbsp;&nbsp;def client = new AmazonElasticLoadBalancingClient(awsCredentials)<br />
                 &nbsp;&nbsp;client.registerInstancesWithLoadBalancer(request)<br />
                 &nbsp;&nbsp;logger.info "${instance.getPublicDnsName()} registered with ${loadBalancerName} load balancer"<br />
             }<br />
</code></p>
<p>Above code simply registers a given instance with a load balancer. Now let&#8217;s try to achieve the same thing using a different API available on <a href="http://code.google.com/">Google code</a>. It&#8217;s called <a href="http://code.google.com/p/typica/">Typica</a> and is written by <a href="http://code.google.com/p/typica/people/list">dkavanagh and two other persons</a>.</p>
<p><code><br />
def registerInstance(Stirng instanceId, String loadBalancerName) {<br />
&nbsp;&nbsp;def loadBalancing = new LoadBalancing(accessKey, secretKey)<br />
&nbsp;&nbsp;loadBalancing.registerInstancesWithLoadBalancer(loadBalancerName, [instaceId])<br />
}</code></p>
<p>Now which one is easier to read?  Which one is faster to code? Which is more intuitive? Obviously the second one. I simply don&#8217;t understand the reason for long class names they have throughout the API and the request and result pattern. Every single method in Amazon&#8217;s sdk take a request object and return a result object. You are forced to create these extra long name objects! Thank god I did not write the code in java here, otherwise it wouldn&#8217;t have been even bigger as in Java you have to repeat a class name twice in a line if you want to create an object. The whole api is full of such examples.</p>
<p>I am not the only one screaming over the API. Steve Jin has expressed similar concerns about the API in his <a href="http://www.doublecloud.org/2010/04/amazon-aws-sdk-for-java-it%E2%80%99s-not-quite-there-yet/">DoubleCloud Blog</a>. According to Steve the API lacks consistency, clear object model and the structure of the API is flawed. </p>
<p>Hope enough people scream over the Internet so that Amazon can hear it.</p>
]]></content:encoded>
			<wfw:commentRss>http://aws-musings.com/why-amazons-new-aws-java-sdk-sucks/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Amazon AWS Java SDK released</title>
		<link>http://aws-musings.com/amazon-aws-java-sdk-released/</link>
		<comments>http://aws-musings.com/amazon-aws-java-sdk-released/#comments</comments>
		<pubDate>Fri, 16 Apr 2010 01:25:06 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Autoscaling]]></category>
		<category><![CDATA[EBS]]></category>
		<category><![CDATA[EC2]]></category>
		<category><![CDATA[RDS]]></category>
		<category><![CDATA[S3]]></category>
		<category><![CDATA[SNS]]></category>
		<category><![CDATA[SQS]]></category>
		<guid isPermaLink="false">http://aws-musings.com/?p=252</guid>
		<description><![CDATA[Amazon recently announced the AWS SDK for java. SDK or a java api is very much needed &#8211; especially if you are writing your automation scripts in groovy. We have tried multiple java apis in our scripts including JetS3t and Typica. These apis were really helpful, but they only supported some of the AWS services [...]]]></description>
			<content:encoded><![CDATA[<p>Amazon recently <a href="http://aws.amazon.com/about-aws/whats-new/2010/03/22/announcing-the-aws-sdk-for-java/">announced</a>  the AWS SDK for java. </p>
<p>SDK or a java api is very much needed &#8211; especially if you are writing your automation scripts in groovy. We have tried multiple java apis in our scripts including <a href="http://jets3t.s3.amazonaws.com/index.html">JetS3t</a>  and <a href="http://code.google.com/p/typica/">Typica</a>. These apis were really helpful, but they only supported some of the AWS services and were not up to date (for obvious reasons). Having one java api that can support all of AWS technologies was definitely the need of the hour. I am sure Amazon will keep it updated as new services are released. They have the necessary resources to do so.</p>
<p>Furthermore, Amazon has also uploaded the SDK to the <a href="http://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk/1.0.002">maven repository</a>.<br />
You can use the following dependency in your pom.xml:<br />
<code><br />
&lt;dependency><br />
    &nbsp;&nbsp;&nbsp;&nbsp;&lt;groupId>com.amazonaws&lt;/groupId><br />
    &nbsp;&nbsp;&nbsp;&nbsp;&lt;artifactId>aws-java-sdk&lt;/artifactId><br />
    &nbsp;&nbsp;&nbsp;&nbsp;&lt;version&gt;1.0.002&lt;/version><br />
&lt;/dependency><br />
</code><br />
The java doc for the SDK is hosted at <a href="http://docs.amazonwebservices.com/AWSJavaSDK/latest/javadoc/index.html">http://docs.amazonwebservices.com/AWSJavaSDK/latest/javadoc/index.html</a></p>
<p>Amazon has also opened the SDK source code for all. They have mirrored the SDK code repository at github. You can look at the SDK code at <a href="http://github.com/amazonwebservices/aws-sdk-for-java">http://github.com/amazonwebservices/aws-sdk-for-java</a></p>
]]></content:encoded>
			<wfw:commentRss>http://aws-musings.com/amazon-aws-java-sdk-released/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Amazon Simple Notification Service &#8211; an easy messaging system in the cloud</title>
		<link>http://aws-musings.com/amazon-simple-notification-service-easy-messaing-system-in-cloud/</link>
		<comments>http://aws-musings.com/amazon-simple-notification-service-easy-messaing-system-in-cloud/#comments</comments>
		<pubDate>Fri, 09 Apr 2010 22:34:10 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[SNS]]></category>
		<guid isPermaLink="false">http://aws-musings.com/?p=238</guid>
		<description><![CDATA[Amazon recently announced a new service called Simple Notification Service. It provides a cheap publish/subscribe messaging system in the cloud. You can learn how to use it by visiting http://docs.amazonwebservices.com/sns/latest/gsg/ I played with it and found that the service is really easy to use, very robust and very extensible. It basically makes a publish/subscribe messaging [...]]]></description>
			<content:encoded><![CDATA[<p>Amazon recently announced a new service called Simple Notification Service.<br />
It provides a cheap publish/subscribe messaging system in the cloud. You can learn how to use it by visiting <a href="http://docs.amazonwebservices.com/sns/latest/gsg/">http://docs.amazonwebservices.com/sns/latest/gsg/</a></p>
<p>I played with it and found that the service is really easy to use, very robust and very extensible. It basically makes a publish/subscribe messaging service similar to <a href="http://java.sun.com/products/jms/tutorial/1_3_1-fcs/doc/basics.html">JMS</a> available in the cloud. Having a cheap, robust publish/subscribe messaging system can serve many purposes in the cloud. In an auto scaled environment servers go up and down depending upon traffic. Here are some of the usages I can think of in our environment:</p>
<p><span id="more-238"></span><br />
<strong>Discover members of a cluster</strong><br />
We use JMX to switch on/off various services in our java webapp. <a href="http://www.jmanage.org/">JManage</a> allows us to configure a cluster and call a method remotely on the entire cluster. That is how we flush caches, switch on/off services etc. JManage is not sufficient by itself in an autoscaled environment as it does not detect new nodes automatically. When a new node comes up, somebody needs to configure the node in the JManage cluster. You can create a topic and publish a message to it with the external dns name of the newly born server in it. This message can be then consumed by a piece of code that can add the new node in the JManage Cluster. It&#8217;s also possible to similarly remove a node by publishing a message when the instance is terminated by auto scaling.</p>
<p>Discovering members of a cluster is very useful in a cloud environment especially in an auto scaled environment. As Http is supported as a protocol, it&#8217;s possible call a servlet (or similar concept in other platforms) url. This opens up a lot of possibilities. You can pretty much execute anything you want.</p>
<p><strong>Send email alerts without having to configure a SMTP server</strong><br />
Simply put, you can use SNS to send simple emails. All the email addresses need to be the subscribers of the topic, but that&#8217;s a trivial one time activity. No need to configure and maintain any SMTP server!</p>
<p><strong>Some nice to haves features</strong><br />
There is one particular thing I find annoying about SNS. Let&#8217;s say you subscribed to a topic using the email protocol. You will get a long string identifier called subscription arn. When you want to unsubscribe this email, you cannot supply the email address and the topic, you must supply the subscription arn. That&#8217;s not a good idea. It forces applications to store the subscription arn. To me an email and a topic combination is unique enough for them to determine the subscription. It will be much easier for applications if they change the unsubscribe api to accept the topic and the email address you want to unsubscribe. I can understand that they probably want their api to be protocol agnostic, but it will be a nice to have feature.</p>
<p>Furthermore, you cannot set the subject of the notification email. That means, if I am using multiple notifications in application, all the notifications will have the same standard subject line &#8211; <em>AWS Notification Message</em>.</p>
<p>It would also be nice to have twitter as a protocol. We use twitter heavily for notification. Twitter allows us to get notifications via text messages. <a href="http://pingdom.com/">Pingdom</a> got us into it.</p>
]]></content:encoded>
			<wfw:commentRss>http://aws-musings.com/amazon-simple-notification-service-easy-messaing-system-in-cloud/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Web serving in the cloud &#8211; our experiences with nginx and instance sizes</title>
		<link>http://aws-musings.com/web-serving-in-the-cloud-our-experiences-with-nginx-and-instance-sizes/</link>
		<comments>http://aws-musings.com/web-serving-in-the-cloud-our-experiences-with-nginx-and-instance-sizes/#comments</comments>
		<pubDate>Tue, 30 Mar 2010 01:05:23 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[EC2]]></category>
		<guid isPermaLink="false">http://aws-musings.com/?p=211</guid>
		<description><![CDATA[We have been doing various experiments in our ec2 web serving cluster to serve maximum traffic at the minimum costs. I thought our experience will be useful to many other people using ec2. We have a web application with nginx + tomcat. We approximately get 200k requests per minute at the peak and about 65K [...]]]></description>
			<content:encoded><![CDATA[<p>We have been doing various experiments in our ec2 web serving cluster to serve maximum traffic at the minimum costs. I thought our experience will be useful to many other people using ec2.</p>
<p>We have a web application with nginx + tomcat. We approximately get  200k requests per minute at the peak and about 65K requests per minute at night. Since we host a webservice and not a webpage, most of our requests are servlet requests (and not the faster file serving, nginx based requests).</p>
<p><span id="more-211"></span><br />
We began our experiment with two m1.small machines. We set the autoscaling group minimum to two and set the tomcat file handle limit to be 65535.<br />
We based on autoscaling metric on latency. We asked our autoscaling group to increase the capacity if the latency goes beyond 0.75 seconds.</p>
<p>This didn&#8217;t work that well. Here is what happened:</p>
<p>1) We started getting cpu bursts in peak hours. CPU used to jump to 100%, there by causing latency to go beyond 0.75 seconds periodically. The autoscaling used to kick in, starting a new server. The latency use to fall back below 0.75 within few minutes and the autoscaling used to cut back the capacity to the minimum.</p>
<p>2) We also started getting nginx errors. Here is what the error said:<br />
<code><br />
(24: Too many open files) while accepting new connection on 0.0.0.0:80<br />
</code></p>
<p>Afer some research, we realized that the default nginx settings were not meant for the kind of scale we were dealing with. We changed the following settings in the /etc/nginx/nginx.conf:<br />
<code><br />
worker_processes 4;<br />
worker_rlimit_nofile 10240;<br />
events {<br />
  worker_connections 8192;<br />
}<br />
</code><br />
The nginx errors stopped. And the CPU bursts evened out for the non peak period. Here is how the graph looked right after we made the nginx change.</p>
<p><a href="http://aws-musings.com/wp-content/uploads/2010/03/Effects-of-nginx-settings-on-cpu.png"><img src="http://aws-musings.com/wp-content/uploads/2010/03/Effects-of-nginx-settings-on-cpu.png" alt="" title="Effects-of-nginx-settings-on-cpu" width="300" height="237" class="alignnone size-full wp-image-214" /></a></p>
<p>You can clearly see that the change the bursts stopped at one point (when we made the change). But the CPU bursts started coming back in the peak time. This time, nginx was fine and there were no errors in the log.</p>
<p>This was clearly a signal that m1.small was not performing well at that load (for our application). We decided to switch to c1.mediums.  We knew that <a href="http://aws.amazon.com/ec2/instance-types/">c1.mediums have 5 EC2 compute units where as the m1.smalls have 1 EC2 compute unit.</a> But we wanted to see how far m1.smalls can take us. The switch totally worked! Cpu bursts stopped. Autoscaling stopped kicking in. We can see the cpu going from 10% to 50% smoothly from non peak to peak hours. This is what we wanted!</p>
<p>Obviously two c1.mediums cost more than two m1.smalls. But we belive that we will be able to cope up with much larger growth using c1.mediums as the CPU is always hovering between 10 to 40%. In long term, it will definitely save us money. We will need less number of machines and we won&#8217;t waste money on instances getting started for a few minutes and getting shutdown when the bursts subside. </p>
]]></content:encoded>
			<wfw:commentRss>http://aws-musings.com/web-serving-in-the-cloud-our-experiences-with-nginx-and-instance-sizes/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
	</channel>
</rss>
