How to create and run multiple ec2 instances with same configurations and software installed?

Question

Fairly new to cloud computing, so bear with me if question is obvious or silly. With tons of information available on internet, I was able to successfully create an ec2 linux instance and installed R and Rstudio on it. Ran my scripts on it which went really well but took too long (16 hrs) and very expensive as well since

Accepted Answer

Problem: For a school project, I was running several machine learning algorithms on pretty large size data which happened to requre 30-35GB of memory and my PC couldn&#8217;t handle it. I was using R/RStudio. So, I resorted to AWS for my memory limitation problem.What I did initially: I created an ec2-instance, installed R/RStudio. Everything worked out perfectly and I was able to run my programs on RStudio through browser. I actually, ran my scripts on a very small dataset on this AWS instance to see how things are going. To much of my surprise it took very long for the whole script to run even with this small dataset. Soon enough, I realized that all these algorithms in my programs could be run independently for the same set of features with a little tweak in the scripts.So, I decided to play with AWS little bit. I recreated the programs such that everything stayed the same except the learning algorithms in each script. In other words, I wanted to simultaneously run a copy of these programs with different algorithms so that I could get everything running simultaneously and produce the results in a smaller amount of time.Now, my goal was to run multiple copies of this instance (original instance). And I should be able to run RStudio on my browser for each of these instances e.g. 5 ec2 instances will have 5 RStudio running concurrently on browser&#8217;s different tabs. With that, I would be able to run all RStudio for each instance on my browser.Then, I created an image (AMI) of this instance and then I created multiple instances from the AMI but I was missing out few points while creating those new instances from AMI which caused the problem I asked in the question above.I initially suspected that it has something to do with port 8787 and I might not be able to run multiple RStudio for each ec2 instance in the browser. However, that was not the problem at all.There are few very important things to take care of while you create the new instances from an AMI.Mistake: While CREATING new instances from this AMI, I was NOT selecting two important things correctly i.e. VPC and Security Group. Correct method is: VPC &#8212; On the &#8220;Configuration Instance Details&#8221; page:a. Click the &#8220;Network&#8221; dropdown and select the VPC which was created for the original instance. (Original instance is the one which is used to create the AMI (image))b. Click the &#8220;Auto-assign Public IP&#8221; dropdown and select EnableSecurity Group &#8212; On the &#8220;Configure Security Group&#8221; page:a. for &#8220;Assign a security group&#8221; option, tick the &#8220;Select an existing security group&#8221; options b. If there are more than one security groups in the list then select the one which was created for the original instance (OR create a new Security Group and make sure that it has the same kind of inbound and outbound port accesses)Once I set this up, as Marc B mentioned in the comments, each instance gets its own IP address, and a local subnet address is assigned as wellIP address of instance looks like : ec2-33-444-22-111.us-west-1.compute.amazonaws.comsubnet looks like:  127.0.0.35Now, after learning this, I recreated 5 instances from my AMI. So, now I had 5 instances with RStudio on each of them. All of them were running perfectly fine because I was able to SSH into each of them.Now I thought I should be able to work with these instances in different tabs of the browser and run my scripts in them. But I wasn&#8217;t able to login to all the RStudio instances in my browser tabs. Only one of them was working fine and the others were just not working in the browser. However, I was able to SSH into all of them from PuTTY. I could have ran my scripts from Linux (SSH) as well but I wanted to run them using RStudio.After spending a good number of hours on this, I figured out the problem that the RStudio server needs to be started manually for each ec2 instance in the linux except the very first instance.For one of the ec2-instance (besides the one which was working fine on browser), I did the following to start the RStudio server manually as below:SSH using PuTTYBecome root: sudo suGo to this path where RStudio was installed on my Linux instance:  cd /usr/lib/rstudio-server/binstart RServer with this command : rstudio-server startNow go back to the browser, open another tab and use your ec2-instance address and port number (http://ec2-33-444-22-111.us-west-1.compute.amazonaws.com:8787). And now you should get the login page of RStudio for this instance as well.Now, with a similar process, I had to manually run RStudio-servers for all other instance in order to be able to access them through the browser. Then I thought, if there is a way to start the RStudio server when Linux starts up every time. Then came up with a solution. To do this, I made a change in one of the configuration files of Linux as follows: Become root: sudo sugo to this path: cd /etc/rc.dvi the file rc.local and add the following command:/usr/lib/rstudio-server/bin/rstudio-server startsave the changes you made.close the SSH connectionThen, I went back to the AWS console, stopped this instance  and created an AMI (image) of it. Now the above changes will be effective for each instance that I create from this AMI i.e. now RStudio server will be started as soon as the instance boots and will be accessible through the browser. Now I can use multiple RStudio instances using different tabs of my browser. Make sure you are using the correct instance address in the browser. Port number stays same for all i.e. 8787

Advertisement

Answer