Skip to content
Advertisement

How to create and run multiple ec2 instances with same configurations and software installed?

Fairly new to cloud computing, so bear with me if question is obvious or silly. With tons of information available on internet, I was able to successfully create an ec2 linux instance and installed R and Rstudio on it. Ran my scripts on it which went really well but took too long (16 hrs) and very expensive as well since I require instances with high memory and vCPUs .

In my programs, I am essentially running the same scripts for different datasets.

My question is, is there any way I can run multiple similar instances of ec2 (with exactly same software installed and my scripts). So, this way, I will be able to run my scripts on every dataset on a separate instance simultaneously in less amount of time.

So what I have tried so far. I have created an AMI image of my existing instance and launched it. But I couldn’t SSH it because of its weird username and ip address, something like “root@10.0.0.1”. I can see both instances are running (original and the AMI image instance), I can SSH into original but not into the other one. I am able to login to the RStudio for original instance on port 8787.

Another question is how to launch this AMI imaged instance using SSH (Putty) in parallel with original instance. What problem will it cause if I use both of them in the browser (RStudio in this case) simultaneously?

Please help me with this!Thanks!

Advertisement

Answer

Problem: For a school project, I was running several machine learning algorithms on pretty large size data which happened to requre 30-35GB of memory and my PC couldn’t handle it. I was using R/RStudio. So, I resorted to AWS for my memory limitation problem.

What I did initially: I created an ec2-instance, installed R/RStudio. Everything worked out perfectly and I was able to run my programs on RStudio through browser. I actually, ran my scripts on a very small dataset on this AWS instance to see how things are going. To much of my surprise it took very long for the whole script to run even with this small dataset. Soon enough, I realized that all these algorithms in my programs could be run independently for the same set of features with a little tweak in the scripts.

So, I decided to play with AWS little bit. I recreated the programs such that everything stayed the same except the learning algorithms in each script. In other words, I wanted to simultaneously run a copy of these programs with different algorithms so that I could get everything running simultaneously and produce the results in a smaller amount of time.

Now, my goal was to run multiple copies of this instance (original instance). And I should be able to run RStudio on my browser for each of these instances e.g. 5 ec2 instances will have 5 RStudio running concurrently on browser’s different tabs. With that, I would be able to run all RStudio for each instance on my browser.

Then, I created an image (AMI) of this instance and then I created multiple instances from the AMI but I was missing out few points while creating those new instances from AMI which caused the problem I asked in the question above.

I initially suspected that it has something to do with port 8787 and I might not be able to run multiple RStudio for each ec2 instance in the browser. However, that was not the problem at all.

There are few very important things to take care of while you create the new instances from an AMI.

Mistake: While CREATING new instances from this AMI, I was NOT selecting two important things correctly i.e. VPC and Security Group.

Correct method is:

VPC — On the “Configuration Instance Details” page:

a. Click the “Network” dropdown and select the VPC which was created for the original instance. (Original instance is the one which is used to create the AMI (image))

b. Click the “Auto-assign Public IP” dropdown and select Enable

Security Group — On the “Configure Security Group” page:

a. for “Assign a security group” option, tick the “Select an existing security group” options

b. If there are more than one security groups in the list then select the one which was created for the original instance (OR create a new Security Group and make sure that it has the same kind of inbound and outbound port accesses)

Once I set this up, as Marc B mentioned in the comments, each instance gets its own IP address, and a local subnet address is assigned as well

IP address of instance looks like : ec2-33-444-22-111.us-west-1.compute.amazonaws.com

subnet looks like: 127.0.0.35

Now, after learning this, I recreated 5 instances from my AMI. So, now I had 5 instances with RStudio on each of them. All of them were running perfectly fine because I was able to SSH into each of them.

Now I thought I should be able to work with these instances in different tabs of the browser and run my scripts in them. But I wasn’t able to login to all the RStudio instances in my browser tabs. Only one of them was working fine and the others were just not working in the browser. However, I was able to SSH into all of them from PuTTY. I could have ran my scripts from Linux (SSH) as well but I wanted to run them using RStudio.

After spending a good number of hours on this, I figured out the problem that the RStudio server needs to be started manually for each ec2 instance in the linux except the very first instance.

For one of the ec2-instance (besides the one which was working fine on browser), I did the following to start the RStudio server manually as below:

  1. SSH using PuTTY

  2. Become root: sudo su

  3. Go to this path where RStudio was installed on my Linux instance: cd /usr/lib/rstudio-server/bin

  4. start RServer with this command : rstudio-server start

Now go back to the browser, open another tab and use your ec2-instance address and port number (http://ec2-33-444-22-111.us-west-1.compute.amazonaws.com:8787). And now you should get the login page of RStudio for this instance as well.

Now, with a similar process, I had to manually run RStudio-servers for all other instance in order to be able to access them through the browser. Then I thought, if there is a way to start the RStudio server when Linux starts up every time. Then came up with a solution. To do this, I made a change in one of the configuration files of Linux as follows:

  1. Become root: sudo su

  2. go to this path: cd /etc/rc.d

  3. vi the file rc.local and add the following command:

    /usr/lib/rstudio-server/bin/rstudio-server start

  4. save the changes you made.

  5. close the SSH connection

Then, I went back to the AWS console, stopped this instance and created an AMI (image) of it. Now the above changes will be effective for each instance that I create from this AMI i.e. now RStudio server will be started as soon as the instance boots and will be accessible through the browser.

Now I can use multiple RStudio instances using different tabs of my browser. Make sure you are using the correct instance address in the browser. Port number stays same for all i.e. 8787

Advertisement