The CloudLab Manual
CloudLab is a "meta-cloud"—
The current CloudLab deployment consists of more than 25,000 cores distributed across three sites at the University of Wisconsin, Clemson University, and the University of Utah. CloudLab interoperates with existing testbeds including GENI and Emulab, to take advantage of hardware at dozens of sites around the world.
The control software for CloudLab is open source, and is built on the foundation established for Emulab, GENI, and Apt. Pointers to the details of this control system can be found on CloudLab’s technology page.
8.7 Specify an operating system and set install and execute scripts |
1 CloudLab Status Notes
CloudLab is open! The hardware that is currently deployed our Phase I deployment and most of Phase II, and we are still constantly adding new features.
2 Getting Started
This chapter will walk you through a simple experiment on CloudLab and introduce you to some of its basic concepts.
Start by pointing your browser at https://www.cloudlab.us/.
- Log inYou’ll need an account to use CloudLab. If you already have an account on Emulab.net, you may use that username and password. Or, if you have an account at the GENI portal, you may use the “GENI User” button to log in using that account. If not, you can apply to start a new project at https://www.cloudlab.us/signup.php and taking the "Start New Project" option. See the chapter about CloudLab users for more details about user accounts.
- Start ExperimentFrom the top menu, click “Experiments” and then “Start Experiment” to begin.
- Experiment WizardExperiments must be configured before they can be instantiated. A short wizard guides you through the process. The first step is to pick a profile for your experiment. A profile describes a set of resources (both hardware and software) that will be used to start your experiment. On the hardware side, the profile will control whether you get virtual machines or physical ones, how many there are, and what the network between them looks like. On the software side, the profile specifies the operating system and installed software.Profiles come from two sources. Some of them are provided by CloudLab itself, and provide standard installation of popular operating systems, software stacks, etc. Others are created by other researchers and may contain research software, artifacts and data used to gather published results, etc. Profiles represent a powerful way to enable repeatable research.Clicking the “Change Profile” button will let you select the profile that your experiment will be built from.
- Select a profileOn the left side is the profile selector which lists the profiles you can choose. The list contains both globally accessible profiles and profiles accessible to the projects you are part of.The large display in this dialog box shows the network topology of the profile, and a short description sits below the topology view.The OpenStack profile will give you a small OpenStack installation with one master node and one compute node. It provides a simple example of how complex software stacks can be packaged up within CloudLab. If you’d prefer to start from bare metal, look for one of the profiles that installs a stock operating system on physical machines.
- Choose ParametersSome profiles are simple and provide the same topology every time they are instantiated. But others, like the OpenStack profile, are parameterized and allow users to make choices about how they are instantiated. The OpenStack profile allows you to pick the number of compute nodes, the hardware to use, and many more options. The creator of the profile chooses which options to allow and provides information on what those options mean. Just mouse over a blue ’?’ to see a description of an option. For now, stick with the default options and click “Next” to continue.
- Pick a clusterCloudLab can instantiate profiles on several different backend clusters. The cluster selector is located right above the “Create” button; the the cluster most suited to the profile you’ve chosen will be selected by default.
- Click Create!When you click the “Create” button, CloudLab will start preparing your experiment by selecting nodes, installing software, etc. as described in the profile. What’s going on behind the scenes is that on one (or more) of the machines in one of the CloudLab clusters, a disk is being imaged, VMs and/or physical machines booted, accounts created for you, etc. This process usually takes a couple of minutes.
- Use your experimentWhen your experiment is ready to use, the progress bar will be complete, and you’ll be given a lot of new options at the bottom of the screen.The “Topology View” shows the network topology of your experiment (which may be as simple as a single node). Clicking on a node in this view brings up a terminal in your browser that gives you a shell on the node. The “List View” lists all nodes in the topology, and in addition to the in-browser shell, gives you the command to ssh login to the node (if you provided a public key). The “Manifest” tab shows you the technical details of the resources allocated to your experiment. Any open terminals you have to the nodes show up as tabs on this page.Clicking on the “Profile Instructions” link (if present) will show instructions provided by the profile’s creator regarding its use.Your experiment is yours alone, and you have full “root” access (via the sudo command). No one else has access to the nodes in your experiment, and you may do anything at all inside of it, up to and including making radical changes to the operating system itself. We’ll clean it all up when you’re done!If you used our default OpenStack profile, the instructions will contain a link to the OpenStack web interface. The instructions will also give you a username and password to use.Since you gave CloudLab an ssh public key as part of account creation, you can log in using the ssh client on your laptop or desktop. The contoller node is a good place to start, since you can poke around with the OpenStack admin commands. Go to the "list view" on the experiment page to get a full command line for the ssh command.Your experiment will terminate automatically after a few hours. When the experiment terminates, you will lose anything on disk on the nodes, so be sure to copy off anything important early and often. You can use the “Extend” button to submit a request to hold it longer, or the “Terminate” button to end it early.
2.1 Next Steps
Try a profile that runs bare metal and set up a cloud stack yourself
Making your own profiles is easy: see the chapter on profile creation for instructions.
If you need help, or have questions or comments about CloudLab’s features, contact us!
3 CloudLab Users
Registering for an account is quick and easy. Registering doesn’t cost anything, it’s simply for accountability. We just ask that if you’re going to use CloudLab for anything other than light use, you tell us a bit more about who you are and what you want to use CloudLab for.
Users in CloudLab are grouped into projects: a project is a (loosely-defined) group of people working together on some common goal, whether that be a research project, a class, etc. CloudLab places a lot of trust on project leaders, including the ability to authorize others to use the CloudLab. We therefore require that project leaders be faculty, senior research staff, or others who are relatively senior positions.
3.1 GENI Users
If you already have a GENI account, you may use it instead of creating a new CloudLab account. On the login page, select the “GENI User” button. You will be taken to a page like the one below to select where you normally log into your GENI account.
From here, you will be taken to the login page of your GENI federate; for example, the login page for the GENI portal is shown below.
After you log in, you will asked to authorize the CloudLab portal to use this account on your behalf. If your certificate at your GENI aggregate has a passphrase on it, you may be asked to enter that passphrase; if not, (as is the case with the GENI portal) you will simply see an “authorize” button as below:
That’s it! When you log in a second time, some of these steps may be skipped, as your browser has them cached.
3.2 Register for an Account
To get an account on CloudLab, you either join an existing project or create a new one. In general, if you are a student, you should join a project led by a faculty member with whom you’re working.
If you already have an account on
Emulab.net, you don’t need to sign
up for a new account on CloudLab—
3.2.1 Join an existing project
To join an existing project, simply use the “Sign Up” button found on every CloudLab page. The form will ask you a few basic questions about yourself and the institution you’re affiliated with.
An SSH public key is required; if you’re unfamiliar with creating and using ssh keypairs, we recommend taking a look at the first few steps in GitHub’s guide to generating SSH keys. (Obviously, the steps about how to upload the keypair into GitHub don’t apply to CloudLab.)
CloudLab will send you email to confirm your address—
watch for it (it might end up in your spam folder), as your request won’t be processed until you’ve confirmed your address.
3.2.2 Create a new project
You should only start a new project if you are a faculty member, senior research staff, or in some other senior position. Students should ask their advisor or course instructor to create a new project.
To start a new project, use the “Sign Up” button found on every CloudLab page. In addition to basic information about yourself, the form will ask you a few questions about how you intend to use CloudLab. The application will be reviewed by our staff, so please provide enough information for us to understand the research or educational value of your project. The review process may take a few days, and you will receive mail informing you of the outcome.
Every person working in your project needs to have their own account. You get to approve these additional users yourself (you will receive email when anyone applies to join.) It is common, for example, for a faculty member to create a project which is primarily used by his or her students, who are the ones who run experiments. We still require that the project leader be the faculty member, as we require that there is someone in a position of authority we can contact if there are questions about the activities of the project.
Note that projects in CloudLab are publicly-listed: a page that allows users to see a list of all projects and search through them does not exist yet, but it will in the future.
4 CloudLab and Repeatable Research
One of CloudLab’s key goals is to enable repeatable research—
CloudLab is designed as a scientific instrument. It gives full visibility into every aspect of the facility, and it’s designed to minimize the impact that simultaneous slices have on each other. This means that researchers using CloudLab can fully understand why their systems behave the way they do, and can have confidence that the results that they gather are not artifacts of competition for shared hardware resources. CloudLab profiles can also be published, giving other researchers the exact same environment—hardware and software—on which to repeat experiments and compare results.
CloudLab gives exclusive access to compute resources to one experiment at a time. (That experiment may involve re-exporting those resources to other users, for example, by running cloud services.) Storage resources attached to them (eg. local disk) are also used by a single experiment at a time, and it is possible to run experiments that have exclusive access to switches.
5 Creating Profiles
In CloudLab, a profile captures an entire cloud environment—
When you create a new profile, you are creating a new RSpec, and, usually, creating one or more disk images that are referenced by that RSpec. When someone uses your profile, they will get their own experiment that boots up the resources (virtual or physical) described by the RSpec. It is common to create the resource specifification for profiles using a GUI or by writing a python script rather than dealing with RSpecs directly.
5.1 Creating a profile from an existing one
The easiest way to create a new profile is by cloning or copying an existing one and customizing it to your needs. The basic steps are:
When you clone an experiment, you are taking an existing experiment, including a snapshot of the disk, and creating a new profile based on it. The new profile will be identical to the profile that experiment was based on in all other respects. Cloning only works on experiments with a single node.
If you copy a profile, you are creating a new profile that is identical in every way to an existing profile. You may or may not have a running experiment using the source profile. And if you do have a running experiment, it does not impact the copy. After copying a profile, you can then modify it for your own use. And if you instantiate the copy, you can then take snapshots of disk images and use them in future version of your copy. Any profile that you have access to may be copied.
5.1.1 Preparation and precautions
To create profiles, you need to be a registered user.
Cloning a profile can take a while, so we recommend that you extend your experiment while creating it, and contact us if you are worried your experiment might expire before you’re done creating your profile. We also strongly recommend testing your profile fully before terminating the experiment you’re creating it from.
When cloning, your home directory is not included in the disk image snapshot! You will need to install your code and data elsewhere in the image. We recommend /local/. Keep in mind that others who use your profile are going to have their own accounts, so make sure that nothing in your image makes assumptions about the username, home directory, etc. of the user running it.
When cloning, be aware the only the contents of disk (not running process, etc.) are stored as part of the profile, and as part of the creation process, your node(s) will be rebooted in order to take consistent snapshots of the disk.
For the time being, cloning only works for single-node profiles; we will add support for multi-node profiles in the future.
When copying a profile, remember that the disk images of a currently running experiment are not saved. If you want to customize the disk images using copy, you must copy the profile first, then instantiate your copy, then take snapshots of the modified disk image in your experiment.
5.1.2 Cloning a Profile
- Create an experimentCreate an experiment using the profile that is most similar to the one you want to build. Usually, this will be one of our facility-provided profiles with a generic installation of Linux.
- Set up the node the way you want itLog into the node and install your software, datasets, packages, etc. Note the caveat above that it needs to be installed somewhere outside of your home directory, and should not be tied to your user account.
- Clone the experiment to create a new profileWhile you are logged in, the experiment page for your active experiments will have a “clone” button. Clicking this button will create a new profile based on your running experiment.Specifically, the button creates a copy of the RSpec used to create the experiment, passes it to the form used to create new profiles, and helps you create a disk image from your running experiment.
- Create ProfileYou will be taken to a complete profile form and should fill it out as described below.
5.1.3 Copying a Profile
- Choose a profileFind the profile you wish to clone using the “Start Experiment” selector. Then you can click “Show Profile” to clone the profile directly or you can instantiate the profile if you wish to create an experiment first. Both profiles themselves and active experiments can be copied.
- Copy the profile or experimentWhile logged in, both your experiment page and the show profile page will have a copy button. Clicking this button will create a profile based on that profile or experiment.This button only copies the rspec or genilib script. No state in the active experiment is preserved.
- Create ProfileYou will be taken to a complete profile form and should fill it out as described below.
5.1.4 Creating the Profile
After copying or cloning a profile (see above) or selecting the menu option to create a new profile from scratch, you will need to fill out the profile creation form in order to complete the creation process.
- Fill out information for the new profileAfter clicking on the “clone” button, you will see a form that allows you to view and edit the basic information associated with your profile.Each profile must be associated with a project. If you’re a member of more than one project, you’ll need to select which one you want the profile to belong to.Make sure to edit the profile’s Description and Instructions.The “Description” is the text that users will see when your profile is listed in CloudLab, when the user is selecting which profile to use. It is also displayed when following a direct link to your profile. It should give the reader a brief description of what they will get if they create an experiment with this profile. If the profile is associated with a paper or other publication, this is a good place to mention that. Markdown markup, including hyperlinks, are allowed in the profile description.The “Instructions” text is displayed on the experiment page after the user has created an experiment using the profile. This is a good place to tell them where the code and data can be found, what scripts they might want to run, etc. Again, Markdown is allowed.The “Steps” section allows you to create a “tour” of your profile, which is displayed after a user creates an experiment with it. This feature is mostly useful if your profile contains more than one node, and you wish to explain to the user what the purpose of each node is.You have the option of making your profile usable to anyone, only registered CloudLab users, or members of your project. Regardless of the setting you chose here, CloudLab will also give you a direct link that you can use to share your profile with others of your choosing.
- Click “Create”When you click the “Create” button, your node will be rebooted, so that we can take a consistent snapshot of the disk contents. This process can take several minutes or longer, depending on the size of your disk image. You can watch the progress on this page. When the progress bar reaches the “Ready” stage, your new profile is ready! It will now show up in your “My Profiles” list.
- Test your profileBefore terminating your experiment (or letting it expire), we strongly recommend testing out the new profile. If you elected to make it publicly visible, it will be listed in the profile selection dialog on the front page of https://www.cloudlab.us/. If not, you can instantiate it from the listing in your “My Profiles” page. If the profile will be used by guest users, we recommend testing it as one yourself: log out, and instantiate it using a different username (you will also have to use an alternate email address.
- Share your profileNow that your profile is working, you can share it with others by sending them direct links, putting links on your webpage or in papers, etc. See “Sharing Profiles” for more details.
5.1.5 Updating a profile
You can update the metadata associated with a profile at any time by going to the “My Profiles” page and clicking on the name of the profile to go to the profile page. On this page, you can edit any of the text fields (Description, Instructions, etc.), change the permissions, etc.
As with cloning a profile, this snapshot feature currently only works with single-node profiles.
If you need to update the contents of the disk image in the profile, simply create a new experiment from the profile. (You will only see this button on experiments created from profiles that you own.) Once your experiment is ready, you will see a “Snapshot” button on the experiment page. Log into your node, get the disk changed the way you want, and click the button.
This button kicks off the same image creation process that occurs during cloning a profile. Once it’s finished, any new experiments created from the profile will use the new image.
As with creating a new profile, we recommend testing the profile before letting your experiment expire. If something goes wrong, we do keep one previous image file for each profile; currently, the only way to get access to this backup is to contact us.
5.2 Creating a profile with a GUI
CloudLab embeds the Jacks GUI for simple creation of small profiles. Jacks can be accessed by clicking the “topology” button on the profile creation or editing page. Jacks is designed to be simple, and to ensure that the topologies drawn can be instantiated on the hardware available. Thus, after making certain choices (such as picking an operating system image) you may find that other choices (such as the node type) become limited.
Jacks has a “palette” on the left side, giving the set of node types (such as physical or virtual machines) that are available. Dragging a node from this palette onto the larger canvas area on the right adds it to the topology. To create a link between nodes, move the mouse near the first node, and a small black line will appear. Click and drag to the second node to complete the link. To create a LAN (multi-endpoint link), create a link between two nodes, then drag links from other nodes to the small grey box that appears in the middle of the original link.
To edit the properties of a node or link, select it by clicking on its icon on the canvas. The panel on the left side will be replace by a property editor that will allow you to to set the disk image for the node, set commands to be run when the node boots, etc. To unselect the current node or link, and return to the palette on the left, simply click a blank area of the canvas.
5.3 Repository-Based Profiles
You can turn any public git repository (including those hosted on GitHub) into a CloudLab profile. Simply place a geni-lib script named profile.py into the top-level directory of your repository. When you create a new profile, you can provide the URL for your repository. The URL needs to be a http:// or https:// URL, and the CloudLab portal needs to be able to clone the repository without authentication.
Note that CloudLab is not a git hosting service; while we do keep a cache of your repository, we don’t guarantee that the profile will continue to work if the original repository becomes unavailable. We also have limits on the size of the repositories that we will clone.
When you intantiate a repository-based profile, the repository will be cloned into the directory /local/repository on all nodes in the experiment. This means that you can keep source code, startup scripts, etc. in your repository and refrence them from profile.py. CloudLab sets the pull URL for all of these clones to be your “upstream” repository, and attempts to set a suitable push URL for (it assumes that the hosting service uses ssh for pushes, and uses the git@<hostname>:user/repo convention). As a result, git pull and git push should be connected to your repository.
There is an example repository on GitHub at https://github.com/emulab/my-profile; if you don’t already have a git repository created, a good way to get started is to for this one and crate a new profile pointing at your fork.
Pushing to your repository is still govered by the authentication and permissions of your git hosting service, so others using your profile will not be able to push to your repository.
5.3.1 Updating Repository-Based Profiles
By default, the CloudLab profile does not automatically update whenever you push to your upstream repository; this means that people instantiating your profile see the repository as it existed at the time CloudLab last pulled from it.
You can manually cause CloudLab to pull from your repository using the “Update” button on the profile management page.
You can also set up CloudLab to automatically pull from your repository whever it is updating. To do so, you will need to set up a “web hook” on the service that hosts your git repository. CloudLab currently supports webhooks for GitHub.com, BitBucket.org, and sites hosted using GitLab (including both GitLab.com and self-hosted GitLab installations.) See the “push URL” in the Repository Info panel on the left side of the profile page for the webhook URL, and use the “information” icon next to it to get specific instructions for setting webhook on each service. Once you have set the webhook up, every time you push to your repository, your hosting service will let CloudLab know that it should automatically initiate a pull. (This will not be instantaneous, but should complete quickly in most cases.)
5.3.2 Branches and Tags in Repository-Based Profiles
By default, repository-based profiles will be instaniated from the master branch. At the bottom of the profile page, you will also find a list of all branches and tags in the repository, and can instantiate the version contained in any of them. Branches can be used for development work that is not yet ready to be come the master (default) version of the profile, and tags can be used to mark specific versions of the profiles that were used for specific papers or course assignments, for example.
5.4 Creating a profile from scratch
CloudLab profiles are described by GENI RSpecs. You can create a profile directly from an RSpec by using the “Create Profile” option from the “Actions” menu. Note that you cannot edit the text fields until you upload an RSpec, as these fields edit (in-browser) fields in the RSpec.
5.5 Sharing Profiles
If you chose to make your profile publicly visible, it will show up in the main
“Select Profile” list on https://www.cloudlab.us/. CloudLab also gives you direct links to
your profiles so that you can share them with others, post them on your
website, publish them in papers, etc. The link can be found on the profile’s
detail page, which is linked for your “My Profiles” page. If you chose to
make your profile accessible to anyone, the link will take the form
https://www.cloudlab.us//p/<project-id>/<profile-id>. If you didn’t make the profile
public, the URL will have the form https://www.cloudlab.us//p/<UUID>, where
UUID is a 128-bit number so that the URL is not guessable. You can still
share this URLs with anyone you want to have access to the profile—
5.6 Versioned Profiles
Profiles are versioned to capture the evolution of a profile over time. When updating profiles, the result is be a new version that does not (entirely) replace the profile being updated.
When sharing a profile, you are given two links to share. One link will take the user to the most recent version of the profile that exists at the time they click the link. This is the most appropriate option in most cases. There is also a link that takes one to a specific version of the profile. This link is most useful for publication in papers or other cases in which reproducability with the exact same environment is a concnern.
6 Basic Concepts
This chapter covers the basic concepts that you’ll need to understand in order to use CloudLab.
6.1 Profiles
A profile encapsulates everything needed to run an experiment. It consists of two main parts: a description of the resources (hardware, storage, network, etc.) needed to run the experiment, and the software artifacts that run on those resources.
The resource specification is in the RSpec format. The RSpec describes an entire topology: this includes the nodes (hosts) that the software will run on, the storage that they are attached to, and the network that connects them. The nodes may be virtual machines or physical servers. The RSpec can specify the properties of these nodes, such as how much RAM they should have, how many cores, etc., or can directly reference a specific class of hardware available in one of CloudLab’s clusters. The network topology can include point to point links, LANs, etc. and may be either built from Ethernet or Infiniband.
The primary way that software is associated with a profile are through
disk images. A disk image (often just called an
“image”) is a block-level snapshot of the contents of a real or virtual
disk—
Profiles come from two sources: some are provided by CloudLab itself; these tend to be standard installations of popular operating systems and software stacks. Profiles may also be provided by CloudLab’s users, as a way for communities to share research artifacts.
6.1.1 On-demand Profiles
Profiles in CloudLab may be on-demand profiles, which means that they are designed to be instantiated for a relatively short period of time (hours or days). Each person instantiating the profile gets their own experiment, so everyone using the profile is doing so independently on their own set of resources.
6.1.2 Persistent Profiles
CloudLab also supports persistent profiles, which are longer-lived (weeks
or months) and are set up to be shared by multiple users. A persistent profile
can be thought of as a “testbed within a testbed”—
An instance of a cloud software stack, providing VMs to a large community
A cluster set up with a specific software stack for a class
A persistent instance of a database or other resource used by a large research community
Machines set up for a contest, giving all participants access to the same hardware
An HPC cluster temporarily brought up for the running of a particular set of jobs
A persistent profile may offer its own user interface, and its users may not necessarily be aware that they are using CloudLab. For example, a cloud-style profile might directly offer its own API for provisioning virtual machines. Or, an HPC-style persistent profile might run a standard cluster scheduler, which users interact with rather than the CloudLab website.
6.2 Experiments
See the chapter on repeatability for more information on repeatable experimentation in CloudLab.
An experiment is an instantiation of a profile. An experiment uses resources, virtual or physical, on one or more of the clusters that CloudLab has access to. In most cases, the resources used by an experiment are devoted to the individual use of the user who instantiates the experiment. This means that no one else has an account, access to the filesystems, etc. In the case of experiments using solely physical machines, this also means strong performance isolation from all other CloudLab users. (The exceptions to this rule are persistent profiles, which may offer resources to many users.)
Running experiments on CloudLab consume real resources, which are limited. We ask that you be careful about not holding on to experiments when you are not actively using them. If you are are holding on to experiments because getting your working environment set up takes time, consider creating a profile.
The contents of local disk on nodes in an experiment are considered
ephemeral—
All experiments have an expiration time. By default, the expiration time is short (a few hours), but users can use the “Extend” button on the experiment page to request an extension. A request for an extension must be accompanied by a short description that explains the reason for requesting an extension, which will be reviewed by CloudLab staff. You will receive email a few hours before your experiment expires reminding you to copy your data off or request an extension.
6.2.1 Extending Experiments
If you need more time to run an experiment, you may use the “Extend” button on the experiment’s page. You will be presented with a dialog that allows you to select how much longer you need the experiment. Longer time periods require more extensive appoval processes. Short extensions are auto-approved, while longer ones require the intervention of CloudLab staff or, in the case of indefinite extensions, the steering commitee.
6.3 Projects
Users are grouped into projects. A project is, roughly speaking, a group of people working together on a common research or educational goal. This may be people in a particular research lab, a distributed set of collaborators, instructors and students in a class, etc.
A project is headed by a project leader. We require that project leaders be faculty, senior research staff, or others in an authoritative position. This is because we trust the project leader to approve other members into the project, ultimately making them responsible for the conduct of the users they approve. If CloudLab staff have questions about a project’s activities, its use of resources, etc., these questions will be directed to the project leader. Some project leaders run a lot of experiments themselves, while some choose to approve accounts for others in the project, who run most of the experiments. Either style works just fine in CloudLab.
Permissions for some operations / objects depend on the project that they belong to. Currently, the only such permission is the ability to make a profile visible onto to the owning project. We expect to introduce more project-specific permissions features in the future.
6.4 Physical Machines
Users of CloudLab may get exclusive, root-level control over physical machines. When allocated this way, no layers of virtualization or indirection get in the way of the way of performance, and users can be sure that no other users have access to the machines at the same time. This is an ideal situation for repeatable research.
Physical machines are re-imaged between users, so you can be sure that your physical machines don’t have any state left around from the previous user. You can find descriptions of the hardware in CloudLab’s clusters in the hardware chapter.
6.5 Virtual Machines and Containers
While CloudLab does have the ability to provision virtual machines (using the Xen hypervisor) and containers (using Docker), we expect that the dominant use of CloudLab is that users will provision physical machines. Users (or the cloud software stacks that they run) may build their own virtual machines on these physical nodes using whatever hypervisor they wish. However, if your experiment could still benefit from use of virtual machines or containers (e.g. to form a scalable pool of clients issuing requests to your cloud software stack), you can find more detail in the advanced topics section.
7 Resource Reservations
CloudLab supports reservations that allow you to request resources ahead of time. This can be useful for tutorials, classes, and to run larger experiments than are typically possible on a first-come, first-served basis.
Reservations in CloudLab are per-cluster and per-type. They are tied to a project: any experiment belonging to that project may used the reserved nodes. Reservations are not tied to specific nodes; this gives CloudLab maximum flexibility to do late-binding of nodes to reservations, which makes them minimally intrusive on other users of the testbed.
7.1 What Reservations Guarantee
Having a reservation guarantees that, at minimum, the specified quantity of nodes of the specified type will be available for use by the project during the specified time window.
Having a reservation does not automatically start an experiment at that time: it ensures that the specified number of nodes are available for use by any experiments that you or your fellow project members start.
More than one experiment may use nodes from the reservation; for example, a tutorial in which 40 students each will run an experiment having a single node may be handled as a single 40-node reservation. You may also start and terminate multiple experiments in series over the course of the reservation: reserved nodes will not be returned to general use until your reservation ends.
A reservation guarantees the minimum number of nodes that will be available; you may use more so long as they are not used by other experiments or reservations.
Experiments run during a reservation do not automatically terminate at the end of the reservation; they simply become subject to the normal resource usage policies, and, for example, may become non-extendable due to other reservations that start after yours.
Important caveats include:
Nodes can take several mintues to be freed and reloaded between experiments; this means that they may take a few minutes to be available at the beginning of your reservation, and if you terminate an experiment during your reservation, they may take a few minutes to be usable for your next experiment.
The reservation system cannot account for factors outside of its control such as hardware failures; this many result in occasional failures to get the full number of nodes in exceptional circumstances.
The reservation system ensures that enough nodes of the specified type are available, but does not consider other factors such as network topology, and so cannot guarantee that all possible experiments can be started, even if they fit within the number of nodes.
7.2 How Reservations May Affect You
Reservations held by others may affect your experiments in two ways: they may prevent you from creating new experiments or may prevent you from extending existing experiments. This “admission control system” is how we ensure that nodes are available for those that have them reserved.
If there is an ongoing or upcoming reservation by another project, you may encounter an “admission control” failure when trying to create a new experiment. This means that, although there are enough nodes that are not currently allocated to a particular experiment, some or all of those nodes are required in order to fulfill a reservation. Note that the admission control system assumes that your experiment will last for the full default experiment duration when making this calcuation. For example, if the default experiment duration is 24 hours, and a large reservation will start in 10 hours, your experiment may fail to be created due to the admission control system. If the large reservation starts in 30 hours, you will be able to create the experiment, but you may not be able to extend it.
Reservations can also prevent you from extending existing experiments, if that extension would cause too few nodes to be available to satisfy a reservation. A message will appear on the experiment’s status page warning you when this situation will occur in the near future, and the reservation request dialog will limit the length of reservation that you can request. If this happens, be sure to save all of your work, as the administrators cannot grant extensions that would interfere with reservations.
7.3 Making a Reservation
To request a reservation, use the “Reserve Nodes” item from the “Actions” menu.
After filling out the number of and type of nodes and the time, the check button checks to see if the reservation is possible. If your request is satisfiable, you will get a dialog box that lets you submit the request.
If your request is not satisfiable, you will be given a chance to modify the request and “check” again. In this case, the time when there will not be enough nodes is shown, as will the number of nodes by which the request exceeds availalbe resources. To make your reservation fit, try asking for a different type of nodes, a smaller number, or a time further in the future.
Not all reservation requests are automatically accepted. Your request will be shown as “pending” while it is being reviewed by the CloudLab administrators. Requesting the smaller numbers of nodes, or for shorter periods of time, will maximize the chances that youre request is accepted. Be sure to include meaningful text in the “Reason” field, as administrators will use this to determine whethr to grant your reservation.
You may have more than one reservation at a time; if you need resources of more than one type, or on different clusters, you can get this by requesting mutliple reservations.
7.4 Using a Reservation
To use a reservation, simply create experiments as normal. Experiments run during the duration of the reservation (even those begun before its start time) are automatically counted towards the reservation. Experiments run during reservations have expiration times as do normal experiments, so be sure to extend them if necessary.
Since reservations are per-project, if you belong to more than one, make sure to create the experiment under the correct project.
Experiments are not automatically terminated at the conclusion of a reservation (though it may not be possible to extend them due to other reservations). Remember to terminate your experiments when you are done with them, as you would do normally.
8 Describing a profile with python and geni-lib
geni-lib is a tool that allows users to generate RSpec files from Python code. CloudLab offers the ability to use geni-lib scripts as the definition of a profile, rather then the more primitive RSpec format. When you supply a geni-lib script on the Create Profile page, your script is uploaded to the server so that it can be executed in the geni-lib environment. This allows the script to be verified for correctness, and also produces the equivalent RSpec representation that you can view if you so desire.
When you provide a geni-lib script, you will see a slightly different set of buttons on the Create Profile page; next to the “Source” button there is an “XML” button that will pop up the RSpec XML for you to look at. The XML is read-only; if you want to change the profile, you will need to change the python source code that is displayed when you click on the “Source” button. Each time you change the python source code, the script is uploaded to the server and processed. Be sure to save your changes if you are updating an existing profile.
The following examples demonstrate basic geni-lib usage. More information about geni-lib and additional examples, can be found in the geni-lib repository. The current version of geni-lib used by CloudLab can be found in the 0.9-EMULAB branch. Its full documentation is online as part of this manual.
8.1 A single XEN VM node
"""An example of constructing a profile with a single Xen VM. Instructions: Wait for the profile instance to start, and then log in to the VM via the ssh port specified below. (Note that in this case, you will need to access the VM through a high port on the physical host, since we have not requested a public IP address for the VM itself.) """ # Import the Portal object. import geni.portal as portal import geni.rspec.pg as rspec # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Add a XenVM (named "node") to the request node = request.XenVM("node") # Write the request in RSpec format portal.context.printRequestRSpec() Open this profile on CloudLab"""An example of constructing a profile with a single Xen VM. Instructions: Wait for the profile instance to start, and then log in to the VM via the ssh port specified below. (Note that in this case, you will need to access the VM through a high port on the physical host, since we have not requested a public IP address for the VM itself.) """ # Import the Portal object. import geni.portal as portal import geni.rspec.pg as rspec # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Add a XenVM (named "node") to the request node = request.XenVM("node") # Write the request in RSpec format portal.context.printRequestRSpec()
This example demonstrates the two most important objects: the portal context (accessed through the portal.context object in the geni.portal module), and the request RSpec created by calling makeRequestRSpec() on it. These fundamental objects are central to essentially all CloudLab geni-lib profiles.
Another way to create a Request RSpec object is to call its constructuor, geni.rspec.pg.Request directly. We ask the Context to create it for us so it it is "bound" to the context and does not need to be explicitly passed to other functions on the context
Once the request object has been created, resources may be added to it by calling methods on it like RawPC() or rspec.pg.LAN. In this example, just a single node (created with the XenVM() constructor, asking for a single VM identified by the name "node") is requested.
Most functions called on Request objects are not directly members of that class. Rather, they are loaded as "extensions" by modules such as geni.rspec.emulab.
The final action the geni-lib script performs is to generate the XML representation of the request RSpec, with the printRequestRSpec() call on the last line. This has the effect of communicating the description of all the resources requested by the profile back to CloudLab.
You will also notice that the profile begins with a string literal (to be precise, it is a Python docstring). The initial text will also be used as the profile description; the text following the Instructions: line will be used as the corresponding instructions. This documentation is so important that adding the description to the profile is mandatory. (Using a docstring like this is not the only way to produce the description and instructions, although it is the most convenient.)
This simple example has now demonstrated all the important elements of a geni-lib profile. The portal context and request RSpec objects, the final printRequestRSpec() call, and the docstring description and instructions are “boilerplate” constructions, and you will probably include similar or identical versions of them in every geni-lib profile you create unless you are doing something quite unusual.
8.2 A single physical host
"""An example of constructing a profile with a single raw PC. Instructions: Wait for the profile instance to start, and then log in to the host via the ssh port specified below. """ import geni.portal as portal import geni.rspec.pg as rspec # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a raw PC node = request.RawPC("node") # Print the RSpec to the enclosing page. portal.context.printRequestRSpec() Open this profile on CloudLab"""An example of constructing a profile with a single raw PC. Instructions: Wait for the profile instance to start, and then log in to the host via the ssh port specified below. """ import geni.portal as portal import geni.rspec.pg as rspec # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a raw PC node = request.RawPC("node") # Print the RSpec to the enclosing page. portal.context.printRequestRSpec()
As mentioned above, most of these simple examples consist of boilerplate geni-lib fragments, and indeed the portal context and request RSpec operations are unchanged from the previous script. The big difference, though (other than the updated documentation) is that in this case the RawPC() method is invoked on the Request object instead of XenVM(). As you might expect, the new profile will request a physical host instead of a virtual one. (A side effect of using a real machine is that it automatically comes with a unique public IP address, where the VM used in the earlier example did not. Profiles can request public IP addresses for VMs too, though it does not happen by default.)
8.3 Two XenVM nodes with a link between them
"""An example of constructing a profile with two VMs connected by a LAN. Instructions: Wait for the profile instance to start, and then log in to either VM via the ssh ports specified below. """ import geni.portal as portal import geni.rspec.pg as rspec request = portal.context.makeRequestRSpec() # Create two XenVM nodes. node1 = request.XenVM("node1") node2 = request.XenVM("node2") # Create a link between them link1 = request.Link(members = [node1,node2]) portal.context.printRequestRSpec() Open this profile on CloudLab"""An example of constructing a profile with two VMs connected by a LAN. Instructions: Wait for the profile instance to start, and then log in to either VM via the ssh ports specified below. """ import geni.portal as portal import geni.rspec.pg as rspec request = portal.context.makeRequestRSpec() # Create two XenVM nodes. node1 = request.XenVM("node1") node2 = request.XenVM("node2") # Create a link between them link1 = request.Link(members = [node1,node2]) portal.context.printRequestRSpec()
This example demonstrates two important geni-lib concepts: first, adding more than a single node to the request (which is a relatively straightforward matter of calling more than one node object constructor, being careful to use a different name each time). It also shows how to add links between nodes. It is possible to construct links and LANs in a more complicated manner (such as explicitly creating Interface objects to control interfaces), but the simplest case is to supply the member nodes at the time the link is created.
8.4 Two ARM64 servers in a LAN
"""An example of constructing a profile with two ARM64 nodes connected by a LAN. Instructions: Wait for the profile instance to start, and then log in to either host via the ssh ports specified below. """ import geni.portal as portal import geni.rspec.pg as rspec request = portal.context.makeRequestRSpec() # Create two raw "PC" nodes node1 = request.RawPC("node1") node2 = request.RawPC("node2") # Set each of the two to specifically request "m400" nodes, which in CloudLab, are ARM node1.hardware_type = "m400" node2.hardware_type = "m400" # Create a link between them link1 = request.Link(members = [node1, node2]) portal.context.printRequestRSpec() Open this profile on CloudLab"""An example of constructing a profile with two ARM64 nodes connected by a LAN. Instructions: Wait for the profile instance to start, and then log in to either host via the ssh ports specified below. """ import geni.portal as portal import geni.rspec.pg as rspec request = portal.context.makeRequestRSpec() # Create two raw "PC" nodes node1 = request.RawPC("node1") node2 = request.RawPC("node2") # Set each of the two to specifically request "m400" nodes, which in CloudLab, are ARM node1.hardware_type = "m400" node2.hardware_type = "m400" # Create a link between them link1 = request.Link(members = [node1, node2]) portal.context.printRequestRSpec()
We now come to demonstrate requesting particular properties of nodes—
8.5 A VM with a custom size
"""An example of constructing a profile with a single Xen VM. Instructions: Wait for the profile instance to start, and then log in to the VM via the ssh port specified below. (Note that in this case, you will need to access the VM through a high port on the physical host, since we have not requested a public IP address for the VM itself.) """ import geni.portal as portal import geni.rspec.pg as rspec # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a XenVM node = request.XenVM("node") # Ask for two cores node.cores = 2 # Ask for 2GB of ram node.ram = 2048 # Add an extra 8GB of space on the primary disk. # NOTE: Use fdisk, the extra space is in the 4th DOS partition, # you will need to create a filesystem and mount it. node.disk = 8 # Alternate method; request an ephemeral blockstore mounted at /mydata. # NOTE: Comment out the above line (node.disk) if you do it this way. #bs = node.Blockstore("bs", "/mydata") #bs.size = "8GB" #bs.placement = "nonsysvol" # Print the RSpec to the enclosing page. portal.context.printRequestRSpec() Open this profile on CloudLab"""An example of constructing a profile with a single Xen VM. Instructions: Wait for the profile instance to start, and then log in to the VM via the ssh port specified below. (Note that in this case, you will need to access the VM through a high port on the physical host, since we have not requested a public IP address for the VM itself.) """ import geni.portal as portal import geni.rspec.pg as rspec # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a XenVM node = request.XenVM("node") # Ask for two cores node.cores = 2 # Ask for 2GB of ram node.ram = 2048 # Add an extra 8GB of space on the primary disk. # NOTE: Use fdisk, the extra space is in the 4th DOS partition, # you will need to create a filesystem and mount it. node.disk = 8 # Alternate method; request an ephemeral blockstore mounted at /mydata. # NOTE: Comment out the above line (node.disk) if you do it this way. #bs = node.Blockstore("bs", "/mydata") #bs.size = "8GB" #bs.placement = "nonsysvol" # Print the RSpec to the enclosing page. portal.context.printRequestRSpec()
The earlier examples requesting VMs used the default number of cores, quantity of RAM, and disk size. It’s also possible to customize these value, as this example does by setting the cores, ram, and disk properties of the XenVM class (which is a subclass of rspec.pg.Node.)
8.6 Set a specific IP address on each node
"""An example of constructing a profile with node IP addresses specified manually. Instructions: Wait for the profile instance to start, and then log in to either VM via the ssh ports specified below. (Note that even though the EXPERIMENTAL data plane interfaces will use the addresses given in the profile, you will still connect over the control plane interfaces using addresses given by the testbed. The data plane addresses are for intra-experiment communication only.) """ import geni.portal as portal import geni.rspec.pg as rspec request = portal.context.makeRequestRSpec() node1 = request.XenVM("node1") iface1 = node1.addInterface("if1") # Specify the component id and the IPv4 address iface1.component_id = "eth1" iface1.addAddress(rspec.IPv4Address("192.168.1.1", "255.255.255.0")) node2 = request.XenVM("node2") iface2 = node2.addInterface("if2") # Specify the component id and the IPv4 address iface2.component_id = "eth2" iface2.addAddress(rspec.IPv4Address("192.168.1.2", "255.255.255.0")) link = request.LAN("lan") link.addInterface(iface1) link.addInterface(iface2) portal.context.printRequestRSpec() Open this profile on CloudLab"""An example of constructing a profile with node IP addresses specified manually. Instructions: Wait for the profile instance to start, and then log in to either VM via the ssh ports specified below. (Note that even though the EXPERIMENTAL data plane interfaces will use the addresses given in the profile, you will still connect over the control plane interfaces using addresses given by the testbed. The data plane addresses are for intra-experiment communication only.) """ import geni.portal as portal import geni.rspec.pg as rspec request = portal.context.makeRequestRSpec() node1 = request.XenVM("node1") iface1 = node1.addInterface("if1") # Specify the component id and the IPv4 address iface1.component_id = "eth1" iface1.addAddress(rspec.IPv4Address("192.168.1.1", "255.255.255.0")) node2 = request.XenVM("node2") iface2 = node2.addInterface("if2") # Specify the component id and the IPv4 address iface2.component_id = "eth2" iface2.addAddress(rspec.IPv4Address("192.168.1.2", "255.255.255.0")) link = request.LAN("lan") link.addInterface(iface1) link.addInterface(iface2) portal.context.printRequestRSpec()
This code sample assigns specific IP addresses to interfaces on the nodes it requests.
Some of the available qualifiers on requested nodes are specified by manipulating attributes within the node (or interface) object directly. The hardware_type in the previous example is one such case, as is the component_id here. (Note that the component_id in this example is applied to an interface, although it is also possible to specify component_ids on nodes, too, to request a particular physical host.)
Other modifications to requests require dedicated methods. For instance, see the addAddress() calls made on each of the two interfaces above. In each case, an IPv4Address object is obtained from the appropriate constructor (the parameters are the address and the netmask, respectively), and then added to the corresponding interface.
8.7 Specify an operating system and set install and execute scripts
"""An example of constructing a profile with install and execute services. Instructions: Wait for the profile instance to start, then click on the node in the topology and choose the `shell` menu item. The install and execute services are handled automatically during profile instantiation, with no manual intervention required. """ # Import the Portal object. import geni.portal as portal # Import the ProtoGENI library. import geni.rspec.pg as rspec # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Add a raw PC to the request. node = request.RawPC("node") # Install and execute scripts on the node. THIS TAR FILE DOES NOT ACTUALLY EXIST! node.addService(rspec.Install(url="http://example.org/sample.tar.gz", path="/local")) node.addService(rspec.Execute(shell="bash", command="/local/example.sh")) portal.context.printRequestRSpec()Open this profile on CloudLab"""An example of constructing a profile with install and execute services. Instructions: Wait for the profile instance to start, then click on the node in the topology and choose the `shell` menu item. The install and execute services are handled automatically during profile instantiation, with no manual intervention required. """ # Import the Portal object. import geni.portal as portal # Import the ProtoGENI library. import geni.rspec.pg as rspec # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Add a raw PC to the request. node = request.RawPC("node") # Install and execute scripts on the node. THIS TAR FILE DOES NOT ACTUALLY EXIST! node.addService(rspec.Install(url="http://example.org/sample.tar.gz", path="/local")) node.addService(rspec.Execute(shell="bash", command="/local/example.sh")) portal.context.printRequestRSpec()
This example demonstrates how to request services for a node, where CloudLab will automate some task as part of the profile instance setup procedure. In this case, two services are described (an install and an execute). This is a very common pair of services to request together: the Install object describes a service which retrieves a tarball from the location given in the url parameter, and installs it into the local filesystem as specified by path. (The installation occurs during node setup, upon the first boot after the disk image has been loaded.) The second service, described by the Execute object, invokes a shell process to run the given command. In this example (as is common), the command refers directly to a file saved by the immediately preceding Install service. This behaviour works, because CloudLab guarantees that all Install services complete before any Execute services are started. The command executes every time the node boots, so you can use it start daemons, etc. that are necessary for your experiment.
8.8 Profiles with user-specified parameters
"""An example of using parameters to construct a profile with a variable number of nodes. Instructions: Wait for the profile instance to start, and then log in to one or more of the VMs via the ssh port(s) specified below. """ # Import the Portal object. import geni.portal as portal # Import the ProtoGENI library. import geni.rspec.pg as rspec # Describe the parameter(s) this profile script can accept. portal.context.defineParameter( "n", "Number of VMs", portal.ParameterType.INTEGER, 1 ) # Retrieve the values the user specifies during instantiation. params = portal.context.bindParameters() # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Check parameter validity. if params.n < 1 or params.n > 8: portal.context.reportError( portal.ParameterError( "You must choose at least 1 and no more than 8 VMs." ) ) for i in range( params.n ): # Create a XenVM and add it to the RSpec. node = request.XenVM( "node" + str( i ) ) # Print the RSpec to the enclosing page. portal.context.printRequestRSpec() Open this profile on CloudLab"""An example of using parameters to construct a profile with a variable number of nodes. Instructions: Wait for the profile instance to start, and then log in to one or more of the VMs via the ssh port(s) specified below. """ # Import the Portal object. import geni.portal as portal # Import the ProtoGENI library. import geni.rspec.pg as rspec # Describe the parameter(s) this profile script can accept. portal.context.defineParameter( "n", "Number of VMs", portal.ParameterType.INTEGER, 1 ) # Retrieve the values the user specifies during instantiation. params = portal.context.bindParameters() # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Check parameter validity. if params.n < 1 or params.n > 8: portal.context.reportError( portal.ParameterError( "You must choose at least 1 and no more than 8 VMs." ) ) for i in range( params.n ): # Create a XenVM and add it to the RSpec. node = request.XenVM( "node" + str( i ) ) # Print the RSpec to the enclosing page. portal.context.printRequestRSpec()
Until now, all of the geni-lib scripts have described profiles which could also have been generated with the Jacks GUI, or even by writing a raw XML RSpec directly. However, geni-lib profiles offer an important feature unavailable by the other methods: the ability to describe not a static request, but a request “template” which is dynamically constructed based on a user’s choices at the time the profile is instantiated. The mechanism for constructing such profiles relies on profile parameters; the geni-lib script describes the set of parameters it will accept, and then retrieves the corresponding values at instantiation time and is free to respond by constructing arbitrarily different resource requests based on that input.
The profile above accepts exactly one parameter—
| Simple integer | |
| Arbitrary (uninterpreted) string | |
| True or False | |
| URN to a disk image | |
| URN of a GENI Aggregate Manager | |
| String specifying a type of node | |
| Floating-point number specifying bandwidth in kbps | |
| Floating-point number specifying delay in ms | |
| Integer used for memory or disk size (e.g., MB, GB, etc.) |
The last field is the default value of the parameter, and is required: not only must the field itself contain a valid value, but the set of all parameters must be valid when each of them assumes the default value. (This is partly so that the portal can construct a default topology for the profile without any manual intervention, and partly so that unprivileged users, who may lack permission to supply their own values, might still be able to instantiate the profile.)
After all parameters have been defined, the profile script may retrieve the runtime values with the bindParameters() method. This will return a Python class instance with one attribute for each parameter (with the name supplied during the appropriate defineParameter() call). In the example, the instance was assigned to params, and therefore the only parameter (which was called "n") is accessible as params.n.
Of course, it may be possible for the user to specify nonsensical values for a parameter, or perhaps give a set of parameters whose combination is invalid. A profile should detect error cases like these, and respond by constructing a portal.ParameterError object, which can be passed to the portal context’s reportError() method to abort generation of the RSpec.
8.9 Add temporary local disk space to a node
"""This profile demonstrates how to add some extra *local* disk space on your node. In general nodes have much more disk space then what you see with `df` when you log in. That extra space is in unallocated partitions or additional disk drives. An *ephemeral blockstore* is how you ask for some of that space to be allocated and mounted as a **temporary** filesystem (temporary means it will be lost when you terminate your experiment). Instructions: Log into your node, your **temporary** file system in mounted at `/mydata`. """ # Import the Portal object. import geni.portal as portal # Import the ProtoGENI library. import geni.rspec.pg as rspec # Import the emulab extensions library. import geni.rspec.emulab # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Allocate a node and ask for a 30GB file system mounted at /mydata node = request.RawPC("node") bs = node.Blockstore("bs", "/mydata") bs.size = "30GB" # Print the RSpec to the enclosing page. portal.context.printRequestRSpec() Open this profile on CloudLab"""This profile demonstrates how to add some extra *local* disk space on your node. In general nodes have much more disk space then what you see with `df` when you log in. That extra space is in unallocated partitions or additional disk drives. An *ephemeral blockstore* is how you ask for some of that space to be allocated and mounted as a **temporary** filesystem (temporary means it will be lost when you terminate your experiment). Instructions: Log into your node, your **temporary** file system in mounted at `/mydata`. """ # Import the Portal object. import geni.portal as portal # Import the ProtoGENI library. import geni.rspec.pg as rspec # Import the emulab extensions library. import geni.rspec.emulab # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Allocate a node and ask for a 30GB file system mounted at /mydata node = request.RawPC("node") bs = node.Blockstore("bs", "/mydata") bs.size = "30GB" # Print the RSpec to the enclosing page. portal.context.printRequestRSpec()
This example demonstrates how to request extra temporary diskspace on a node. The extra disk space is allocated from unused disk partitions, and is mounted at a directory of your choosing. In the example code above, we are asking for a 30GB file system mounted at "/mydata". Anything you store in this file system is temporary, and will be lost when your experiment is terminated.
The total size of the file systems you can ask for on a node, is obviously limited to the amount of unused disk space available. The system does its best to find nodes with enough space to fulfill the request, but in general you are limited to temporary file systems in the 10s of, or a few hundred GB.
8.10 Creating a reusable dataset
"""This profile demonstrates how to use a remote dataset on your node, either a long term dataset or a short term dataset, created via the Portal. Instructions: Log into your node, your dataset file system in mounted at `/mydata`. """ # Import the Portal object. import geni.portal as portal # Import the ProtoGENI library. import geni.rspec.pg as rspec # Import the emulab extensions library. import geni.rspec.emulab # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Add a node to the request. node = request.RawPC("node") # We need a link to talk to the remote file system, so make an interface. iface = node.addInterface() # The remote file system is represented by special node. fsnode = request.RemoteBlockstore("fsnode", "/mydata") # This URN is displayed in the web interfaace for your dataset. fsnode.dataset = "urn:publicid:IDN+emulab.net:portalprofiles+ltdataset+DemoDataset" # # The "rwclone" attribute allows you to map a writable copy of the # indicated SAN-based dataset. In this way, multiple nodes can map # the same dataset simultaneously. In many situations, this is more # useful than a "readonly" mapping. For example, a dataset # containing a Linux source tree could be mapped into multiple # nodes, each of which could do its own independent, # non-conflicting configure and build in their respective copies. # Currently, rwclones are "ephemeral" in that any changes made are # lost when the experiment mapping the clone is terminated. # #fsnode.rwclone = True # # The "readonly" attribute, like the rwclone attribute, allows you to # map a dataset onto multiple nodes simultaneously. But with readonly, # those mappings will only allow read access (duh!) and any filesystem # (/mydata in this example) will thus be mounted read-only. Currently, # readonly mappings are implemented as clones that are exported # allowing just read access, so there are minimal efficiency reasons to # use a readonly mapping rather than a clone. The main reason to use a # readonly mapping is to avoid a situation in which you forget that # changes to a clone dataset are ephemeral, and then lose some # important changes when you terminate the experiment. # #fsnode.readonly = True # Now we add the link between the node and the special node fslink = request.Link("fslink") fslink.addInterface(iface) fslink.addInterface(fsnode.interface) # Special attributes for this link that we must use. fslink.best_effort = True fslink.vlan_tagging = True # Print the RSpec to the enclosing page. portal.context.printRequestRSpec()Open this profile on CloudLab"""This profile demonstrates how to use a remote dataset on your node, either a long term dataset or a short term dataset, created via the Portal. Instructions: Log into your node, your dataset file system in mounted at `/mydata`. """ # Import the Portal object. import geni.portal as portal # Import the ProtoGENI library. import geni.rspec.pg as rspec # Import the emulab extensions library. import geni.rspec.emulab # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Add a node to the request. node = request.RawPC("node") # We need a link to talk to the remote file system, so make an interface. iface = node.addInterface() # The remote file system is represented by special node. fsnode = request.RemoteBlockstore("fsnode", "/mydata") # This URN is displayed in the web interfaace for your dataset. fsnode.dataset = "urn:publicid:IDN+emulab.net:portalprofiles+ltdataset+DemoDataset" # # The "rwclone" attribute allows you to map a writable copy of the # indicated SAN-based dataset. In this way, multiple nodes can map # the same dataset simultaneously. In many situations, this is more # useful than a "readonly" mapping. For example, a dataset # containing a Linux source tree could be mapped into multiple # nodes, each of which could do its own independent, # non-conflicting configure and build in their respective copies. # Currently, rwclones are "ephemeral" in that any changes made are # lost when the experiment mapping the clone is terminated. # #fsnode.rwclone = True # # The "readonly" attribute, like the rwclone attribute, allows you to # map a dataset onto multiple nodes simultaneously. But with readonly, # those mappings will only allow read access (duh!) and any filesystem # (/mydata in this example) will thus be mounted read-only. Currently, # readonly mappings are implemented as clones that are exported # allowing just read access, so there are minimal efficiency reasons to # use a readonly mapping rather than a clone. The main reason to use a # readonly mapping is to avoid a situation in which you forget that # changes to a clone dataset are ephemeral, and then lose some # important changes when you terminate the experiment. # #fsnode.readonly = True # Now we add the link between the node and the special node fslink = request.Link("fslink") fslink.addInterface(iface) fslink.addInterface(fsnode.interface) # Special attributes for this link that we must use. fslink.best_effort = True fslink.vlan_tagging = True # Print the RSpec to the enclosing page. portal.context.printRequestRSpec()
In this example, we demonstrate how to create and use a dataset. A dataset is simply a snapshot of a temporary file system (see the previous example) that has been saved to permanent storage, and reloaded on a node (or nodes) in a different experiment. This type of dataset must be explicitly saved (more on this below) in order to make changes permanent (and available later). In the example code above, the temporary file system will be loaded with the dataset specified by the URN.
But before you can use a dataset, you first have to create one using the following steps:
- Create an experimentCreate an experiment using the local diskspace example above.
- Add your dataPopulate the file system mounted at /mydata with the data you wish to use in other experiments.
- Fill out the Create Dataset formClick on the "Create Dataset" option in the Actions menu. This will bring up the form to create a new dataset. Choose a name for your dataset and optionally the project the dataset should be associated with. Be sure to select Image Backed for the type. Then choose the the experiment, which node in the experiment, and which blockstore on the node.
- Click “CreateWhen you click the “Create” button, the file system on your node will be unmounted so that we can take a consistent snapshot of the contents. This process can take several minutes or longer, depending on the size of the file system. You can watch the progress on this page. When the progress bar reaches the “Ready” stage, your new dataset is ready! It will now show up in your “List Datasets” list, and can be used in new experiments.
- Use your datasetTo use your new dataset, you will need to reference it in your geni lib script (see the example code above). The name of your dataset is a URN, and can be found on the information page for the dataset. From the Actions menu, click on "List Datasets", find the name of your dataset in the list, and click on it.
- Update your datasetIf you need to make changes to your dataset, simply start an experiment that uses your dataset. Make the changes you need to the file system mounted at /mydata, and then use the "Modify" button as shown in the previous step.
8.11 Debugging geni-lib profile scripts
It is not necessary to instantiate the profile via the portal web interface
to test it. Properly written profile scripts should work perfectly well
independent of the normal portal—
python geni-lib-parameters.py
will produce an RSpec containing three nodes (the default value for n). It is also possible to override the defaults on the command line by giving the parameter name as an option, followed by the desired value:
python geni-lib-parameters.py –n 4
The option –help will list the available parameters and their descriptions.
9 Virtual Machines and Containers
A CloudLab virtual node is a virtual machine or container running on top of a regular operating system. CloudLab virtual nodes are based on the Xen hypervisor or on Docker containers. Both types of virtualization allow groups of processes to be isolated from each other while running on the same physical machine. CloudLab virtual nodes provide isolation of the filesystem, process, network, and account namespaces. Thus, each virtual node has its own private filesystem, process hierarchy, network interfaces and IP addresses, and set of users and groups. This level of virtualization allows unmodified applications to run as though they were on a real machine. Virtual network interfaces support an arbitrary number of virtual network links. These links may be individually shaped according to user-specified link parameters, and may be multiplexed over physical links or used to connect to virtual nodes within a single physical node.
There are a few specific differences between virtual and physical nodes. First, CloudLab physical nodes have a routable, public IPv4 address allowing direct remote access (unless the CloudLab installation has been configured to use unroutable control network IP addresses, which is very rare). However, virtual nodes are assigned control network IP addresses on a private network (typically the 172.16/12 subnet) and are remotely accessible over ssh via DNAT (destination network-address translation) to the physical host’s public control network IP address, to a high-numbered port. Depending on local configuration, it may be possible to request routable IP addresses for specific virtual nodes to enable direct remote access. Note that virtual nodes are always able to access the public Internet via SNAT (source network-address translation; nearly identical to masquerading).
Second, virtual nodes and their virtual network interfaces are connected by virtual links built atop physical links and physical interfaces. The virtualization of a physical device/link decreases the fidelity of the network emulation. Moreover, several virtual links may share the same physical links via multiplexing. Individual links are isolated at layer 2, but they are not isolated in terms of performance. If you request a specific bandwidth for a given set of links, our resource mapper will ensure that if multiple virtual links are mapped to a single physical link, the sum of the bandwidths of the virtual links will not exceed the capacity of the physical link (unless you also specify that this constraint can be ignored by setting the best_effort link parameter to True). For example, no more than ten 1Gbps virtual links can be mapped to a 10Gbps physical link.
Finally, when you allocate virtual nodes, you can specify the amount of CPU and RAM (and, for Xen VMs, virtual disk space) each node will be allocated. CloudLab’s resource assigner will not oversubscribe these quantities.
9.1 Xen VMs
These examples show the basics of allocating Xen VMs: a single Xen VM node, two Xen VMs in a LAN, a Xen VM with custom disk size. In the sections below, we discuss advanced Xen VM allocation features.
9.1.1 Controlling CPU and Memory
You can control the number of cores and the amount of memory allocated to each VM by setting the cores and ram instance variables of a XenVM object, as shown in the following example:
"""An example of constructing a profile with a single Xen VM. Instructions: Wait for the profile instance to start, and then log in to the VM via the ssh port specified below. (Note that in this case, you will need to access the VM through a high port on the physical host, since we have not requested a public IP address for the VM itself.) """ import geni.portal as portal import geni.rspec.pg as rspec # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a XenVM node = request.XenVM("node") # Request a specific number of VCPUs. node.cores = 4 # Request a specific amount of memory (in GB). node.ram = 4096 # Print the RSpec to the enclosing page. portal.context.printRequestRSpec() """An example of constructing a profile with a single Xen VM. Instructions: Wait for the profile instance to start, and then log in to the VM via the ssh port specified below. (Note that in this case, you will need to access the VM through a high port on the physical host, since we have not requested a public IP address for the VM itself.) """ import geni.portal as portal import geni.rspec.pg as rspec # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a XenVM node = request.XenVM("node") # Request a specific number of VCPUs. node.cores = 4 # Request a specific amount of memory (in GB). node.ram = 4096 # Print the RSpec to the enclosing page. portal.context.printRequestRSpec()
9.1.2 Controlling Disk Space
Each Xen VM is given enough disk space to hold the requested image. Most CloudLab images are built with a 16 GB root partition, typically with about 25% of the disk space used by the operating system. If the remaining space is not enough for your needs, you can request additional disk space by setting a XEN_EXTRAFS node attribute, as shown in the following example.
"""An example of constructing a profile with a single Xen VM with extra fs space. Instructions: Wait for the profile instance to start, and then log in to the VM via the ssh port specified below. (Note that in this case, you will need to access the VM through a high port on the physical host, since we have not requested a public IP address for the VM itself.) """ import geni.portal as portal import geni.rspec.pg as rspec # Import Emulab-specific extensions so we can set node attributes. import geni.rspec.emulab as emulab # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a XenVM node = request.XenVM("node") # Set the XEN_EXTRAFS to request 8GB of extra space in the 4th partition. node.Attribute('XEN_EXTRAFS','8') # Print the RSpec to the enclosing page. portal.context.printRequestRSpec() """An example of constructing a profile with a single Xen VM with extra fs space. Instructions: Wait for the profile instance to start, and then log in to the VM via the ssh port specified below. (Note that in this case, you will need to access the VM through a high port on the physical host, since we have not requested a public IP address for the VM itself.) """ import geni.portal as portal import geni.rspec.pg as rspec # Import Emulab-specific extensions so we can set node attributes. import geni.rspec.emulab as emulab # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a XenVM node = request.XenVM("node") # Set the XEN_EXTRAFS to request 8GB of extra space in the 4th partition. node.Attribute('XEN_EXTRAFS','8') # Print the RSpec to the enclosing page. portal.context.printRequestRSpec()
This attribute’s unit is in GB. As with CloudLab physical nodes, the extra disk space will appear in the fourth partition of your VM’s disk. You can turn this extra space into a usable file system by logging into your VM and doing:
mynode> sudo mkdir /dirname mynode> sudo /usr/local/etc/emulab/mkextrafs.pl /dirname
where dirname is the directory you want your newly-formatted file system to be mounted.
"""An example of constructing a profile with a single Xen VM. Instructions: Wait for the profile instance to start, and then log in to the VM via the ssh port specified below. (Note that in this case, you will need to access the VM through a high port on the physical host, since we have not requested a public IP address for the VM itself.) """ import geni.portal as portal import geni.rspec.pg as rspec # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a XenVM node = request.XenVM("node") # Request a specific number of VCPUs. node.cores = 4 # Request a specific amount of memory (in GB). node.ram = 4096 # Print the RSpec to the enclosing page. portal.context.printRequestRSpec() """An example of constructing a profile with a single Xen VM. Instructions: Wait for the profile instance to start, and then log in to the VM via the ssh port specified below. (Note that in this case, you will need to access the VM through a high port on the physical host, since we have not requested a public IP address for the VM itself.) """ import geni.portal as portal import geni.rspec.pg as rspec # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a XenVM node = request.XenVM("node") # Request a specific number of VCPUs. node.cores = 4 # Request a specific amount of memory (in GB). node.ram = 4096 # Print the RSpec to the enclosing page. portal.context.printRequestRSpec()
9.1.3 Setting HVM Mode
By default, all Xen VMs are paravirtualized. If you need hardware virtualization instead, you must set a XEN_FORCE_HVM node attribute, as shown in this example:
"""An example of constructing a profile with a single Xen VM in HVM mode. Instructions: Wait for the profile instance to start, and then log in to the VM via the ssh port specified below. (Note that in this case, you will need to access the VM through a high port on the physical host, since we have not requested a public IP address for the VM itself.) """ import geni.portal as portal import geni.rspec.pg as rspec # Import Emulab-specific extensions so we can set node attributes. import geni.rspec.emulab as emulab # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a XenVM node = request.XenVM("node") # Set the XEN_FORCE_HVM custom node attribute to 1 to enable HVM mode: node.Attribute('XEN_FORCE_HVM','1') # Print the RSpec to the enclosing page. portal.context.printRequestRSpec() """An example of constructing a profile with a single Xen VM in HVM mode. Instructions: Wait for the profile instance to start, and then log in to the VM via the ssh port specified below. (Note that in this case, you will need to access the VM through a high port on the physical host, since we have not requested a public IP address for the VM itself.) """ import geni.portal as portal import geni.rspec.pg as rspec # Import Emulab-specific extensions so we can set node attributes. import geni.rspec.emulab as emulab # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a XenVM node = request.XenVM("node") # Set the XEN_FORCE_HVM custom node attribute to 1 to enable HVM mode: node.Attribute('XEN_FORCE_HVM','1') # Print the RSpec to the enclosing page. portal.context.printRequestRSpec()
You can set this attribute only for dedicated-mode VMs. Shared VMs are available only in paravirtualized mode.
9.2 Docker Containers
CloudLab supports experiments that use Docker containers as virtual nodes. In this section, we first describe how to build simple profiles that create Docker containers, and then demonstrate more advanced features. The CloudLab-Docker container integration has been designed to enable easy image onboarding, and to allow users to continue to work naturally with the standard Docker API or CLI. However, because CloudLab is itself an orchestration engine, it does not support any of the Docker orchestration tools or platforms, such as Docker Swarm.
You can request a CloudLab Docker container in a geni-lib script like this:
import geni.portal as portal import geni.rspec.pg as rspec request = portal.context.makeRequestRSpec() node = request.DockerContainer("node")
You can use the returned node object (a DockerContainer instance) similarly to other kinds of node objects, like RawPC or XenVM. However, Docker nodes have several custom member variables you can set to control their behavior and Docker-specific features. We demonstrate the usage of these member variables in the following subsections and summarize them at the end of this section.
9.2.1 Basic Examples
"""An example of constructing a profile with a single Docker container. Instructions: Wait for the profile instance to start, and then log in to the container via the ssh port specified below. By default, your container will run a standard Ubuntu image with the Emulab software preinstalled. """ import geni.portal as portal import geni.rspec.pg as rspec # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a Docker container. node = request.DockerContainer("node") # Request a container hosted on a shared container host; you will not # have access to the underlying physical host, and your container will # not be privileged. Note that if there are no shared hosts available, # your experiment will be assigned a physical machine to host your container. node.exclusive = True # Print the RSpec to the enclosing page. portal.context.printRequestRSpec() """An example of constructing a profile with a single Docker container. Instructions: Wait for the profile instance to start, and then log in to the container via the ssh port specified below. By default, your container will run a standard Ubuntu image with the Emulab software preinstalled. """ import geni.portal as portal import geni.rspec.pg as rspec # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a Docker container. node = request.DockerContainer("node") # Request a container hosted on a shared container host; you will not # have access to the underlying physical host, and your container will # not be privileged. Note that if there are no shared hosts available, # your experiment will be assigned a physical machine to host your container. node.exclusive = True # Print the RSpec to the enclosing page. portal.context.printRequestRSpec()
It is easy to extend this profile slightly to allocate 10 containers in a LAN, and to switch them to dedicated mode. Note that in this case, the exclusive member variable is not specified, and it defaults to False):
"""An example of constructing a profile with ten Docker containers in a LAN. Instructions: Wait for the profile instance to start, and then log in to the container via the ssh port specified below. By default, your container will run a standard Ubuntu image with the Emulab software preinstalled. """ import geni.portal as portal import geni.rspec.pg as rspec # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a LAN to put containers into. lan = request.LAN("lan") # Create ten Docker containers. for i in range(0,10): node = request.DockerContainer("node-%d" % (i)) # Create an interface. iface = node.addInterface("if1") # Add the interface to the LAN. lan.addInterface(iface) # Print the RSpec to the enclosing page. portal.context.printRequestRSpec() """An example of constructing a profile with ten Docker containers in a LAN. Instructions: Wait for the profile instance to start, and then log in to the container via the ssh port specified below. By default, your container will run a standard Ubuntu image with the Emulab software preinstalled. """ import geni.portal as portal import geni.rspec.pg as rspec # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a LAN to put containers into. lan = request.LAN("lan") # Create ten Docker containers. for i in range(0,10): node = request.DockerContainer("node-%d" % (i)) # Create an interface. iface = node.addInterface("if1") # Add the interface to the LAN. lan.addInterface(iface) # Print the RSpec to the enclosing page. portal.context.printRequestRSpec()
Here is a more complex profile that creates 20 containers, binds 10 of them to a physical host machine of a particular type, and binds the other 10 to a second machine of the same type:
"""An example of constructing a profile with 20 Docker containers in a LAN, divided across two container hosts. Instructions: Wait for the profile instance to start, and then log in to the container via the ssh port specified below. By default, your container will run a standard Ubuntu image with the Emulab software preinstalled. """ import geni.portal as portal import geni.rspec.pg as rspec # Import the Emulab specific extensions. import geni.rspec.emulab as emulab # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a LAN to put containers into. lan = request.LAN("lan") # Create two container hosts, each with ten Docker containers. for j in range(0,2): # Create a container host. host = request.RawPC("host-%d" % (j)) # Select a specific hardware type for the container host. host.hardware_type = "d430" for i in range(0,10): # Create a container. node = request.DockerContainer("node-%d-%d" % (j,i)) # Create an interface. iface = node.addInterface("if1") # Add the interface to the LAN. lan.addInterface(iface) # Set this container to be instantiated on the host created in # the outer loop. node.InstantiateOn(host.client_id) # Print the RSpec to the enclosing page. portal.context.printRequestRSpec() """An example of constructing a profile with 20 Docker containers in a LAN, divided across two container hosts. Instructions: Wait for the profile instance to start, and then log in to the container via the ssh port specified below. By default, your container will run a standard Ubuntu image with the Emulab software preinstalled. """ import geni.portal as portal import geni.rspec.pg as rspec # Import the Emulab specific extensions. import geni.rspec.emulab as emulab # Create a Request object to start building the RSpec. request = portal.context.makeRequestRSpec() # Create a LAN to put containers into. lan = request.LAN("lan") # Create two container hosts, each with ten Docker containers. for j in range(0,2): # Create a container host. host = request.RawPC("host-%d" % (j)) # Select a specific hardware type for the container host. host.hardware_type = "d430" for i in range(0,10): # Create a container. node = request.DockerContainer("node-%d-%d" % (j,i)) # Create an interface. iface = node.addInterface("if1") # Add the interface to the LAN. lan.addInterface(iface) # Set this container to be instantiated on the host created in # the outer loop. node.InstantiateOn(host.client_id) # Print the RSpec to the enclosing page. portal.context.printRequestRSpec()
9.2.2 Disk Images
Docker containers use a different disk image format than CloudLab physical machines or Xen virtual machines, which means that you cannot use the same images on both a container and a raw PC. However, CloudLab supports native Docker images in several modes and workflows. CloudLab hosts a private Docker registry, and the standard CloudLab image-deployment and -capture mechanisms support capturing container disk images into it. CloudLab also supports the use of externally hosted, unmodified Docker images and Dockerfiles for image onboarding and dynamic image creation. Finally, since some CloudLab features require in-container support (e.g., user accounts, SSH pubkeys, syslog, scripted program execution), we also provide an optional automated process, called augmentation, through which an external image can be customized with the CloudLab software and dependencies.
CloudLab supports both augmented and unmodified Docker images, but some features require augmentation (e.g., that the CloudLab client-side software is installed and running in the container). Unmodified images support these CloudLab features: network links, link shaping, remote access, remote storage (e.g. remote block stores), and image capture. Unmodified images do not support user accounts, SSH pubkeys, or scripted program execution.
CloudLab’s disk image naming and versioning scheme is slightly different from Docker’s content-addressable model. A CloudLab disk image is identified by a project and name tuple (typically encoded as a URN), or by a UUID, as well as a version number that starts at 0. Each time you capture a new version of an image, the image’s version number is incremented by one. CloudLab does not support the use of arbitrary alphanumeric tags to identify image versions, as Docker does.
Thus, when you capture an CloudLab disk image of a CloudLab Docker container,
and give it a name, the local CloudLab registry will contain an image
(repository) of that name, in the project (and group—
docker pull ops.emulab.net:5080/myproject/myproject/docker-my-research:2
You will be prompted for username and password; use your CloudLab credentials.
The following code fragment creates a Docker container that uses a standard CloudLab Docker disk image, docker-ubuntu16-std. This image is based on the ubuntu:16.04 Docker image, with the Emulab client software installed (meaning it is augmented) along with dependencies and other utilities.
"""An example of a Docker container running a standard, augmented system image.""" import geni.portal as portal import geni.rspec.pg as rspec request = portal.context.makeRequestRSpec() node = request.DockerContainer("node") node.disk_image = "urn:publicid:IDN+emulab.net+image+emulab-ops//docker-ubuntu16-std" portal.context.printRequestRSpec() """An example of a Docker container running a standard, augmented system image.""" import geni.portal as portal import geni.rspec.pg as rspec request = portal.context.makeRequestRSpec() node = request.DockerContainer("node") node.disk_image = "urn:publicid:IDN+emulab.net+image+emulab-ops//docker-ubuntu16-std" portal.context.printRequestRSpec()
9.2.3 External Images
CloudLab supports the use of publicly accessible Docker images in other registries. It does not currently support username/password access to images. By default, if you simply specify a repository and tag, as in the example below, CloudLab assumes the image is in the standard Docker registry; but you can instead specify a complete URL pointing to a different registry.
"""An example of a Docker container running an external, unmodified image.""" import geni.portal as portal import geni.rspec.pg as rspec request = portal.context.makeRequestRSpec() node = request.DockerContainer("node") node.docker_extimage = "ubuntu:16.04" portal.context.printRequestRSpec() """An example of a Docker container running an external, unmodified image.""" import geni.portal as portal import geni.rspec.pg as rspec request = portal.context.makeRequestRSpec() node = request.DockerContainer("node") node.docker_extimage = "ubuntu:16.04" portal.context.printRequestRSpec()
By default, CloudLab assumes that an external, non-augmented image does not run its own sshd to support remote login. Instead, it facilitates remote access to a container by running an alternate sshd on the container host and executing a shell (by default /bin/sh) in the container associated with a specific port (the port in the ssh URL shown on your experiment’s page). See the section on remote access below for more detail.
9.2.4 Dockerfiles
You can also create images dynamically (at experiment runtime) by specifying a Dockerfile for each container. Note that if multiple containers hosted on the same physical machine reference the same Dockerfile, the dynamically built image will only be built once. Here is a simple example of a Dockerfile that builds httpd from source:
"""An example of a Docker container running an external, unmodified image.""" import geni.portal as portal import geni.rspec.pg as rspec request = portal.context.makeRequestRSpec() node = request.DockerContainer("node") node.docker_dockerfile = "https://github.com/docker-library/httpd/raw/38842a5d4cdd44ff4888e8540c0da99009790d01/2.4/Dockerfile" portal.context.printRequestRSpec() """An example of a Docker container running an external, unmodified image.""" import geni.portal as portal import geni.rspec.pg as rspec request = portal.context.makeRequestRSpec() node = request.DockerContainer("node") node.docker_dockerfile = "https://github.com/docker-library/httpd/raw/38842a5d4cdd44ff4888e8540c0da99009790d01/2.4/Dockerfile" portal.context.printRequestRSpec()
You should not assume you have access to the image build environment
(you only do if your container(s) are running in
dedicated mode)—
9.2.5 Augmented Disk Images
The primary use case that Docker supports involves a large collection of containers, potentially spanning many machines, in which each container runs a single application (one or more processes launched by a single parent) that provides a specific service. In this model, container images may be tailored to support only their single service, unencumbered by extra software. Many Docker images do not include an init daemon to launch and monitor processes, or even a basic set of user-space tools and libraries.
CloudLab-based experimentation requires more than the ability to run a single service per node. Within an experiment, it is beneficial for each node to run basic OS services (e.g., syslogd and sshd) to support activities such as debugging, interactive exploration, logging, and more. It is beneficial for each node to run a suite of CloudLab-specific services to configure the node, conduct in-node monitoring, and automate experiment deployment and control (e.g., launch programs or dynamically emulate link failures). This environment requires a full-featured init daemon and a common, basic set of user-space software and libraries.
To enable experiments that use existing Docker images and seamlessly support the CloudLab environment, CloudLab supports automatic augmentation of those images. When requested, CloudLab pulls an existing image, builds both runit (a simple, minimal init daemon) and the CloudLab client toolchain against it in temporary containers, and creates a new image with the just-built runit and CloudLab binaries, scripts, and dependencies. Augmented images are necessary to fully support some CloudLab features (specifically, the creation of user accounts, installation of SSH pubkeys, and execution of startup scripts). Other CloudLab features (such as experiment network links and block storage) are configured outside the container at startup and do not require augmentation.
During augmentation, CloudLab modifies Docker images to run an init system rather than a single application. CloudLab builds and packages (in temporary containers with a build toolchain) and installs (in the final augmented image) runit and sets it as the augmented image’s ENTRYPOINT. CloudLab creates runit services to emulate existing ENTRYPOINT and/or CMD instructions from the original image. The CloudLab client-side software is also built and installed.
9.2.6 Remote Access
CloudLab provides remote access via ssh to each node in an experiment. If your container is running an sshd inside, then ssh access is straightfoward: CloudLab allocates a high-numbered port on the container host machine, and redirects (DNATs) that port to the private, unrouted control network IP address assigned to the container. (CloudLab containers typically receive private addresses that are only routed on the CloudLab control network to increase experiment scalability.) We refer to this style of ssh remote login as direct.
However, most unmodified, unaugmented Docker images do not run sshd. To trivially allow remote access to containers running such images, CloudLab runs a proxy sshd in the host context on the high-numbered ports assigned to each container, and turns successfully authenticated incoming connections into invocations of docker exec <container> /bin/sh. We refer to this style of ssh remote login as exec.
You can force the ssh style you prefer on a per-container
basis—
"""An example of a Docker container running an external, unmodified image, and customizing its remote access.""" import geni.portal as portal import geni.rspec.pg as rspec request = portal.context.makeRequestRSpec() node = request.DockerContainer("node") node.docker_extimage = "ubuntu:16.04" node.docker_ssh_style = "exec" node.docker_exec_shell = "/bin/bash" portal.context.printRequestRSpec() """An example of a Docker container running an external, unmodified image, and customizing its remote access.""" import geni.portal as portal import geni.rspec.pg as rspec request = portal.context.makeRequestRSpec() node = request.DockerContainer("node") node.docker_extimage = "ubuntu:16.04" node.docker_ssh_style = "exec" node.docker_exec_shell = "/bin/bash" portal.context.printRequestRSpec()
9.2.7 Console
Although Docker containers do not have a serial console, as do most CloudLab physical machines, they can provide a pty (pseudoterminal) for interaction with the main process running in the container. CloudLab attaches to these streams via the Docker daemon and proxies them into the CloudLab console mechanism. Thus, if you click the "Console" link for a container on your experiment status page, you will see output from the container and be able to send it input. (Each time you close and reopen the console for a container, your console will see the entire console log kept by the Docker daemon, so there may be a flood of initial output if your container has been running for a long time and producing output.)
9.2.8 ENTRYPOINT and CMD
If you run an augmented Docker image in a container, recall that the augmentation process overrides the ENTRYPOINT and/or CMD instruction(s) the image contained, if any. Changing the ENTRYPOINT means that the augmented image will not run the command that was specified for the original image. Moreover, runit must be run as root, but the image creator may have specified a different USER for processes that execute in the container. To fix these problems, CloudLab emulates the original (or runtime-specific) ENTRYPOINT as an runit service and handles several related Dockerfile settings as well: CMD, WORKDIR, ENV, and USER. The emulation preserves the semantics of these settings (see Docker’s reference on interaction between ENTRYPOINT and CMD), with the exception that the user-specified ENTRYPOINT or CMD is not executed as PID 1. Only the ENTRYPOINT and CMD processes run as USER; processes started from outside the container via docker exec run as root.
You can customize the ENTRYPOINT and CMD for each container you create by setting the docker_entrypoint and the docker_cmd member variables of a geni-lib DockerContainer object instance.
The runit service that performs this emulation is named dockerentrypoint. If this service exits, it is not automatically restarted, unlike most runit services. You can start it again by running sv start dockerentrypoint within a container. You can also browse the runit documentation for more examples of interacting with runit.
This emulation process is complicated, so ifyou suspect problems, there are logfiles written in each container that may help. The stdout and stderr output from the ENTRYPOINT and CMD emulation is logged to /var/log/entrypoint.log, and additional debugging information is logged in /var/log/entrypoint-debug.log.
9.2.9 Shared Containers
In CloudLab, Docker containers can be created in dedicated or shared mode. In dedicated mode, containers run on physical nodes that are reserved to a particular experiment, and you have root-level access to the underlying physical machine. In shared mode, containers run on physical machines that host containers from potentially many experiments, and users do not have access to the underlying physical machine.
9.2.10 Privileged Containers
Docker allows containers to be privileged or unprivileged: a privileged container has administrative access to the underlying host. CloudLab allows you to spawn privileged containers, but only on dedicated container hosts. Moreover, you should only do this when absolutely necessary, because a hacked privileged container can effectively take over the physical host, and access and control other containers. To make a container privileged, set the docker_privileged member variable of a geni-lib DockerContainer object instance.
9.2.11 Remote Blockstores
You can mount CloudLab remote blockstores in Docker containers —
"""An example of a Docker container that mounts a remote blockstore.""" import geni.portal as portal import geni.rspec.pg as rspec import geni.rspec.igext as ig request = portal.context.makeRequestRSpec() node = request.DockerContainer("node") # Create an interface to connect to the link from the container to the # blockstore host. myintf = # Create the blockstore host. bsnode = ig.RemoteBlockstore("bsnode","/mnt/blockstore") # Map your remote blockstore to the blockstore host bsnode.dataset = \ "urn:publicid:IDN+emulab.net:emulab-ops+ltdataset+johnsond-bs-foo" bsnode.readonly = False # Connect the blockstore host to the container. bslink = pg.Link("bslink") bslink.addInterface(node.addInterface("ifbs0")) bslink.addInterface(bsnode.interface) portal.context.printRequestRSpec() """An example of a Docker container that mounts a remote blockstore.""" import geni.portal as portal import geni.rspec.pg as rspec import geni.rspec.igext as ig request = portal.context.makeRequestRSpec() node = request.DockerContainer("node") # Create an interface to connect to the link from the container to the # blockstore host. myintf = # Create the blockstore host. bsnode = ig.RemoteBlockstore("bsnode","/mnt/blockstore") # Map your remote blockstore to the blockstore host bsnode.dataset = \ "urn:publicid:IDN+emulab.net:emulab-ops+ltdataset+johnsond-bs-foo" bsnode.readonly = False # Connect the blockstore host to the container. bslink = pg.Link("bslink") bslink.addInterface(node.addInterface("ifbs0")) bslink.addInterface(bsnode.interface) portal.context.printRequestRSpec()
CloudLab does not support mapping raw devices (or in the case of CloudLab remote blockstores, virtual ISCSI-backed devices) into containers.
9.2.12 Temporary Block Storage
You can mount CloudLab temporary blockstores in Docker containers. Here is an example that places an 8 GB filesystem in a container at /mnt/tmp:
"""An example of a Docker container that mounts a remote blockstore.""" import geni.portal as portal import geni.rspec.pg as rspec import geni.rspec.igext as ig request = portal.context.makeRequestRSpec() node = request.DockerContainer("node") bs = node.Blockstore("temp-bs","/mnt/tmp") bs.size = "8GB" bs.placement "any" portal.context.printRequestRSpec() """An example of a Docker container that mounts a remote blockstore.""" import geni.portal as portal import geni.rspec.pg as rspec import geni.rspec.igext as ig request = portal.context.makeRequestRSpec() node = request.DockerContainer("node") bs = node.Blockstore("temp-bs","/mnt/tmp") bs.size = "8GB" bs.placement "any" portal.context.printRequestRSpec()
9.2.13 DockerContainer Member Variables
The following table summarizes the DockerContainer class member variables. You can find more detail either in the sections above, or by browsing the source code documentation.
| The amount (integer) of virtual CPUs this container should receive; approximated using Docker's CpuShares and CpuPeriod options. | |
| The amount (integer) of memory this container should receive. | |
| The disk image this node should run. See the section on Docker disk images below. | |
| The physical node type on which to instantiate the container. Types are cluster-specific; see the hardware chapter. | |
| An external Docker image (repo:tag) to load on the container; see the section on external Docker images. | |
| A URL that points to a Dockerfile from which an image for this node will be created; see the section on Dockerfiles. | |
| The requested testbed augmentation level: must be one of full, buildenv, core, basic, none. See the section on Docker image augmentation. | |
| If the image has already been augmented, should we update it True or not False. | |
| Specify what happens when you ssh to your node; must be either direct or exec. See the section on remote access. | |
| The shell to run if the value of docker_ssh_style is direct; ignored otherwise. | |
| Change/set the Docker ENTRYPOINT option for this container; see the section on Docker ENTRYPOINT and CMD handling. | |
| Change/set the Docker CMD option for this container; see the section on Docker ENTRYPOINT and CMD handling. | |
| Add or override environment variables to a container. The value to this attribute should be either a newline-separated list of variable assignments, or one or more variable assignments on a single line. If the former, we do not support escaped newlines, unlike the Docker ENV instruction. | |
| Set this member variable True to make the container privileged. See the section on Docker privileged containers. |
10 Advanced Topics
10.1 Disk Images
Most disk images in CloudLab are stored and distributed in the Frisbee disk image format. They are stored at block level, meaning that, in theory, any filesystem can be used. In practice, Frisbee’s filesystem-aware compression is used, which causes the image snapshotting and installation processes to parse the filesystem and skip free blocks; this provides large performance benefits and space savings. Frisbee has support for filesystems that are variants of the BSD UFS/FFS, Linux ext, and Windows NTFS formats. The disk images created by Frisbee are bit-for-bit identical with the original image, with the caveat that free blocks are skipped and may contain leftover data from previous users.
Disk images in CloudLab are typically created by starting with one of CloudLab’s supplied images, customizing the contents, and taking a snapshot of the resulting disk. The snapshotting process reboots the node, as it boots into an MFS to ensure a quiescent disk. If you wish to bring in an image from outside of CloudLab or create a new one from scratch, please contact us for help; if this is a common request, we may add features to make it easier.
CloudLab has default disk image for each node type; after a node is freed by one experimenter, it is re-loaded with the default image before being released back into the free pool. As a result, profiles that use the default disk images typically instantiate faster than those that use custom images, as no disk loading occurs.
Frisbee loads disk images using a custom multicast protocol, so loading large numbers of nodes typically does not slow down the instantiation process much.
Images may be referred to in requests in three ways: by URN, by an unqualified name, and by URL. URNs refer to a specific image that may be hosted on any of the CloudLab-affilated clusters. An unqualified name refers to the version of the image hosted on the cluster on which the experiment is instantiated. If you have large images that CloudLab cannot store due to space constraints, you may host them yourself on a webserver and put the URL for the image into the profile. CloudLab will fetch your image on demand, and cache it for some period of time for efficient distribution.
Images in CloudLab are versioned, and CloudLab records the provenance of images. That is, when you create an image by snapshotting a disk that was previously loaded with another image, CloudLab records which image was used as a base for the new image, and stores only the blocks that differ between the two. Image URLs and URNs can contain version numbers, in which case they refer to that specific version of the image, or they may omit the version number, in which case they refer to the latest version of the image at the time an experiment is instantiated.
10.2 RSpecs
The resources (nodes, links, etc.) that define a profile are expressed in the
RSpec
format from the GENI project. In general, RSpec should be thought of as a
sort of “assembly language”—
10.3 Public IP Access
CloudLab treats publicly-routable IP addresses as an allocatable resource.
By default, all physical hosts are given a public IP address. This IP address is determined by the host, rather than the experiment. There are two DNS names that point to this public address: a static one that is the node’s permanent hostname (such as apt042.apt.emulab.net), and a dynamic one that is assigned based on the experiment; this one may look like <vname>.<exp>.<proj>.apt.emulab.net, where <vname> is the name assigned in the request RSpec, <eid> is the name assigned to the experiment, and proj is the project that the experiment belongs to. This name is predictable regardless of the physical nodes assigned.
By default, virtual machines are not given public IP addresses; basic remote access is provided through an ssh server running on a non-standard port, using the physical host’s IP address. This port can be discovered through the manifest of an instantiated experiment, or on the “list view” of the experiment page. If a public IP address is required for a virtual machine (for example, to host a webserver on it), a public address can be requested on a per-VM basis. If using the Jacks GUI, each VM has a checkbox to request a public address. If using python scripts and geni-lib, setting the routable_control_ip property of a node accomplishes the same effect. Different clusters will have different numbers of public addresses available for allocation in this manner.
10.3.1 Dynamic Public IP Addresses
In some cases, users would like to create their own virtual machines, and would like to give them public IP addresses. We also allow profiles to request a pool of dynamic addresses; VMs brought up by the user can then run DHCP to be assigned one of these addresses.
Profiles using python scripts and geni-lib can request dynamic IP address pools by constructing an AddressPool object (defined in the geni.rspec.igext module), as in the following example:
# Request a pool of 3 dynamic IP addresses pool = AddressPool( "poolname", 3 ) rspec.addResource( pool ) # Request a pool of 3 dynamic IP addresses pool = AddressPool( "poolname", 3 ) rspec.addResource( pool )
The addresses assigned to the pool are found in the experiment manifest.
10.4 Markdown
CloudLab supports Markdown in the major text fields in RSpecs. Markdown is a simple formatting syntax with a straightforward translation to basic HTML elements such as headers, lists, and pre-formatted text. You may find this useful in the description and instructions attached to your profile.
While editing a profile, you can preview the Markdown rendering of the Instructions or Description field by double-clicking within the text box.
You will probably find the Markdown manual to be useful.
10.5 Introspection
CloudLab implements the GENI APIs, and in particular the geni-get command. geni-get is a generic means for nodes to query their own configuration and metadata, and is pre-installed on all facility-provided disk images. (If you are supplying your own disk image built from scratch, you can add the geni-get client from its repository.)
While geni-get supports many options, there are five commands most useful in the CloudLab context.
10.5.1 Client ID
Invoking geni-get client_id will print a single line to standard output showing the identifier specified in the profile corresponding to the node on which it is run. This is a particularly useful feature in execute services, where a script might want to vary its behaviour between different nodes.
10.5.2 Control MAC
The command geni-get control_mac will print the MAC address of the control interface (as a string of 12 hexadecimal digits with no punctuation). In some circumstances this can be a useful means to determine which interface is attached to the control network, as OSes are not necessarily consistent in assigning identifiers to network interfaces.
10.5.3 Manifest
To retrieve the manifest RSpec for the instance, you can use the command geni-get manifest. It will print the manifest to standard output, including any annotations added during instantiation. For instance, this is an appropriate technique to use to query the allocation of a dynamic public IP address pool.
10.5.4 Private key
As a convenience, CloudLab will automatically generate an RSA private key unique to each profile instance. geni-get key will retrieve the private half of the keypair, which makes it a useful command for profiles bootstraping an authenticated channel. For instance:
#!/bin/sh # Create the user SSH directory, just in case. mkdir $HOME/.ssh && chmod 700 $HOME/.ssh # Retrieve the server-generated RSA private key. geni-get key > $HOME/.ssh/id_rsa chmod 600 $HOME/.ssh/id_rsa # Derive the corresponding public key portion. ssh-keygen -y -f $HOME/.ssh/id_rsa > $HOME/.ssh/id_rsa.pub # If you want to permit login authenticated by the auto-generated key, # then append the public half to the authorized_keys file: grep -q -f $HOME/.ssh/id_rsa.pub $HOME/.ssh/authorized_keys || cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys #!/bin/sh # Create the user SSH directory, just in case. mkdir $HOME/.ssh && chmod 700 $HOME/.ssh # Retrieve the server-generated RSA private key. geni-get key > $HOME/.ssh/id_rsa chmod 600 $HOME/.ssh/id_rsa # Derive the corresponding public key portion. ssh-keygen -y -f $HOME/.ssh/id_rsa > $HOME/.ssh/id_rsa.pub # If you want to permit login authenticated by the auto-generated key, # then append the public half to the authorized_keys file: grep -q -f $HOME/.ssh/id_rsa.pub $HOME/.ssh/authorized_keys || cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Please note that the private key will be accessible to any user who can invoke geni-get from within the profile instance. Therefore, it is NOT suitable for an authentication mechanism for privilege within a multi-user instance!
10.5.5 Profile parameters
When executing within the context of a profile instantiated with user-specified parameters, geni-get allows the retrieval of any of those parameters. The proper syntax is geni-get "param name", where name is the parameter name as specified in the geni-lib script defineParameter call. For example, geni-get "param n" would retrieve the number of nodes in an instance of the profile shown in the geni-lib parameter section.
10.6 User-controlled switches and layer-1 topologies
Some experiments require exclusive access to Ethernet switches and/or the ability for users to reconfigure those switches. One example of a good use case for this feature is to enable or tune QoS features that cannot be enabled on CloudLab’s shared infrastructure switches.
User-allocated switches are treated similarly to the way CloudLab treats servers: the switches appear as nodes in your topology, and you ’wire’ them to PCs and each other using point-to-point layer-1 links. When one of these switches is allocated to an experiment, that experiment is the exclusive user, just as it is for a raw PC, and the user has ssh access with full administrative control. This means that users are free to enable and disable features, tweak parameters, reconfigure as will, etc. Users are be given the switches in a ’clean’ state (we do little configuration on them), and can reload and reboot them like you would do with a server.
The list of available switches is found in our hardware chapter, and the following example shows how to request a simple topology using geni-lib.
"""This profile allocates two bare metal nodes and connects them together via a Dell switch with layer1 links. Instructions: Click on any node in the topology and choose the `shell` menu item. When your shell window appears, use `ping` to test the link. You will be able to ping the other node through the switch fabric. We have installed a minimal configuration on your switches that enables the ports that are in use, and turns on spanning-tree (RSTP) in case you inadvertently created a loop with your topology. All unused ports are disabled. The ports are in Vlan 1, which effectively gives a single broadcast domain. If you want anything fancier, you will need to open up a shell window to your switches and configure them yourself. If your topology has more then a single switch, and you have links between your switches, we will enable those ports too, but we do not put them into switchport mode or bond them into a single channel, you will need to do that yourself. If you make any changes to the switch configuration, be sure to write those changes to memory. We will wipe the switches clean and restore a default configuration when your experiment ends.""" # Import the Portal object. import geni.portal as portal # Import the ProtoGENI library. import geni.rspec.pg as pg # Import the Emulab specific extensions. import geni.rspec.emulab as emulab # Create a portal context. pc = portal.Context() # Create a Request object to start building the RSpec. request = pc.makeRequestRSpec() # Do not run snmpit #request.skipVlans() # Add a raw PC to the request and give it an interface. node1 = request.RawPC("node1") iface1 = node1.addInterface() # Specify the IPv4 address iface1.addAddress(pg.IPv4Address("192.168.1.1", "255.255.255.0")) # Add Switch to the request and give it a couple of interfaces mysw = request.Switch("mysw"); mysw.hardware_type = "dell-s4048" swiface1 = mysw.addInterface() swiface2 = mysw.addInterface() # Add another raw PC to the request and give it an interface. node2 = request.RawPC("node2") iface2 = node2.addInterface() # Specify the IPv4 address iface2.addAddress(pg.IPv4Address("192.168.1.2", "255.255.255.0")) # Add L1 link from node1 to mysw link1 = request.L1Link("link1") link1.addInterface(iface1) link1.addInterface(swiface1) # Add L1 link from node2 to mysw link2 = request.L1Link("link2") link2.addInterface(iface2) link2.addInterface(swiface2) # Print the RSpec to the enclosing page. pc.printRequestRSpec(request) Open this profile on CloudLab"""This profile allocates two bare metal nodes and connects them together via a Dell switch with layer1 links. Instructions: Click on any node in the topology and choose the `shell` menu item. When your shell window appears, use `ping` to test the link. You will be able to ping the other node through the switch fabric. We have installed a minimal configuration on your switches that enables the ports that are in use, and turns on spanning-tree (RSTP) in case you inadvertently created a loop with your topology. All unused ports are disabled. The ports are in Vlan 1, which effectively gives a single broadcast domain. If you want anything fancier, you will need to open up a shell window to your switches and configure them yourself. If your topology has more then a single switch, and you have links between your switches, we will enable those ports too, but we do not put them into switchport mode or bond them into a single channel, you will need to do that yourself. If you make any changes to the switch configuration, be sure to write those changes to memory. We will wipe the switches clean and restore a default configuration when your experiment ends.""" # Import the Portal object. import geni.portal as portal # Import the ProtoGENI library. import geni.rspec.pg as pg # Import the Emulab specific extensions. import geni.rspec.emulab as emulab # Create a portal context. pc = portal.Context() # Create a Request object to start building the RSpec. request = pc.makeRequestRSpec() # Do not run snmpit #request.skipVlans() # Add a raw PC to the request and give it an interface. node1 = request.RawPC("node1") iface1 = node1.addInterface() # Specify the IPv4 address iface1.addAddress(pg.IPv4Address("192.168.1.1", "255.255.255.0")) # Add Switch to the request and give it a couple of interfaces mysw = request.Switch("mysw"); mysw.hardware_type = "dell-s4048" swiface1 = mysw.addInterface() swiface2 = mysw.addInterface() # Add another raw PC to the request and give it an interface. node2 = request.RawPC("node2") iface2 = node2.addInterface() # Specify the IPv4 address iface2.addAddress(pg.IPv4Address("192.168.1.2", "255.255.255.0")) # Add L1 link from node1 to mysw link1 = request.L1Link("link1") link1.addInterface(iface1) link1.addInterface(swiface1) # Add L1 link from node2 to mysw link2 = request.L1Link("link2") link2.addInterface(iface2) link2.addInterface(swiface2) # Print the RSpec to the enclosing page. pc.printRequestRSpec(request)
This feature is implemented using a set of layer-1 switches between some servers and Ethernet switches. These switches act as “patch panels,” allowing us to “wire” servers to switches with no intervening Ethernet packet processing and minimal impact on latency. This feature can also be used to “wire” servers directly to one another, and to create links between switches, as seen in the following two examples.
"""This profile allocates two bare metal nodes and connects them directly together via a layer1 link. Instructions: Click on any node in the topology and choose the `shell` menu item. When your shell window appears, use `ping` to test the link.""" # Import the Portal object. import geni.portal as portal # Import the ProtoGENI library. import geni.rspec.pg as pg # Import the Emulab specific extensions. import geni.rspec.emulab as emulab # Create a portal context. pc = portal.Context() # Create a Request object to start building the RSpec. request = pc.makeRequestRSpec() # Do not run snmpit #request.skipVlans() # Add a raw PC to the request and give it an interface. node1 = request.RawPC("node1") iface1 = node1.addInterface() # Specify the IPv4 address iface1.addAddress(pg.IPv4Address("192.168.1.1", "255.255.255.0")) # Add another raw PC to the request and give it an interface. node2 = request.RawPC("node2") iface2 = node2.addInterface() # Specify the IPv4 address iface2.addAddress(pg.IPv4Address("192.168.1.2", "255.255.255.0")) # Add L1 link from node1 to node2 link1 = request.L1Link("link1") link1.addInterface(iface1) link1.addInterface(iface2) # Print the RSpec to the enclosing page. pc.printRequestRSpec(request) Open this profile on CloudLab"""This profile allocates two bare metal nodes and connects them directly together via a layer1 link. Instructions: Click on any node in the topology and choose the `shell` menu item. When your shell window appears, use `ping` to test the link.""" # Import the Portal object. import geni.portal as portal # Import the ProtoGENI library. import geni.rspec.pg as pg # Import the Emulab specific extensions. import geni.rspec.emulab as emulab # Create a portal context. pc = portal.Context() # Create a Request object to start building the RSpec. request = pc.makeRequestRSpec() # Do not run snmpit #request.skipVlans() # Add a raw PC to the request and give it an interface. node1 = request.RawPC("node1") iface1 = node1.addInterface() # Specify the IPv4 address iface1.addAddress(pg.IPv4Address("192.168.1.1", "255.255.255.0")) # Add another raw PC to the request and give it an interface. node2 = request.RawPC("node2") iface2 = node2.addInterface() # Specify the IPv4 address iface2.addAddress(pg.IPv4Address("192.168.1.2", "255.255.255.0")) # Add L1 link from node1 to node2 link1 = request.L1Link("link1") link1.addInterface(iface1) link1.addInterface(iface2) # Print the RSpec to the enclosing page. pc.printRequestRSpec(request)
"""This profile allocates two bare metal nodes and connects them together via two Dell switches with layer1 links. Instructions: Click on any node in the topology and choose the `shell` menu item. When your shell window appears, use `ping` to test the link. You will be able to ping the other node through the switch fabric. We have installed a minimal configuration on your switches that enables the ports that are in use, and turns on spanning-tree (RSTP) in case you inadvertently created a loop with your topology. All unused ports are disabled. The ports are in Vlan 1, which effectively gives a single broadcast domain. If you want anything fancier, you will need to open up a shell window to your switches and configure them yourself. If your topology has more then a single switch, and you have links between your switches, we will enable those ports too, but we do not put them into switchport mode or bond them into a single channel, you will need to do that yourself. If you make any changes to the switch configuration, be sure to write those changes to memory. We will wipe the switches clean and restore a default configuration when your experiment ends.""" # Import the Portal object. import geni.portal as portal # Import the ProtoGENI library. import geni.rspec.pg as pg # Import the Emulab specific extensions. import geni.rspec.emulab as emulab # Create a portal context. pc = portal.Context() # Create a Request object to start building the RSpec. request = pc.makeRequestRSpec() # Do not run snmpit #request.skipVlans() # Add a raw PC to the request and give it an interface. node1 = request.RawPC("node1") iface1 = node1.addInterface() # Specify the IPv4 address iface1.addAddress(pg.IPv4Address("192.168.1.1", "255.255.255.0")) # Add first switch to the request and give it a couple of interfaces mysw1 = request.Switch("mysw1"); mysw1.hardware_type = "dell-s4048" sw1iface1 = mysw1.addInterface() sw1iface2 = mysw1.addInterface() # Add second switch to the request and give it a couple of interfaces mysw2 = request.Switch("mysw2"); mysw2.hardware_type = "dell-s4048" sw2iface1 = mysw2.addInterface() sw2iface2 = mysw2.addInterface() # Add another raw PC to the request and give it an interface. node2 = request.RawPC("node2") iface2 = node2.addInterface() # Specify the IPv4 address iface2.addAddress(pg.IPv4Address("192.168.1.2", "255.255.255.0")) # Add L1 link from node1 to mysw1 link1 = request.L1Link("link1") link1.addInterface(iface1) link1.addInterface(sw1iface1) # Add L1 link from mysw1 to mysw2 trunk = request.L1Link("trunk") trunk.addInterface(sw1iface2) trunk.addInterface(sw2iface2) # Add L1 link from node2 to mysw2 link2 = request.L1Link("link2") link2.addInterface(iface2) link2.addInterface(sw2iface1) # Print the RSpec to the enclosing page. pc.printRequestRSpec(request) Open this profile on CloudLab"""This profile allocates two bare metal nodes and connects them together via two Dell switches with layer1 links. Instructions: Click on any node in the topology and choose the `shell` menu item. When your shell window appears, use `ping` to test the link. You will be able to ping the other node through the switch fabric. We have installed a minimal configuration on your switches that enables the ports that are in use, and turns on spanning-tree (RSTP) in case you inadvertently created a loop with your topology. All unused ports are disabled. The ports are in Vlan 1, which effectively gives a single broadcast domain. If you want anything fancier, you will need to open up a shell window to your switches and configure them yourself. If your topology has more then a single switch, and you have links between your switches, we will enable those ports too, but we do not put them into switchport mode or bond them into a single channel, you will need to do that yourself. If you make any changes to the switch configuration, be sure to write those changes to memory. We will wipe the switches clean and restore a default configuration when your experiment ends.""" # Import the Portal object. import geni.portal as portal # Import the ProtoGENI library. import geni.rspec.pg as pg # Import the Emulab specific extensions. import geni.rspec.emulab as emulab # Create a portal context. pc = portal.Context() # Create a Request object to start building the RSpec. request = pc.makeRequestRSpec() # Do not run snmpit #request.skipVlans() # Add a raw PC to the request and give it an interface. node1 = request.RawPC("node1") iface1 = node1.addInterface() # Specify the IPv4 address iface1.addAddress(pg.IPv4Address("192.168.1.1", "255.255.255.0")) # Add first switch to the request and give it a couple of interfaces mysw1 = request.Switch("mysw1"); mysw1.hardware_type = "dell-s4048" sw1iface1 = mysw1.addInterface() sw1iface2 = mysw1.addInterface() # Add second switch to the request and give it a couple of interfaces mysw2 = request.Switch("mysw2"); mysw2.hardware_type = "dell-s4048" sw2iface1 = mysw2.addInterface() sw2iface2 = mysw2.addInterface() # Add another raw PC to the request and give it an interface. node2 = request.RawPC("node2") iface2 = node2.addInterface() # Specify the IPv4 address iface2.addAddress(pg.IPv4Address("192.168.1.2", "255.255.255.0")) # Add L1 link from node1 to mysw1 link1 = request.L1Link("link1") link1.addInterface(iface1) link1.addInterface(sw1iface1) # Add L1 link from mysw1 to mysw2 trunk = request.L1Link("trunk") trunk.addInterface(sw1iface2) trunk.addInterface(sw2iface2) # Add L1 link from node2 to mysw2 link2 = request.L1Link("link2") link2.addInterface(iface2) link2.addInterface(sw2iface1) # Print the RSpec to the enclosing page. pc.printRequestRSpec(request)
11 Hardware
CloudLab can allocate experiments on any one of several federated clusters.
CloudLab has the ability to dispatch experiments to several clusters: three that belong to CloudLab itself, plus several more that belong to federated projects.
Additional hardware expansions are planned, and descriptions of them can be found at https://www.cloudlab.us/hardware.php
11.1 CloudLab Utah
The CloudLab cluster at the University of Utah is being built in partnership with HP. It consists of 200 Intel Xeon E5 servers, 270 Xeon-D servers, and 315 64-bit ARM servers and 270 Intel Xeon-D severs, for a total of 6,680 cores. The cluster is housed in the University of Utah’s Downtown Data Center in Salt Lake City.
m400 |
| 315 nodes (64-bit ARM) |
CPU |
| Eight 64-bit ARMv8 (Atlas/A57) cores at 2.4 GHz (APM X-GENE) |
RAM |
| 64GB ECC Memory (8x 8 GB DDR3-1600 SO-DIMMs) |
Disk |
| 120 GB of flash (SATA3 / M.2, Micron M500) |
NIC |
| Dual-port Mellanox ConnectX-3 10 GB NIC (PCIe v3.0, 8 lanes |
m510 |
| 270 nodes (Intel Xeon-D) |
CPU |
| Eight-core Intel Xeon D-1548 at 2.0 GHz |
RAM |
| 64GB ECC Memory (4x 16 GB DDR4-2133 SO-DIMMs) |
Disk |
| 256 GB NVMe flash storage |
NIC |
| Dual-port Mellanox ConnectX-3 10 GB NIC (PCIe v3.0, 8 lanes |
There are 45 nodes in a chassis, and this cluster consists of thirteen chassis. Each chassis has two 45XGc switches; each node is connected to both switches, and each chassis switch has four 40Gbps uplinks, for a total of 320Gbps of uplink capacity from each chassis. One switch is used for control traffic, connecting to the Internet, etc. The other is used to build experiment topologies, and should be used for most experimental purposes.
All chassis are interconnected through a large HP FlexFabric 12910 switch which has full bisection bandwidth internally.
We have plans to enable some users to allocate entire chassis; when allocated in this mode, it will be possible to have complete administrator control over the switches in addition to the nodes.
In phase two we added 50 Apollo R2200 chassis each with four HPE ProLiant XL170r server modules. Each server has 10 cores for a total of 2000 cores.
xl170 |
| 200 nodes (Intel Broadwell, 10 core, 1 disk) |
CPU |
| Ten-core Intel E5-2640v4 at 2.4 GHz |
RAM |
| 64GB ECC Memory (4x 16 GB DDR4-2400 DIMMs) |
Disk |
| Intel DC S3520 480 GB 6G SATA SSD |
NIC |
| Two Dual-port Mellanox ConnectX-4 25 GB NIC (PCIe v3.0, 8 lanes |
Each server is connected via a 10Gbps control link (Dell switches) and a 25Gbps experimental link to Mellanox 2410 switches in groups of 40 servers. Each of the five groups’ experimental switches are connected to a Mellanox 2700 spine switch at 5x100Gbps. That switch in turn interconnects with the rest of the Utah CloudLab cluster via 6x40Gbps uplinks to the HP FlexFabric 12910 switch.
A unique feature of the phase two nodes is the addition of eight ONIE bootable "user allocatable" switches that can run a variety of Open Network OSes: six Dell S4048-ONs and two Mellanox MSN2410-BB2Fs. These switches and all 200 nodes are connected to two NetScout 3903 layer-1 switches, allowing flexible combinations of nodes and switches in an experiment.
11.2 CloudLab Wisconsin
The CloudLab cluster at the University of Wisconsin is built in partnership with Cisco, Seagate, and HP. The cluster, which is in Madison, Wisconsin, has 270 servers with a total of 5,000 cores connected in a CLOS topology with full bisection bandwidth. It has 1,070 TB of storage, including SSDs on every node.
More technical details can be found at https://www.cloudlab.us/hardware.php#wisconsin
c220g1 |
| 90 nodes (Haswell, 16 core, 3 disks) |
CPU |
| Two Intel E5-2630 v3 8-core CPUs at 2.40 GHz (Haswell w/ EM64T) |
RAM |
| 128GB ECC Memory (8x 16 GB DDR4 1866 MHz dual rank RDIMMs) |
Disk |
| Two 1.2 TB 10K RPM 6G SAS SFF HDDs |
Disk |
| One Intel DC S3500 480 GB 6G SATA SSDs |
NIC |
| Dual-port Intel X520-DA2 10Gb NIC (PCIe v3.0, 8 lanes) |
NIC |
| Onboard Intel i350 1Gb |
c240g1 |
| 10 nodes (Haswell, 16 core, 14 disks) |
CPU |
| Two Intel E5-2630 v3 8-core CPUs at 2.40 GHz (Haswell w/ EM64T) |
RAM |
| 128GB ECC Memory (8x 16 GB DDR4 1866 MHz dual rank RDIMMs) |
Disk |
| Two Intel DC S3500 480 GB 6G SATA SSDs |
Disk |
| Twelve 3 TB 3.5" HDDs donated by Seagate |
NIC |
| Dual-port Intel X520-DA2 10Gb NIC (PCIe v3.0, 8 lanes) |
NIC |
| Onboard Intel i350 1Gb |
c220g2 |
| 163 nodes (Haswell, 20 core, 3 disks) |
CPU |
| Two Intel E5-2660 v3 10-core CPUs at 2.60 GHz (Haswell EP) |
RAM |
| 160GB ECC Memory (10x 16 GB DDR4 2133 MHz dual rank RDIMMs) |
Disk |
| One Intel DC S3500 480 GB 6G SATA SSDs |
Disk |
| Two 1.2 TB 10K RPM 6G SAS SFF HDDs |
NIC |
| Dual-port Intel X520 10Gb NIC (PCIe v3.0, 8 lanes |
NIC |
| Onboard Intel i350 1Gb |
c240g2 |
| 4 nodes (Haswell, 20 core, 8 disks) |
CPU |
| Two Intel E5-2660 v3 10-core CPUs at 2.60 GHz (Haswell EP) |
RAM |
| 160GB ECC Memory (10x 16 GB DDR4 2133 MHz dual rank RDIMMs) |
Disk |
| Two Intel DC S3500 480 GB 6G SATA SSDs |
Disk |
| Two 1TB HDDs |
Disk |
| Four 3TB HDDs |
NIC |
| Dual-port Intel X520 10Gb NIC (PCIe v3.0, 8 lanes |
NIC |
| Onboard Intel i350 1Gb |
All nodes are connected to two networks:
The experiment network at Wisconsin is transitioning to using HP switches in order to provide OpenFlow 1.3 support.
A 1 Gbps Ethernet “control network”—
this network is used for remote access, experiment management, etc., and is connected to the public Internet. When you log in to nodes in your experiment using ssh, this is the network you are using. You should not use this network as part of the experiments you run in CloudLab. A 10 Gbps Ethernet “experiment network”–each node has two interfaces on this network. Twelve leaf switches are Cisco Nexus C3172PQs, which have 48 10Gbps ports for the nodes and six 40Gbps uplink ports. They are connected to six spine switches (Cisco Nexus C3132Qs); each leaf has one 40Gbps link to each spine switch. Another C3132Q switch acts as a core; each spine switch has one 40Gbps link to it, and it has upstream links to Internet2.
Phase II added 260 new nodes, 36 with one or more GPUs:
c220g5 |
| 224 nodes (Intel Skylake, 20 core, 2 disks) |
CPU |
| Two Intel Xeon Silver 4114 10-core CPUs at 2.20 GHz |
RAM |
| 192GB ECC DDR4-2666 Memory |
Disk |
| One 1 TB 7200 RPM 6G SAS HDs |
Disk |
| One Intel DC S3500 480 GB 6G SATA SSD |
NIC |
| Dual-port Intel X520-DA2 10Gb NIC (PCIe v3.0, 8 lanes) |
NIC |
| Onboard Intel i350 1Gb |
c240g5 |
| 32 nodes (Intel Skylake, 20 core, 2 disks, GPU) |
CPU |
| Two Intel Xeon Silver 4114 10-core CPUs at 2.20 GHz |
RAM |
| 192GB ECC DDR4-2666 Memory |
Disk |
| One 1 TB 7200 RPM 6G SAS HDs |
Disk |
| One Intel DC S3500 480 GB 6G SATA SSD |
GPU |
| One NVIDIA 12GB PCI P100 GPU |
NIC |
| Dual-port Intel X520-DA2 10Gb NIC (PCIe v3.0, 8 lanes) |
NIC |
| Onboard Intel i350 1Gb |
c4130 |
| 4 nodes (Intel Broadwell, 16 core, 2 disks, 4 GPUs) |
CPU |
| Two Intel Xeon E5-2667 8-core CPUs at 3.20 GHz |
RAM |
| 128GB ECC Memory |
Disk |
| Two 960 GB 6G SATA SSD |
GPU |
| Four NVIDIA 16GB Tesla V100 SMX2 GPUs |
11.3 CloudLab Clemson
The CloudLab cluster at Clemson University has been built partnership with Dell. The cluster so far has 260 servers with a total of 6,736 cores, 1,272TB of disk space, and 73TB of RAM. All nodes have 10GB Ethernet and most have QDR Infiniband. It is located in Clemson, South Carolina.
More technical details can be found at https://www.cloudlab.us/hardware.php#clemson
c8220 |
| 96 nodes (Ivy Bridge, 20 core) |
CPU |
| Two Intel E5-2660 v2 10-core CPUs at 2.20 GHz (Ivy Bridge) |
RAM |
| 256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs |
Disk |
| Two 1 TB 7.2K RPM 3G SATA HDDs |
NIC |
| Dual-port Intel 10Gbe NIC (PCIe v3.0, 8 lanes |
NIC |
| Qlogic QLE 7340 40 Gb/s Infiniband HCA (PCIe v3.0, 8 lanes) |
c8220x |
| 4 nodes (Ivy Bridge, 20 core, 20 disks) |
CPU |
| Two Intel E5-2660 v2 10-core CPUs at 2.20 GHz (Ivy Bridge) |
RAM |
| 256GB ECC Memory (16x 16 GB DDR4 1600MT/s dual rank RDIMMs |
Disk |
| Eight 1 TB 7.2K RPM 3G SATA HDDs |
Disk |
| Twelve 4 TB 7.2K RPM 3G SATA HDDs |
NIC |
| Dual-port Intel 10Gbe NIC (PCIe v3.0, 8 lanes |
NIC |
| Qlogic QLE 7340 40 Gb/s Infiniband HCA (PCIe v3.0, 8 lanes) |
c6320 |
| 84 nodes (Haswell, 28 core) |
CPU |
| Two Intel E5-2683 v3 14-core CPUs at 2.00 GHz (Haswell) |
RAM |
| 256GB ECC Memory |
Disk |
| Two 1 TB 7.2K RPM 3G SATA HDDs |
NIC |
| Dual-port Intel 10Gbe NIC (X520) |
NIC |
| Qlogic QLE 7340 40 Gb/s Infiniband HCA (PCIe v3.0, 8 lanes) |
c4130 |
| 2 nodes (Haswell, 28 core, two GPUs) |
CPU |
| Two Intel E5-2680 v3 12-core processors at 2.50 GHz (Haswell) |
RAM |
| 256GB ECC Memory |
Disk |
| Two 1 TB 7.2K RPM 3G SATA HDDs |
GPU |
| Two Tesla K40m GPUs |
NIC |
| Dual-port Intel 1Gbe NIC (i350) |
NIC |
| Dual-port Intel 10Gbe NIC (X710) |
NIC |
| Qlogic QLE 7340 40 Gb/s Infiniband HCA (PCIe v3.0, 8 lanes) |
There are also two, storage intensive (270TB each!) nodes that should only be used if you need a huge amount of volatile storage. These nodes have only 10GB Ethernet.
dss7500 |
| 2 nodes (Haswell, 12 core, 270TB disk) |
CPU |
| Two Intel E5-2620 v3 6-core CPUs at 2.40 GHz (Haswell) |
RAM |
| 128GB ECC Memory |
Disk |
| Two 120 GB 6Gbps SATA SSDs |
Disk |
| 45 6 TB 7.2K RPM 6Gbps SATA HDDs |
NIC |
| Dual-port Intel 10Gbe NIC (X520) |
There are three networks at the Clemson site:
A 1 Gbps Ethernet “control network”—
this network is used for remote access, experiment management, etc., and is connected to the public Internet. When you log in to nodes in your experiment using ssh, this is the network you are using. You should not use this network as part of the experiments you run in CloudLab. A 10 Gbps Ethernet “experiment network”–each node has one interface on this network. This network is implemented using three Force10 S6000 and three Force10 Z9100 switches. Each S6000 switch is connected to a companion Z9100 switch via a 480Gbps link aggregate.
A 40 Gbps QDR Infiniband “experiment network”–each node has one connection to this network, which is implemented using a large Mellanox chassis switch with full bisection bandwidth.
Phase two added 18 Dell C6420 chassis each with four dual-socket Skylake-based servers. Each of the 72 servers has 32 cores for a total of 2304 cores.
c6420 |
| 72 nodes (Intel Skylake, 32 core, 2 disk) |
CPU |
| Two Sixteen-core Intel Xeon Gold 6142 CPUs at 2.6 GHz |
RAM |
| 384GB ECC DDR4-2666 Memory |
Disk |
| Two Seagate 1TB 7200 RPM 6G SATA HDs |
NIC |
| Dual-port Intel X710 10Gbe NIC |
Each server is connected via a 1Gbps control link (Dell D3048 switches) and a 10Gbps experimental link (Dell S5048 switches).
These Phase II machines do not include Infiniband.
11.4 Apt Cluster
The main Apt cluster is housed in the University of Utah’s Downtown Data Center in Salt Lake City, Utah. It contains two classes of nodes:
r320 |
| 128 nodes (Sandy Bridge, 8 cores) |
CPU |
| 1x Xeon E5-2450 processor (8 cores, 2.1Ghz) |
RAM |
| 16GB Memory (4 x 2GB RDIMMs, 1.6Ghz) |
Disks |
| 4 x 500GB 7.2K SATA Drives (RAID5) |
NIC |
| 1GbE Dual port embedded NIC (Broadcom) |
NIC |
| 1 x Mellanox MX354A Dual port FDR CX3 adapter w/1 x QSA adapter |
c6220 |
| 64 nodes (Ivy Bridge, 16 cores) |
CPU |
| 2 x Xeon E5-2650v2 processors (8 cores each, 2.6Ghz) |
RAM |
| 64GB Memory (8 x 8GB DDR-3 RDIMMs, 1.86Ghz) |
Disks |
| 2 x 1TB SATA 3.5” 7.2K rpm hard drives |
NIC |
| 4 x 1GbE embedded Ethernet Ports (Broadcom) |
NIC |
| 1 x Intel X520 PCIe Dual port 10Gb Ethernet NIC |
NIC |
| 1 x Mellanox FDR CX3 Single port mezz card |
All nodes are connected to three networks with one interface each:
A 1 Gbps Ethernet “control network”—
this network is used for remote access, experiment management, etc., and is connected to the public Internet. When you log in to nodes in your experiment using ssh, this is the network you are using. You should not use this network as part of the experiments you run in Apt. A “flexible fabric” that can run up to 56 Gbps and runs either FDR Infiniband or Ethernet. This fabric uses NICs and switches with Mellanox’s VPI technology. This means that we can, on demand, configure each port to be either FDR Inifiniband or 40 Gbps (or even non-standard 56 Gbps) Ethernet. This fabric consists of seven edge switches (Mellanox SX6036G) with 28 connected nodes each. There are two core switches (also SX6036G), and each edge switch connects to both cores with a 3.5:1 blocking factor. This fabric is ideal if you need very low latency, Infiniband, or a few, high-bandwidth Ethernet links.
A 10 Gbps Ethernet “commodity fabric”. One the r320 nodes, a port on the Mellanox NIC (permanently set to Ethernet mode) is used to connect to this fabric; on the c6220 nodes, a dedicated Intel 10 Gbps NIC is used. This fabric is built from two Dell Z9000 switches, each of which has 96 nodes connected to it. It is idea for creating large LANs: each of the two switches has full bisection bandwidth for its 96 ports, and there is a 3.5:1 blocking factor between the two switches.
11.5 IG-DDC Cluster
This cluster is not owned by CloudLab, but is federated and available to CloudLab users.
This small cluster is an InstaGENI Rack housed in the University of Utah’s Downtown Data Center. It has nodes of only a single type:
dl360 |
| 33 nodes (Sandy Bridge, 16 cores) |
CPU |
| 2x Xeon E5-2450 processors (8 cores each, 2.1Ghz) |
RAM |
| 48GB Memory (6 x 8GB RDIMMs, 1.6Ghz) |
Disk |
| 1 x 1TB 7.2K SATA Drive |
NIC |
| 1GbE 4-port embedded NIC |
It has two network fabrics:
A 1 Gbps “control network”. This is used for remote access, and should not be used for experiments.
A 1 Gbps “experiment network”. Each node has three interfaces on this network, which is built from a single HP Procurve 5406 switch. OpenFlow is available on this network.
12 Planned Features
This chapter describes features that are planned for CloudLab or under development: please contact us if you have any feedback or suggestions!
12.1 Improved Physical Resource Descriptions
As part of the process of reserving resources on CloudLab, a type of RSpec called a manifest is created. The manifest gives a detailed description of the hardware allocated, including the specifics of network topology. Currently, CloudLab does not directly export this information to the user. In the interest of improving transparency and repeatable research, CloudLab will develop interfaces to expose, explore, and export this information.
13 CloudLab OpenStack Tutorial
This tutorial will walk you through the process of creating a small cloud on CloudLab using OpenStack. Your copy of OpenStack will run on bare-metal machines that are dedicated for your use for the duration of your experiment. You will have complete administrative access to these machines, meaning that you have full ability to customize and/or configure your installation of OpenStack.
13.1 Objectives
In the process of taking this tutorial, you will learn to:
Log in to CloudLab
Create your own cloud by using a pre-defined profile
Access resources in a cloud that you create
Use administrative access to customize your cloud
Clean up your cloud when finished
Learn where to get more information
13.2 Prerequisites
This tutorial assumes that:
- You have an existing account on either:
CloudLab (Instructions for getting an account can be found here.)
The GENI portal. (Instructions for getting an account can be found here.)
14.4 Logging In
The first step is to log in to CloudLab. CloudLab is available to all researchers and educators who work in cloud computing. If you have an account at one of its federated facilities, like Emulab or GENI, then you already have an account at CloudLab.
14.4.1 Using a CloudLab Account
If you have signed up for an account at the CloudLab website, simply open https://www.cloudlab.us/ in your browser, click the “Log In” button, enter your username and password, and skip to the next part of the tutorial.
14.4.2 Using a GENI Account
If you have an account at the GENI portal, you will use the single-sign-on features of the CloudLab and GENI portal.
- Open the web interfaceTo log in to CloudLab using a GENI account, start by visiting https://www.cloudlab.us/ in your browser and using the “Log In” button in the upper right.
- Click the "GENI User" buttonOn the login page that appears, you will use the “GENI User” button. This will start the GENI authorization tool, which lets you connect your GENI account with CloudLab.
- Select the GENI portalYou will be asked which facility your account is with. Select the GENI icon, which will take you to the GENI portal. There are several other facilities that are federated with CloudLab, which can be found in the drop-down list.
- Log into the GENI portalYou will be taken to the GENI portal, and will need to select the institution that is your identity provider. Usually, this is the university you are affiliated with. If your university is not in the list, you may need to log in using the “GENI Project Office” identity provider. If you have ever logged in to the GENI portal before, your identity provider should be pre-selected; if you are not familiar with this login page, there is a good chance that you don’t have a GENI account and need to apply for one.
- Log into your institutionThis will bring you to your institution’s standard login page, which you likely use to access many on-campus services. (Yours will look different from this one.)
- Authorize CloudLab to use your GENI account
What’s happening in this step is that your browser uses your GENI user certificate (which it obtained from the GENI portal) to cryptographically sign a “speaks-for” credential that authorizes the CloudLab portal to make GENI API calls on your behalf.
Click the “Authorize” button: this will create a signed statement authorizing the CloudLab portal to speak on your behalf. This authorization is time-limited (see “Show Advanced” for the details), and all actions the CloudLab portal takes on your behalf are clearly traceable.If you’d like, you can click “Remember This Decision” to skip this step in the future (you will still be asked to log in to the GENI portal). - Set up an SSH keypair in the GENI portalThough not strictly required, many parts of this tutorial will work better if you have an ssh public key associated with your GENI Portal account. To upload one, visit portal.geni.net and use the “SSH Keys” item on the menu that drops down under your name.If you are not familiar with ssh keys, you may skip this step. You may find GitHub’s ssh key page helpful in learning how to generate and use them.
When you create a new experiment, CloudLab grabs the set of ssh keys you have set up at the time; any ssh keys that you add later are not added to experiments that are already running.
13.4 Building Your Own OpenStack Cloud
Once you have logged in to CloudLab, you will “instantiate” a “profile” to create an experiment. (An experiment in CloudLab is similar to a “slice” in GENI.) Profiles are CloudLab’s way of packaging up configurations and experiments so that they can be shared with others. Each experiment is separate: the experiment that you create for this tutorial will be an instance of a profile provided by the facility, but running on resources that are dedicated to you, which you have complete control over. This profile uses local disk space on the nodes, so anything you store there will be lost when the experiment terminates.
The OpenStack cloud we will build in this tutorial is very small, but CloudLab has additional hardware that can be used for larger-scale experiments.
For this tutorial, we will use a basic profile that brings up a small OpenStack cloud. The CloudLab staff have built this profile by capturing disk images of a partially-completed OpenStack installation and scripting the remainder of the install (customizing it for the specific machines that will get allocated, the user that created it, etc.) See this manual’s section on profiles for more information about how they work.
- Start ExperimentAfter logging in, you are taken to your main status dashboard. Select “Start Experiment” from the “Experiments” menu.
- Select a profileThe “Start an Experiment” page is where you will select a profile to instantiate. We will use the OpenStack profile; if it is not selected, follow this link or click the “Change Profile” button, and select “OpenStack” from the list on the left.Once you have the correct profile selected, click “Next”
- Set parametersProfiles in CloudLab can have parameters that affect how they are configured; for example, this profile has parameters that allow you to set the size of your cloud, spread it across multiple clusters, and turn on and off many OpenStack options.For this tutorial, we will leave all parameters at their defaults and just click “next”.
- Select a clusterCloudLab has multiple clusters available to it. Some profiles can run on any cluster, some can only run on specific ones due to specific hardware constraints, etc.Note: If you are at an in-person tutorial, the instructor will tell you which cluster to select. Otherwise, you may select any cluster.
The dropdown menu for the clusters shows you both the health (outer ring) and available resources (inner dot) of each cluster. The “Check Cluster Status” link opens a page (in a new tab) showing the current utilization of all CloudLab clusters.
- Click Finish!When you click the “finish” button, CloudLab will start provisioning the resources that you requested on the cluster that you selected.
You may optionally give your experiment a name—
this can be useful if you have many experiments running at once. - CloudLab instantiates your profileCloudLab will take a few minutes to bring up your copy of OpenStack, as many things happen at this stage, including selecting suitable hardware, loading disk images on local storage, booting bare-metal machines, re-configuring the network topology, etc. While this is happening, you will see this status page:
Provisioning is done using the GENI APIs; it is possible for advanced users to bypass the CloudLab portal and call these provisioning APIs from their own code. A good way to do this is to use the geni-lib library for Python.
As soon as a set of resources have been assigned to you, you will see details about them at the bottom of the page (though you will not be able to log in until they have gone through the process of imaging and booting.) While you are waiting for your resources to become available, you may want to have a look at the CloudLab user manual, or use the “Sliver” button to watch the logs of the resources (“slivers”) being provisioned and booting. - Your cloud is ready!When the web interface reports the state as “Booted”, your cloud is provisioned, and you can proceed to the next section.Important: A “Booted” status indicates that resources are provisioned and booted; this particular profile runs scripts to complete the OpenStack setup, and it will be a few more minutes before OpenStack is fully ready to log in and create virtual machine instances. You will be able to tell that this has finished when the status changes from “Booted” to “Ready”. For now, don’t attempt to log in to OpenStack, we will explore the CloudLab experiment first.
13.5 Exploring Your Experiment
Now that your experiment is ready, take a few minutes to look at various parts of the CloudLab status page to help you understand what resources you’ve got and what you can do with them.
13.5.1 Experiment Status
The panel at the top of the page shows the status of your experiment—
Note that the default lifetime for experiments on CloudLab is less than a day; after this time, the resources will be reclaimed and their disk contents will be lost. If you need to use them for longer, you can use the “Extend” button and provide a description of why they are needed. Longer extensions require higher levels of approval from CloudLab staff. You might also consider creating a profile of your own if you might need to run a customized environment multiple times or want to share it with others.
You can click the title of the panel to expand or collapse it.
13.5.2 Profile Instructions
Profiles may contain written instructions for their use. Clicking on the title
of the “Profile Instructions” panel will expand (or collapse) it; in this
case, the instructions provide a link to the administrative interface of
OpenStack, and give you passwords to use to log in. (Don’t log into OpenStack
yet—
13.5.3 Topology View
At the bottom of the page, you can see the topology of your experiment. This profile has three nodes connected by a LAN, which is represented by a gray box in the middle of the topology. The names given for each node are the names assigned as part of the profile; this way, every time you instantiate a profile, you can refer to the nodes using the same names, regardless of which physical hardware was assigned to them. The green boxes around each node indicate that they are up; click the “Refresh Status” button to initiate a fresh check.
If an experiment has “startup services” (programs that run at the beginning of the experiment to set it up), their status is indicated by a small icon in the upper right corner of the node. You can mouse over this icon to see a description of the current status. In this profile, the startup services on the compute node(s) typically complete quickly, but the control node may take much longer.
It is important to note that every node in CloudLab has at least two network interfaces: one “control network” that carries public IP connectivity, and one “experiment network” that is isolated from the Internet and all other experiments. It is the experiment net that is shown in this topology view. You will use the control network to ssh into your nodes, interact with their web interfaces, etc. This separation gives you more freedom and control in the private experiment network, and sets up a clean environment for repeatable research.
13.5.4 List View
The list view tab shows similar information to the topology view, but in a different format. It shows the identities of the nodes you have been assigned, and the full ssh command lines to connect to them. In some browsers (those that support the ssh:// URL scheme), you can click on the SSH commands to automatically open a new session. On others, you may need to cut and paste this command into a terminal window. Note that only public-key authentication is supported, and you must have set up an ssh keypair on your account before starting the experiment in order for authentication to work.
13.5.5 Manifest View
The third default tab shows a manifest detailing the hardware that has been assigned to you. This is the “request” RSpec that is used to define the profile, annotated with details of the hardware that was chosen to instantiate your request. This information is available on the nodes themselves using the geni-get command, enabling you to do rich scripting that is fully aware of both the requested topology and assigned resources.
Most of the information displayed on the CloudLab status page comes directly from this manifest; it is parsed and laid out in-browser.
13.5.6 Graphs View
The final default tab shows a page of CPU load and network traffic graphs for the nodes in your experiment. On a freshly-created experiment, it may take several minutes for the first data to appear. After clicking on the “Graphs” tab the first time, a small reload icon will appear on the tab, which you can click to refresh the data and regenerate the graphs. For instance, here is the load average graph for an OpenStack experiment running this profile for over 6 hours. Scroll past this screenshot to see the control and experiment network traffic graphs. In your experiment, you’ll want to wait 20-30 minutes before expecting to see anything interesting.
Here are the control network and experiment network packet graphs at the same time. The spikes at the beginning are produced by OpenStack setup and configuration, as well as the simple OpenStack tasks you’ll perform later in this profile, like adding a VM.
13.5.7 Actions
In both the topology and list views, you have access to several actions that you may take on individual nodes. In the topology view, click on the node to access this menu; in the list view, it is accessed through the icon in the “Actions” column. Available actions include rebooting (power cycling) a node, and re-loading it with a fresh copy of its disk image (destroying all data on the node). While nodes are in the process of rebooting or re-imaging, they will turn yellow in the topology view. When they have completed, they will become green again. The shell and console actions are described in more detail below.
13.5.8 Web-based Shell
CloudLab provides a browser-based shell for logging into your nodes, which is accessed through the action menu described above. While this shell is functional, it is most suited to light, quick tasks; if you are going to do serious work, on your nodes, we recommend using a standard terminal and ssh program.
This shell can be used even if you did not establish an ssh keypair with your account.
Two things of note:
Your browser may require you to click in the shell window before it gets focus.
Depending on your operating system and browser, cutting and pasting into the window may not work. If keyboard-based pasting does not work, try right-clicking to paste.
13.5.9 Serial Console
CloudLab provides serial console access for all nodes, which can be used in the event that normal IP or ssh access gets intentionally or unintentionally broken. Like the browser-based shell, it is launched through the access menu, and the same caveats listed above apply as well. In addition:
If you look at the console for a node that is already booted, the console may be blank due to a lack of activity; press enter several times to get a fresh login prompt.
If you need to log in, do so as root; your normal user account does not allow password login. There is a link above the console window that reveals the randomly-generated root password for your node. Note that this password changes frequently for security reasons.
13.6 Bringing up Instances in OpenStack
Now that you have your own copy of OpenStack running, you can use it just like you would any other OpenStack cloud, with the important property that you have full root access to every machine in the cloud and can modify them however you’d like. In this part of the tutorial, we’ll go through the process of bringing up a new VM instance in your cloud.
We’ll be doing all of the work in this section using the Horizon web GUI for OpenStack, but you could also ssh into the machines directly and use the command line interfaces or other APIs as well.
- Check to see if OpenStack is ready to log inAs mentioned earlier, this profile runs several scripts to complete the installation of OpenStack. These scripts do things such as finalize package installation, customize the installation for the specific set of hardware assigned to your experiment, import cloud images, and bring up the hypervisors on the compute node(s).If exploring the CloudLab experiment took you more than ten minutes, these scripts are probably done. You can be sure by checking that all nodes have completed their startup scripts (indicated by a checkmark on the Topology view); when this happens, the experiment state will also change from “Booting” to “Ready”If you continue without verifying that the setup scripts are complete, be aware that you may see temporary errors from the OpenStack web interface. These errors, and the method for dealing with them, are generally noted in the text below.
- Visit the OpenStack Horizon web interfaceOn the status page for your experiment, in the “Instructions” panel (click to expand it if it’s collapsed), you’ll find a link to the web interface running on the ctl node. Open this link (we recommend opening it in a new tab, since you will still need information from the CloudLab web interface).
- Log in to the OpenStack web interfaceLog in using the username admin and the password shown in the instructions for the profile. Use the domain name default if prompted for a domain.
This profile generates a new (random) password for every experiment.
Important: if the web interface rejects your password, wait a few minutes and try again. If it gives you another type of error, you may need to wait a minute and reload the page to get login working properly. - Launch a new VM instanceClick the “Launch Instance” button on the “Instances” page of the web interface.
- Set Basic Settings For the InstanceThere a few settings you will need to make in order to launch your instance. These instructions are for the launch wizard in the OpenStack Mitaka, but the different release wizards are all similar in function and required information. Use the tabs in the column on the left of the launch wizard to add the required information (i.e., Details, Source, Flavor, Networks, and Key Pair), as shown in the following screenshots:
Pick any “Instance Name” you wish
For the “Image Name”, select “trusty-server”
For the “Instance Boot Source”, select “Boot from image”
Important: If you do not see any available images, the image import script may not have finished yet; wait a few minutes, reload the page, and try again.Set the “Flavor” to m1.small—
the disk for the default m1.tiny instance is too small for the image we will be using, and since we have only one compute node, we want to avoid using up too many of its resources.
- Add a Network to the InstanceIn order to be able to access your instance, you will need to give it a network. On the “Networking” tab, add the tun0-net to the list of selected networks by clicking the “+” button or dragging it up to the blue region. This will set up an internal tunneled connection within your cloud; later, we will assign a public IP address to it.
The tun0-net consists of EGRE tunnels on the CloudLab experiment network.
Important: If you are missing the Networking tab, you may have logged into the OpenStack web interface before all services were started. Unfortunately, reloading does not solve this, you will need to log out and back in. - Set an SSH KeypairOn the “Key Pair” tab (or “Access & Security” in previous versions), you will add an ssh keypair to log into your node. If you configured an ssh key in your GENI account, you should find it as one of the options in this list. You can filter the list by typing your CloudLab username, or a portion of it, into the filter box to reduce the size of the list. (By default, we load in public keys for all users in the project in which you created your experiment, for convenience – thus the list can be long.) If you don’t see your keypair, you can add a new one with the red button to the right of the list. Alternately, you can skip this step and use a password for login later.
- Launch, and Wait For Your Instance to BootClick the “Launch” button on the “Key Pair” wizard page, and wait for the status on the instances page to show up as “Active”.
- Add a Public IP AddressAt this point, your instance is up, but it has no connection to the public Internet. From the menu on the right, select “Associate Floating IP”.
Profiles may request to have public IP addresses allocated to them; this profile requests four (two of which are used by OpenStack itself.)
On the following screen, you will need to:Press the red button to set up a new IP address
Click the “Allocate IP” button—
this allocates a new address for you from the public pool available to your cloud. Click the “Associate” button—
this associates the newly-allocated address with this instance.
The public address is tunneled by the ctl (controller) node from the control network to the experiment network. (In older OpenStack profile versions, or depending the profile parameters specified to the current profile version, the public address may instead be tunneled by the nm (network manager) node, in a split controller/network manager OpenStack deployment.)
You will now see your instance’s public address on the “Instances” page, and should be able to ping this address from your laptop or desktop. - Log in to Your InstanceYou can ssh to this IP address. Use the username ubuntu; if you provided a public key earlier, use your private ssh key to connect. If you did not set up an ssh key, use the same password you used to log in to the OpenStack web interface (shown in the profile instructions.) Run a few commands to check out the VM you have launched.
13.7 Administering OpenStack
Now that you’ve done some basic tasks in OpenStack, we’ll do a few things
that you would not be able to do as a user of someone else’s OpenStack
installation. These just scratch the surface—
13.7.1 Log Into The Control Nodes
If you want to watch what’s going on with your copy of OpenStack, you can use ssh to log into any of the hosts as described above in the List View or Web-based Shell sections. Don’t be shy about viewing or modifying things; no one else is using your cloud, and if you break this one, you can easily get another.
Some things to try:
Run ps -ef on the ctl to see the list of OpenStack services running
Run ifconfig on the ctl node (or the nm node, if your experiment has one), to see the various bridges and tunnels that have been brought to support the networking in your cloud
Run sudo virsh list --all on cp1 to see the VMs that are running
13.7.2 Reboot the Compute Node
Since you have this cloud to yourself, you can do things like reboot or re-image nodes whenever you wish. We’ll reboot the cp1 (compute) node and watch it through the serial console.
- Open the serial console for cp1On your experiment’s CloudLab status page, use the action menu as described in Actions to launch the console. Remember that you may have to click to focus the console window and hit enter a few times to see activity on the console.
- Reboot the nodeOn your experiment’s CloudLab status page, use the action menu as described in Actions to reboot cp1. Note that in the topology display, the box around cp1 will turn yellow while it is rebooting, then green again when it has booted.
- Watch the node boot on the consoleSwitch back to the console tab you opened earlier, and you should see the node starting its reboot process.
- Check the node status in OpenStackYou can also watch the node’s status from the OpenStack Horizon web interface. In Horizon, select “Hypervisors” under the “System” menu, and switch to the “Compute Host” tab.Note: This display can take a few minutes to notice that the node has gone down, and to notice when it comes back up. Your instances will not come back up automatically; you can bring them up with a “Hard Reboot” from the “Admin -> System ->Instances” page.
13.8 Terminating the Experiment
Resources that you hold in CloudLab are real, physical machines and are therefore limited and in high demand. When you are done, you should release them for use by other experimenters. Do this via the “Terminate” button on the CloudLab experiment status page.
Note: When you terminate an experiment, all data on the nodes is lost, so make sure to copy off any data you may need before terminating.
If you were doing a real experiment, you might need to hold onto the nodes for longer than the default expiration time. You would request more time by using the “Extend” button the on the status page. You would need to provide a written justification for holding onto your resources for a longer period of time.
13.9 Taking Next Steps
Now that you’ve got a feel for for what CloudLab can do, there are several things you might try next:
Create a new project to continue working on CloudLab past this tutorial
Try out some profiles other than OpenStack (use the “Change Profile” button on the “Start Experiment” page)
Read more about the basic concepts in CloudLab
Try out different hardware
Learn how to make your own profiles
14 CloudLab Chef Tutorial
This tutorial will walk you through the process of creating and using an instance of the Chef configuration management system on CloudLab.
Chef is both the name of a company and the name of a popular modern configuration management system written in Ruby and Erlang. A large variety of tutorials, articles, technical docs and training opportunities is available at the Chef official website. Also, refer to the Customer Stories page to see how Chef is used in production environments, including very large installations (e.g., at Facebook, Bloomberg, and Yahoo!).
14.1 Objectives
In the process of taking this tutorial, you will learn to:
Create your own instance of Chef using a pre-defined profile
Explore profile parameters allowing to customize components of your Chef deployments
Access monitoring and management capabilities provided by Chef
- Use Chef to perform two exercises:
Install and configure NFS, the Network File System, on experiment nodes
Install and configure an instance of the Apache web server, as well as run a benchmark against it
Terminate your experiment
Learn where to get more information
14.2 Motivation
This tutorial will demonstrate how experiments can be managed on CloudLab, as well as show how experiment resources can be administered using Chef. By following the instructions provided below, you will learn how to take advantage of the powerful features of Chef for configuring multi-node software environments. The exercises included in this tutorial are built around simple but realistic configurations. In the process of recreating these configurations on nodes running default images, you will explore individual components of Chef and follow the configuration management workflow applicable to more complex configurations and experiments.
14.3 Prerequisites
This tutorial assumes that:
- You have an existing account on either:
CloudLab (Instructions for getting an account can be found here.)
The GENI portal. (Instructions for getting an account can be found here.)
14.4 Logging In
The first step is to log in to CloudLab. CloudLab is available to all researchers and educators who work in cloud computing. If you have an account at one of its federated facilities, like Emulab or GENI, then you already have an account at CloudLab.
14.4.1 Using a CloudLab Account
If you have signed up for an account at the CloudLab website, simply open https://www.cloudlab.us/ in your browser, click the “Log In” button, enter your username and password, and skip to the next part of the tutorial.
14.4.2 Using a GENI Account
If you have an account at the GENI portal, you will use the single-sign-on features of the CloudLab and GENI portal.
- Open the web interfaceTo log in to CloudLab using a GENI account, start by visiting https://www.cloudlab.us/ in your browser and using the “Log In” button in the upper right.
- Click the "GENI User" buttonOn the login page that appears, you will use the “GENI User” button. This will start the GENI authorization tool, which lets you connect your GENI account with CloudLab.
- Select the GENI portalYou will be asked which facility your account is with. Select the GENI icon, which will take you to the GENI portal. There are several other facilities that are federated with CloudLab, which can be found in the drop-down list.
- Log into the GENI portalYou will be taken to the GENI portal, and will need to select the institution that is your identity provider. Usually, this is the university you are affiliated with. If your university is not in the list, you may need to log in using the “GENI Project Office” identity provider. If you have ever logged in to the GENI portal before, your identity provider should be pre-selected; if you are not familiar with this login page, there is a good chance that you don’t have a GENI account and need to apply for one.
- Log into your institutionThis will bring you to your institution’s standard login page, which you likely use to access many on-campus services. (Yours will look different from this one.)
- Authorize CloudLab to use your GENI account
What’s happening in this step is that your browser uses your GENI user certificate (which it obtained from the GENI portal) to cryptographically sign a “speaks-for” credential that authorizes the CloudLab portal to make GENI API calls on your behalf.
Click the “Authorize” button: this will create a signed statement authorizing the CloudLab portal to speak on your behalf. This authorization is time-limited (see “Show Advanced” for the details), and all actions the CloudLab portal takes on your behalf are clearly traceable.If you’d like, you can click “Remember This Decision” to skip this step in the future (you will still be asked to log in to the GENI portal). - Set up an SSH keypair in the GENI portalThough not strictly required, many parts of this tutorial will work better if you have an ssh public key associated with your GENI Portal account. To upload one, visit portal.geni.net and use the “SSH Keys” item on the menu that drops down under your name.If you are not familiar with ssh keys, you may skip this step. You may find GitHub’s ssh key page helpful in learning how to generate and use them.
When you create a new experiment, CloudLab grabs the set of ssh keys you have set up at the time; any ssh keys that you add later are not added to experiments that are already running.
14.5 Launching Chef Experiments
Once you have logged in to CloudLab, you will “instantiate” a “profile” to create an experiment. (An experiment in CloudLab is similar to a “slice” in GENI.) Profiles are CloudLab’s way of packaging up configurations and experiments so that they can be shared with others. Each experiment is separate: the experiment that you create for this tutorial will be an instance of a profile provided by the facility, but running on resources that are dedicated to you, which you have complete control over. This profile uses local disk space on the nodes, so anything you store there will be lost when the experiment terminates.
The Chef cluster we are building in this tutorial is very small, but CloudLab has large clusters that can be used for larger-scale experiments.
For this tutorial, we will use a profile that launches a Chef cluster —
See this manual’s section on profiles for more information about how they work.
- Start ExperimentAfter logging in, you are taken to your main status dashboard. Select “Start Experiment” from the “Experiments” menu.
- Select a profileBy default, the “Start an Experiment” page suggests launching the OpenStack profile which is discribed in detail in the OpenStack tutorial.Go to the list of available profile by clicking “Change Profile”:Find the profile by typing ChefCluster in the search bar. Then, select the profile with the specified name in the list displayed below the search bar. A 2-node preview should now be shown along with high-level profile information. Click “Select Profile” at the bottom of the page:After you select the correct profile, click “Next”:
- Set parametersProfiles in CloudLab can have parameters that affect how they are configured; for example, this profile has parameters that allow you to set the number of client nodes, specify the repository with the infrastructure code we plan to use, obtain copies of the application-specific infrastructure code developed by the global community of Chef developers, etc.For this tutorial, we will leave all parameters at their defaults and just click “Next”.
- Choose experiment nameYou may optionally give your experiment a meaningful name, e.g., “chefdemo”. This is useful if you have many experiments running at once.
- Select a clusterCloudLab has multiple clusters available to it. Some profiles can run on any cluster, some can only run on specific ones due to specific hardware constraints. ChefCluster can only run on the x86-based clusters. This excludes the CloudLab Utah cluster which is built on ARMv8 nodes. Refer to the Hardware section for more information.Note: If you are at an in-person tutorial, the instructor will tell you which cluster to select. Otherwise, you may select any compatible and available cluster.
The dropdown menu for the clusters shows you both the health (outer ring) and available resources (inner dot) of each cluster. The “Check Cluster Status” link opens a page (in a new tab) showing the current utilization of all CloudLab clusters.
- Click Finish!When you click the “Finish” button, CloudLab will start provisioning the resources that you requested on the cluster that you selected.
- @(tb) instantiates your profileCloudLab will take a few minutes to bring up your experiment, as many things happen at this stage, including selecting suitable hardware, loading disk images on local storage, booting bare-metal machines, re-configuring the network topology, etc. While this is happening, you will see the topology with yellow node icons:
Provisioning is done using the GENI APIs; it is possible for advanced users to bypass the CloudLab portal and call these provisioning APIs from their own code. A good way to do this is to use the geni-lib library for Python.
As soon as a set of resources have been assigned to you, you will see details about them if you switch to the "List View" tab (though you will not be able to log in until they have gone through the process of imaging and booting.) While you are waiting for your resources to become available, you may want to have a look at the CloudLab user manual, or use the “Sliver” button to watch the logs of the resources (“slivers”) being provisioned and booting. - Your resources are ready!Shortly, the web interface will report the state as “Booted”.Important: A “Booted” status indicates that resources are provisioned and booted; this particular profile runs scripts to complete the Chef setup, and it will be a few more minutes before Chef becomes fully functional. You will be able to tell that the configuration process has finished when the status changes from “Booted” to “Ready”. Until then, you will likely see that the startup scripts (i.e. programs that run at the beginning of the experiment to set it up) run longer on head than on node-0 —
a lot more work is required to install the Chef server than the client. In the Topology View tab, mouse over the circle in the head’s icon, to confirm the current state of the node.
14.6 Exploring Your Experiment
While the startup scripts are still running, you will have a few minutes to look at various parts of the CloudLab experiment page and learn what resources you now have access to and what you can do with them.
14.6.1 Experiment Status
The panel at the top of the page shows the status of your experiment —
Note that the default lifetime for experiments on CloudLab is less than a day; after this time, the resources will be reclaimed and their disk contents will be lost. If you need to use them for longer, you can use the “Extend” button and provide a description of why they are needed. Longer extensions require higher levels of approval from CloudLab staff. You might also consider creating a profile of your own if you might need to run a customized environment multiple times or want to share it with others.
You can click on the title of the panel to expand or collapse it.
14.6.2 Profile Instructions
Profiles may contain written instructions for their use. Clicking on the title
of the “Profile Instructions” panel will expand or collapse it. In this
case, the instructions provide a link to the Chef server web console.
(Don’t click on the link yet —
Also, notice the list of private and public hostnames of the experiment nodes included in the instructions. You will use the public hostnames (shown in bold) in the exercise with the Apache web server and apache benchmark.
14.6.3 Topology View
We have already used the topology viewer to see the node status; let’s take a closer look at the topology of this experiment. This profile has two nodes connected by a 10 Gigabit LAN, which is represented by a gray box in the middle of the topology. The names given for each node are the names assigned as part of the profile; this way, every time you instantiate a profile, you can refer to the nodes using the same names, regardless of which physical hardware was assigned to them. The green boxes around each node indicate that they are up; click the “Refresh Status” button to initiate a fresh check.
You can also run several networking tests by clicking the “Run Linktest” button at the bottom of the page. The available tests include:
Level 1 —
Connectivity and Latency Level 2 —
Plus Static Routing Level 3 —
Plus Loss Level 4 —
Plus Bandwidth Higher levels will take longer to complete and require patience.
If an experiment has startup services, their statuses are indicated by small icons in the upper right corners of the node icons. You can mouse over this icon to see a description of the current status. In this profile, the startup services on the client node(s), such as node-0, typically complete quickly, but the scripts on head take much longer. The screenshot above shows the state of the experiment when all startup scripts complete.
It is important to note that every node in CloudLab has at least two network interfaces: one “control network” that carries public IP connectivity, and one “experiment network” that is isolated from the Internet and all other experiments. It is the experiment net that is shown in this topology view. You will use the control network to ssh into your nodes, interact with their web interfaces, etc. This separation gives you more freedom and control in the private experiment network, and sets up a clean environment for repeatable research.
14.6.4 List View
The list view tab shows similar information to the topology view, but in a different format. It shows the identities of the nodes you have been assigned, and the full ssh command lines to connect to them. In some browsers (those that support the ssh:// URL scheme), you can click on the SSH commands to automatically open a new session. On others, you may need to cut and paste this command into a terminal window. Note that only public-key authentication is supported, and you must have set up an ssh keypair on your account before starting the experiment in order for authentication to work.
14.6.5 Manifest View
The final default tab shows a manifest detailing the hardware that has been assigned to you. This is the “request” RSpec that is used to define the profile, annotated with details of the hardware that was chosen to instantiate your request. This information is available on the nodes themselves using the geni-get command, enabling you to do rich scripting that is fully aware of both the requested topology and assigned resources.
Most of the information displayed on the CloudLab status page comes directly from this manifest; it is parsed and laid out in-browser.
14.6.6 Actions
In both the topology and list views, you have access to several actions that you may take on individual nodes. In the topology view, click on the node to access this menu; in the list view, it is accessed through the icon in the “Actions” column. Available actions include rebooting (power cycling) a node, and re-loading it with a fresh copy of its disk image (destroying all data on the node). While nodes are in the process of rebooting or re-imaging, they will turn yellow in the topology view. When they have completed, they will become green again. The shell action is described in more detail below and will be used as the main method for issuing commands on the experiment nodes throughout this tutorial.
14.7 Brief Introduction to Chef
While the startup scripts are running, let’s take a quick look at the architecture of a typical Chef intallation. The diagram provided at the official Overview of Chef page demonstrates the relationships between individual Chef components. While there is a number of optional, advanced components that we don’t use in this tutorial, it is worth noting the following:
Chef clients can run on a variety of resources: virtual and physical servers, cloud instances, storage and networking devices, containers, etc. All clients connect and listen to the central, infrastructure-wide Chef server.
The Chef server has a web interface. The server manages cookbooks, which are fundamental units of configuration management that define configurations and contain everything that is required to instantiate those configurations. Additionally, the server manages run lists, which can be viewed as mappings between collections of cookbooks and the clients on which they need to execute.
A workstation represents a machine from which a Chef administrator connects to and controls the server. It typically has a copy of chef-repo, the repository which contains cookbooks and other code artifacts. The administrator modifies cookbooks locally and submits them to the server to make them available to all clients.
Recipes, shown next to cookbooks on the workstation, are the "building blocks" from which cookbooks are assembled. Recipes can be viewed as individual scripts accomplishing specific fine-grained tasks, such as create a user, install a package, clone a repository, configure a network interface, etc. Cookbooks, which include one or more recipes, are responsible for "bigger" tasks: install and configure a database, install and run a web server, and so on.
Another type of Chef code artifacts (not shown in the diagram) is called role. A role can include one or more cookbooks and specific recipes (each cookbook and recipe can be used in several roles if necessary). Roles typically define complete node configurations; for instance, in a simple case, one role corresponds to a master node on a cluster with a set of specific services, while another role defines how the rest of the cluster nodes, so-called “worker” nodes, should be configured. After being created on the workstation, these roles are submitted to the server by the administrator and then assigned to specific nodes via the run lists.
In the experiment we have launched, head runs all three: the server, the workstation, and the client
(there is no conflict between these components), while node-0 runs only the client. If this profile is instantiated
with the arbitrary number N of clients, there will be N “client-only” nodes
named node-0,node-1,...,node-(N-1).
Another profile parameter allows choosing the repository from which chef-repo is cloned;
by default, it points to emulab/chef-repo.
The startup scrips in this profile also obtain and install copies of public cookbooks
hosted at the respository called Supermarket.
Specifically, the exercises in this tutorial rely on the nfs
and apache2 cookbooks —
14.8 Logging in to the Chef Web Console
As soon as the startup scripts complete, you will likely recieve an email confirming that Chef is installed and operational.
Dear User, |
|
Chef server and workstataion should now be |
installed on head.chefdemo.utahstud.emulab.net. |
|
To explore the web management console, copy |
this hostname and paste it into your browser. |
Installation log can be found at /var/log/init-chef.log |
on the server node. |
|
To authenticate, use the unique credentials |
saved in /root/.chefauth on the server node. |
|
Below are several Chef commands which detail the launched experiment: |
|
# chef -v |
Chef Development Kit Version: 0.7.0 |
chef-client version: 12.4.1 |
berks version: 3.2.4 |
kitchen version: 1.4.2 |
|
# knife cookbook list |
apache2 3.1.0 |
apt 2.9.2 |
emulab-apachebench 1.0.0 |
emulab-nfs 0.1.0 |
... |
nfs 2.2.6 |
|
# knife node list |
head |
node-0 |
|
# knife role list |
apache2 |
apachebench |
... |
nfs |
|
# knife status -r |
1 minute ago, head, [], ubuntu 14.04. |
0 minutes ago, node-0, [], ubuntu 14.04. |
|
Happy automation with Chef! |
In some cases, you will not be able to see this email —
When you receive this email or see in the topology viewer that the startup scripts completed, you can proceed to the following step.
14.8.1 Web-based Shell
CloudLab provides a browser-based shell for logging into your nodes, which is accessed through the action menu described above. While this shell is functional, it is most suited to light, quick tasks; if you are going to do serious work, on your nodes, we recommend using a standard terminal and ssh program.
This shell can be used even if you did not establish an ssh keypair with your account.
Three things of note:
Your browser may require you to click in the shell window before it gets focus.
Depending on your operating system and browser, cutting and pasting into the window may not work. If keyboard-based pasting does not work, try right-clicking to paste.
The recommended browsers are Chrome and Firefox. The web-based shell does not work reliably in Safari. It is not guaranteed that it works smoothly in other browsers.
Create a shell tab for head by choosing “Shell” in the Actions menu for head in either the topology or the list view. The new tab will be labeled “head”. It will allow typing shell commands and executing them on the node. All shell commands used throughout this tutorial should be issued using this shell tab.
While you are here, switch to the root user by typing:
sudo su - |
Your prompt, the string which is printed on every line before the cursor, should change to root@head. All commands in the rest of the tutorial should be executed under root.
14.8.2 Chef Web Console
Type the following to print Chef credentials:
cat /root/.chefauth |
You should see the unique login and password that have been generated by the startup scripts specifically for your instance of Chef:
Expand the “Profile Instructions” panel and click on the “Chef web console” link.
Warning: When your browser first points to this link, you will see a warning about using a self-signed SSL certificate. Using self-signed certificates is not a good option in production scenarios, but it is totally acceptable in this short-term experimentation environment. We will ignore this warning and proceed to using the Chef console. The sequence of steps for ignoring it depends on your browser. In Chrome, you will see a message saying "Your connection is not private". Click “Advanced” at the bottom of the page and choose “Proceed to <hostname> (unsafe)”. In Firefox, the process is slightly different: click “Advanced”, choose “Add Exception”, and click “Confirm Security Exception”.
When the login page is finally displayed, use the credentials you have printed in the shell:
If you see a console like the one above, with both head and node-0 listed on the “Nodes” tab, you have a working Chef cluster! You can now proceed to managing your nodes using Chef recipes, cookbooks, and roles.
In the rest of the tutorial, we demonstrate how you can use several pre-defined cookbooks and roles.
Development of cookbooks and roles is a subject of another tutorial (many such tutorials can be
found online). Our goal in the following sections is to walk you through
the process of using existing cookbooks and roles —
14.9 Configuring NFS
Now that you have a working instance of Chef where both head and node-0 can be controlled using Chef, let’s start modifying the software stacks on these nodes by installing NFS and exporting a directory from head to node-0.
- Modify the run list on headClick on the row for head in the node list (so the entire row is highlighted in orange), and then click "Edit" in the Run List panel in the bottom right corner of the page:
- Apply nfs role:In the popup window, find the role called nfs in the Available Roles list on the left, drag and drop it into the "Current Run List" field on the right. When this is done, click "Save Run List".
- Repeat the last two steps for node-0:
At this point, nfs role is assigned to both nodes, but nothing has executed yet. - Check the status from the shellBefore proceeding to applying these updates, let’s check the role assignment from the shell by typing:
knife status -r
The run lists for your nodes should be printed inside square brackets:The output of this command also conveniently shows when Chef applied changes to your nodes last (first column) and what operating systems run on the nodes (last column). - Trigger the updatesRun the assigned role on head by typing:
chef-client
As a part of the startup procedure, passwordless ssh connections are enabled between the nodes. Therefore, you can use ssh to run commands remotely, on the nodes other than head, from the same shell. Execute the same command on node-0 by typing:ssh node-0 chef-client
When nodes execute the “chef-client” command, they contact the server and request the cookbooks, recipes, and roles have been assigned to them in their run lists. The server responds with those artifacts, and the nodes execute them in the specified order. - Verify that NFS is workingAfter updating both nodes, you should have a working NFS configuration in which the /exp-share directory is exported from head and mounted on node-0.
The name of the NFS directory is one of the attributes that is set in the nfs role. To explore other attributes and see how this role is implemented, take a look at the /chef-repo/roles/nfs.rb file on head
The simplest way to test that this configuration is functioning properly is to create a file on one node and check that this file becomes available on the other node. Follow these commands to test it (the lines that start with the # sign are comments):# List files in the NFS directory /exp-share on head; should be empty
ls /exp-share/
# List the same directory remotely on node-0; also empty
ssh node-0 ls /exp-share/
# Create an empty file in this derectory locally
touch /exp-share/NFS-is-configured-by-Chef
# Find the same file on node-0
ssh node-0 ls /exp-share/
If you can see the created file on node-0, your NFS is working as expected. Now you can easily move files between your nodes using the /exp-share directory.
Summary: You have just installed and configured NFS by assigning a Chef role and running it on your nodes. You can create much more complex software environments by repeating these simple steps and installing more software components on your nodes. Additionally, installing NFS on a set of nodes is not a subject of a research experiment but rather an infrastructure prerequisite for many distributed systems. You can automate installation and configuration procedures in those systems using a system like Chef and save a lot of time when you need to periodically recreate them or create multiple instances of those systems.
14.9.1 Exploring The Structure
It is worth looking at how the role you have just used is implemented. Stored at /chef-repo/roles/nfs.rb on head, it should include the following code:
# |
# This role depends on the emulab-nfs cookbook; |
# emulab-nfs depends on the nfs cookbook |
# available at: https://supermarket.chef.io/cookbooks/nfs |
# Make sure it is installed; If it is not, try: knife cookbook site install nfs |
# |
name "nfs" |
description "Role applied to all NFS nodes - server and client" |
override_attributes( |
"nfs" => { |
"server" => "head", |
"dir" => "/exp-share", |
"export" => { |
"network" => "10.0.0.0/8", |
"writeable" => true |
} |
} |
) |
run_list [ "emulab-nfs" ] |
You can see a number of domain-specific Ruby attributes in this file. name and description are self-describing attributes. The override_attribute attribute allows you to control high-level configuration parameters, including (but not limited to) the name of the node running the NFS server, the directory being exported and mounted, the network in which this directory is shared, and the “write” permission in this directory (granted or not). The run_list attribute includes a single cookbook called emulab-nfs.
You are probably wondering now about how the same cookbook can perform different actions on different nodes. Indeed, the emulab-nfs cookbook has just installed NFS server on head and NFS client on node-0. You can take a look at the implementation of the default.rb, the default recipe in the emulab-nfs cookbook which gets called when the cookbook is executed. You should find the following code at /chef-repo/cookbooks/emulab-nfs/recipes/default.rb:
# |
# Cookbook Name:: emulab-nfs |
# Recipe:: default |
# |
|
if node["hostname"] == node["nfs"]["server"] |
include_recipe "emulab-nfs::export" |
else |
include_recipe "emulab-nfs::mount" |
end |
The if statement above allows comparing the hostname of the node on which the cookbook is running with one of the attributes specified in the role. Based on this comparison, the recipe takes the appropriate actions by calling the code from two other recipes in this cookbook. Optionally, you can explore /chef-repo/cookbooks/emulab-nfs/recipes/export.rb and /chef-repo/cookbooks/emulab-nfs/recipes/mount.rb to see how these recipes are implemented.
Obviously, this is not the only possible structure for this configuration. You can alternatively create two roles, such as nfs-server and nfs-client, that will call the corresponding recipes without the need for the described comparison. Since a single model cannot fit perfectly all scenarios, Chef provides the developer with enough flexibility to organize the code into structures that match specific environment and application needs.
14.10 Apache Web Server and ApacheBench Benchmarking tool
In this exercise, we are going to switch gears and experiment with
different software —
We recommend you to continue using the shell tab for head (again, make sure that your commands are executed as root). Run the commands described below in that shell.
- Add a role to the head’s run listIssue the command in bold (the rest is the expected output):
knife node run_list add head "role[apache2]"
head:
run_list:
role[nfs]
role[apache2]
- Add two roles to the node-0’s run listRun the two knife commands listed below (also on head) in order to assign two roles to node-0:
knife node run_list add node-0 "role[apache2]"
run_list:
head:
run_list:
role[nfs]
role[apache2]
knife node run_list add node-0 "role[apachebench]"
run_list:
head:
run_list:
role[nfs]
role[apache2]
role[apachebench]
Notice that we did not exclude the nfs role from the run lists. Configuration updates in Chef are idempotent, which means that an update can be applied to a node multiple times, and every time the update will yield identical configuration (regardless of the node’s previous state) Thus, this time, when you run “chef-client” again, Chef will do the right thing: the state of each individual component will be inspected, and all NFS-related tasks will be silently skipped because NFS is already installed. - Trigger updates on all nodesIn the previous section, we updated one node at a time. Try the “batch” update - run a single command that will trigger configuration procedures on all client nodes:
knife ssh "name:*" chef-client
apt155.apt.emulab.net resolving cookbooks for run list:
["emulab-nfs", "apache2", "apache2::mod_autoindex", "emulab-apachebench"]
apt154.apt.emulab.net resolving cookbooks for run list:
["emulab-nfs", "apache2", "apache2::mod_autoindex"]
apt155.apt.emulab.net Synchronizing Cookbooks:
....
apt155.apt.emulab.net Chef Client finished, 5/123 resources updated in 04 seconds
You should see interleaved output from the “chef-client” command running on both nodes at the same time.The last command uses a node search based on the “name:*” criteria. As a result, every node will execute “chef-client”. In cases where it is necessary, you can use more specific search strings, such as, for instance, “name:head” and “name:node-*”. - Check that the webservers are runningOne of the attributes in the apache2 role configures the web server such that it runs on the port 8080. Let’s check that the web servers are running via checking the status of that port:
knife ssh "name:*" "netstat -ntpl | grep 8080"
pc765.emulab.net tcp6 0 0 :::8080 :::* LISTEN 9415/apache2
pc768.emulab.net tcp6 0 0 :::8080 :::* LISTEN 30248/apache2
Note that this command sends an arbitrary command (unrelated to Chef) to a group of nodes. With this functionality, the knife command-line utility can be viewed as an orchestration tool for managing groups of nodes that is capable of replacing pdsh, the Parallel Distributed Shell utility, that is often used on computing clusters.The output that is similar to the one above indicates that both apache2 web servers are running. The command that you have just issued uses netstat, a command-line tool that displays network connections, routing tables, interface statistics, etc. Using knife, you have run netstat in combination with a Linux pipe and a grep command for filtering output and displaying the information only on the port 8080. - Check benchmarking results on node-0Many different actions have taken place on node-0, including:
apache2 has been installed and configured
ab, a benchmarking tool from the apache2-utils package, has been installed and executed
benchmarking results have been saved, and plots have been created using the gnuplot utility
the plots have been made available using the installed web server
To see how the last three tasks are performed using Chef, you can take a look at the default.rb recipe inside the emulab-apachebench cookbook. In the in-person tutorial, let’s proceed to the following steps and leave the discussion of this specific recipe and abstractions used in Chef recipes in general to the end of the tutorial.Obtain the node-0’s public hostname (refer to the list included in the profile instructions), and construct the following URL:http://<node-0's public hostname>:8080
With the right hostname, copy and paste this URL into your browser to access the web server running on node-0 and serving the benchmarking graphs. You should see a directory listing like this:You can explore the benchmarking graphs by clicking at the listed links. So far, ab has run against the local host (i.e. node-0), therefore the results may not be so interesting. Do not close the window with your browser since one of the following steps will ask you to go back to it to see more results. - Modify a roleJust like the nfs role described in the previous section, apachebench has a number of attributes that define its behavior. We will use the Chef web console to update the value for one of its attributes.In the console, click on the Policy tab at the top of the page, choose “Roles” in the panel on the left, and select “apachebench” in the displayed list of roles. Select the Attributes tab in the middle of the page. At the bottom of the page, you should now see a panel called “Override Attributes”. Click the “Edit” button inside that panel:In the popup window, change the target_host attribute from null to the head’s public hostname (refer to the hostnames listed in the profile instructions). Don’t forget to use double quotes around the hostname. Also, let’s modify the list of ports against which we will run the benchmark. Change 80 to 8080 since nothing interesting is running on port 80, while the apache2 server you have just installed is listening on the port 8080. Leave the port 443 in the list —
this is the port on which the Chef web console is running. Here is an example of the recommended changes: When these changes are complete, click "Save Attributes".Alternative: It turns out that you don’t have to use the Chef web console for modifying this role. Take a look at the two steps described below to see how we can modify this role from the shell or skip to running the benchmark.This is what you need to make the described changes without using the web console:Edit the file /chef-repo/roles/apachebench.rb on head using a text editor of your choice (e.g., vi)
After making the suggested changes, “push” the updated role to the server by typing: knife role from file /chef-repo/roles/apachebench.rb
- Run the benchmark against headThe updated version of the role is available on the server now. You can run it on node-0 using the following command:
knife ssh "role:apachebench" chef-client
This command, unlike the knife commands you issued in the previous steps, uses a search based on the assigned roles. In other words, it will find the nodes to which the apachebench role has been assigned by now —in our case, only node-0 — and execute “chef-client” on them. - Check benchmarking results againGo back to your browser window and update the page to see the directory listing with new benchmarking results.The first two (the most recent) graphs represent the results of benchmarking of the web services running on head performed from node-0. Among many interesting facts that are revealed by these graphs, you will see that the response time is much higher on the port 443 than on the port 8080.
Summary: You have just performed an experiment with apache2 and ab in a very automated manner. The Chef role you have used performed many tasks, including setting up the infrastructure, running a benchmark, saving and processing results, as well as making them easily accessible. The following section will shed some light on how these tasks were accomplished and also how they can be customized.
14.10.1 Understanding the Internals
Below we describe some of the key points about the pieces of infrastructure code you have just used.
The apache2 role is stored at roles/apache2.rb as part of the emulab/chef-repo. Note that the run list specified in this role includes a cookbook apache2 (the recipe called default.rb inside the cookbook is used) and also a recipe mod_autoindex from the same cookbook. This cookbook is one of the cookbooks that have been obtained from Chef Supermarket. All administrative work for installing and configuring the web server is performed by the recipes in this cookbook, while the role demonstrates an example of how that cookbook can be “wrapped” into a specification that satisfy specific application needs.
The apachebench role is stored at roles/apachebench.rb It specifies values for several attributes and adds a custom cookbook emulab-apachebench to the run list. We use the “emulab” prefix in the names of the cookbooks that have been developed by the CloudLab staff to emphasize that they are developed for Emulab and derived testbeds such as CloudLab. This also allows distinguishing them from the Supermarket cookbooks, which are installed in the same directory on head.
Inside the default.rb recipe in emulab-apachebench you can find many key words, including package, log, file, execute, template, etc. They are called Chef resources. These elements, which allow defining fine-grained configuration tasks and actions, are available for many common administrative needs. You can refer to the list of supported Chef resources and see examples of how they can be used.
Another item that is worth mentioning is the “ignore_failure true” attribute used in some of the resources. It allows the recipe to continue execution even when something does not go as expected (shell command fail, necessary files do not exist, etc.). Also, when relying on specific files in your recipes, you can augment resources with additional checks like “only_if{::File.exists?(<file name>)}” and “not_if{::File.exists?(<file name>)}” to make your infrastructure code more reliable and repeatable (this refers to the notion of idempotent code mentioned earlier).
14.11 Final Remarks about Chef on CloudLab
In this tutorial, you had a chance to explore the Chef configuration management system and used some of its powerful features for configuration management in a multi-node experiment on CloudLab. Even though the instructions walked you through the process of configurating only two nodes, you can use the demonstrated code artifacts, such as roles, cookbooks, recipes, and resources, and apply them to infrastructure in larger experiments. With the knife commands like the ones shown above, you can “orchestrate” diverse and complex applications and environments.
When establishing such orchestration in your experiments, you can take advantage of the relevant pieces of intrastructure code, e.g., available through Chef Supermarket. In cases when you have to develop your own custom code, you may adopt the structure and the abstractions supported by Chef and aim to develop infrastructure code that is modular, flexible, and easy to use.
The ChefCluster profile is available to all users on CloudLab. If you are interested in learning more about Chef and developing custom infrastructure code for your experiments, this profile will spare you from the need to set up necessary components every time and allow you to run a fully functional Chef installation for your specific needs.
14.12 Terminating Your Experiment
Resources that you hold in CloudLab are real, physical machines and are therefore limited and in high demand. When you are done, you should release them for use by other experimenters. Do this via the “Terminate” button on the CloudLab experiment status page.
Note: When you terminate an experiment, all data on the nodes is lost, so make sure to copy off any data you may need before terminating.
If you were doing a real experiment, you might need to hold onto the nodes for longer than the default expiration time. You would request more time by using the “Extend” button the on the status page. You would need to provide a written justification for holding onto your resources for a longer period of time.
14.13 Future Steps
Now that you’ve got a feel for how Chef can help manage experiment resources, there are several things you might try next:
Learn more about Chef at the Chef official website
Explore existing cookbooks and roles in the emulab/chef-repo repository
Follow the steps in the Guide to Writing Chef Cookbooks blog post by Joshua Timberman, a Chef developer and one of the active members of the Chef community
Configure a LAMP (Linux, Apache, MySql, and PHP) stack using Chef by following the Creating Your First Chef Cookbook tutorial
15 Getting Help
The help forum is the main place to ask questions about CloudLab, its use, its default profiles, etc. Users are strongly encouraged to participate in answering other users’ questions. The help forum is handled through Google Groups, which, by default, sends all messages to the group to your email address. You may change your settings to receive only digests, or to turn email off completely.
The forum is searchable, and we recommend doing a search before asking a question.
The URL for joining or searching the forum is: https://groups.google.com/forum/#!forum/cloudlab-users
If you have any questions that you do not think are suitable for a public forum, you may direct them to support@cloudlab.us