OSG Technology Area Rumblings > File Isolation using bind mounts and chroots

The last post ended with a new technique for process-level isolation that unlocks our ability to safely use anonymous accounts and group accounts.

However, that's not "safe enough" for us: the jobs can still interact with each other via the file system.  This post examines the directories where jobs can write into, and what can be done to remove this access.

On a typical batch system node, a user can write into the following directories:

  • System temporary directories: The Linux Filesystem Hierarchy Standard (FHS) provides at least two sticky, world-writable directories, /tmp and /var/tmp.  These directories are traditionally unmanaged (user processes can write an uncontrolled amount of data here) and a security issue (symlink attacks and information leaks), even when user separation is in place.
  • Job Sandbox: This is a directory created by the batch system as a scratch location for the job.  The contents of the directory will be cleaned out by the batch system after the job ends.  For Condor, any user proxy, executable, or job stage-in files will be copied here prior to the job starting.
  • Shared Filesystems: For a non-grid site, this is typically at least $HOME, and some other site-specific directory.  $HOME is owned by the user running the job.  On the OSG, we also have $OSG_APP for application installation (typically read-only for worker nodes) and, optionally, $OSG_DATA for data staging (writable for worker nodes).  If they exist and are writable, $OSG_APP/DATA are owned by root and marked as sticky.
  • GRAM directories: For non-Condor OSG sites, a few user-writable directories are needed to transfer the executable, proxy, and job stage-in files from the gatekeeper to the worker node.  These default to $HOME, but can be relocated to any shared filesystem directory.  For Condor-based OSG sites, this is a part of the job sandbox.
If user separation is in place and considered sufficient, filesystem isolation is taken care of for shared filesystems, GRAM directories, and the job sandbox.  The systemwide temporary directories can be protected by mixing filesystem namespaces and bind mounts.

A process can be launched in its own filesystem namespace; such a process will have a copy of the system mount table.  Any change made to the process's mount table will not be seen by the outside system, and will be shared with any child processes.

For example, if the user's home directory is not mounted on the host, the batch system could create a process in a new filesystem namespace and mount the home directory in that namespace.  The home directory will be available to the batch job, but to no other process on the filesystem.

When the last process in the filesystem namespace exits, all mounts that are unique to that namespace will be unmounted.  In our example, when the batch job exits, the kernel will unmount the home directory.

A bind mount makes a file or directory visible at another place in the filesystem - I think of it as mirroring the directory elsewhere.  We can take the job sandbox directory, create a sub-directory, and bind-mount the sub-directory over /tmp.  The process is mostly equivalent to the following shell commands (where $_CONDOR_SCRATCH_DIR is the location of the Condor job sandbox) in a filesystem namespace:

mkdir $_CONDOR_SCRATCH_DIR/tmp
mount --bind $_CONDOR_SCRATCH_DIR/tmp /tmp

Afterward, any files a process creates in /tmp will actually be stored in $_CONDOR_SCRATCH_DIR/tmp - and cleaned up accordingly by Condor on job exit.  Any system process not in the job will not be able to see or otherwise interfere with the contents of the job's /tmp unless it can write into $_CONDOR_SCRATCH_DIR.

Condor refers to this feature as MOUNT_UNDER_SCRATCH, and will be a part of the 7.7.5 release.  This will be an admin-specified list of directories on the worker node.  With it, the job will have a private copy of these directories, which will be backed by $_CONDOR_SCRATCH_DIR.  The contents - and size - of these will be managed by Condor, just like anything else in the scratch directory.

If user separation is unavailable or not considered sufficient (if there are, for example, group accounts), an additional layer of isolation is needed to protect the job sandbox.  A topic for a future day!

Derek's Blog > Ceph on Fedora 16

I've written before how to run ceph on Fedora 15, but now I'm working on Fedora 16.

Last time I complained about how much ceph tries to do for you.  For better or worse, now it attempts to do more for you!

For my setup, I had 3 nodes in the HCC private cloud.  First, we need to install ceph.
$ yum install ceph

Then, create a configuration file for ceph.  The RPM comes with a good example that my configuration is based on.  The example script is in /usr/share/doc/ceph/sample.ceph.conf


My configuration: Derek's Configuration

The configuration has the authentication turned off.  I found this useful because the ceph-authtool (yes, the renamed it since Fedora 15) is difficult to use.  And because all of the nodes are on a private vlan only reachable by my openvpn key :)

Then, you need to create and distribute ssh keys to all of your nodes so that the mkcephfs can ssh to them and configure.
$ ssh-keygen 

Then copy them to the nodes:
$ ssh-copy-id i-000000c2
$ ssh-copy-id i-000000c3

Be sure to make the data directories on all the nodes.  In this case:
$ mkdir -p /data/osd.0
$ ssh i-000000c2 'mkdir -p /data/osd.1'
$ ssh i-000000c3 'mkdir -p /data/osd.2'

Then run the mkcephfs command:
$ mkcephfs -a -c /etc/ceph/ceph.conf

And start up the daemons:
$ service ceph start

You should have the daemons running then.  If they fail for some reason, they tend to output what the problem was.  Also, the logs for the services are in /var/log/ceph

To mount the filesystem, find an ip address of one of the monitors.  In my case, I had a monitor on ip address 10.148.2.147.  The command to mount is:
$ mkdir -p /mnt/ceph
$ mount -t ceph 10.148.2.147:/ /mnt/ceph

Since you don't have any authentication, it should work without problems.

I've had some problems with the different mds, even had a OSD die on me.  It resolved itself, and I even added another OSD to take it's place, recreating the CRUSH table.  Since creating this, I have even worked with the graphical interface:

And here's a presentation I did about the CEPH Paper.  Note,  I may not be entirely accurate in the presentation, do be kind.




An Open Science Grid Work Log > SAGA Hadoop

Introduction

Hadoop has become a fixture where big data is concerned but it has been difficult to use in HPC and HTC cluster environments. This is becoming unfortunate as an increasing number of new algorithms assume Hadoop’s an option. I tried SAGA-Hadoop first but it should be noted that, myHadoop from U. of Indiana sought to remedy this a few years ago. If someone has experience with both or would like to comment on differences, that would be appreciated.

SAGA Hadoop

SAGA-Hadoop installs, configures and executes Hadoop on clusters running batch schedulers for which SAGA has adapters. At least it’s headed in that direction.

SAGA is the Simple API for Grid Applications and an Open Grid Forum standard (GFD.90) for interfacing with diverse cluster batch scheduling systems. It is a large and complex standard so we’ll leave it at that for now. For our purposes, suffice to say that it works with PBS.

The Bliss project provides Python bindings for SAGA. Like many Python APIs, it takes a minimalist approach, not covering the entire standard and demonstrating a strong preference for simplicity and brevity.

As the helpful introductory blog post above (SAGA-Hadoop) describes, it runs Hadoop.

But Can this be Interesting on the OSG?

To make this plausible for the OSG:

      • We have to be able to automate the process entirely
      • It would be really good if there were a practical way to use command line tools as the map and reduce steps in a MapReduce computation.

So this is the story of finding those things out.

Automating Installation

The RENCI Blueridge cluster runs the PBS job manager. I scripted the installation but nothing is pretty about it yet. Except that it  demonstrates  automation of setting Hadoop up on an HPC cluster.

There’s Python involved so my first step was to set up a virtualenv to manage our project’s dependencies:

wget --timestamping \
   https://raw.github.com/pypa/virtualenv/master/virtualenv.py
python virtualenv.py venv
source venv/bin/activate
pip install bliss uuid

uuid is a dependency that, for whatever reason, needed to be explicitly named to pip.

the uuid module uses ifconfig. Putting it in the path changed nothing. So we force the issue by editing the module file. Again, it’s in our virtualenv so the copy’s entirely ours:

sed --in-place=.orig                       \
     s,\'ifconfig\',\'/sbin/ifconfig\',     \
     venv/lib/python2.4/site-packages/uuid.py

Make sure JAVA_HOME is set.

Then, after unzipping Hadoop, we

      • Get the source code for SAGA-Hadoop
      • Edit it to specify the login node for our cluster
      • Edit the bootstrap script to alter the Hadoop configs after installation. It
        • Deletes an unnecessary os.makedirs() call
        • Corrects the makedirs for log_dir to be for self.job_log_dir
        • Uncomments the configuration of the Hadoop data node
        • Alters the network itnerface to use to the locally correct eth0
        • Sets JAVA_HOME
        • Sets HADOOP_HEAP_SIZE
svn co https://svn.cct.lsu.edu/repos/saga-projects/applications/SAGAHadoop/saga-hadoop
sed --in-place=.orig \
    -e "s,india,br0," saga-hadoop/launcher.py
HADOOP_ENV=hadoop-1.0.0/conf/hadoop-env.sh
sed --in-place=.orig \
    -e "s,os.makedirs(job_dir),," \
    -e "s,os.makedirs(log_dir),os.makedirs(self.job_log_dir)," \
    -e "s,\<!--,," \
    -e "s,\-\-\>,," \
    -e "s,eth1,eth0," \
    -e "s,tar \-xzf hadoop.tar.gz,tar -xzf hadoop.tar.gz; echo export JAVA_HOME=$JAVA_HOME >> $HADOOP_ENV; echo export HADOOP_HEAP     _SIZE=2000 >> $HADOOP_ENV; cat $HADOOP_ENV," \
    saga-hadoop/bootstrap_hadoop.py

This is all saved as a script called build.

Running it is relatively uninteresting. It installs the virtualenv, Python dependencies and edits the config files as you’d expect.

Running Hadoop

We save these commands to a script called start:

source venv/bin/activate
./saga-hadoop/bootstrap_hadoop.py

Running it logs plenty of Hadoop information to the console and starts the server.

Command Line MapReduce with Hadoop Streaming

The command line is the lingua franca of the OSG. There’s nothing finer for figuring out a problem quickly than the command line. So how are we going to run MapReduce jobs from the command line – especially in this dynamically created environment?

Hadoop Streaming lets us run programs of our choice as the map and reduce steps. But first we need an additional Java archive to make it work. The following commands go in a file called stream. This first batch gets the JAR file and copies it to the right location

jar_url=http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-streaming/1.0.0/hadoop-streaming-1.0.0.jar
lib=hadoop-streaming-1.0.0.jar work/hadoop-1.0.0/share/hadoop/contrib/streaming
mkdir -p $lib
curl $jar_url  > $lib/hadoop-streaming-1.0.0.jar

Once that’s in place, we get rid of the old output directory (not likely to be relevant for an OSG job), then

inputs=$1
output=$2
mapper=$3
reducer=$4

rm -rf out

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/contrib/streaming/hadoop-streaming-1.0.0.jar \
     -input $inputs \
     -output $output \
     -mapper $mapper \
     -reducer $reducer \
     -jobconf mapred.reduce.tasks=2

We execute the map reduce with whatever’s passed in. Then, we create a trivial program called script to be our map operator:

for x in $(seq 0 100); do
    echo pattern $x
done

Next, we run stream like this:

./stream in out script wc

Which produces the sum of word count of our 101 output lines times three

 [scox@br0:~/dev/saga-hadoop]$ cat out/part-00000
     303     606    3609

because the stream is executed once per file in our input directory (in):

 in
 |-- a
 |-- b
 `-- c

Summary

We’ve just done a completely automated install, configuration and execution of a small Hadoop cluster overlay on top of a PBS HPC cluster. We also saw automation for installnig Hadoop streaming and running a test MapReduce job using command line programs as map and reduce operators.

The overall stack looks like this:

It is probably clear that much more would be necessary to properly configure and deploy a useful cluster, particularly interfacing it properly to a PBS job context. That part will have to wait for another day. And again, if anyone reading this can comment on myHadoop, I’d appreciate the insight.




Derek's Blog > CEPH on Fedora 15

Yesterday, I read a blog post using CEPH for a backend store for virtual machine images.  I've heard a lot about ceph in the last year, especially after it was integrated into the mainline kernel in 2.6.34.  So I thought I'd give it a try.

Before I get into the install, I want to summarize my thoughts on Ceph.  I think it has a lot of potential, but parts of it are trying too hard to do everything for you.  I always think there is a careful balance between a program doing too much for you, and making you do too much.  For example, the mkcephfs script that creates a ceph filesystem will ssh to all the worker nodes (defined in ceph.conf) and configure the filesystem.  If I was in operations, this would scare me.

Also, the keychain configuration is overly complicated.  I think the Ceph is designed to be secure over the WAN (secure, not encrypted), so maybe it's needed.  But it seems overly complicated when you compare it to other distributed file systems (Hadoop, Lustre).

On the other hand, I really like the full posix compliant client, especially since it's in the mainline kernel.  It is too bad that it was added in 2.6.34 rather than 2.6.32 (RHEL 6 kernel).  I guess we'll have to wait 2 years for RHEL 7 to have it in something we can use in production.

Also, the distributed metadata and multiple metadata servers are interesting aspects to the system.  Though, in the version I tested, the MDS crashed a few times (the system picked it up and compensated).

On Fedora 15, ceph packages are in the repos.
yum install ceph

The configuration I settled on was:
[global]
auth supported = cephx
keyring = /etc/ceph/keyring.admin

[mds]
keyring = /etc/ceph/keyring.$name
[mds.i-00000072]
host = i-00000072
[mds.i-00000073]
host = i-00000073
[mds.i-00000074]
host = i-00000074

[osd]
osd data = /srv/ceph/osd$id
osd journal = /srv/ceph/osd$id/journal
osd journal size = 512
osd class dir = /usr/lib64/rados-classes
keyring = /etc/ceph/keyring.$name
[osd0]
host = i-00000072
[osd1]
host = i-00000073
[osd2]
host = i-00000074

[mon]
mon data = /srv/ceph/mon$id
[mon0]
host = i-00000072
mon addr = 10.148.2.147:6789
[mon1]
host = i-00000073
mon addr = 10.148.2.148:6789
[mon2]
host = i-00000074
mon addr = 10.148.2.149:6789

As you can read from the configuration file, all files are stored in /srv/ceph/...  You will need to make this directory on all your worker nodes.

Next I needed to create a keyring for authentication with the client/admin/dataservers.  The keyring tool is distributed with Ceph, and is called cauthtool.  Even now, it's not clear to me how to use this tool, or how Ceph uses the keyring.  First you need to make a caps (capabilities?) file:

osd = "allow *"
mds = "allow *"
mon = "allow *"

Here are the cauthtool commands to get it to work.

cauthtool --create-keyring /etc/ceph/keyring.bin
cauthtool -c -n i-00000072 --gen-key /etc/ceph/keyring.bin
cauthtool -n i-00000074 --caps caps /etc/ceph/keyring.bin
cauthtool -c -n i-00000073 --gen-key /etc/ceph/keyring.bin
cauthtool -n i-00000073 --caps caps /etc/ceph/keyring.bin
cauthtool -c -n i-00000074 --gen-key /etc/ceph/keyring.bin
cauthtool -n i-00000072 --caps caps /etc/ceph/keyring.bin
cauthtool --gen-key --name=admin /etc/ceph/keyring.admin


From the blog post linked above, I used their script to create the directories and copy the ceph.conf to the other hosts.

n=0
for host in i-00000072 i-00000073 i-00000074 ; \
do \
ssh root@$host mkdir -p /etc/ceph /srv/ceph/mon$n; \
n=$(expr $n + 1); \
scp /etc/ceph/ceph.conf root@$host:/etc/ceph/ceph.conf
done
mkcephfs -a -c /etc/ceph/ceph.conf -k /etc/ceph/keyring.bin


Then copy the keyrings
for host in i-00000072 i-00000073 i-00000074 ; \
do \
scp /etc/ceph/keyring.admin root@$host:/etc/ceph/keyring.admin; \
done


Then startup the daemons on all the nodes:

service ceph start

And to mount the system:
mount -t ceph 10.148.2.147:/ /mnt/ceph -o name=admin,secret=AQBlV5dO2TICABAA0/FP7m+ru6TJLZaPxFuQyg==

Where the secret is the output from the command:
 cauthtool --print-key /etc/ceph/keyring.bin 


Inside OSG Ops. > New Home for OSG Web Site

On Frebruary 28th the OSG Webpages located at www.opensciencegrid.org will move to the OSG Twiki. This move corresponds with the upcoming conclusion of a contract with the Chicago based web hosting service Tilted Planet.

During the scheduled production service update on the 28th, browsers will be redirected from the current location to the new location on the main OSG Twiki page. A mockup of the new page can be seen at twiki-itb.grid.iu.edu.

This is the first step, and likely an interim web page home, in a project that will affect the OSG Public Web Pages, the OSG Twiki, the DocDB, and possibly other OSG services with web-UI’s. Evaluation of content management systems, wikis, and documentation file database solutions has already begun and will continue over the next several months. Please contact the GOC (goc@opensciencegrid.org) if you have a suggestion for packages you think should be evaluated.


An Open Science Grid Work Log > NAMD with PBS and Infiniband on NERSC Dirac

Overview

NAMD simulates molecular motion, especially of large molecules so it’s often used to simulate molecular docking problems. One particularly interesting class of docking problem is the interaction of protein molecules with other molecules such as the cell membrane. The enormous number of atoms involved in these simulations confine the kinds of information we’re able to learn about how proteins interact with and shape their environments because more atoms require more computing power. So we’re investigating using GPU accelerated nodes in a shared memory cluster to speed up simulation time.

This describes running NAMD in a multi-node configuration at NERSC Dirac to determine if we want to build out a Pegasus workflow executing in this mode through the OSG compute element. The process is, as usual with MPI codes using cluster interconnects, highly cluster specific. The next step is to determine if it’s worth it and what our alternatives are.

Approach

If you’re having a hard time running NAMD in a PBS environment over an Infiniband interconnect, you are not alone. The NAMD release notes come right to the point:

“Writing batch job scripts to run charmrun in a queueing system can be challenging.”

These links, in addition to the release notes cited above provide useful insights:

And without further delay, here’s the approach that worked on Dirac. Mileage on your cluster may vary.
#!/bin/bash

set -x
set -e

# build a node list file based on the PBS
# environment in a form suitable for NAMD/charmrun

nodefile=$TMPDIR/$PBS_JOBID.nodelist
echo group main > $nodefile
nodes=$( cat $PBS_NODEFILE )
for node in $nodes; do
   echo host $node >> $nodefile
done

# find the cluster's mpiexec
MPIEXEC=$(which mpiexec)

# Tell charmrun to use all the available nodes, the nodelist built  above and the cluster's MPI.
CHARMARGS="+p32 ++nodelist $nodefile"

As an additional wrinkle, we want to run the GPU accelerated version. That’s why we use the +idlepoll argument to NAMD.

After setting NAMD_HOME, the command to execute NAMD is:

${NAMD_HOME}/charmrun \

${CHARMARGS} ++mpiexec ++remote-shell \

${MPIEXEC} ${NAMD_HOME}/namd2 +idlepoll <input_file>

The beginning of NAMD’s output looks like this:

Info: 1 NAMD 2.8 Linux-x86_64-ibverbs-CUDA 16 dirac48 stevecox
Info: Running on 16 processors, 16 nodes, 2 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.025701 s
Pe 5 sharing CUDA device 0 first 0 next 6
Pe 5 physical rank 5 binding to CUDA device 0 on dirac48: ‘Tesla C1060′ Mem: 4095MB Rev: 1.3
Pe 10 sharing CUDA device 0 first 8 next 11
Pe 10 physical rank 2 binding to CUDA device 0 on dirac47: ‘Tesla C1060′ Mem: 4095MB Rev: 1.3
Pe 8 sharing CUDA device 0 first 8 next 9
Pe 8 physical rank 0 binding to CUDA device 0 on dirac47: ‘Tesla C1060′ Mem: 4095MB Rev: 1.3
Pe 2 sharing CUDA device 0 first 0 next 3
Did not find +devices i,j,k,… argument, using all
Pe 2 physical rank 2 binding to CUDA device 0 on dirac48: ‘Tesla C1060′ Mem: 4095MB Rev: 1.3

Of particular importance, note that there is a pre-built executable specific to ibverbs-CUDA – that is, it works with infiniband connected clusters with CUDA accelerated nodes.

These are the parameters of the dirac_reg queue:

[stevecox@cvrsvc01 namd]$ qstat -Qf dirac_reg
Queue: dirac_reg
 queue_type = Execution
 Priority = 10
 max_user_queuable = 500
 total_jobs = 39
 state_count = Transit:0 Queued:4 Held:27 Waiting:0 Running:8 Exiting:0
 acl_user_enable = False
 resources_max.nodect = 12
 resources_max.walltime = 06:00:00
 resources_min.nodect = 1
 resources_default.walltime = 00:05:00
 mtime = 1323823829
 resources_assigned.nodect = 34
 max_user_run = 2
 enabled = True
 started = True

So to test jobs, I ran qsub like this:

qsub -I -q dirac_reg -l walltime=06:00:00 -l nodes=4:ppn=8

The -I parameter tells qsub to start an interactive job. The walltime parameter overrides the very low default walltime. Fnially, nodes tells PBS how many cluster nodes to use and ppn specifies the processes per node to start.

After debugging, I ran the script like this:

qsub -q dirac_reg -l walltime=06:00:00 -l nodes=4:ppn=8 ./callnamd

Results

I did three runs with 2, 4, and 8 nodes. The interesting performance number for a NAMD run is days/ns or days of computation time required per nanosecond of simulation.

[stevecox@cvrsvc01 ~]$ grep -i days dev/dukechem/osg/namd/run.* | sed -e “s,.txt,,” -e “s,.*run.,,”
2way:Info: Initial time: 16 CPUs 0.0617085 s/step 0.357109 days/ns 91.0601 MB memory
2way:Info: Initial time: 16 CPUs 0.0613538 s/step 0.355057 days/ns 94.0546 MB memory
2way:Info: Initial time: 16 CPUs 0.0619225 s/step 0.358348 days/ns 94.7324 MB memory
2way:Info: Benchmark time: 16 CPUs 0.0620334 s/step 0.35899 days/ns 94.8284 MB memory
2way:Info: Benchmark time: 16 CPUs 0.0621472 s/step 0.359648 days/ns 95.09 MB memory
2way:Info: Benchmark time: 16 CPUs 0.0620733 s/step 0.359221 days/ns 95.162 MB memory
4way:Info: Initial time: 32 CPUs 0.0472537 s/step 0.273459 days/ns 83.8981 MB memory
4way:Info: Initial time: 32 CPUs 0.0470766 s/step 0.272434 days/ns 84.8605 MB memory
8way:Info: Initial time: 64 CPUs 0.0406125 s/step 0.235026 days/ns 81.0847 MB memory
8way:Info: Initial time: 64 CPUs 0.0406405 s/step 0.235188 days/ns 82.1035 MB memory
8way:Info: Initial time: 64 CPUs 0.0407004 s/step 0.235534 days/ns 82.2474 MB memory
8way:Info: Benchmark time: 64 CPUs 0.0407453 s/step 0.235794 days/ns 82.3482 MB memory
8way:Info: Benchmark time: 64 CPUs 0.040858 s/step 0.236447 days/ns 82.3975 MB memory
8way:Info: Benchmark time: 64 CPUs 0.0406536 s/step 0.235264 days/ns 82.4038 MB memory

Here are some details of NERSC Dirac’s configuration:

Dirac is a 50 GPU node cluster connected with QDR IB.  Each GPU node also contains 2 Intel 5530 2.4 GHz, 8MB cache, 5.86GT/sec QPI Quad core Nehalem processors (8 cores per node) and 24GB DDR3-1066 Reg ECC memory.

  •  44 nodes:  1 NVIDIA Tesla C2050 (code named Fermi) GPU with 3GB of memory and 448 parallel CUDA processor cores.
  • 4 nodes:  1 C1060 NVIDIA Tesla GPU with 4GB of memory and 240 parallel CUDA processor cores.
  • 1 node:  4 NVIDIA Tesla C2050 (Fermi) GPU’s, each with 3GB of memory and 448 parallel CUDA processor cores.
  • 1 node:  4 C1060 Nvidia Tesla GPU’s, each with 4GB of memory and 240 parallel CUDA processor cores.

Here are results from earlier runs on a cluster with far fewer GPUs but a configuration in which accelerated nodes contain four Nvidia Teslas (like one of the Dirac nodes):

  • 4CPU: 0.998798 days/ns
  • 8CPU: 0.565848 days/ns
  • And with the production sample at 8CPU:  0.288802
Conclusions
While these findings are preliminary, indications are that having four GPUs on a single node makes a substantial performance difference.


OSG Technology Area Rumblings > Job Isolation in Condor

I'd like to share a few exciting new features under construction for Condor 7.7.6 (or 7.9.0, as it may be).

I've been working hard to improve the job isolation techniques available in Condor.  My dictionary defines the verb "to isolate" as "to be or remain alone or apart from others"; when applied to the Condor context, we'd like to isolate each job from the others.  We'll define process isolation as the inability of a process running in a batch job to interfere with a process not a part of the job.  Interfering with processes on Linux, loosely defined, means the sending of POSIX signals, taking control via the ptrace mechanism, or writing into the other process's memory.

Process isolation is only one aspect of job isolation.  Job isolation also includes the inability to interfere with other jobs' files (file isolation) and not being able to consume others' system resources such as CPU, memory, or disk (resource isolation).

In Condor, process isolation has historically been accomplished via one of two mechanisms:

  • Submitting user.  Jobs from Alice and Bob will be submitted as the unix users alice and bob, respectively.  In this model, the jobs running on the worker node will be run as users alice and bob, respectively.  The processes in the job running under user bob are protected from the processes in the job running as user alice via traditional POSIX security mechanisms.
    • This model makes the assumption that jobs submitted by the same user do not need isolation from each other.  In other words, there shouldn't be any shared user accounts!
    • This model also assumes the submit host and the worker node share a common user namespace.  This can be more difficult to accomplish than it sounds: if the submit host has thousands of unique users, we must make sure each functions on the worker node.  If the submit host is on a remote site with a different user namespace from the worker node, this may not be easily achievable!
  • Per-slot users.  Each "slot" (roughly corresponding to a CPU) in condor is assigned a unique unix user.  The job currently running in that slot is run under the associated username.
    • This solves the "gotchas" noted above with the submitting user isolation model.
    • This is difficult to accomplish in-practice if the job wants to utilize a filesystem shared between the submit and worker nodes.  The filesystem security is based on two users having distinct Unix user names; in this model, there's no way to mark your files as only readable by your own jobs.
Notice both techniques require on user isolation to accomplish process isolation.  Condor has an oft-overlooked third mode:

  • Mapping remote users to nobody.  In this mode, local users (where the site admin can define the meaning of "local") get mapped to the submit host usernames, but non-local users all get mapped to user nobody - the traditional unprivileged user on Linux.
    • Local users can access all their files, but remote users only get access to the batch resources - no shared file systems.
Unfortunately, this is not a very secure mode as, according to the manual, the nobody account "... may also be used by other Condor jobs running on the same machine, if it is a multi-processor machine"; not very handy advice in an age where your cell phone likely is a multi-processor machine!

This third mode is particularly attractive to us - we can avoid filesystem issues for our local users, but no longer have to create the thousands of accounts in our LDAP database for remote users.  However, since jobs from remote users run under the same unix user account, the traditional security mechanism of user separation does not apply - we need a new technique!

Enter PID namespaces, a new separation technique introduced in kernel 2.6.24.  By passing an additional flag when creating a new process, the kernel will assign an additional process ID (PID) to the child process.  The child will believe itself to be PID 1 (that is, when the child calls getpid(), it returns 1), while the processes in the parent's namespace will see a different PID.  The child will be able to spawn additional processes - all will be stuck in the same inner namespace - that similarly have an inner PID different from the outer one.

Processes within the namespace can only see and interfere (send signals, ptrace, etc) with other processes inside the namespace.  By launching the new job in its own PID namespace, Condor can achieve process isolation without user isolation: the job processes are isolated from all other processes on the system.

Perhaps the best way to visualize the impact of PID namespaces in the job is to examine the output of ps:

[bbockelm@localhost condor]$ condor_run ps faux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
bbockelm     1  0.0  0.0 114132  1236 ?        SNs  11:42   0:00 /bin/bash /home/bbockelm/.condor_run.3672
bbockelm     2  0.0  0.0 115660  1080 ?        RN   11:42   0:00 ps faux

Only two processes can be seen from within the job - the shell executing the job script and "ps" itself.

Releasing a PID namespaces-enabled Condor is an ongoing effort: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1959; I've recently re-designed the patch to be far less intrusive on the Condor internals by switching from the glibc clone() call to the clone syscall.  I am hopeful it will make it in the 7.7.6 / 7.9.0 timescale.

From a process isolation point-of-view, with this patch, it now is safe to run jobs as user "nobody" or re-introduce the idea of shared "group accounts".  For example, we could map all CMS users to a single "cmsuser" account without having to worry about these becoming a vector for virus infection.

However, the story of job isolation does not end with PID namespaces.  Stay tuned to find out how we are tackling file and resource isolation!

An Open Science Grid Work Log > Virtual Machines for OSG

Overview

The ability to run a virtual machine with a self-contained computing environment has major advantages. Users can

    • Choose the operating system that’s best for the application
    • Execute programs that require elevated privileges
    • Install any software they need
    • Dynamically configure machine attributes like the number of cores to suit the host environment

The Engage VO is beginning to see researchers new to OSG whose default mode of operation is to spin up a VM on EC2. They quickly get used to having complete control of the computing environment.

Context

This capability has been explored for the Open Science Grid before. Clemson built Kestrel which supports KVM based virtualization with an XMPP communication architecture. Then STAR used Kestrel with great success. Clemson now also provides OneCloud based on OpenNebula.

There’s also been work by Brian Bockelman, Derek Weitzel and others to configure virtual machines running Condor to join the submit host’s pool. Infrastructure background for that work and lots of great information is available at the team’s blog.

Recently, I’ve had new Engage users who are heavy users of virtualization. As mentioned before, they tend to assume control over the environment. This background can make the need to specially prepare executables for the OSG by static compilation and other packaging seem onerous. Many Engage users, it should be added, have input and output file sizes in the low number of gigabytes and are not familiar with High Throughput Computing or a command line approach to virtualization.

They asked if it was possible to run virtual machines on the OSG so I set out to look for an approach that would allow researchers to

      • Create virtual machines on their desktops using simple graphical tools
      • Deploy virtual machines onto the OSG
      • Transfer input files to and from the virtual machine
      • Avoid complex interactions with HTC plumbing like configuring X.509 certs, Condor, etc.

Virtualization on RENCI-Blueberry

Since virtualization is not a currently supported technology on the OSG, step one is to create a small area in which we change that. RENCI’s Blueberry is a new cluster made of older machines that we’ve recently brought online. It’s a ROCKS, Centos 5.7, Torque/PBS cluster with a small number of virtualization capable nodes.

Here’s an overview of changes we made to the cluster:

First, we installed these packages on the virtualization capable nodes:

        • Libvirt: A library of low level capabilities supporting virtualization.
        • QEMU: Virtual machine emulation layer
        • KVM: A virtualization kernel module
        • XMLStarlet: A command line XSLT engine

We configured QEMU to allow the engage user to execute virsh.

Then we created a Torque/PBS queue called virt grouping the upgraded machines.
A new GlideinWMS group was added on the Engage VO frontend. Jobs deploying virtual machines are decorated with the following modifications:
        • Job Requirements: && (CAN_RUN_VIRTUAL_MACHINE == TRUE)
        • Job Attributes: +RequiresVirtualization=TRUE

A new resource was added to the GlideinWMS factory with RSL pointing at the virt queue on RENCI-Blueberry.

Creating and Running a VM

Virtual machines were created using virt-manager, the Virtual Machine Manager. It’s a graphical application providing a wizard like interface for creating and managing VMs.

We used the command line virsh tool to export an XML description of the running virtual machine. Then the XML description and the disk image (the large file containing all of the VMs data) were moved to the Engage submit host.

The OSG job was designed to

      • Download the virtual machine’s XML description
      • Determine the number of CPUs on the machine
      • Modify the XML description to specify
        • The appropriate number of CPUs
        • The correct location for the VM image file
      • Download the virtual machine
      • Execute the virtual machine

This works. Jobs configured to run in the GlideinWMS virt group on the Engage submit node map to glideins on RENCI-Blueberry. There, the jobs download the XML config and the image, make the needed edits and spawn the virtual machine.

Getting Work to the Virtual Machine

Now, if you’ve tried to do this kind of thing before, you realize this is where things get tricky.

When the virtual machine launches, it has no idea what to do. This is part of the reason that some previous approaches put Condor on the machine. That way, it can join an existing Condor pool and has all the good things that Condor brings us in terms of file transfer, matching and so on. But getting credentials into the virtual machine securely to allow it to join the Engage pool is tricky. If you know how to do that, please leave a comment.

Alternatives … and OS Versions

Now, in principle, there are two other ways to do this that would work fine. If we can get files onto and off of the machine, it would be Ok to transfer them into and out of the worker node the old fashioned way – globs-url-copy. So here are two other mechanisms for file exchange between a host and a guest:

Shared Host/Guest Filesystem: More recent versions of Libvirt/QEMU/KVM that support sharing filesystems between the host and guest. In this model, the guest’s XML description can specify a directory on the host that should be mounted within the guest. But, as I mentioned, the RENCI-Blueberry cluster runs CentOS 5.7. As such, only a significantly older version of the virtualization stack is supported. We discussed upgrading to a newer version but that would prevent this solution from being generally reproducible on OSG.

Guestfish: Next, there’s libguest and the associated interactive shell guestfish. Guestfish lets you mount a disk image in user space. That is, there’s no need to use root privileges. It also has convenient wrapper scripts for copying a file into and out of an image. But, again, it requires a version of CentOS   significantly greater than 5.7.

From this angle, it looks like VMs on OSG could be an every day occurrence if it were not for very low OS version numbers.

File Sharing REST API

Before giving the approach up for dead, I decided to try something off the beaten path.

Beanstalk: I installed beanstalk on the submit node. Beanstalk is a very simple HTTP based queue. You can put messages on the queue and get them off. You can name queues – which it refers to, weirdly – as tubes. Beanstalk does not have a notion of authentication so that’s not great.

Beanstalkc: This is the Python client for Beanstalk.

Box.net: One of many file sharing sites with a REST API.

Workflow

A command line Box.net authentication script does token negotiation with Box.net mostly from the command line using wget and curl.

Then, Box.net URLs are published into the Engage event queue.

When virtual machines run, they install and run the boot script.

It installs the Beanstalk client and reads a single item from the event queue which it downloads from Box.net and processes. Queues are appropriately named so that different users and jobs never collide.

Finally, it converts the download URL to an upload URL and publishes the results of the run via the file sharing API.

So at the end of my run of 3 VM’s on RENCI-Blueberry, there were three files waiting for me at Box.net.

Conclusions

I invite comments on how others have secured communication to a VM on OSG. I’d love to hear.

In particular, as mentioned above, I’d love to hear how others have gotten X.509 credentials onto a VM in this environment.

Anyone else running VMs on the OSG?




Condor Project News > Red Hat announces the release of Red Hat Enterprise MRG 2.1, (February 10, 2012)

offering increased performance, reliability, interoperability, as presented in this news release.

Derek's Blog > Fedora 16 on OpenStack

After following Brian's guide on installing Fedora 15 on OpenStack, I thought I would try my hand at Fedora 16.  There where a few differences.

Filesystem Differences
Brian's guide installed Fedora using LVM.  I installed Fedora without LVM (there's a little checkbox on the partition page of Anaconda).  Without LVM, I can skip the steps on listing the physical volumes and logical volumes to find the start and end of the partition.

Also, Fedora 16 uses gpt partition.  fdisk command cannot read the partition table, therefore I had to install gdisk (in epel).  Running it has very similar command and output:

$ /usr/sbin/gdisk -l /tmp/fedora16
GPT fdisk (gdisk) version 0.8.1

Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /tmp/fedora16: 20971520 sectors, 10.0 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): A351197B-8233-4811-9B28-69A1DE121AD2
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 20971486
Partitions will be aligned on 2048-sector boundaries
Total free space is 4029 sectors (2.0 MiB)

Number Start (sector) End (sector) Size Code Name
1 2048 4095 1024.0 KiB EF02
2 4096 1028095 500.0 MiB EF00 ext4
3 1028096 16777215 7.5 GiB 0700
4 16777216 20969471 2.0 GiB 8200


Then, to extract the image:
dd if=/tmp/fedora16 of=/tmp/server-extract.img skip=1028096 count=$((16777215-1028096)) bs=512

SSH Key Differences
Brian's guide instructed you to create a /etc/rc.local.  Fedora 16 sees the introduction of systemd, which no longer executes rc.local.  Instead, it looks for the file /etc/rc.d/rc.local (possibly a symlink to /etc/rc.local?).  This file needs to be executable and be sure to include the shebang.

Also, Fedora 16's selinux doesn't label the root file system correctly (BUG), and simply making the .ssh directory doesn't not allow sshd to read it.  To solve selinux problem, I disabled selinux (bad, bad me).


Common Commands
After installing Fedora 16 into an image, and extracting the kernel and ramdisk, there where a few commands that where executed over and over as I debugged the image:

Make the changes to the image:
sudo /usr/libexec/qemu-kvm -m 2048 -drive file=/tmp/fedora16 -net nic -net user -vnc 127.0.0.1:0 -cpu qemu64 -M rhel5.6.0 -smp 2 -daemonize

Extract the partition:
dd if=/tmp/fedora16 of=/tmp/server-extract.img skip=1028096 count=$((16777215-1028096)) bs=512

Start the VM to change the label on the image:
sudo /usr/libexec/qemu-kvm -m 2048 -drive file=/tmp/fedora16 -net nic -net user -vnc 127.0.0.1:0 -cpu qemu64 -M rhel5.6.0 -smp 2 -daemonize -drive file=/tmp/server-extract.img 

Rename the image to something appropriate:
mv /tmp/server-extract.img /tmp/fedora16-extracted.img

Bundle the image for OpenStack:
euca-bundle-image --kernel aki-0000002e --ramdisk ari-0000002f -i /tmp/fedora16-extracted.img -r x86_64

Upload the image to OpenStack:
euca-upload-bundle -b derek-bucket -m /tmp/fedora16-extracted.img.manifest.xm

Register the image (this command completes fast, but openstack takes for ever to decrypt and untar the image):
euca-register derek-bucket/fedora16-extracted.img.manifest.xml


Now to build OSG packages for Fedora...  maybe not.



Spinning > Pool utilization

Here is a utilization script for a Condor pool.

$ ./utilization.sh
       Unavailable Available    Total     Used:  Avail   Total
Slots         5968      5451    11419     4179  76.66%  36.59%
Cpus          6314      5903    12217     4631  78.45%  37.90%
Memory    14277325  11776800 26054125  9908190  84.13%  38.02%

And, if you know your workload will not run on slots with less then 1GB of memory, you can filter out slots that are too small,

$ ./utilization.sh 'Memory < 1024'
       Unavailable Available    Total     Used:  Avail   Total
Slots         6292      5127    11419     4177  81.47%  36.57%
Cpus          6638      5579    12217     4629  82.97%  37.88%
Memory    14592711  11461414 26054125  9904193  86.41%  38.01%

Remember, if an attribute is not on all slots you need to use the meta-comparison operators: =?= and =!=, e.g. 'MyCustomAttr =!= True'.



OSG Technology Area Rumblings > openstack - update

Last time I was able to deploy an image. Next step would be to list it and then run. But I have hit problems.


To list images I run command:

euca-describe-images


which hangs up forever and after long time exits with message "connection reset by peer".


I have disabled iptables to eliminate firewall issues. No help.

All manuals assume that euca-describe-images should simply run and do not give instruction what to do if it does not.


Following Josh's advice I did:

strace -o edi_output -f -ff euca-describe-images

and then I looked into the output files. It seems that there might be two problems:

  1. Some euca2ools files are missing - in particular the .eucarc configuration file.
  2. There are messages about missing python files, like for example "open("/usr/lib64/python2.6/site-packages/gtk-2.0/org.so", O_RDONLY) = -1 ENOENT (No such file or directory)" (There are manu more like that).
So it seems that the eucatools installation described in previous posts may be not complete - and it missed some key files. Or python (which we already know had to be patched) is not OK. Or both.

That's all I know for now.

Derek's Blog > Testing an Globus Free OSG-Software (From EPEL(-testing))

As you may or may not know, there is a massive globus update pending in EPEL that will update globus to the version the OSG distributes.  What this means is much less work for the osg-software team since we will not have to build and support our own builds of globus.

Testing the globus from EPEL while installing some packages from osg repos is not a trivial matter.

  1. Disable the priority of the OSG repo
  2. Exclude globus and related packages that are already in EPEL from the osg repo.
Below is my final file /etc/yum.repos.d/osg.repo

Notice the many excludes in the file, the list may not be complete.

Installation is just:
yum install osg-client-condor --enablerepo=epel-testing

UPDATE!!!!
Testing Results
very good!

I ran 3 tests, all completely successful.
1. globus-job-run against a rpm CE.
$ globus-job-run pf-grid.unl.edu/jobmanager-fork /bin/sh -c "id"
uid=1761(hcc) gid=4001(grid) groups=4001(grid)
2. Condor-G submission
Condor-G Submission worked without problems.  The submission file is below:
3. And globus-url-copy worked:
$ globus-url-copy gsiftp://pf-grid.unl.edu/etc/hosts ./hosts


Spinning > EC2, VNC and Fedora

If you have ever wondered about running a desktop session in EC2, here is one way to set it up and some pointers.

First, start an instance, my preferred way is via Condor. I used ami-60bd4609 on an m1.small, providing a basic Fedora 15 server. Make sure the instance’s security group has port 22 (ssh) open.

Second, install a desktop environment, e.g. yum groupinstall 'GNOME Desktop Environment'. This is 467 packages and will take about 18 minutes.

Third, install and setup a VNC server. yum install vnc-server ; vncpasswd ; vncserver :1. This produces a running desktop that can be contacted by a vncviewer.

Finally, connect via an SSH secured VNC session.

VNC_VIA_CMD='/usr/bin/ssh -i KEYPAIR.pem -l ec2-user -f -L "$L":"$H":"$R" "$G" sleep 20' vncviewer localhost:1 -via INSTANCE_ADDRESS

What’s going on here? vncviewer allows for a proxy host when connecting to the vncserver. That is the -via argument. The VNC_VIA_CMD is an environment variable that specifies the command used to connect to the proxy. Here it is modified to provide the keypair needed to access the instance, and the user ec2-user, which is the default user on Fedora AMIs. The INSTANCE_ADDRESS is the Hostname from condor_ec2_q.

Alternatively, ssh-add KEYPAIR.pem followed by vncviewer localhost:1 -via ec2-user@INSTANCE_ADDRESS. However, be careful if you have many keys stored in your ssh-agent. They will all be tried and the remote sshd may reject your connection before the proper keypair is found.

Tips:

  • It takes about 20 minutes from start to vncviewer. Once the instance is setup consider creating your own AMI.
  • Set a password for ec2-user, otherwise the screensaver will lock you out. Use sudo passwd ec2-user.
  • Remember AWS charges for data transmitted out of the instance, as well as the uptime of the instance, see EC2 Pricing. You will want to figure out how much bandwidth your workflow takes on average to figure out total cost. For me, a half hour of browsing Planet Fedora, editing with emacs, and compiling some code, transmitted about 60MB of data. That measurement is the difference in eth0′s “TX bytes” as reported by ifconfig. This is not a perfect estimate because there is may have been data transferred within EC2, which is not charged.
  • For transmit rates, consider running bmw-ng to see what actions use the most bandwidth.
  • Generally, make the screen update as little as possible. Constantly changing graphics on web pages can run 60-120KB/s. Compare that to a text console and emacs producing a TX rate closer to 5-25KB/s.
  • Cover consoles with compilations, or compile in a low verbosity mode.


Derek's Blog > Initial EL6 Packages for OSG

Last night I completed initial packages for EL6 support.  Just like for EL5, the first OSG component I created is the osg-wn-client.

The osg-wn-client has a complicated dependency tree.  Easily some of the most difficult packages where form glite.

Just some quick tidbits that made the transition easier:

UUID Differences
uuid.h and the associated library is used by many applications.  In el5, uuid is provided by the e2fsprogs package.  In el6, it has it's own package, libuuid.  It was common for me to copy this tidbit into a few packages:
gsoap Differences
glite-fts-client and glite-data-delegation-api-c both use gsoap.  In the past, it was common to copy stdsoap2.c from the gsoap distribution and compile that into your program.  Now that gsoap is a regular library though, it should be linked into the system's version.  In order to do this, I had to add patches to the Makefiles for both packages to link against the system's gsoap.


What's next?  
The next step is the osg-client.  Since there are no more glite packages for the osg-client, this step should be easier.

Spinning > Manage inventory with Wallaby

Wallaby will manage your configuration, as well as an inventory of your machines. It can differentiate between machines that are expected to be present and those that opportunistically appear.

Build the roster with wallaby add-node -

$ wallaby add-node node0.local node1.local node2.local
Adding the following node: node0.local
Console Connection Established...
Adding the following node: node1.local
Adding the following node: node2.local
$ for i in $(seq 3 10); do wallaby add-node node$i.local; done
Adding the following node: node3.local
Console Connection Established...
Adding the following node: node4.local
Console Connection Established...
...

List expected nodes (provisioned) -

$ wallaby inventory
Console Connection Established...
P        Node name                 Last checkin
-        ---------                 ------------
+      node0.local Wed Jan 11 07:32:33 -0500 20
+      node1.local Thu Jan 05 12:15:00 -0500 20
+     node10.local Wed Jan 11 07:31:56 -0500 20
+      node2.local Wed Jan 11 07:31:56 -0500 20
+      node3.local Wed Jan 11 07:15:21 -0500 20
+      node4.local Wed Jan 11 07:31:42 -0500 20
+      node5.local Wed Jan 11 07:16:47 -0500 20
+      node6.local                        never
+      node7.local Wed Jan 11 07:32:33 -0500 20
+      node8.local Wed Jan 11 07:32:33 -0500 20
+      node9.local Wed Jan 11 07:30:47 -0500 20
-      robin.local Thu Dec 15 14:11:35 -0500 20
-      woods.local Tue Jan 10 20:33:47 -0500 20

List opportunistic, bonus nodes (unprovisioned) -

$ wallaby inventory -o unprovisioned
Console Connection Established...
P        Node name                 Last checkin
-        ---------                 ------------
-      robin.local Thu Dec 15 14:11:35 -0500 20
-      woods.local Tue Jan 10 20:33:47 -0500 20

Provisioned nodes that have never checked in, maybe setup failed -

$ wallaby inventory -c 'last_checkin == 0 && provisioned'
Console Connection Established...
P        Node name                 Last checkin
-        ---------                 ------------
+      node6.local                        never

Provisioned node that have not checked in for the past 4 hours, maybe machine is down -

$ wallaby inventory -c 'last_checkin > 0 && last_checkin < 4.hours_ago && provisioned'
Console Connection Established...
P        Node name                 Last checkin
-        ---------                 ------------
+      node1.local Thu Jan 05 12:15:00 -0500 20

Unprovisioned nodes that have not checked in for 48 hours, candidates for wallaby remove-node -

$ wallaby inventory -c 'last_checkin < 48.hours_ago && !provisioned'
Console Connection Established...
P        Node name                 Last checkin
-        ---------                 ------------
-      robin.local Thu Dec 15 14:11:35 -0500 20 

Enjoy.



OSG Technology Area Rumblings > How to register an image in openstack

After having installed and configured the worker and controller nodes of the openstack testbed we would like to upload images into it.

First I downloaded some images to /root/images on controller node. One is from Xin and another one is a minimal image for testing I got from the net. I have no idea what are they worth.


Then I tried to follow the instructions

http://docs.openstack.org/cactus/openstack-compute/admin/content/part-ii-getting-virtual-machines.html

which go like this:

image="ubuntu1010-UEC-localuser-image.tar.gz"
wget http://c0179148.cdn1.cloudfiles.rackspacecloud.com/ubuntu1010-UEC-localuser-image.tar.gz
uec-publish-tarball $image [bucket-name] [hardware-arch]


and I could not find where does the
uec-publish-tarball

command comes from. Finally I realized that it comes from Ubuntu and the manual became Ubuntu specific without saying it explicitly.


So I tried different approach.

cd /root/images

glance add name="My Image" < sl61-kvm.tar.bz2 # the image I got from Xin

The command responded that the image got Id=1, which is a good sign.

Then I did:

glance show 1

and got:

URI: http://0.0.0.0/images/1
Id: 1
Public: No
Name: My Image
Size: 199737477
Location: file:///var/lib/glance/images/1
Disk format: raw
Container format: ovf

Which suggests that the file is in the system. But when I tried:

glance index

it said:

no public images found

So I tried to register it again:

glance add name="My Image" is_public=true < sl61-kvm.tar.bz2
Added new image with ID: 2

I tried to list:

glance index
Found 1 public images...
ID Name Disk Format Container Format Size
---------------- ------------------------------ -------------------- -------------------- --------------
2 My Image raw ovf 199737477

So it seems we have uploaded an image to the system.


Now I have to figure out how to run it.


Subscribe