<?xml version="1.0"?>
<rss version="2.0">

<channel>
	<title>Planet OSG</title>
	<link>http://www.opensciencegrid.org/</link>
	<language>en</language>
	<description>Planet OSG - http://www.opensciencegrid.org/</description>

<item>
	<title>Spinning: Condor Week 2012</title>
	<guid>http://spinningmatt.wordpress.com/?p=661</guid>
	<link>http://spinningmatt.wordpress.com/2012/05/11/condor-week-2012/</link>
	<description>&lt;p&gt;&lt;a href=&quot;http://research.cs.wisc.edu/condor/CondorWeek2012/&quot;&gt;Condor Week 2012&lt;/a&gt; was last week. As in past years there was great representation from the research community. We learned how research of &lt;a href=&quot;http://research.cs.wisc.edu/condor/CondorWeek2012/presentations/fienen-pest.pdf&quot;&gt;all&lt;/a&gt; &lt;a href=&quot;http://research.cs.wisc.edu/condor/CondorWeek2012/presentations/durham-brooks-genome-function.pdf&quot;&gt;sizes&lt;/a&gt; and &lt;a href=&quot;http://research.cs.wisc.edu/condor/CondorWeek2012/presentations/re-deepdive.pdf&quot;&gt;shapes&lt;/a&gt; is benefiting from &lt;a href=&quot;http://www.opensciencegrid.org/&quot;&gt;high throughput computing resources&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A number of enterprises also turned up to talk about &lt;a href=&quot;http://research.cs.wisc.edu/condor/CondorWeek2012/presentations/nordlund-hartford.pdf&quot;&gt;their uses&lt;/a&gt; &lt;a href=&quot;http://research.cs.wisc.edu/condor/CondorWeek2012/presentations/triplett-pacific-life.pdf&quot;&gt;of Condor&lt;/a&gt;, even if they &lt;a href=&quot;http://twitter.com/#!/frecklesweetp&quot;&gt;didn&amp;#8217;t bring slides&lt;/a&gt;. &lt;a href=&quot;http://research.cs.wisc.edu/condor/CondorWeek2012/presentations/carstensen-dreamworks.pdf&quot;&gt;DreamWorks Animation&lt;/a&gt; was present to tell us how our favorite movies are being created with Condor.&lt;/p&gt;
&lt;p&gt;I got to talk about &lt;a href=&quot;http://research.cs.wisc.edu/condor/CondorWeek2012/presentations/farrellee-red-hat.pdf&quot;&gt;Condor&amp;#8217;s developer community&lt;/a&gt;, and still want to hear from anyone who has tried or would like to contribute to the Condor project. That probably means you!&lt;/p&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/spinningmatt.wordpress.com/661/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/spinningmatt.wordpress.com/661/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/spinningmatt.wordpress.com/661/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/spinningmatt.wordpress.com/661/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/spinningmatt.wordpress.com/661/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/spinningmatt.wordpress.com/661/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/spinningmatt.wordpress.com/661/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/spinningmatt.wordpress.com/661/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/spinningmatt.wordpress.com/661/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/spinningmatt.wordpress.com/661/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/spinningmatt.wordpress.com/661/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/spinningmatt.wordpress.com/661/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/spinningmatt.wordpress.com/661/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/spinningmatt.wordpress.com/661/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=spinningmatt.wordpress.com&amp;#038;blog=6870579&amp;#038;post=661&amp;#038;subd=spinningmatt&amp;#038;ref=&amp;#038;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Fri, 11 May 2012 20:17:29 +0000</pubDate>
</item>
<item>
	<title>Condor Project News: Condor 7.8.0 released! (May 10, 2012)</title>
	<guid>http://www.cs.wisc.edu/condor/manual/v7.8/9_3Stable_Release.html</guid>
	<link>http://www.cs.wisc.edu/condor/manual/v7.8/9_3Stable_Release.html</link>
	<description>The Condor team is pleased to announce the release of Condor 7.8.0.  This is the first entry in a new stable series, contains all 
the features and bug fixes in the 7.7 development series.  See the  version History for a complete list of changes.</description>
	<pubDate>Thu, 10 May 2012 05:00:00 +0000</pubDate>
</item>
<item>
	<title>Inside OSG Ops.: CondorWeek 2012</title>
	<guid>tag:blogger.com,1999:blog-7506259688180433777.post-1566215560492577011</guid>
	<link>http://insideosgops.blogspot.com/2012/05/during-week-of-may-1-4-i-attended.html</link>
	<description>During the week of May 1-4, I attended &lt;a href=&quot;http://research.cs.wisc.edu/condor/CondorWeek2012/&quot;&gt;CondorWeek 2012&lt;/a&gt; in Madison, WI. &amp;nbsp;The event was hosted in the &lt;a href=&quot;http://discovery.wisc.edu/&quot;&gt;Wisconsin Institutes for Discovery&lt;/a&gt;&amp;nbsp;at the &lt;a href=&quot;http://www.wisc.edu/&quot;&gt;University of Wisconsin-Madison&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The first day consisted of tutorials explaining the basics of &lt;a href=&quot;http://research.cs.wisc.edu/condor/&quot;&gt;Condor&lt;/a&gt; usage, workflows, administration, and security. &amp;nbsp;Having spent more time doing general support with OSG and working with the support workflow, I haven't spent as much time as I'd like with Condor, but the tutorials did show me some things that I would like to try. &amp;nbsp;I think that next year, these tutorials would be even more useful for me after trying some new things with Condor in the coming months.&lt;br /&gt;&lt;br /&gt;The final three days consisted of researchers, admins, and companies showing and discussing off the multitude of ways that they use Condor to enhance and extend their workflows. &amp;nbsp;Of particular interest to me was the way that DreamWorks used Condor to send jobs to their rendering software to create their animated films. &amp;nbsp;They showed the trailer for Madagascar 3, which is their first movie to be created using Condor from beginning to end in the rendering process.&lt;br /&gt;&lt;br /&gt;It was interesting to hear other outfits explain how they have had to work around some things Condor currently can't do and see how the Condor team was curious to see if these things could be implemented into coming versions of Condor. &amp;nbsp;The team was very interested to expand and enhance Condor, which is already a very capable product.&lt;br /&gt;&lt;br /&gt;Overall it was a good experience that made me aware of all the things that can be done with Condor. &amp;nbsp;I look forward to trying some of these things for my own, and hopefully returning next year with a much better understanding of Condor as a whole so that I can gain even more information from the talks and tutorials.&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/7506259688180433777-1566215560492577011?l=insideosgops.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Mon, 07 May 2012 09:32:49 +0000</pubDate>
	<author>noreply@blogger.com (K)</author>
</item>
<item>
	<title>Inside OSG Ops.: RabbitMQ &amp; CometD</title>
	<guid>tag:blogger.com,1999:blog-7506259688180433777.post-7214311189520955281</guid>
	<link>http://insideosgops.blogspot.com/2012/03/rabbitmq-cometd.html</link>
	<description>We've been experimenting with following new services in the last few month here at GOC.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;event.grid.iu.edu (RabbitMQ/AMQP Server)&lt;/li&gt;&lt;li&gt;comet.grid.iu.edu (CometD Server)&amp;nbsp;&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;RabbitMQ (event.grid.iu.edu) allows GOC, and OSG users to publish and/or subscribe to various messages generated by our services, and OSG in general. Currently, we receive following messages.&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;RSV status changes&lt;/li&gt;&lt;li&gt;OIM updates&lt;/li&gt;&lt;li&gt;GOC Ticket updates&lt;/li&gt;&lt;li&gt;GIP information changes (prototype)&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Anyone can subscribe to these messages in XML format, and be notified in real time using AMQP messaging client. APIs are available in many languages including, Java, PHP, Python, etc..&amp;nbsp;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;CometD (comet.grid.iu.edu) allows us to push messages to our various web applications. For example, GOC Ticket uses it to display users who are currently viewing a ticket. If someone updates a ticket while someone else is viewing, it will send page refresh request to all viewers. CometD can also be used to implement features such as chat, shared editing, and other&amp;nbsp;functionalists.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;CometD itself is a Java&amp;nbsp;application framework where we can implement various services that client (web browsers) can make requests to. CometD acts as a glue between RabbitMQ and the web browsers. For example, &quot;GOC event service&quot; in comet.grid.iu.edu subscribes to RSV, OIM, and GOC tickets events, and pools all recent events. A web browsers can then make a request to download these events during the initial loading of a page, and it will subscribe to &quot;new event&quot; queue on comet in order to receive new events in real time until user closes the page.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;By using Event &amp;amp; Comet services, we can implement interesting features such as &lt;a href=&quot;http://myosg.grid.iu.edu/miscevent/index?datasource=event&amp;amp;count_sg_1=on&amp;amp;count_active=on&amp;amp;count_enabled=on&quot;&gt;Realtime GOC event&lt;/a&gt;&amp;nbsp;(prototype)&amp;nbsp;in MyOSG. My current goal is to continue experimenting with RabbitMQ/CometD and see what I can (and can not) accomplishing using these tools.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If anyone has an idea about what we can do with these tools, please feel free to send me a message.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/7506259688180433777-7214311189520955281?l=insideosgops.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Mon, 07 May 2012 07:53:54 +0000</pubDate>
	<author>noreply@blogger.com (Soichi Hayashi)</author>
</item>
<item>
	<title>Condor Project News: Condor 7.6.7 released (May 2, 2012)</title>
	<guid>http://research.cs.wisc.edu/condor/manual/v7.6/8_3Stable_Release.html</guid>
	<link>http://research.cs.wisc.edu/condor/manual/v7.6/8_3Stable_Release.html</link>
	<description>The Condor team is pleased to announce the newest release in our development series, 7.6.7.  This release fixes several important bugs, and we believe this will be the last release in the 7.6 series.  Please see the release notes for a complete list.  Condor binaries and source code are available from our Downloads page.</description>
	<pubDate>Wed, 02 May 2012 05:00:00 +0000</pubDate>
</item>
<item>
	<title>Condor Project News: 50,000-Core Condor cluster provisioned by Cycle Computing (May 2, 2012)</title>
	<guid>http://www.hostingtecnews.com/cycle-computing-provisioned-50000-core-cluster-schrodinger-molecular-research</guid>
	<link>http://www.hostingtecnews.com/cycle-computing-provisioned-50000-core-cluster-schrodinger-molecular-research</link>
	<description>for the Schroedinger Drug Discovery Applications Group. See  the article in Hostingtecnews.com, or the article in BioIT World. And, Condor helped enable this research.</description>
	<pubDate>Wed, 02 May 2012 05:00:00 +0000</pubDate>
</item>
<item>
	<title>Condor Project News: Condor 7.7.6 released (April 24, 2012)</title>
	<guid>http://research.cs.wisc.edu/condor/manual/v7.7/9_3Development_Release.html</guid>
	<link>http://research.cs.wisc.edu/condor/manual/v7.7/9_3Development_Release.html</link>
	<description>The Condor team is pleased to announce the newest release in our development series, 7.7.6.  This release represents the release candidate for Condor version 7.8, and it is the last release in the 7.7 development series. Please see the release notes for a complete list.  Condor binaries and source code are available from our Downloads page.</description>
	<pubDate>Tue, 24 Apr 2012 05:00:00 +0000</pubDate>
</item>
<item>
	<title>Derek's Blog: Developments at Nebraska</title>
	<guid>tag:blogger.com,1999:blog-3007054864987759910.post-5387802507005678003</guid>
	<link>http://derekweitzel.blogspot.com/2012/04/developments-at-nebraska.html</link>
	<description>I thought I would do a quick post about recent developments at Nebraska.&lt;br /&gt;&lt;br /&gt;&lt;span&gt;&lt;b&gt;Tusker &amp;amp; OSG&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;a href=&quot;http://1.bp.blogspot.com/-UEWa3N6oNRk/T5Gdkx9bXrI/AAAAAAAAA7g/_76Q34lUoE8/s1600/row1.jpg&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;320&quot; src=&quot;http://1.bp.blogspot.com/-UEWa3N6oNRk/T5Gdkx9bXrI/AAAAAAAAA7g/_76Q34lUoE8/s320/row1.jpg&quot; width=&quot;243&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class=&quot;tr-caption&quot;&gt;New Tusker Cluster&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;We recently received a new cluster, Tusker. &amp;nbsp;It is our newest in a line of clusters that prioritize memory per core, and cores per node over clock speed. &amp;nbsp;Therefore, the cluster is 104 nodes, 102 of which are 64 core, 256GB nodes. &lt;br /&gt;&lt;br /&gt;The goal of this cluster is to enable higher throughput of local user jobs while enabling backfilling of grid jobs. &amp;nbsp;The current breakdown of local and grid jobs can be found on the &lt;a href=&quot;http://hcc.unl.edu/gratia&quot;&gt;hcc monitoring page&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;A common complaint&amp;nbsp;among&amp;nbsp;our local users is interference between processes on the nodes. &amp;nbsp;To address this, we patched torque to add cgroups support for cpu isolation. &amp;nbsp;Memory isolation should come into production in the next few weeks. &amp;nbsp;This will affect grid jobs by locking down their usage to only a single core.&lt;br /&gt;&lt;br /&gt;Nebraska's goal is to support all OSG VO's, and give them equal priority (albeit lower than local users). &amp;nbsp;All OSG VO's are welcome to run on Tusker.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;&lt;b&gt;Nebraska's Contribution to OSG Opportunistic Usage&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;a href=&quot;http://gratiaweb.grid.iu.edu/gratia/pie_graphs/osg_facility_hours?vo=nanohub%7Cglow%7Csdss%7Cdteam%7Cligo%7Cosg%7Cengage%7Cosgedu%7Ctigre%7Cnysgrid%7Ccigi%7Ccompbiogrid%7Cdes%7Cfmri%7Cgpn%7Cgrase%7Cgugrid%7Ci2u2%7Cmariachi%7Cnwicg%7Csbgrid%7Cstar%7Chcc%7Cnebiogrid&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;400&quot; src=&quot;http://gratiaweb.grid.iu.edu/gratia/pie_graphs/osg_facility_hours?vo=nanohub%7Cglow%7Csdss%7Cdteam%7Cligo%7Cosg%7Cengage%7Cosgedu%7Ctigre%7Cnysgrid%7Ccigi%7Ccompbiogrid%7Cdes%7Cfmri%7Cgpn%7Cgrase%7Cgugrid%7Ci2u2%7Cmariachi%7Cnwicg%7Csbgrid%7Cstar%7Chcc%7Cnebiogrid&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class=&quot;tr-caption&quot;&gt;Opportunistic&amp;nbsp;usage by Site (&lt;a href=&quot;http://gratiaweb.grid.iu.edu/gratia/vo?set=Non-HEP%20VOs&amp;amp;vo=nanohub|glow|sdss|dteam|ligo|osg|engage|osgedu|tigre|nysgrid|cigi|compbiogrid|des|fmri|gpn|grase|gugrid|i2u2|mariachi|nwicg|sbgrid|star|hcc|nebiogrid&quot;&gt;source&lt;/a&gt;)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;Nebraska resources have become the largest contributor to opportunistic resources. &amp;nbsp;Easily over 1/4 of&amp;nbsp;opportunistic&amp;nbsp;usage is happening at Nebraska. &amp;nbsp;We are #1 (Tusker), #2 (prairiefire), and after adding Firefly's different CE's, #7. &amp;nbsp;We are very proud of this contribution, and hope it continues.&lt;div&gt;&lt;br /&gt;&lt;br /&gt;Stay tuned, the next year should be exciting...&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/3007054864987759910-5387802507005678003?l=derekweitzel.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Fri, 20 Apr 2012 13:18:58 +0000</pubDate>
	<author>noreply@blogger.com (Derek Weitzel)</author>
</item>
<item>
	<title>Derek's Blog: BOSCO + Campus Factory</title>
	<guid>tag:blogger.com,1999:blog-3007054864987759910.post-9171294433921424376</guid>
	<link>http://derekweitzel.blogspot.com/2012/04/bosco-campus-factory.html</link>
	<description>&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;a href=&quot;http://3.bp.blogspot.com/-28vQ1ijxGaw/T4yCAR7X8NI/AAAAAAAAA7Y/F7QbMqygG6M/s1600/photo.JPG&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;300&quot; src=&quot;http://3.bp.blogspot.com/-28vQ1ijxGaw/T4yCAR7X8NI/AAAAAAAAA7Y/F7QbMqygG6M/s400/photo.JPG&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class=&quot;tr-caption&quot;&gt;Checklist while implementing CF + Bosco integration&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;For the last several months, the&lt;a href=&quot;https://twiki.grid.iu.edu/bin/view/CampusGrids/WebHome&quot;&gt; campus infrastructure&lt;/a&gt; team has worked on software that will help users create a larger, more inclusive campus grid. &amp;nbsp;The goal largely has been to make the software easier to install and expand.&lt;br /&gt;&lt;br /&gt;Integrating the &lt;a href=&quot;https://twiki.grid.iu.edu/bin/view/Documentation/CampusFactoryInstall&quot;&gt;Campus Factory&lt;/a&gt; (which is already used on many campuses) with &lt;a href=&quot;https://twiki.grid.iu.edu/bin/view/CampusGrids/BoSCO&quot;&gt;Bosco&lt;/a&gt;&amp;nbsp;has been a key goal for this effort. &amp;nbsp;Last week, I finally integrated the two&amp;nbsp;(&lt;a href=&quot;https://twiki.grid.iu.edu/bin/view/Documentation/CampusFactoryInstallRC&quot;&gt;Beta Install Doc&lt;/a&gt;). &amp;nbsp;This will have many benefits for the user over both the current Campus Factory and current Bosco.&lt;br /&gt; &lt;br /&gt;&lt;table id=&quot;box-table-b&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Feature &lt;/th&gt;&lt;th&gt;Campus Factory &lt;/th&gt;&lt;th&gt;Bosco v0 &lt;/th&gt;&lt;th&gt;Campus Factory + Bosco &lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Installation &lt;/td&gt;&lt;td&gt;Large installation/configuration instructions.  Install Condor and campus factory on every cluster. &lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;Install on a central submit node.  Configuration is handled automatically. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Adding Resources &lt;/td&gt;&lt;td&gt;Install Condor and the campus factory on every cluster.  Configure to link to other submit and campus factory clusters. &lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;Run the command &lt;br /&gt;&lt;pre&gt;bosco_cluster -add&lt;/pre&gt;Installation and configuration handled auto-magically. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;File Transfer &lt;/td&gt;&lt;td&gt;Using Condor file transfer, can transfer input and output. &lt;/td&gt;&lt;td&gt;Manually with scp by the user.  No stderr or stdout from job. &lt;/td&gt;&lt;td&gt;Using Condor file transfer, can transfer input and output. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Job State &lt;/td&gt;&lt;td&gt;Accurate user job state. &lt;/td&gt;&lt;td&gt;Delayed user job state.  No exit codes from the user jobs. &lt;/td&gt;&lt;td&gt;Accurate user job state. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;As you can see from the above table, the combined Campus Factory + Bosco takes the best from both technologies.&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/3007054864987759910-9171294433921424376?l=derekweitzel.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Mon, 16 Apr 2012 16:36:34 +0000</pubDate>
	<author>noreply@blogger.com (Derek Weitzel)</author>
</item>
<item>
	<title>Spinning: Service as a Job: HDFS NameNode</title>
	<guid>http://spinningmatt.wordpress.com/?p=651</guid>
	<link>http://spinningmatt.wordpress.com/2012/04/16/service-as-a-job-hdfs-namenode/</link>
	<description>&lt;p&gt;Scheduling an &lt;a href=&quot;http://spinningmatt.wordpress.com/2012/04/04/service-as-a-job-hdfs-datanode/&quot;&gt;HDFS DataNode&lt;/a&gt; is a powerful function. However, an operational HDFS instance also requires a NameNode. Here is an example of how a NameNode can be scheduled, followed by scheduled DataNodes, to create an HDFS instance.&lt;/p&gt;
&lt;p&gt;From here, HDFS instances can be dynamically created on shared resources. Workflows can be built to manage, grow and shrink HDFS instances. Multiple HDFS instances can be deployed on a single set of resources.&lt;/p&gt;
&lt;p&gt;The control script is based on hdfs_datanode.sh. It discovers the NameNode&amp;#8217;s endpoints and chirps them.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;hdfs_namenode.sh&lt;/b&gt;&lt;br /&gt;
&lt;pre class=&quot;brush: bash;&quot;&gt;
#!/bin/sh -x

# condor_chirp in /usr/libexec/condor
export PATH=$PATH:/usr/libexec/condor

HADOOP_TARBALL=$1

# Note: bin/hadoop uses JAVA_HOME to find the runtime and tools.jar,
#       except tools.jar does not seem necessary therefore /usr works
#       (there's no /usr/lib/tools.jar, but there is /usr/bin/java)
export JAVA_HOME=/usr

# When we get SIGTERM, which Condor will send when
# we are kicked, kill off the namenode
function term {
   ./bin/hadoop-daemon.sh stop namenode
}

# Unpack
tar xzfv $HADOOP_TARBALL

# Move into tarball, inefficiently
cd $(tar tzf $HADOOP_TARBALL | head -n1)

# Configure,
#  . fs.default.name,dfs.http.address must be set to port 0 (ephemeral)
#  . dfs.name.dir must be in _CONDOR_SCRATCH_DIR for cleanup
# FIX: Figure out why a hostname is required, instead of 0.0.0.0:0 for
#      fs.default.name
cat &amp;gt; conf/hdfs-site.xml &amp;lt;&amp;lt;EOF
&amp;lt;?xml version=&amp;quot;1.0&amp;quot;?&amp;gt;
&amp;lt;?xml-stylesheet type=&amp;quot;text/xsl&amp;quot; href=&amp;quot;configuration.xsl&amp;quot;?&amp;gt;
&amp;lt;configuration&amp;gt;
  &amp;lt;property&amp;gt;
    &amp;lt;name&amp;gt;fs.default.name&amp;lt;/name&amp;gt;
    &amp;lt;value&amp;gt;hdfs://$HOSTNAME:0&amp;lt;/value&amp;gt;
  &amp;lt;/property&amp;gt;
  &amp;lt;property&amp;gt;
    &amp;lt;name&amp;gt;dfs.name.dir&amp;lt;/name&amp;gt;
    &amp;lt;value&amp;gt;$_CONDOR_SCRATCH_DIR/name&amp;lt;/value&amp;gt;
  &amp;lt;/property&amp;gt;
  &amp;lt;property&amp;gt;
    &amp;lt;name&amp;gt;dfs.http.address&amp;lt;/name&amp;gt;
    &amp;lt;value&amp;gt;0.0.0.0:0&amp;lt;/value&amp;gt;
  &amp;lt;/property&amp;gt;
&amp;lt;/configuration&amp;gt;
EOF

# Try to shutdown cleanly
trap term SIGTERM

export HADOOP_CONF_DIR=$PWD/conf
export HADOOP_LOG_DIR=$_CONDOR_SCRATCH_DIR/logs
export HADOOP_PID_DIR=$PWD

./bin/hadoop namenode -format
./bin/hadoop-daemon.sh start namenode

# Wait for pid file
PID_FILE=$(echo hadoop-*-namenode.pid)
while [ ! -s $PID_FILE ]; do sleep 1; done
PID=$(cat $PID_FILE)

# Wait for the log
LOG_FILE=$(echo $HADOOP_LOG_DIR/hadoop-*-namenode-*.log)
while [ ! -s $LOG_FILE ]; do sleep 1; done

# It would be nice if there were a way to get these without grepping logs
while [ ! $(grep &amp;quot;IPC Server listener on&amp;quot; $LOG_FILE) ]; do sleep 1; done
IPC_PORT=$(grep &amp;quot;IPC Server listener on&amp;quot; $LOG_FILE | sed 's/.* on \(.*\):.*/\1/')
while [ ! $(grep &amp;quot;Jetty bound to port&amp;quot; $LOG_FILE) ]; do sleep 1; done
HTTP_PORT=$(grep &amp;quot;Jetty bound to port&amp;quot; $LOG_FILE | sed 's/.* to port \(.*\)$/\1/')

# Record the port number where everyone can see it
condor_chirp set_job_attr NameNodeIPCAddress \&amp;quot;$HOSTNAME:$IPC_PORT\&amp;quot;
condor_chirp set_job_attr NameNodeHTTPAddress \&amp;quot;$HOSTNAME:$HTTP_PORT\&amp;quot;

# While namenode is running, collect and report back stats
while kill -0 $PID; do
   # Collect stats and chirp them back into the job ad
   # Nothing to do.
   sleep 30
done
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;The job description file is standard at this point.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;hdfs_namenode.job&lt;/b&gt;&lt;br /&gt;
&lt;pre class=&quot;brush: plain;&quot;&gt;
cmd = hdfs_namenode.sh
args = hadoop-1.0.1-bin.tar.gz

transfer_input_files = hadoop-1.0.1-bin.tar.gz

#output = namenode.$(cluster).out
#error = namenode.$(cluster).err

log = namenode.$(cluster).log

kill_sig = SIGTERM

# Want chirp functionality
+WantIOProxy = TRUE

should_transfer_files = yes
when_to_transfer_output = on_exit

requirements = HasJava =?= TRUE

queue
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;In operation -&lt;/p&gt;
&lt;p&gt;Submit the namenode and find its endpoints,&lt;br /&gt;
&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
$ condor_submit hdfs_namenode.job
Submitting job(s).
1 job(s) submitted to cluster 208.

$ condor_q -long 208| grep NameNode       
NameNodeHTTPAddress = &amp;quot;eeyore.local:60182&amp;quot;
NameNodeIPCAddress = &amp;quot;eeyore.local:38174&amp;quot;
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;Open a browser window to http://eeyore.local:60182 to find the cluster summary,&lt;br /&gt;
&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
1 files and directories, 0 blocks = 1 total. Heap Size is 44.81 MB / 888.94 MB (5%)
  Configured Capacity                   :        0 KB
  DFS Used                              :        0 KB
  Non DFS Used                          :        0 KB
  DFS Remaining                         :        0 KB
  DFS Used%                             :       100 %
  DFS Remaining%                        :         0 %
 Live Nodes                             :           0
 Dead Nodes                             :           0
 Decommissioning Nodes                  :           0
  Number of Under-Replicated Blocks     :           0
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;Add a datanode,&lt;br /&gt;
&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
$ condor_submit -a NameNodeAddress=hdfs://eeyore.local:38174 hdfs_datanode.job 
Submitting job(s).
1 job(s) submitted to cluster 209.
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;Refresh the cluster summary,&lt;br /&gt;
&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
1 files and directories, 0 blocks = 1 total. Heap Size is 44.81 MB / 888.94 MB (5%)
  Configured Capacity                   :     9.84 GB
  DFS Used                              :       28 KB
  Non DFS Used                          :     9.54 GB
  DFS Remaining                         :   309.79 MB
  DFS Used%                             :         0 %
  DFS Remaining%                        :      3.07 %
 Live Nodes                             :           1
 Dead Nodes                             :           0
 Decommissioning Nodes                  :           0
  Number of Under-Replicated Blocks     :           0
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;And another,&lt;br /&gt;
&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
$ condor_submit -a NameNodeAddress=hdfs://eeyore.local:38174 hdfs_datanode.job
Submitting job(s).
1 job(s) submitted to cluster 210.
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;Refresh,&lt;br /&gt;
&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
1 files and directories, 0 blocks = 1 total. Heap Size is 44.81 MB / 888.94 MB (5%)
  Configured Capacity                   :    19.69 GB
  DFS Used                              :       56 KB
  Non DFS Used                          :    19.26 GB
  DFS Remaining                         :   435.51 MB
  DFS Used%                             :         0 %
  DFS Remaining%                        :      2.16 %
 Live Nodes                             :           2
 Dead Nodes                             :           0
 Decommissioning Nodes                  :           0
  Number of Under-Replicated Blocks     :           0
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;All the building blocks necessary to run HDFS on scheduled resources.&lt;/p&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/spinningmatt.wordpress.com/651/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/spinningmatt.wordpress.com/651/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/spinningmatt.wordpress.com/651/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/spinningmatt.wordpress.com/651/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/spinningmatt.wordpress.com/651/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/spinningmatt.wordpress.com/651/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/spinningmatt.wordpress.com/651/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/spinningmatt.wordpress.com/651/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/spinningmatt.wordpress.com/651/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/spinningmatt.wordpress.com/651/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/spinningmatt.wordpress.com/651/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/spinningmatt.wordpress.com/651/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/spinningmatt.wordpress.com/651/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/spinningmatt.wordpress.com/651/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=spinningmatt.wordpress.com&amp;#038;blog=6870579&amp;#038;post=651&amp;#038;subd=spinningmatt&amp;#038;ref=&amp;#038;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Mon, 16 Apr 2012 10:44:13 +0000</pubDate>
</item>
<item>
	<title>Spinning: Service as a Job: HDFS DataNode</title>
	<guid>http://spinningmatt.wordpress.com/?p=641</guid>
	<link>http://spinningmatt.wordpress.com/2012/04/04/service-as-a-job-hdfs-datanode/</link>
	<description>&lt;p&gt;Building on other examples of &lt;a href=&quot;http://spinningmatt.wordpress.com/tag/service-as-a-job/&quot;&gt;services run as jobs&lt;/a&gt;, such as &lt;a href=&quot;http://spinningmatt.wordpress.com/2011/02/27/service-as-a-job-the-tomcat-app-server/&quot;&gt;Tomcat&lt;/a&gt;, &lt;a href=&quot;http://spinningmatt.wordpress.com/2010/05/18/service-as-a-job-the-qpid-c-broker/&quot;&gt;Qpidd&lt;/a&gt; and &lt;a href=&quot;http://spinningmatt.wordpress.com/2011/12/05/service-as-a-job-memcached/&quot;&gt;memcached&lt;/a&gt;, here is an example for the &lt;a href=&quot;http://hadoop.apache.org/hdfs/&quot;&gt;Hadoop Distributed File System&lt;/a&gt;&amp;#8216;s &lt;a href=&quot;http://hadoop.apache.org/common/docs/current/hdfs_design.html#NameNode+and+DataNodes&quot;&gt;DataNode&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Below is the control script for the datanode. It mirrors the memcached&amp;#8217;s script, but does not publish statistics. However, datanode statistic/metrics could be pulled and published.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;hdfs_datanode.sh&lt;/b&gt;&lt;br /&gt;
&lt;pre class=&quot;brush: bash;&quot;&gt;
#!/bin/sh -x

# condor_chirp in /usr/libexec/condor
export PATH=$PATH:/usr/libexec/condor

HADOOP_TARBALL=$1
NAMENODE_ENDPOINT=$2

# Note: bin/hadoop uses JAVA_HOME to find the runtime and tools.jar,
#       except tools.jar does not seem necessary therefore /usr works
#       (there's no /usr/lib/tools.jar, but there is /usr/bin/java)
export JAVA_HOME=/usr

# When we get SIGTERM, which Condor will send when
# we are kicked, kill off the datanode and gather logs
function term {
   ./bin/hadoop-daemon.sh stop datanode
# Useful if we can transfer data back
#   tar czf logs.tgz logs
#   cp logs.tgz $_CONDOR_SCRATCH_DIR
}

# Unpack
tar xzfv $HADOOP_TARBALL

# Move into tarball, inefficiently
cd $(tar tzf $HADOOP_TARBALL | head -n1)

# Configure,
#  . dfs.data.dir must be in _CONDOR_SCRATCH_DIR for cleanup
#  . address,http.address,ipc.address must be set to port 0 (ephemeral)
cat &amp;gt; conf/hdfs-site.xml &amp;lt;&amp;lt;EOF
&amp;lt;?xml version=&amp;quot;1.0&amp;quot;?&amp;gt;
&amp;lt;?xml-stylesheet type=&amp;quot;text/xsl&amp;quot; href=&amp;quot;configuration.xsl&amp;quot;?&amp;gt;
&amp;lt;configuration&amp;gt;
  &amp;lt;property&amp;gt;
    &amp;lt;name&amp;gt;fs.default.name&amp;lt;/name&amp;gt;
    &amp;lt;value&amp;gt;$NAMENODE_ENDPOINT&amp;lt;/value&amp;gt;
  &amp;lt;/property&amp;gt;
  &amp;lt;property&amp;gt;
    &amp;lt;name&amp;gt;dfs.data.dir&amp;lt;/name&amp;gt;
    &amp;lt;value&amp;gt;$_CONDOR_SCRATCH_DIR/data&amp;lt;/value&amp;gt;
  &amp;lt;/property&amp;gt;
  &amp;lt;property&amp;gt;
    &amp;lt;name&amp;gt;dfs.datanode.address&amp;lt;/name&amp;gt;
    &amp;lt;value&amp;gt;0.0.0.0:0&amp;lt;/value&amp;gt;
  &amp;lt;/property&amp;gt;
  &amp;lt;property&amp;gt;
    &amp;lt;name&amp;gt;dfs.datanode.http.address&amp;lt;/name&amp;gt;
    &amp;lt;value&amp;gt;0.0.0.0:0&amp;lt;/value&amp;gt;
  &amp;lt;/property&amp;gt;
  &amp;lt;property&amp;gt;
    &amp;lt;name&amp;gt;dfs.datanode.ipc.address&amp;lt;/name&amp;gt;
    &amp;lt;value&amp;gt;0.0.0.0:0&amp;lt;/value&amp;gt;
  &amp;lt;/property&amp;gt;
&amp;lt;/configuration&amp;gt;
EOF

# Try to shutdown cleanly
trap term SIGTERM

export HADOOP_CONF_DIR=$PWD/conf
export HADOOP_PID_DIR=$PWD
export HADOOP_LOG_DIR=$_CONDOR_SCRATCH_DIR/logs
./bin/hadoop-daemon.sh start datanode

# Wait for pid file
PID_FILE=$(echo hadoop-*-datanode.pid)
while [ ! -s $PID_FILE ]; do sleep 1; done
PID=$(cat $PID_FILE)

# Report back some static data about the datanode
# e.g. condor_chirp set_job_attr SomeAttr SomeData
# None at the moment.

# While the datanode is running, collect and report back stats
while kill -0 $PID; do
   # Collect stats and chirp them back into the job ad
   # None at the moment.
   sleep 30
done
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;The job description below uses a templating technique. The description uses a variable &lt;code&gt;NameNodeAddress&lt;/code&gt;, which is not defined in the description. Instead, the value is provided as an argument to condor_submit. In fact, a complete job can be defined without a description file, e.g. &lt;code&gt;echo queue | condor_submit -a executable=/bin/sleep -a args=1d&lt;/code&gt;, but more on that some other time.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;hdfs_datanode.job&lt;/b&gt;&lt;br /&gt;
&lt;pre class=&quot;brush: plain;&quot;&gt;
# Submit w/ condor_submit -a NameNodeAddress=&amp;lt;address&amp;gt;
# e.g. &amp;lt;address&amp;gt; = hdfs://$HOSTNAME:2007

cmd = hdfs_datanode.sh
args = hadoop-1.0.1-bin.tar.gz $(NameNodeAddress)

transfer_input_files = hadoop-1.0.1-bin.tar.gz

# RFE: Ability to get output files even when job is removed
#transfer_output_files = logs.tgz
#transfer_output_remaps = &amp;quot;logs.tgz logs.$(cluster).tgz&amp;quot;
output = datanode.$(cluster).out
error = datanode.$(cluster).err

log = datanode.$(cluster).log

kill_sig = SIGTERM

# Want chirp functionality
+WantIOProxy = TRUE

should_transfer_files = yes
when_to_transfer_output = on_exit

requirements = HasJava =?= TRUE

queue
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;hadoop-1.0.1-bin.tar.gz is available from &lt;a href=&quot;http://archive.apache.org/dist/hadoop/core/hadoop-1.0.1/&quot;&gt;http://archive.apache.org/dist/hadoop/core/hadoop-1.0.1/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Finally, here is a running example,&lt;/p&gt;
&lt;p&gt;A namenode is already running -&lt;br /&gt;
&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
$ ./bin/hadoop dfsadmin -conf conf/hdfs-site.xml -report
Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;Submit a datanode, knowing the namenode&amp;#8217;s IPC port is 9000 -&lt;br /&gt;
&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
$ condor_submit -a NameNodeAddress=hdfs://$HOSTNAME:9000 hdfs_datanode.job
Submitting job(s).
1 job(s) submitted to cluster 169.

$ condor_q
-- Submitter: eeyore.local : &amp;lt;127.0.0.1:59889&amp;gt; : eeyore.local
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD 
 169.0   matt            3/26 15:17   0+00:00:16 R  0   0.0 hdfs_datanode.sh h
1 jobs; 0 idle, 1 running, 0 held
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;Storage is now available, though not very much as my disk is almost full -&lt;br /&gt;
&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
$ ./bin/hadoop dfsadmin -conf conf/hdfs-site.xml -report
Configured Capacity: 63810015232 (59.43 GB)
Present Capacity: 4907495424 (4.57 GB)
DFS Remaining: 4907466752 (4.57 GB)
DFS Used: 28672 (28 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;Submit 9 more datanodes -&lt;br /&gt;
&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
$ condor_submit -a NameNodeAddress=hdfs://$HOSTNAME:9000 hdfs_datanode.job
Submitting job(s).
1 job(s) submitted to cluster 170.
...
1 job(s) submitted to cluster 178.

$ ./bin/hadoop dfsadmin -conf conf/hdfs-site.xml -report
Configured Capacity: 638100152320 (594.28 GB)
Present Capacity: 40399958016 (37.63 GB)
DFS Remaining: 40399671296 (37.63 GB)
DFS Used: 286720 (280 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 10 (10 total, 0 dead)
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;At this point you can run a workload against the storage, visit the namenode at http://localhost:50070, or simply use &lt;code&gt;./bin/hadoop fs&lt;/code&gt; to interact with the filesystem.&lt;/p&gt;
&lt;p&gt;Remember, all the datanodes were dispatched by a scheduler, run along side existing workload on your pool, and are completely manageable by standard policies.&lt;/p&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/spinningmatt.wordpress.com/641/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/spinningmatt.wordpress.com/641/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/spinningmatt.wordpress.com/641/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/spinningmatt.wordpress.com/641/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/spinningmatt.wordpress.com/641/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/spinningmatt.wordpress.com/641/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/spinningmatt.wordpress.com/641/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/spinningmatt.wordpress.com/641/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/spinningmatt.wordpress.com/641/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/spinningmatt.wordpress.com/641/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/spinningmatt.wordpress.com/641/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/spinningmatt.wordpress.com/641/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/spinningmatt.wordpress.com/641/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/spinningmatt.wordpress.com/641/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=spinningmatt.wordpress.com&amp;#038;blog=6870579&amp;#038;post=641&amp;#038;subd=spinningmatt&amp;#038;ref=&amp;#038;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Wed, 04 Apr 2012 09:15:10 +0000</pubDate>
</item>
<item>
	<title>Derek's Blog: OSG AHM 2012</title>
	<guid>tag:blogger.com,1999:blog-3007054864987759910.post-6577510568944472528</guid>
	<link>http://derekweitzel.blogspot.com/2012/03/osg-ahm-2012.html</link>
	<description>&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://2.bp.blogspot.com/-QV_4QT3zIYg/T3I7W1_ZXpI/AAAAAAAAA7Q/KZE54209E7U/s1600/welcomesign.png&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;142&quot; src=&quot;http://2.bp.blogspot.com/-QV_4QT3zIYg/T3I7W1_ZXpI/AAAAAAAAA7Q/KZE54209E7U/s400/welcomesign.png&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;This years &lt;a href=&quot;https://indico.fnal.gov/conferenceDisplay.py?confId=5109&quot;&gt;all hands&lt;/a&gt; meeting was a great success! &amp;nbsp;There where a few sessions that I really enjoyed.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://flic.kr/s/aHsjyJGT8G&quot;&gt;AHM Pictures&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;&lt;b&gt; Campus Caucus&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;There where many user engagement people there. &amp;nbsp;I believe that we reached a consensus that there isn't much an engagement community. &lt;br /&gt;&lt;br /&gt;For example, Wisconsin has a great method for distributing and running MATLAB and R application on the OSG, but there has been no knowledge transfer to other engagement folks. &amp;nbsp;I know a few UNL users that have wanted to run MATLAB on our resources. &amp;nbsp;If we could move a HCC MATLAB workflow to the Grid, I believe that would be a great success.&lt;br /&gt;&lt;br /&gt;I completely agree that there is no Engagement 'community'. &amp;nbsp;But I think that's true of most of the OSG. &amp;nbsp;Though, there have been recently many improvements. &lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;I think the centralized Jira has helped tremendously. &amp;nbsp;It's very easy to see what other people have been working on and even the general direction of progress. &amp;nbsp;Though this only works for OSG 'employees' and OSG projects.&lt;/li&gt;&lt;li&gt;The OSG blogs have been successful for the technology group to explain what they are working on. &amp;nbsp;Though, I wish they had shorter and more often posts.&lt;/li&gt;&lt;/ul&gt;I hope that the blogs can be a way to spread the OSG Engagement activity. &amp;nbsp;It's also a great way to point to code and work that is being done. &amp;nbsp;Also, blog posts shouldn't be limited to things that the author is doing, but could point to what other people are doing. &amp;nbsp;For example, I knew nothing about the Rosetta people at Wisconsin until my poster was setup next to theirs and was able to have a conversation. &amp;nbsp;It would have been great to see some information that of what they where doing outside of the once a year meeting.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;&lt;b&gt; Science on the OSG&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;a href=&quot;https://indico.fnal.gov/getFile.py/access?contribId=12&amp;amp;sessionId=5&amp;amp;resId=0&amp;amp;materialId=slides&amp;amp;confId=5109&quot;&gt;Talk&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I thought this talk by Frank was great. &amp;nbsp;I felt he had the same feeling that we where all feeling, that Protein processing was becoming a very large user of the OSG. &amp;nbsp;We've seen this at HCC with both CPASS, CS-Rosetta, and Autodock. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;a href=&quot;http://3.bp.blogspot.com/-bEXZ-VfTDDg/T3Ini9rxNcI/AAAAAAAAA7A/cw675px8UeY/s1600/Screen+Shot+2012-03-27+at+3.47.43+PM.png&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;246&quot; src=&quot;http://3.bp.blogspot.com/-bEXZ-VfTDDg/T3Ini9rxNcI/AAAAAAAAA7A/cw675px8UeY/s400/Screen+Shot+2012-03-27+at+3.47.43+PM.png&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class=&quot;tr-caption&quot;&gt;Walltime usage for non-HEP&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div&gt;Frank also pointed out a graph of usage. &amp;nbsp;At the end of the graph, there seems to be a plateau. &amp;nbsp;Possibly we are hitting opportunistic resource limits?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;Here's an updated usage graph&amp;nbsp;(&lt;a href=&quot;http://gratiaweb.grid.iu.edu/gratia/xml/osg_wall_hours?endtime=2012-03-27+23%3A59%3A59&amp;amp;span=604800&amp;amp;facility=.*&amp;amp;probe=.*&amp;amp;resource-type=^Batch%24&amp;amp;vo=.*&amp;amp;role=.*&amp;amp;user=.*&amp;amp;starttime=2007-06-01+00%3A00%3A00&amp;amp;exclude-facility=NONE|Generic|Obsolete&amp;amp;exclude-user=NONE&amp;amp;includeSuccess=true&amp;amp;exclude-vo=unknown|other|vo|atlas|cdf|dzero|cms|Unknown&amp;amp;includeFailed=true&amp;amp;exclude-role=NONE&quot;&gt;source&lt;/a&gt;):&lt;br /&gt;&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;a href=&quot;http://2.bp.blogspot.com/-6p85DUqrB_E/T3IzsDtuaOI/AAAAAAAAA7I/R7ScyhPveBI/s1600/osg_wall_hours.png&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;250&quot; src=&quot;http://2.bp.blogspot.com/-6p85DUqrB_E/T3IzsDtuaOI/AAAAAAAAA7I/R7ScyhPveBI/s400/osg_wall_hours.png&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class=&quot;tr-caption&quot;&gt;Walltime usage for non-HEP updated&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&amp;nbsp;The thing to note is the explosive growth of GLOW VO. &amp;nbsp;Their usage has increased dramatically recently.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;span&gt;&lt;b&gt; OSG in 2017&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;a href=&quot;https://indico.fnal.gov/getFile.py/access?contribId=19&amp;amp;sessionId=5&amp;amp;resId=0&amp;amp;materialId=slides&amp;amp;confId=5109&quot;&gt;Talk&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I really liked to see what people thought the OSG would look like in 2017.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Chander's prediction that people will come to us to use the OSG. &amp;nbsp;I believe this will take a critical mass of users. &amp;nbsp;I think we have a good product to sell, we just need publicity. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;Chander's comment on data is also important. &amp;nbsp;But I believe the problem with data isn't&amp;nbsp;necessarily&amp;nbsp;storage, but it's access to the data. &amp;nbsp;Take for example Dropbox. &amp;nbsp;For free, they offer very little storage. &amp;nbsp;The main advantage is that it's&amp;nbsp;accessible&amp;nbsp;from anywhere, laptop, desktop, iphone, web... &amp;nbsp;I think a uniform data access method can get us a lot further than distributed storage.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;Alain's prediction that we will be using more community software. &amp;nbsp;This will take a large effort to be part of the distribution's community. &amp;nbsp;I foresee us contributing packages, patches, and effort to Fedora EPEL and possibly Ubuntu. &amp;nbsp;I think we are making great strides with the packaging, and would like to continue injecting us into the Fedora community.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;span&gt;&lt;b&gt; Nebraska Campus Infrastructure&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;a href=&quot;https://indico.fnal.gov/getFile.py/access?contribId=48&amp;amp;sessionId=5&amp;amp;resId=0&amp;amp;materialId=slides&amp;amp;confId=5109&quot;&gt;Talk&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Of course, my talk is worth looking at.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This post ended up larger than I was hoping for. &amp;nbsp; Oh well.&lt;/div&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/3007054864987759910-6577510568944472528?l=derekweitzel.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Tue, 27 Mar 2012 17:36:34 +0000</pubDate>
	<author>noreply@blogger.com (Derek Weitzel)</author>
</item>
<item>
	<title>Condor Project News: Two Condor papers selected as most influential in HPDC (March 25, 2012)</title>
	<guid>https://www.cs.wisc.edu/node/9123</guid>
	<link>https://www.cs.wisc.edu/node/9123</link>
	<description>Two papers by the Condor Team were selected as to represent the most influential papers in the history of The International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC). Details...</description>
	<pubDate>Sun, 25 Mar 2012 05:00:00 +0000</pubDate>
</item>
<item>
	<title>Spinning: Daylight savings, costs CPU</title>
	<guid>http://spinningmatt.wordpress.com/?p=624</guid>
	<link>http://spinningmatt.wordpress.com/2012/03/20/daylight-savings-costs-cpu/</link>
	<description>&lt;p&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Daylight_saving_time&quot;&gt;&lt;img src=&quot;http://upload.wikimedia.org/wikipedia/commons/2/29/DaylightSaving-World-Subdivisions.png&quot; alt=&quot;World daylight savings map&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://erikerlandson.github.com/&quot;&gt;Erik&lt;/a&gt; ran into a surprising &lt;a href=&quot;http://erikerlandson.github.com/blog/2012/03/19/interaction-between-mktime-and-tm-isdst-a-compute-cycle-landmine/&quot;&gt;performance problem&lt;/a&gt; over the weekend.&lt;/p&gt;
&lt;p&gt;If you live in an orange area of the map, &lt;code&gt;mktime(3)&lt;/code&gt; may be killing your performance.&lt;/p&gt;
&lt;p&gt;tm_isdst is important. Don&amp;#8217;t guess!&lt;/p&gt;
&lt;p&gt;Don&amp;#8217;t construct &lt;code&gt;struct tm&lt;/code&gt; manually. Or, even better, let &lt;code&gt;mktime&lt;/code&gt; decide DST for you. Set &lt;code&gt;tm_isdst = -1&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Are you impacted? Run your programs under callgrind, vary between Standard and Daylight, and look at mktime&amp;#8217;s profile. Using Erik&amp;#8217;s jig,&lt;/p&gt;
&lt;p&gt;&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
$ TZ=EDT valgrind --tool=callgrind ./test_mktime
...
$ TZ=EST valgrind --tool=callgrind ./test_mktime
...
$ callgrind_annotate --inclusive=yes --tree=calling ...
...

$ grep &amp;quot;mktime &amp;quot; EST.txt EDT.txt 
EST.txt:Profiled target:  ./test_mktime (PID 29595, part 1)
EST.txt:    1,765  &amp;amp;gt;   ???:mktime (1x) [/lib64/libc-2.5.so]
EST.txt:  546,244  *  ???:mktime [/lib64/libc-2.5.so]
EST.txt:  542,350  &amp;amp;gt;   ???:mktime (1x) [/lib64/libc-2.5.so]

EDT.txt:Profiled target:  ./test_mktime (PID 18878, part 1)
EDT.txt:    4,819  &amp;amp;gt;   ???:mktime (1x) [/lib64/libc-2.5.so]
EDT.txt:   23,119  *  ???:mktime [/lib64/libc-2.5.so]
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;It is currently EDT.&lt;/p&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/spinningmatt.wordpress.com/624/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/spinningmatt.wordpress.com/624/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/spinningmatt.wordpress.com/624/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/spinningmatt.wordpress.com/624/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/spinningmatt.wordpress.com/624/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/spinningmatt.wordpress.com/624/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/spinningmatt.wordpress.com/624/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/spinningmatt.wordpress.com/624/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/spinningmatt.wordpress.com/624/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/spinningmatt.wordpress.com/624/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/spinningmatt.wordpress.com/624/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/spinningmatt.wordpress.com/624/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/spinningmatt.wordpress.com/624/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/spinningmatt.wordpress.com/624/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=spinningmatt.wordpress.com&amp;#038;blog=6870579&amp;#038;post=624&amp;#038;subd=spinningmatt&amp;#038;ref=&amp;#038;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Tue, 20 Mar 2012 12:56:24 +0000</pubDate>
</item>
<item>
	<title>Derek's Blog: Burning the LiveUSBs</title>
	<guid>tag:blogger.com,1999:blog-3007054864987759910.post-3654986359753786280</guid>
	<link>http://derekweitzel.blogspot.com/2012/03/burning-liveusbs.html</link>
	<description>In my &lt;a href=&quot;http://derekweitzel.blogspot.com/2012/02/building-osg-client-liveusb.html&quot;&gt;last post&lt;/a&gt;, I talked about the OSG LiveUSBs. &amp;nbsp;Now that the conference is next week, I have started burning the USB with the image. &lt;br /&gt;&lt;br /&gt;&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;a href=&quot;http://2.bp.blogspot.com/-7P6BPLqYAw0/T2OhuakWsEI/AAAAAAAAA6s/Lqm4f3lSEu8/s1600/keys_piles.JPG&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;480&quot; src=&quot;http://2.bp.blogspot.com/-7P6BPLqYAw0/T2OhuakWsEI/AAAAAAAAA6s/Lqm4f3lSEu8/s640/keys_piles.JPG&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class=&quot;tr-caption&quot;&gt;USB Key Piles&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class=&quot;separator&quot;&gt;&lt;br /&gt;&lt;/div&gt;I burned 4 USBs at a time, using the script below. &amp;nbsp;Parted didn't work, never really found out why. &amp;nbsp;The symptoms where that the USB would not boot, but they where readable by Macs. &amp;nbsp;So I scripted fdisk.&lt;br /&gt;&lt;script src=&quot;https://gist.github.com/2052428.js&quot;&gt;&lt;/script&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/3007054864987759910-3654986359753786280?l=derekweitzel.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Fri, 16 Mar 2012 16:37:02 +0000</pubDate>
	<author>noreply@blogger.com (Derek Weitzel)</author>
</item>
<item>
	<title>Condor Project News: Congratulations to Victor Ruotti (March 15, 2012)</title>
	<guid>http://bit.ly/wfNCTt</guid>
	<link>http://bit.ly/wfNCTt</link>
	<description>computational biologist at the Morgridge Institute for Research, winner of the CycleCloud BigScience Challenge 2011. Read more about both the challenge and the finalists in the Cycle Computing press release.</description>
	<pubDate>Thu, 15 Mar 2012 05:00:00 +0000</pubDate>
</item>
<item>
	<title>OSG Technology Area Rumblings: Resource Isolation in Condor using cgroups</title>
	<guid>tag:blogger.com,1999:blog-8803173202887660937.post-2804145841486035015</guid>
	<link>http://osgtech.blogspot.com/2012/03/resource-isolation-in-condor-using.html</link>
	<description>This is the last in my series on job isolation techniques. &amp;nbsp;It has spanned in postings over the last month, so it may help to recap:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href=&quot;http://osgtech.blogspot.com/2012/02/job-isolation-in-condor.html&quot;&gt;Part I&lt;/a&gt; covered process isolation, prevent processes in one job from interacting with other jobs. &amp;nbsp;This has been achievable through POSIX mechanisms for awhile, but the new PID namespaces mechanisms provide improved isolation for jobs running as the same user.&lt;/li&gt;&lt;li&gt;&lt;a href=&quot;http://osgtech.blogspot.com/2012/02/file-isolation-using-bind-mounts-and.html&quot;&gt;Part II&lt;/a&gt;&amp;nbsp;and &lt;a href=&quot;http://osgtech.blogspot.com/2012/02/improving-file-isolation-with-chroot.html&quot;&gt;Part III&lt;/a&gt; discussed file isolation using bind mounts and chroots. &amp;nbsp;Condor uses bind mounts to remove access to &quot;problematic&quot; directories such as /tmp. &amp;nbsp;While more complex to setup, chroots allow jobs to run in a completely separate environment as the host and further isolates the job sandbox.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;This post will cover &lt;i&gt;resource isolation&lt;/i&gt;:&amp;nbsp;preventing jobs from consuming system resources promised to another job.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Condor has always had some crude form of resource isolation. &amp;nbsp;For example, the worker node could be configured to detect when the processes in a job have more CPU time than walltime (a rough indication that more than one core is being used) or when the sum of each process's virtual memory size exceeds the memory requested for the job. &amp;nbsp;When Condor detects too many resources are being consumed, it can take an action such as suspending or killing the job.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This traditional approach is relatively unsatisfactory for a few reasons:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Condor periodically polls to view resource consumption. &amp;nbsp;Any activity between polls is unmonitored.&lt;/li&gt;&lt;li&gt;The metrics Condor traditionally monitors are limited to memory and CPU, where the memory metrics are poor quality for complex jobs. &amp;nbsp;The sum many process's virtual memory size, on a modern Linux box, has little correlation with RAM used and is not particularly meaningful.&lt;/li&gt;&lt;li&gt;We can do little with the system besides detect when resource limits have been violated and kill the job.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;We cannot, for example, simply instruct the kernel to reduce the job's memory or CPU usage.&lt;/li&gt;&lt;li&gt;Accordingly, users must ask for &lt;b&gt;peak&lt;/b&gt;&amp;nbsp;resource usage, which may be well-above &lt;b&gt;average&lt;/b&gt;&amp;nbsp;resource usage, &lt;i&gt;decreasing overall throughput&lt;/i&gt;. &amp;nbsp;If the job needs 2GB on average but 4GB for a single second, the user will ask for 4GB; the other 2GB will be un-utilized.&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;In Linux, the oldest form of resource isolation is processor affinity or CPU pinning: a job can be locked to a specific CPU, and all its processes will inherit the affinity. &amp;nbsp;Because two jobs are locked to separate CPUs, they will never consume each others' CPU resources. &amp;nbsp;CPU pinning is unsatisfactory for reasons similar to memory: jobs can't utilize otherwise-idle CPUs, decreasing potential system throughput. &amp;nbsp;The granularity is also poor: you can't evenly fairshare 25 jobs on a machine with 24 cores as each job must be locked to at least one core. &amp;nbsp;However, it's a step forward - you don't need to kill jobs for using too much CPU - and present in Condor since 7.3.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Newer Linux kernels support &lt;a href=&quot;http://osgtech.blogspot.com/2011/07/part-iii-bulletproof-process-tracking.html&quot;&gt;cgroups&lt;/a&gt;, which allow are structures for managing groups of processes, and provide &lt;i&gt;controllers&lt;/i&gt;&amp;nbsp;for managing resources in each cgroup. &amp;nbsp;In Condor 7.7.0, cgroup support was added for measuring resource usage. &amp;nbsp;When enabled, Condor will place each job into a dedicated cgroup for the block-I/O, memory, CPU, and &quot;freezer&quot; controllers. &amp;nbsp;&lt;a href=&quot;https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2734&quot;&gt;We have implemented&lt;/a&gt;&amp;nbsp;two new limiting mechanisms based on the memory and CPU controllers.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The CPU controller provides a mechanism for fairsharing between different cgroups. &amp;nbsp;CPU shares are assigned to jobs based on the &quot;slot weight&quot; (by default, equal to the number of cores the job requested). &amp;nbsp;Thus, a job asking for 2 cores will get an average of 2 cores on a fully loaded system. &amp;nbsp;If there's an idle CPU, it could utilize more than 2 cores; however, it will never get less than what it requested for a significant amount of time. &amp;nbsp;CPU fairsharing provides a much finer granularity than pinning, easily allowing the jobs-to-cores ratio be non-integer.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The memory controller provides two kinds of limits: soft and hard. &amp;nbsp;When &lt;i&gt;soft&lt;/i&gt; limits are in place, the job can use an arbitrary amount of RAM until the host runs out of memory (and starts to swap); when this happens, only jobs over their limit are swapped out. &amp;nbsp;With &lt;i&gt;hard&lt;/i&gt;&amp;nbsp;limits, the job immediately starts swapping once it hits its RAM limit, regardless of the amount of free memory. &amp;nbsp;Both soft and hard limits default to the amount of memory requested for the job.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Both methods also have disadvantages. &amp;nbsp;Soft limits can cause &quot;well-behaved&quot; processes to wait while the OS frees up RAM from &quot;badly behaving&quot; process. &amp;nbsp;Hard limits can cause large amounts of swapping (for example, if there's a memory leak), decreasing the entire node's disk performance and thus adversely affecting other jobs. &amp;nbsp;In fact, it may be a better use of resources to preempt a heavily-swapping process and reschedule it on another node than let it continue running. &amp;nbsp;There is further room for improvement here in the future.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Regardless, cgroups and controllers provide a solid improvement in resource isolation for Condor, and finish up our series on job isolation. &amp;nbsp;Thanks for reading!&lt;/div&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/8803173202887660937-2804145841486035015?l=osgtech.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Sat, 10 Mar 2012 11:28:31 +0000</pubDate>
	<author>noreply@blogger.com (Brian Bockelman)</author>
</item>
<item>
	<title>Spinning: Best practices: batch applications input</title>
	<guid>http://spinningmatt.wordpress.com/?p=617</guid>
	<link>http://spinningmatt.wordpress.com/2012/03/08/best-practices-batch-applications-input/</link>
	<description>&lt;p&gt;For application developers and their users.&lt;/p&gt;
&lt;p&gt;An application that is going to be run in batch mode &amp;#8212; non-interactive, maybe scheduled to remote resources &amp;#8212; is going to need input[0]. That input might be a few numbers or numerous data files. As an application developer, the case of data files can be difficult to get right. There are many ways to handle data files and &lt;b&gt;one big pitfall: hard-coded paths&lt;/b&gt;.&lt;/p&gt;
&lt;p&gt;Below are a few options for application developers and the resulting work for a Condor user of the application.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;The best approach&lt;/b&gt; is to read input from stdin or have paths to data files passed via arguments. Doing so shows the developer has batch processing in mind, and provides the applications user with clear options.&lt;/p&gt;
&lt;p&gt;A submit file for an application that reads from stdin -&lt;/p&gt;
&lt;pre&gt;
executable = batch_app
input = input.dat
queue
&lt;/pre&gt;
&lt;p&gt;A submit file for an application that takes data files as arguments -&lt;/p&gt;
&lt;pre&gt;
executable = batch_app
arguments = --input=input.dat
transfer_input_files = input.dat
queue
&lt;/pre&gt;
&lt;p&gt;&lt;b&gt;A middle ground approach&lt;/b&gt; may be necessary if the set of input files is large or their relationships are complex. In such a case, a meta-data file can be used, or the input files can be laid out in a well-defined pattern in the filesystem. Note: &amp;#8220;well-defined pattern in the filesystem&amp;#8221; is often a myth.&lt;/p&gt;
&lt;p&gt;Of these approaches, the meta-data file is preferred. It makes the input files and their relationships explicit. However, it can be more difficult for the application&amp;#8217;s user from a Condor perspective. When the files are laid out in the filesystem the tendency is for the application to not have a well-defined layout, or a definition maintained independently of the application.&lt;/p&gt;
&lt;p&gt;A submit file for an application that takes a meta-data file -&lt;/p&gt;
&lt;pre&gt;
executable = batch_app
arguments = --input=input.meta
transfer_input_files = input.meta[,all the files listed in input.meta]
queue
&lt;/pre&gt;
&lt;p&gt;The difficulty comes in listing all the files from the input.meta. This is often mitigated by providing URIs, or paths, in input.meta that may point into a shared filesystem. The files in a shared filesystem need not be transferred by Condor and need not be listed in &lt;code&gt;transfer_input_files&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;A submit file for an application that takes a hopefully-well-defined filesystem layout,&lt;/p&gt;
&lt;pre&gt;
executable = batch_app
arguments = --input=data_dir
transfer_input_files = data_dir
queue
&lt;/pre&gt;
&lt;p&gt;This is simpler because Condor will transfer everything under data_dir into the job&amp;#8217;s scratch space and keep it under a directory called data_dir. Often, the data_dir will even exist on a shared filesystem and will not need to be transferred (remove &lt;code&gt;transfer_input_files = data_dir&lt;/code&gt; and provide full path with &lt;code&gt;--input&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;Note: &lt;code&gt;transfer_input_files = data_dir/&lt;/code&gt; will not replicate the directory tree in the job&amp;#8217;s scratch space. It will be collapsed.&lt;/p&gt;
&lt;p&gt;These two approaches can be combined to get the best of both.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;The worst approach&lt;/b&gt; is really a non-approach and involves hard-coding paths into the application. Arguably the application does not have a batch mode. It will fail when not run in its expected environment, which may simply mean by a user different from the developer, or on a new shared filesystem, or an old shared filesystem with new mounts. These application should be avoided or modified to provide a batch mode.&lt;/p&gt;
&lt;p&gt;Developers beware, you can turn near success with a meta-data file into a failure by hard-coding its path.&lt;/p&gt;
&lt;p&gt;Takeaway -&lt;/p&gt;
&lt;p&gt;For developers, an application that has a batch processing will parametrize all its inputs[1].&lt;/p&gt;
&lt;p&gt;For users, beware of applications that operate on data that you have not provided.&lt;/p&gt;
&lt;p&gt;[0] Even if it is just a random seed.&lt;br /&gt;
[1] Database or URI connections to get data also matter.&lt;/p&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/spinningmatt.wordpress.com/617/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/spinningmatt.wordpress.com/617/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/spinningmatt.wordpress.com/617/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/spinningmatt.wordpress.com/617/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/spinningmatt.wordpress.com/617/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/spinningmatt.wordpress.com/617/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/spinningmatt.wordpress.com/617/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/spinningmatt.wordpress.com/617/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/spinningmatt.wordpress.com/617/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/spinningmatt.wordpress.com/617/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/spinningmatt.wordpress.com/617/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/spinningmatt.wordpress.com/617/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/spinningmatt.wordpress.com/617/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/spinningmatt.wordpress.com/617/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=spinningmatt.wordpress.com&amp;#038;blog=6870579&amp;#038;post=617&amp;#038;subd=spinningmatt&amp;#038;ref=&amp;#038;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Thu, 08 Mar 2012 05:16:16 +0000</pubDate>
</item>
<item>
	<title>Condor Project News: Condor in EC2 now supported by Red Hat (March 2, 2012)</title>
	<guid>http://www.redhat.com/about/news/press-archive/2012/2/Red-Hat-Expands-Subscription-Portability-to-the-Cloud-with-Amazon-Web-Services</guid>
	<link>http://www.redhat.com/about/news/press-archive/2012/2/Red-Hat-Expands-Subscription-Portability-to-the-Cloud-with-Amazon-Web-Services</link>
	<description>Red Hat Enterprise MRG support expands into the cloud as announced in this press release, by leveraging the EC2 job universe in Condor, and by maintaining supported images of Red Hat Enterprise Linux with Condor pre-installed on Amazon storage services.  Red Hat MRG can schedule local grids, remote grids, virtual machines for internal clouds, and now, rented cloud infrastructure. There is also documentation of this feature.</description>
	<pubDate>Fri, 02 Mar 2012 05:00:00 +0000</pubDate>
</item>
<item>
	<title>Condor Project News: Condor 7.7.5 released (February 28, 2012)</title>
	<guid>http://research.cs.wisc.edu/condor/manual/v7.7/8_3Development_Release.html</guid>
	<link>http://research.cs.wisc.edu/condor/manual/v7.7/8_3Development_Release.html</link>
	<description>The Condor team is pleased to announce the newest release in our development series, 7.7.5.  This release represents the feature freeze for Condor version 7.8, and we expect it to be the penultimate release in the 7.7 development series. New features in this release include better statistics for monitoring a Condor pool, better support for absent ads in the collector, fast claiming of partitionable slots, and support for some newer Linux kernel features to better support process isolation. Please see the release notes for a complete list.  Condor binaries and source code are available from our Downloads page.</description>
	<pubDate>Tue, 28 Feb 2012 05:00:00 +0000</pubDate>
</item>
<item>
	<title>OSG Technology Area Rumblings: Improving File Isolation with chroot</title>
	<guid>tag:blogger.com,1999:blog-8803173202887660937.post-4833728643726960817</guid>
	<link>http://osgtech.blogspot.com/2012/02/improving-file-isolation-with-chroot.html</link>
	<description>&lt;a href=&quot;http://osgtech.blogspot.com/2012/02/file-isolation-using-bind-mounts-and.html&quot;&gt;In the last post&lt;/a&gt;, we examined a new Condor feature called &lt;i&gt;MOUNT_UNDER_SCRATCH&lt;/i&gt; that will isolate jobs from each other on the file system by making world-writable directories (such as &lt;i&gt;/tmp&lt;/i&gt; and &lt;i&gt;/var/tmp&lt;/i&gt;) be unique and isolated per-batch-job.&lt;br /&gt;&lt;br /&gt;That work started with the assumption that jobs from the same Unix user don't need to be isolated from each other. &amp;nbsp;This isn't necessarily true on the grid: a single, shared account per-VO is still popular on the OSG. &amp;nbsp;For such VOs, an attacker can gain additional credentials by reading the sandbox of each job running under the same Unix username.&lt;br /&gt;&lt;br /&gt;To combat proxy-stealing, we use an old Linux trick called a &quot;&lt;b&gt;chroot&lt;/b&gt;&quot;. &amp;nbsp;A sysadmin can create a complete copy of the OS inside a directory, and an appropriately-privileged process can change the root of its filesystem (&quot;&lt;i&gt;/&lt;/i&gt;&quot;) to that directory. &amp;nbsp;In fact, the phrase &quot;changing root&quot; where we get the &quot;&lt;b&gt;chroot&lt;/b&gt;&quot; terminology.&lt;br /&gt;&lt;br /&gt;For example, suppose the root of the system looks like this:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;[root@localhost ~]# ls /&lt;br /&gt;bin &amp;nbsp; &amp;nbsp; cvmfs &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; hadoop-data2 &amp;nbsp;home &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;media &amp;nbsp;opt &amp;nbsp; selinux &amp;nbsp;usr&lt;br /&gt;boot &amp;nbsp; &amp;nbsp;dev &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; hadoop-data3 &amp;nbsp;lib &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; misc &amp;nbsp; proc &amp;nbsp;srv &amp;nbsp; &amp;nbsp; &amp;nbsp;var&lt;br /&gt;cgroup &amp;nbsp;etc &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; hadoop-data4 &amp;nbsp;lib64 &amp;nbsp; &amp;nbsp; &amp;nbsp; mnt &amp;nbsp; &amp;nbsp;root &amp;nbsp;sys&lt;br /&gt;chroot &amp;nbsp;hadoop-data1 &amp;nbsp;hadoop.log &amp;nbsp; &amp;nbsp;lost+found &amp;nbsp;net &amp;nbsp; &amp;nbsp;sbin &amp;nbsp;tmp&lt;br /&gt;&lt;/pre&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The sysadmin can create a copy of the RHEL5 operating system inside a sub-directory; at our site, this is &lt;i&gt;/chroot/sl5-v3/root&lt;/i&gt;:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;pre&gt;[root@localhost ~]# ls /chroot/sl5-v3/root/&lt;br /&gt;bin &amp;nbsp; cvmfs &amp;nbsp;etc &amp;nbsp; lib &amp;nbsp; &amp;nbsp;media &amp;nbsp;opt &amp;nbsp; root &amp;nbsp;selinux &amp;nbsp;sys &amp;nbsp;usr&lt;br /&gt;boot &amp;nbsp;dev &amp;nbsp; &amp;nbsp;home &amp;nbsp;lib64 &amp;nbsp;mnt &amp;nbsp; &amp;nbsp;proc &amp;nbsp;sbin &amp;nbsp;srv &amp;nbsp; &amp;nbsp; &amp;nbsp;tmp &amp;nbsp;var&lt;br /&gt;&lt;/pre&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Note how the contents of the chroot directory are stripped down relative to the OS - we can remove dangerous binaries, sensitive configurations, or anything else unnecessary to running a job. &amp;nbsp;For example, many common Linux privilege escalation exploits come from the presence of a &lt;a href=&quot;http://en.wikipedia.org/wiki/Setuid&quot;&gt;setuid binary&lt;/a&gt;. &amp;nbsp;Such binaries (&lt;b&gt;at&lt;/b&gt;, &lt;b&gt;cron&lt;/b&gt;, &lt;b&gt;ping&lt;/b&gt;) are necessary for managing the host, but not necessary for a running job. &amp;nbsp;By eliminating the setuid binaries from the chroot, a sysadmin can eliminate a common attack vector for processes running inside.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Once the directory is built, we can call chroot and isolate ourselves from the host:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;pre&gt;[root@red-d15n6 ~]# &lt;b&gt;chroot&lt;/b&gt; /chroot/sl5-v3/root/&lt;br /&gt;bash-3.2# ls /&lt;br /&gt;bin &amp;nbsp; cvmfs &amp;nbsp;etc &amp;nbsp; lib&lt;span class=&quot;Apple-tab-span&quot;&gt; &lt;/span&gt; &amp;nbsp;media &amp;nbsp;opt &amp;nbsp; root &amp;nbsp;selinux &amp;nbsp;sys &amp;nbsp;usr&lt;br /&gt;boot &amp;nbsp;dev &amp;nbsp; &amp;nbsp;home &amp;nbsp;lib64 &amp;nbsp;mnt&lt;span class=&quot;Apple-tab-span&quot;&gt; &lt;/span&gt; proc &amp;nbsp;sbin &amp;nbsp;srv &amp;nbsp; &amp;nbsp; &amp;nbsp;tmp &amp;nbsp;var&lt;br /&gt;&lt;/pre&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Condor, as of 7.7.5, now knows how to invoke the &lt;b&gt;chroot&lt;/b&gt;&amp;nbsp;syscall for user jobs. &amp;nbsp;However, as the job sandbox is written &lt;i&gt;outside&lt;/i&gt;&amp;nbsp;the chroot, we must somehow transport it inside before starting the job. &amp;nbsp;Bind mounts - &lt;a href=&quot;http://osgtech.blogspot.com/2012/02/file-isolation-using-bind-mounts-and.html&quot;&gt;discussed last time&lt;/a&gt; - come to our rescue. &amp;nbsp;The entire process goes something like this:&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;Condor, as root, forks off a new child process.&lt;/li&gt;&lt;li&gt;The child uses the &lt;b&gt;unshare&lt;/b&gt; system call to place itself in a new filesystem namespace.&lt;/li&gt;&lt;li&gt;The child calls &lt;b&gt;mount&lt;/b&gt; to bind-mount the job sandbox inside the chroot. &amp;nbsp;Any other bind mounts - such as &lt;i&gt;/tmp&lt;/i&gt; or &lt;i&gt;/var/tmp&lt;/i&gt; - are done at this time.&lt;/li&gt;&lt;li&gt;The child will invoke the &lt;b&gt;chroot&lt;/b&gt; system call specifying the directory the sysadmin has configured.&lt;/li&gt;&lt;li&gt;The child drops privileges to the target batch system user, then calls &lt;b&gt;exec&lt;/b&gt;&amp;nbsp;to start&amp;nbsp;the user process.&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;&lt;div&gt;&lt;a href=&quot;https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2822&quot;&gt;With this patch applied&lt;/a&gt;, Condor will copy &lt;i&gt;only&lt;/i&gt; the job's sandbox forward into the filesystem namespace, meaning the job has access to no other sandbox (as all other sandboxes live outside the private namespace). &amp;nbsp;This successfully isolates jobs from each other's sandboxes, even if they run under the same Unix user!&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;br /&gt;The Condor feature is referred to as&amp;nbsp;&lt;a href=&quot;https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2698&quot;&gt;&lt;i&gt;NAMED_CHROOT&lt;/i&gt;&lt;/a&gt;, as sysadmins can created multiple chroot-capable directories, give them a user-friendly name (such as &lt;i&gt;RHEL5&lt;/i&gt;, as opposed to&amp;nbsp;&lt;i&gt;/chroot/sl5-v3/root&lt;/i&gt;), and allow user jobs to ask for the directory by the friendly name in their submit file.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In addition to the security benefits, we have found the &lt;i&gt;NAMED_CHROOT&lt;/i&gt; feature allows us to run a RHEL5 job on a RHEL6 host without using virtualization; something for the future.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Going back to our original list of directories needing isolation - system temporary directories, job sandbox, shared filesystems, and GRAM directories - we have now isolated everything except the shared filesystems. The option here is simple, if unpleasant: mount the shared file system as read-only. &amp;nbsp;This is the modus operandi for &lt;b&gt;$OSG_APP&lt;/b&gt;&amp;nbsp;at many sites, and an acceptable (but not recommended) way to run &lt;b&gt;$OSG_DATA&lt;/b&gt;&amp;nbsp;(as &lt;b&gt;$OSG_DATA&lt;/b&gt; is optional anyway). &amp;nbsp;It restricts the functionality for the user, but brings us a step closer to our goal of job isolation.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;After file isolation, we have one thing left: resource isolation. &amp;nbsp;Again, a topic for the future.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/8803173202887660937-4833728643726960817?l=osgtech.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Mon, 27 Feb 2012 11:33:56 +0000</pubDate>
	<author>noreply@blogger.com (Brian Bockelman)</author>
</item>
<item>
	<title>Derek's Blog: Building an OSG-Client LiveUSB</title>
	<guid>tag:blogger.com,1999:blog-3007054864987759910.post-5415905860495723856</guid>
	<link>http://derekweitzel.blogspot.com/2012/02/building-osg-client-liveusb.html</link>
	<description>&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;a href=&quot;http://4.bp.blogspot.com/-Vuj4CEmZjwY/T0fthDkem-I/AAAAAAAAA6Q/LBtLBvjqupQ/s1600/keys.jpg&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;300&quot; src=&quot;http://4.bp.blogspot.com/-Vuj4CEmZjwY/T0fthDkem-I/AAAAAAAAA6Q/LBtLBvjqupQ/s400/keys.jpg&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class=&quot;tr-caption&quot;&gt;Nebraska/OSG USB keys to be distributed at OSG-AHM 2012&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;Since we have started using RPM's for osg software, I've been interested if it where possible to make a LiveUSB of the client. &amp;nbsp;Due to the great documentation provided on the &lt;a href=&quot;http://www.livecd.ethz.ch/index.html&quot;&gt;Scientific Linux LiveCD &lt;/a&gt;page, along with the &lt;a href=&quot;https://projects.centos.org/trac/livecd/&quot;&gt;CentOS LiveCD&lt;/a&gt; page, I've created a OSG Client LiveUSB that will be put on the keys.&lt;br /&gt;&lt;br /&gt;&lt;table align=&quot;center&quot; cellpadding=&quot;0&quot; cellspacing=&quot;0&quot; class=&quot;tr-caption-container&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;a href=&quot;http://2.bp.blogspot.com/-9jTTYavNVJk/T0fwb493TxI/AAAAAAAAA6g/fXWez5KgNnY/s1600/Screen+Shot+2012-02-24+at+2.17.48+PM.png&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;508&quot; src=&quot;http://2.bp.blogspot.com/-9jTTYavNVJk/T0fwb493TxI/AAAAAAAAA6g/fXWez5KgNnY/s640/Screen+Shot+2012-02-24+at+2.17.48+PM.png&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class=&quot;tr-caption&quot;&gt;Desktop of OSG Client Live CD&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;From the picture, notice links on the desktop to OSG User Docs, OSG LiveCD Docs, and How to get a certificate.&lt;br /&gt;&lt;br /&gt;&lt;span&gt;&lt;b&gt;Live image creation&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;The live image creation was done using the livecd-tools package. &amp;nbsp;I used a Fedora 16 instance on the HCC private cloud to make SL6 images. &amp;nbsp;The kickstart file used can be found on &lt;a href=&quot;https://github.com/djw8605/osg-livecd/tree/master/sl&quot;&gt;github&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span&gt;&lt;b&gt;What's Installed&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;The goal of the LiveUSB is to give an easily deliverable demo of the OSG-Client, therefore only the OSG-Client and Condor are installed. &amp;nbsp;The LiveUSB has some persistant data storage area, but not much.&lt;br /&gt;&lt;br /&gt;Tools are installed in order to install the live image, including the OSG-Client components, to the local hard drive. &amp;nbsp;Researchers can then easily have a node up and running.&lt;br /&gt;&lt;br /&gt;Also, people that know how to run virtual machines on their computers can easily create a virtual machine with the OSG-Client from this USB. &amp;nbsp;Just boot from the USB, and click on the Install to Hard Drive icon on the desktop.&lt;br /&gt;&lt;br /&gt;These keys will be distributed to attendees of the &lt;a href=&quot;http://hcc.unl.edu/presentations/event.php?ideventof=5&quot;&gt;OSG All Hands meeting&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;I am open for suggestions on what should be on the LiveUSB. &amp;nbsp;The image is not final yet.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;UPDATE:&lt;/b&gt;&lt;br /&gt;Link to Current ISO:&amp;nbsp;&lt;a href=&quot;http://glidein.unl.edu/OSG-SL6.2-x86_64-LiveUSB.iso&quot;&gt;OSG-SL6.2-x86_64-LiveUSB.iso&lt;/a&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/3007054864987759910-5415905860495723856?l=derekweitzel.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Fri, 24 Feb 2012 15:18:41 +0000</pubDate>
	<author>noreply@blogger.com (Derek Weitzel)</author>
</item>
<item>
	<title>OSG Technology Area Rumblings: File Isolation using bind mounts and chroots</title>
	<guid>tag:blogger.com,1999:blog-8803173202887660937.post-5734694593802146494</guid>
	<link>http://osgtech.blogspot.com/2012/02/file-isolation-using-bind-mounts-and.html</link>
	<description>The last post ended with a new technique for process-level isolation that unlocks our ability to safely use anonymous accounts and group accounts.&lt;br /&gt;&lt;br /&gt;However, that's not &quot;safe enough&quot; for us: the jobs can still interact with each other via the file system. &amp;nbsp;This post examines the directories where jobs can write into, and what can be done to remove this access.&lt;br /&gt;&lt;br /&gt;On a typical batch system node, a user can write into the following directories:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;System temporary directories&lt;/b&gt;: The Linux Filesystem Hierarchy Standard (FHS) provides at least two &lt;a href=&quot;http://en.wikipedia.org/wiki/Sticky_bit&quot;&gt;sticky&lt;/a&gt;, world-writable directories, /tmp and /var/tmp. &amp;nbsp;These directories are traditionally unmanaged (user processes can write an uncontrolled amount of data here) and a security issue (symlink attacks and information leaks), even when user separation is in place.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Job Sandbox&lt;/b&gt;: This is a directory created by the batch system as a scratch location for the job. &amp;nbsp;The contents of the directory will be cleaned out by the batch system after the job ends. &amp;nbsp;For Condor, any user proxy, executable, or job stage-in files will be copied here prior to the job starting.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Shared Filesystems&lt;/b&gt;: For a non-grid site, this is typically at least $HOME, and some other site-specific directory. &amp;nbsp;$HOME is owned by the user running the job. &amp;nbsp;On the OSG, we also have $OSG_APP for application installation (typically read-only for worker nodes) and, optionally, $OSG_DATA for data staging (writable for worker nodes). &amp;nbsp;If they exist and are writable, $OSG_APP/DATA are owned by root and marked as sticky.&lt;/li&gt;&lt;li&gt;&lt;b&gt;GRAM directories&lt;/b&gt;: For non-Condor OSG sites, a few user-writable directories are needed to transfer the executable, proxy, and job stage-in files from the gatekeeper to the worker node. &amp;nbsp;These default to $HOME, but can be relocated to any shared filesystem directory. &amp;nbsp;For Condor-based OSG sites, this is a part of the job sandbox.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;If user separation is in place and considered sufficient, filesystem isolation is taken care of for shared filesystems, GRAM directories, and the job sandbox. &amp;nbsp;The systemwide temporary directories can be protected by mixing &lt;i&gt;&lt;a href=&quot;http://www.kernel.org/doc/man-pages/online/pages/man2/clone.2.html&quot;&gt;filesystem namespaces&lt;/a&gt;&lt;/i&gt; and &lt;i&gt;&lt;a href=&quot;http://www.kernel.org/doc/man-pages/online/pages/man2/mount.2.html&quot;&gt;bind mounts&lt;/a&gt;&lt;/i&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A &lt;a href=&quot;http://www.kernel.org/doc/man-pages/online/pages/man2/clone.2.html&quot;&gt;process can be launched&lt;/a&gt; in its own filesystem namespace; such a process will have a copy of the system mount table. &amp;nbsp;Any change made to the process's mount table will not be seen by the outside system, and will be shared with any child processes.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For example, if the user's home directory is not mounted on the host, the batch system could create a process in a new filesystem namespace and mount the home directory in that namespace. &amp;nbsp;The home directory will be available to the batch job, but to no other process on the filesystem.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When the last process in the filesystem namespace exits, all mounts that are unique to that namespace will be unmounted. &amp;nbsp;In our example, when the batch job exits, the kernel will unmount the home directory.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A &lt;a href=&quot;http://www.kernel.org/doc/man-pages/online/pages/man2/mount.2.html&quot;&gt;bind mount&lt;/a&gt; makes a file or directory visible at another place in the filesystem - I think of it as mirroring the directory elsewhere. &amp;nbsp;We can take the job sandbox directory, create a sub-directory, and bind-mount the sub-directory over &lt;b&gt;/tmp&lt;/b&gt;. &amp;nbsp;The process is mostly equivalent to the following shell commands (where &lt;b&gt;$_CONDOR_SCRATCH_DIR&lt;/b&gt; is the location of the Condor job sandbox) in a filesystem namespace:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;pre&gt;mkdir $_CONDOR_SCRATCH_DIR/tmp&lt;br /&gt;mount --bind $_CONDOR_SCRATCH_DIR/tmp /tmp&lt;br /&gt;&lt;/pre&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Afterward, any files a process creates in &lt;b&gt;/tmp&lt;/b&gt; will actually be stored in &lt;b&gt;$_CONDOR_SCRATCH_DIR/tmp&lt;/b&gt; - and cleaned up accordingly by Condor on job exit. &amp;nbsp;Any system process not in the job will not be able to see or otherwise interfere with the contents of the job's &lt;b&gt;/tmp&lt;/b&gt; unless it can write into &lt;b&gt;$_CONDOR_SCRATCH_DIR&lt;/b&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Condor refers to this feature as &lt;i&gt;MOUNT_UNDER_SCRATCH&lt;/i&gt;, and will be a part of the 7.7.5 release. &amp;nbsp;This will be an admin-specified list of directories on the worker node. &amp;nbsp;With it, the job will have a private copy of these directories, which will be backed by &lt;b&gt;$_CONDOR_SCRATCH_DIR&lt;/b&gt;. &amp;nbsp;The contents - and size - of these will be managed by Condor, just like anything else in the scratch directory.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If user separation is unavailable or not considered sufficient (if there are, for example, group accounts), an additional layer of isolation is needed to protect the job sandbox. &amp;nbsp;A topic for a future day!&lt;/div&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/8803173202887660937-5734694593802146494?l=osgtech.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Mon, 20 Feb 2012 08:03:49 +0000</pubDate>
	<author>noreply@blogger.com (Brian Bockelman)</author>
</item>
<item>
	<title>Derek's Blog: Ceph on Fedora 16</title>
	<guid>tag:blogger.com,1999:blog-3007054864987759910.post-8237054800561194348</guid>
	<link>http://derekweitzel.blogspot.com/2012/02/ceph-on-fedora-16.html</link>
	<description>I've written before how to run ceph on Fedora 15, but now I'm working on Fedora 16.&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://derekweitzel.blogspot.com/2011/10/ceph-on-fedora-15.html&quot;&gt;Last time&lt;/a&gt;&amp;nbsp;I complained about how much ceph tries to do for you. &amp;nbsp;For better or worse, now it attempts to do more for you!&lt;br /&gt;&lt;br /&gt;For my setup, I had 3 nodes in the HCC private cloud. &amp;nbsp;First, we need to install ceph.&lt;br /&gt;&lt;pre&gt;$ yum install ceph&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Then, create a configuration file for ceph. &amp;nbsp;The RPM comes with a good example that my configuration is based on. &amp;nbsp;The example script is in&amp;nbsp;&lt;span&gt;/usr/share/doc/ceph/sample.ceph.conf&lt;/span&gt;&lt;br /&gt;&lt;span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;My configuration:&amp;nbsp;&lt;a href=&quot;https://gist.github.com/1850697&quot;&gt;Derek's Configuration&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The configuration has the authentication turned off. &amp;nbsp;I found this useful because the ceph-authtool (yes, the renamed it since Fedora 15) is difficult to use. &amp;nbsp;And because all of the nodes are on a private vlan only reachable by my openvpn key :)&lt;br /&gt;&lt;br /&gt;Then, you need to create and distribute ssh keys to all of your nodes so that the mkcephfs can ssh to them and configure.&lt;br /&gt;&lt;pre&gt;$ ssh-keygen &lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Then copy them to the nodes:&lt;br /&gt;&lt;pre&gt;$ ssh-copy-id i-000000c2&lt;br /&gt;$ ssh-copy-id i-000000c3&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Be sure to make the data directories on all the nodes. &amp;nbsp;In this case:&lt;br /&gt;&lt;pre&gt;$ mkdir -p /data/osd.0&lt;br /&gt;$ ssh i-000000c2 'mkdir -p /data/osd.1'&lt;br /&gt;$ ssh i-000000c3 'mkdir -p /data/osd.2'&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Then run the mkcephfs command:&lt;br /&gt;&lt;pre&gt;$ mkcephfs -a -c /etc/ceph/ceph.conf&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And start up the daemons:&lt;br /&gt;&lt;pre&gt;$ service ceph start&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;You should have the daemons running then. &amp;nbsp;If they fail for some reason, they tend to output what the problem was. &amp;nbsp;Also, the logs for the services are in /var/log/ceph&lt;br /&gt;&lt;br /&gt;To mount the filesystem, find an ip address of one of the monitors. &amp;nbsp;In my case, I had a monitor on ip address&amp;nbsp;10.148.2.147. &amp;nbsp;The command to mount is:&lt;br /&gt;&lt;pre&gt;$ mkdir -p /mnt/ceph&lt;br /&gt;$ mount -t ceph 10.148.2.147:/ /mnt/ceph&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Since you don't have any authentication, it should work without problems.&lt;br /&gt;&lt;br /&gt;I've had some problems with the different mds, even had a OSD die on me. &amp;nbsp;It resolved itself, and I even added another OSD to take it's place, recreating the CRUSH table. &amp;nbsp;Since creating this, I have even worked with the graphical interface:&lt;br /&gt;&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://1.bp.blogspot.com/-BnzkTQ2PVps/Tz3cbA7dTMI/AAAAAAAAA6E/xbEag3Nmn0c/s1600/Screen+Shot+2012-02-16+at+10.49.26+PM.png&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;507&quot; src=&quot;http://1.bp.blogspot.com/-BnzkTQ2PVps/Tz3cbA7dTMI/AAAAAAAAA6E/xbEag3Nmn0c/s640/Screen+Shot+2012-02-16+at+10.49.26+PM.png&quot; width=&quot;640&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;And here's a presentation I did about the &lt;a href=&quot;http://dl.acm.org/citation.cfm?id=1298485&quot;&gt;CEPH Paper&lt;/a&gt;. &amp;nbsp;Note, &amp;nbsp;I may not be entirely accurate in the presentation, do be kind.&lt;br /&gt;&lt;br /&gt;&lt;script src=&quot;http://speakerdeck.com/embed/4f3f0dbbb1f69c0022002626.js&quot;&gt;&lt;/script&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/3007054864987759910-8237054800561194348?l=derekweitzel.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Fri, 17 Feb 2012 20:53:16 +0000</pubDate>
	<author>noreply@blogger.com (Derek Weitzel)</author>
</item>
<item>
	<title>An Open Science Grid Work Log: SAGA Hadoop</title>
	<guid>http://osglog.wordpress.com/?p=736</guid>
	<link>http://osglog.wordpress.com/2012/02/17/saga-hadoop/</link>
	<description>&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a title=&quot;Hadoop&quot; href=&quot;http://hadoop.apache.org/&quot; target=&quot;_blank&quot;&gt;Hadoop &lt;/a&gt;has become a fixture where big data is concerned but it has been difficult to use in HPC and HTC cluster environments. This is becoming unfortunate as an increasing number of new algorithms assume Hadoop&amp;#8217;s an option. I tried SAGA-Hadoop first but it should be noted that, &lt;a title=&quot;myHadoop&quot; href=&quot;http://www.sdsc.edu/us/consulting/myHadoop-SDSC.pdf&quot; target=&quot;_blank&quot;&gt;myHadoop&lt;/a&gt; from U. of Indiana sought to remedy this a few years ago. If someone has experience with both or would like to comment on differences, that would be appreciated.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;SAGA Hadoop&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a title=&quot;SAGA-Hadoop&quot; href=&quot;http://randomlydistributed.blogspot.com/2011/01/running-hadoop-10-on-distributed.html&quot; target=&quot;_blank&quot;&gt;SAGA-Hadoop&lt;/a&gt; installs, configures and executes Hadoop on clusters running batch schedulers for which SAGA has adapters. At least it&amp;#8217;s headed in that direction.&lt;/p&gt;
&lt;p&gt;SAGA is the &lt;a title=&quot;SAGA&quot; href=&quot;http://en.wikipedia.org/wiki/Simple_API_for_Grid_Applications&quot; target=&quot;_blank&quot;&gt;Simple API for Grid Applications&lt;/a&gt; and an Open Grid Forum standard (GFD.90) for interfacing with diverse cluster batch scheduling systems. It is a large and complex standard so we&amp;#8217;ll leave it at that for now. For our purposes, suffice to say that it works with PBS.&lt;/p&gt;
&lt;p&gt;The &lt;a title=&quot;Bliss&quot; href=&quot;http://oweidner.github.com/bliss/&quot; target=&quot;_blank&quot;&gt;Bliss &lt;/a&gt;project provides Python bindings for SAGA. Like many Python APIs, it takes a minimalist approach, not covering the entire standard and demonstrating a strong preference for simplicity and brevity.&lt;/p&gt;
&lt;p&gt;As the helpful introductory blog post above (SAGA-Hadoop) describes, it runs Hadoop.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;But Can this be Interesting on the OSG?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To make this plausible for the OSG:&lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;We have to be able to automate the process entirely&lt;/li&gt;
&lt;li&gt;It would be really good if there were a practical way to use command line tools as the map and reduce steps in a MapReduce computation.&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt;So this is the story of finding those things out.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Automating Installation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The RENCI Blueridge cluster runs the PBS job manager. I scripted the installation but nothing is pretty about it yet. Except that it  demonstrates  automation of setting Hadoop up on an HPC cluster.&lt;/p&gt;
&lt;p&gt;There&amp;#8217;s Python involved so my first step was to set up a virtualenv to manage our project&amp;#8217;s dependencies:&lt;/p&gt;
&lt;pre&gt;wget --timestamping \
   https://raw.github.com/pypa/virtualenv/master/virtualenv.py
python virtualenv.py venv
source venv/bin/activate
pip install bliss uuid&lt;/pre&gt;
&lt;p&gt;uuid is a dependency that, for whatever reason, needed to be explicitly named to pip.&lt;/p&gt;
&lt;p&gt;the uuid module uses ifconfig. Putting it in the path changed nothing. So we force the issue by editing the module file. Again, it&amp;#8217;s in our virtualenv so the copy&amp;#8217;s entirely ours:&lt;/p&gt;
&lt;pre&gt;sed --in-place=.orig                       \
     s,\'ifconfig\',\'/sbin/ifconfig\',     \
     venv/lib/python2.4/site-packages/uuid.py&lt;/pre&gt;
&lt;p&gt;Make sure JAVA_HOME is set.&lt;/p&gt;
&lt;p&gt;Then, after unzipping Hadoop, we&lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;Get the source code for SAGA-Hadoop&lt;/li&gt;
&lt;li&gt;Edit it to specify the login node for our cluster&lt;/li&gt;
&lt;li&gt;Edit the bootstrap script to alter the Hadoop configs after installation. It
&lt;ul&gt;
&lt;li&gt;Deletes an unnecessary os.makedirs() call&lt;/li&gt;
&lt;li&gt;Corrects the makedirs for log_dir to be for self.job_log_dir&lt;/li&gt;
&lt;li&gt;Uncomments the configuration of the Hadoop data node&lt;/li&gt;
&lt;li&gt;Alters the network itnerface to use to the locally correct eth0&lt;/li&gt;
&lt;li&gt;Sets JAVA_HOME&lt;/li&gt;
&lt;li&gt;Sets HADOOP_HEAP_SIZE&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;pre&gt;svn co https://svn.cct.lsu.edu/repos/saga-projects/applications/SAGAHadoop/saga-hadoop
sed --in-place=.orig \
    -e &quot;s,india,br0,&quot; saga-hadoop/launcher.py
HADOOP_ENV=hadoop-1.0.0/conf/hadoop-env.sh
sed --in-place=.orig \
    -e &quot;s,os.makedirs(job_dir),,&quot; \
    -e &quot;s,os.makedirs(log_dir),os.makedirs(self.job_log_dir),&quot; \
    -e &quot;s,\&amp;lt;!--,,&quot; \
    -e &quot;s,\-\-\&amp;gt;,,&quot; \
    -e &quot;s,eth1,eth0,&quot; \
    -e &quot;s,tar \-xzf hadoop.tar.gz,tar -xzf hadoop.tar.gz; echo export JAVA_HOME=$JAVA_HOME &amp;gt;&amp;gt; $HADOOP_ENV; echo export HADOOP_HEAP     _SIZE=2000 &amp;gt;&amp;gt; $HADOOP_ENV; cat $HADOOP_ENV,&quot; \
    saga-hadoop/bootstrap_hadoop.py&lt;/pre&gt;
&lt;p&gt;This is all saved as a script called build.&lt;/p&gt;
&lt;p&gt;Running it is relatively uninteresting. It installs the virtualenv, Python dependencies and edits the config files as you&amp;#8217;d expect.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Running Hadoop&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;We save these commands to a script called start:&lt;/span&gt;&lt;/p&gt;
&lt;pre&gt;source venv/bin/activate
./saga-hadoop/bootstrap_hadoop.py&lt;/pre&gt;
&lt;p&gt;Running it logs plenty of Hadoop information to the console and starts the server.&lt;/p&gt;
&lt;p&gt;&lt;span&gt;&lt;strong&gt;Command Line MapReduce with Hadoop Streaming&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The command line is the lingua franca of the OSG. There&amp;#8217;s nothing finer for figuring out a problem quickly than the command line. So how are we going to run MapReduce jobs from the command line &amp;#8211; especially in this dynamically created environment?&lt;/p&gt;
&lt;p&gt;&lt;a title=&quot;Hadoop Streaming&quot; href=&quot;http://hadoop.apache.org/common/docs/r0.18.1/streaming.html&quot; target=&quot;_blank&quot;&gt;Hadoop Streaming&lt;/a&gt; lets us run programs of our choice as the map and reduce steps. But first we need an additional Java archive to make it work. The following commands go in a file called stream. This first batch gets the JAR file and copies it to the right location&lt;/p&gt;
&lt;pre&gt;jar_url=http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-streaming/1.0.0/hadoop-streaming-1.0.0.jar
lib=hadoop-streaming-1.0.0.jar work/hadoop-1.0.0/share/hadoop/contrib/streaming
mkdir -p $lib
curl $jar_url  &amp;gt; $lib/hadoop-streaming-1.0.0.jar&lt;/pre&gt;
&lt;p&gt;Once that&amp;#8217;s in place, we get rid of the old output directory (not likely to be relevant for an OSG job), then&lt;/p&gt;
&lt;pre&gt;inputs=$1
output=$2
mapper=$3
reducer=$4

rm -rf out

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/contrib/streaming/hadoop-streaming-1.0.0.jar \
     -input $inputs \
     -output $output \
     -mapper $mapper \
     -reducer $reducer \
     -jobconf mapred.reduce.tasks=2&lt;/pre&gt;
&lt;p&gt;We execute the map reduce with whatever&amp;#8217;s passed in. Then, we create a trivial program called script to be our map operator:&lt;/p&gt;
&lt;pre&gt;for x in $(seq 0 100); do
    echo pattern $x
done&lt;/pre&gt;
&lt;p&gt;Next, we run stream like this:&lt;/p&gt;
&lt;pre&gt;./stream in out script wc&lt;/pre&gt;
&lt;p&gt;Which produces the sum of word count of our 101 output lines times three&lt;/p&gt;
&lt;pre&gt; [scox@br0:~/dev/saga-hadoop]$ cat out/part-00000
     303     606    3609&lt;/pre&gt;
&lt;p&gt;because the stream is executed once per file in our input directory (in):&lt;/p&gt;
&lt;pre&gt; in
 |-- a
 |-- b
 `-- c&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We&amp;#8217;ve just done a completely automated install, configuration and execution of a small Hadoop cluster overlay on top of a PBS HPC cluster. We also saw automation for installnig Hadoop streaming and running a test MapReduce job using command line programs as map and reduce operators.&lt;/p&gt;
&lt;p&gt;The overall stack looks like this:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://osglog.files.wordpress.com/2012/02/cluster-mapreduce.png&quot;&gt;&lt;img class=&quot;aligncenter size-full wp-image-737&quot; title=&quot;cluster-mapreduce&quot; src=&quot;http://osglog.files.wordpress.com/2012/02/cluster-mapreduce.png?w=640&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It is probably clear that much more would be necessary to properly configure and deploy a useful cluster, particularly interfacing it properly to a PBS job context. That part will have to wait for another day. And again, if anyone reading this can comment on myHadoop, I&amp;#8217;d appreciate the insight.&lt;/p&gt;
&lt;p&gt;&lt;span&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/p&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/osglog.wordpress.com/736/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/osglog.wordpress.com/736/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/osglog.wordpress.com/736/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/osglog.wordpress.com/736/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/osglog.wordpress.com/736/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/osglog.wordpress.com/736/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/osglog.wordpress.com/736/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/osglog.wordpress.com/736/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/osglog.wordpress.com/736/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/osglog.wordpress.com/736/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/osglog.wordpress.com/736/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/osglog.wordpress.com/736/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/osglog.wordpress.com/736/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/osglog.wordpress.com/736/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=osglog.wordpress.com&amp;amp;blog=15233693&amp;amp;post=736&amp;amp;subd=osglog&amp;amp;ref=&amp;amp;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Fri, 17 Feb 2012 20:43:50 +0000</pubDate>
</item>
<item>
	<title>Derek's Blog: CEPH on Fedora 15</title>
	<guid>tag:blogger.com,1999:blog-3007054864987759910.post-5695847306491579</guid>
	<link>http://derekweitzel.blogspot.com/2011/10/ceph-on-fedora-15.html</link>
	<description>Yesterday, I read a &lt;a href=&quot;http://berrange.com/posts/2011/10/12/setting-up-a-ceph-cluster-and-exporting-a-rbd-volume-to-a-kvm-guest/&quot;&gt;blog post&lt;/a&gt;&amp;nbsp;using &lt;a href=&quot;http://ceph.newdream.net/&quot;&gt;CEPH&lt;/a&gt; for a backend store for virtual machine images. &amp;nbsp;I've heard a lot about ceph in the last year, especially after it was integrated into the mainline kernel in 2.6.34. &amp;nbsp;So I thought I'd give it a try.&lt;br /&gt;&lt;br /&gt;Before I get into the install, I want to summarize my thoughts on Ceph. &amp;nbsp;I think it has a lot of potential, but parts of it are trying too hard to do everything for you. &amp;nbsp;I always think there is a careful balance between a program doing too much for you, and making you do too much. &amp;nbsp;For example, the mkcephfs script that creates a ceph filesystem will ssh to all the worker nodes (defined in ceph.conf) and configure the filesystem. &amp;nbsp;If I was in operations, this would scare me.&lt;br /&gt;&lt;br /&gt;Also, the keychain configuration is overly complicated. &amp;nbsp;I think the Ceph is designed to be secure over the WAN (secure, not encrypted), so maybe it's needed. &amp;nbsp;But it seems overly complicated when you compare it to other distributed file systems (Hadoop, Lustre).&lt;br /&gt;&lt;br /&gt;On the other hand, I really like the full posix compliant client, especially since it's in the mainline kernel. &amp;nbsp;It is too bad that it was added in 2.6.34 rather than 2.6.32 (RHEL 6 kernel). &amp;nbsp;I guess we'll have to wait 2 years for RHEL 7 to have it in something we can use in production.&lt;br /&gt;&lt;br /&gt;Also, the distributed metadata and multiple metadata servers are interesting aspects to the system. &amp;nbsp;Though, in the version I tested, the MDS crashed a few times (the system picked it up and compensated).&lt;br /&gt;&lt;br /&gt;On Fedora 15, ceph packages are in the repos.&lt;br /&gt;&lt;pre&gt;yum install ceph&lt;/pre&gt;&lt;br /&gt;The configuration I settled on was:&lt;br /&gt;&lt;pre&gt;[global]&lt;br /&gt;    auth supported = cephx&lt;br /&gt;    keyring = /etc/ceph/keyring.admin&lt;br /&gt;&lt;br /&gt;[mds]&lt;br /&gt;    keyring = /etc/ceph/keyring.$name&lt;br /&gt;[mds.i-00000072]&lt;br /&gt;    host = i-00000072&lt;br /&gt;[mds.i-00000073]&lt;br /&gt;    host = i-00000073&lt;br /&gt;[mds.i-00000074]&lt;br /&gt;    host = i-00000074&lt;br /&gt;&lt;br /&gt;[osd]&lt;br /&gt;    osd data = /srv/ceph/osd$id&lt;br /&gt;    osd journal = /srv/ceph/osd$id/journal&lt;br /&gt;    osd journal size = 512&lt;br /&gt;    osd class dir = /usr/lib64/rados-classes&lt;br /&gt;    keyring = /etc/ceph/keyring.$name&lt;br /&gt;[osd0]&lt;br /&gt;    host = i-00000072&lt;br /&gt;[osd1]&lt;br /&gt;    host = i-00000073&lt;br /&gt;[osd2]&lt;br /&gt;    host = i-00000074&lt;br /&gt;&lt;br /&gt;[mon]&lt;br /&gt;    mon data = /srv/ceph/mon$id&lt;br /&gt;[mon0]&lt;br /&gt;    host = i-00000072&lt;br /&gt;    mon addr = 10.148.2.147:6789&lt;br /&gt;[mon1]&lt;br /&gt;    host = i-00000073&lt;br /&gt;    mon addr = 10.148.2.148:6789&lt;br /&gt;[mon2]&lt;br /&gt;    host = i-00000074&lt;br /&gt;    mon addr = 10.148.2.149:6789&lt;/pre&gt;&lt;br /&gt;As you can read from the configuration file, all files are stored in /srv/ceph/... &amp;nbsp;You will need to make this directory on all your worker nodes.&lt;br /&gt;&lt;br /&gt;Next I needed to create a keyring for&amp;nbsp;authentication&amp;nbsp;with the client/admin/dataservers. &amp;nbsp;The keyring tool is distributed with Ceph, and is called&amp;nbsp;&lt;a href=&quot;http://manpages.ubuntu.com/manpages/maverick/man8/cauthtool.8.html&quot;&gt;cauthtool&lt;/a&gt;. &amp;nbsp;Even now, it's not clear to me how to use this tool, or how Ceph uses the keyring. &amp;nbsp;First you need to make a caps (capabilities?) file:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;osd = &quot;allow *&quot;&lt;br /&gt;mds = &quot;allow *&quot;&lt;br /&gt;mon = &quot;allow *&quot;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Here are the cauthtool commands to get it to work.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;cauthtool --create-keyring /etc/ceph/keyring.bin&lt;br /&gt;cauthtool -c -n i-00000072 --gen-key /etc/ceph/keyring.bin &lt;br /&gt;cauthtool -n i-00000074 --caps caps /etc/ceph/keyring.bin&lt;br /&gt;cauthtool -c -n i-00000073 --gen-key /etc/ceph/keyring.bin&lt;br /&gt;cauthtool -n i-00000073 --caps caps /etc/ceph/keyring.bin&lt;br /&gt;cauthtool -c -n i-00000074 --gen-key /etc/ceph/keyring.bin &lt;br /&gt;cauthtool -n i-00000072 --caps caps /etc/ceph/keyring.bin&lt;br /&gt;cauthtool --gen-key --name=admin /etc/ceph/keyring.admin&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;From the blog post linked above, I used their script to create the directories and copy the ceph.conf to the other hosts.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;n=0&lt;br /&gt;for host in i-00000072 i-00000073 i-00000074 ; \&lt;br /&gt;   do \&lt;br /&gt;       ssh root@$host mkdir -p /etc/ceph /srv/ceph/mon$n; \&lt;br /&gt;       n=$(expr $n + 1); \&lt;br /&gt;       scp /etc/ceph/ceph.conf root@$host:/etc/ceph/ceph.conf&lt;br /&gt;   done&lt;br /&gt;mkcephfs -a -c /etc/ceph/ceph.conf -k /etc/ceph/keyring.bin&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Then copy the keyrings&lt;br /&gt;&lt;pre&gt;for host in i-00000072 i-00000073 i-00000074 ; \&lt;br /&gt;   do \&lt;br /&gt;       scp /etc/ceph/keyring.admin root@$host:/etc/ceph/keyring.admin; \&lt;br /&gt;   done&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Then startup the daemons on all the nodes:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;service ceph start&lt;/pre&gt;&lt;br /&gt;And to mount the system:&lt;br /&gt;&lt;pre&gt;mount -t ceph 10.148.2.147:/ /mnt/ceph -o name=admin,secret=AQBlV5dO2TICABAA0/FP7m+ru6TJLZaPxFuQyg==&lt;/pre&gt;&lt;br /&gt;Where the secret is the output from the command:&lt;br /&gt;&lt;pre&gt; cauthtool --print-key /etc/ceph/keyring.bin &lt;/pre&gt;&lt;br /&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/3007054864987759910-5695847306491579?l=derekweitzel.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Fri, 17 Feb 2012 20:30:31 +0000</pubDate>
	<author>noreply@blogger.com (Derek Weitzel)</author>
</item>
<item>
	<title>Inside OSG Ops.: New Home for OSG Web Site</title>
	<guid>tag:blogger.com,1999:blog-7506259688180433777.post-7915760130468234603</guid>
	<link>http://insideosgops.blogspot.com/2012/02/new-home-for-osg-web-site.html</link>
	<description>&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;o:documentproperties&gt;   &lt;o:revision&gt;0&lt;/o:Revision&gt;   &lt;o:totaltime&gt;0&lt;/o:TotalTime&gt;   &lt;o:pages&gt;1&lt;/o:Pages&gt;   &lt;o:words&gt;154&lt;/o:Words&gt;   &lt;o:characters&gt;884&lt;/o:Characters&gt;   &lt;o:company&gt;Indiana University&lt;/o:Company&gt;   &lt;o:lines&gt;7&lt;/o:Lines&gt;   &lt;o:paragraphs&gt;2&lt;/o:Paragraphs&gt;   &lt;o:characterswithspaces&gt;1036&lt;/o:CharactersWithSpaces&gt;   &lt;o:version&gt;14.0&lt;/o:Version&gt;  &lt;/o:DocumentProperties&gt;  &lt;o:officedocumentsettings&gt;   &lt;o:allowpng&gt;&lt;/o:allowpng&gt;  &lt;/o:OfficeDocumentSettings&gt; &lt;/xml&gt;&lt;![endif]--&gt;  &lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:worddocument&gt;   &lt;w:view&gt;Normal&lt;/w:View&gt;   &lt;w:zoom&gt;0&lt;/w:Zoom&gt;   &lt;w:trackmoves&gt;&lt;/w:trackmoves&gt;   &lt;w:trackformatting&gt;&lt;/w:trackformatting&gt;   &lt;w:punctuationkerning&gt;&lt;/w:punctuationkerning&gt;   &lt;w:validateagainstschemas&gt;&lt;/w:validateagainstschemas&gt;   &lt;w:saveifxmlinvalid&gt;false&lt;/w:SaveIfXMLInvalid&gt;   &lt;w:ignoremixedcontent&gt;false&lt;/w:IgnoreMixedContent&gt;   &lt;w:alwaysshowplaceholdertext&gt;false&lt;/w:AlwaysShowPlaceholderText&gt;   &lt;w:donotpromoteqf&gt;&lt;/w:donotpromoteqf&gt;   &lt;w:lidthemeother&gt;EN-US&lt;/w:LidThemeOther&gt;   &lt;w:lidthemeasian&gt;JA&lt;/w:LidThemeAsian&gt;   &lt;w:lidthemecomplexscript&gt;X-NONE&lt;/w:LidThemeComplexScript&gt;   &lt;w:compatibility&gt;    &lt;w:breakwrappedtables&gt;&lt;/w:breakwrappedtables&gt;    &lt;w:snaptogridincell&gt;&lt;/w:snaptogridincell&gt;    &lt;w:wraptextwithpunct&gt;&lt;/w:wraptextwithpunct&gt;    &lt;w:useasianbreakrules&gt;&lt;/w:useasianbreakrules&gt;    &lt;w:dontgrowautofit&gt;&lt;/w:dontgrowautofit&gt;    &lt;w:splitpgbreakandparamark&gt;&lt;/w:splitpgbreakandparamark&gt;    &lt;w:enableopentypekerning&gt;&lt;/w:enableopentypekerning&gt;    &lt;w:dontflipmirrorindents&gt;&lt;/w:dontflipmirrorindents&gt;    &lt;w:overridetablestylehps&gt;&lt;/w:overridetablestylehps&gt;    &lt;w:usefelayout&gt;&lt;/w:usefelayout&gt;   &lt;/w:Compatibility&gt;   &lt;m:mathpr&gt;    &lt;m:mathfont val=&quot;Cambria Math&quot;&gt;    &lt;m:brkbin val=&quot;before&quot;&gt;    &lt;m:brkbinsub val=&quot;&amp;#45;-&quot;&gt;    &lt;m:smallfrac val=&quot;off&quot;&gt;    &lt;m:dispdef&gt;&lt;/m:dispdef&gt;    &lt;m:lmargin val=&quot;0&quot;&gt;    &lt;m:rmargin val=&quot;0&quot;&gt;    &lt;m:defjc val=&quot;centerGroup&quot;&gt;    &lt;m:wrapindent val=&quot;1440&quot;&gt;    &lt;m:intlim val=&quot;subSup&quot;&gt;    &lt;m:narylim val=&quot;undOvr&quot;&gt;   &lt;/m:mathPr&gt;&lt;/w:WordDocument&gt; &lt;/xml&gt;&lt;![endif]--&gt;&lt;!--[if gte mso 9]&gt;&lt;xml&gt;  &lt;w:latentstyles deflockedstate=&quot;false&quot; defunhidewhenused=&quot;true&quot; defsemihidden=&quot;true&quot; defqformat=&quot;false&quot; defpriority=&quot;99&quot; latentstylecount=&quot;276&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;0&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; qformat=&quot;true&quot; name=&quot;Normal&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;9&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; qformat=&quot;true&quot; name=&quot;heading 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;9&quot; qformat=&quot;true&quot; name=&quot;heading 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;9&quot; qformat=&quot;true&quot; name=&quot;heading 3&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;9&quot; qformat=&quot;true&quot; name=&quot;heading 4&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;9&quot; qformat=&quot;true&quot; name=&quot;heading 5&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;9&quot; qformat=&quot;true&quot; name=&quot;heading 6&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;9&quot; qformat=&quot;true&quot; name=&quot;heading 7&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;9&quot; qformat=&quot;true&quot; name=&quot;heading 8&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;9&quot; qformat=&quot;true&quot; name=&quot;heading 9&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;39&quot; name=&quot;toc 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;39&quot; name=&quot;toc 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;39&quot; name=&quot;toc 3&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;39&quot; name=&quot;toc 4&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;39&quot; name=&quot;toc 5&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;39&quot; name=&quot;toc 6&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;39&quot; name=&quot;toc 7&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;39&quot; name=&quot;toc 8&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;39&quot; name=&quot;toc 9&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;35&quot; qformat=&quot;true&quot; name=&quot;caption&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;10&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; qformat=&quot;true&quot; name=&quot;Title&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;1&quot; name=&quot;Default Paragraph Font&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;11&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; qformat=&quot;true&quot; name=&quot;Subtitle&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;22&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; qformat=&quot;true&quot; name=&quot;Strong&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;20&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; qformat=&quot;true&quot; name=&quot;Emphasis&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;59&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Table Grid&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Placeholder Text&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;1&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; qformat=&quot;true&quot; name=&quot;No Spacing&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;60&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light Shading&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;61&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light List&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;62&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light Grid&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;63&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Shading 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;64&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Shading 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;65&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium List 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;66&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium List 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;67&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;68&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;69&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 3&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;70&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Dark List&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;71&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful Shading&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;72&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful List&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;73&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful Grid&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;60&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light Shading Accent 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;61&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light List Accent 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;62&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light Grid Accent 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;63&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Shading 1 Accent 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;64&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Shading 2 Accent 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;65&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium List 1 Accent 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Revision&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;34&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; qformat=&quot;true&quot; name=&quot;List Paragraph&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;29&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; qformat=&quot;true&quot; name=&quot;Quote&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;30&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; qformat=&quot;true&quot; name=&quot;Intense Quote&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;66&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium List 2 Accent 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;67&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 1 Accent 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;68&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 2 Accent 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;69&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 3 Accent 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;70&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Dark List Accent 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;71&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful Shading Accent 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;72&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful List Accent 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;73&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful Grid Accent 1&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;60&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light Shading Accent 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;61&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light List Accent 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;62&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light Grid Accent 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;63&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Shading 1 Accent 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;64&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Shading 2 Accent 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;65&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium List 1 Accent 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;66&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium List 2 Accent 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;67&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 1 Accent 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;68&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 2 Accent 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;69&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 3 Accent 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;70&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Dark List Accent 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;71&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful Shading Accent 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;72&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful List Accent 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;73&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful Grid Accent 2&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;60&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light Shading Accent 3&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;61&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light List Accent 3&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;62&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light Grid Accent 3&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;63&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Shading 1 Accent 3&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;64&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Shading 2 Accent 3&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;65&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium List 1 Accent 3&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;66&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium List 2 Accent 3&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;67&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 1 Accent 3&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;68&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 2 Accent 3&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;69&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 3 Accent 3&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;70&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Dark List Accent 3&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;71&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful Shading Accent 3&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;72&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful List Accent 3&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;73&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful Grid Accent 3&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;60&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light Shading Accent 4&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;61&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light List Accent 4&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;62&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light Grid Accent 4&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;63&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Shading 1 Accent 4&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;64&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Shading 2 Accent 4&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;65&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium List 1 Accent 4&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;66&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium List 2 Accent 4&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;67&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 1 Accent 4&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;68&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 2 Accent 4&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;69&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 3 Accent 4&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;70&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Dark List Accent 4&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;71&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful Shading Accent 4&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;72&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful List Accent 4&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;73&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful Grid Accent 4&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;60&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light Shading Accent 5&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;61&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light List Accent 5&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;62&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light Grid Accent 5&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;63&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Shading 1 Accent 5&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;64&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Shading 2 Accent 5&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;65&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium List 1 Accent 5&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;66&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium List 2 Accent 5&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;67&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 1 Accent 5&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;68&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 2 Accent 5&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;69&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 3 Accent 5&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;70&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Dark List Accent 5&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;71&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful Shading Accent 5&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;72&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful List Accent 5&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;73&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful Grid Accent 5&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;60&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light Shading Accent 6&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;61&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light List Accent 6&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;62&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Light Grid Accent 6&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;63&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Shading 1 Accent 6&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;64&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Shading 2 Accent 6&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;65&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium List 1 Accent 6&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;66&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium List 2 Accent 6&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;67&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 1 Accent 6&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;68&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 2 Accent 6&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;69&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Medium Grid 3 Accent 6&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;70&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Dark List Accent 6&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;71&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful Shading Accent 6&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;72&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful List Accent 6&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;73&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; name=&quot;Colorful Grid Accent 6&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;19&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; qformat=&quot;true&quot; name=&quot;Subtle Emphasis&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;21&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; qformat=&quot;true&quot; name=&quot;Intense Emphasis&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;31&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; qformat=&quot;true&quot; name=&quot;Subtle Reference&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;32&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; qformat=&quot;true&quot; name=&quot;Intense Reference&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;33&quot; semihidden=&quot;false&quot; unhidewhenused=&quot;false&quot; qformat=&quot;true&quot; name=&quot;Book Title&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;37&quot; name=&quot;Bibliography&quot;&gt;   &lt;w:lsdexception locked=&quot;false&quot; priority=&quot;39&quot; qformat=&quot;true&quot; name=&quot;TOC Heading&quot;&gt;  &lt;/w:LatentStyles&gt; &lt;/xml&gt;&lt;![endif]--&gt;  &lt;!--[if gte mso 10]&gt; &lt;style&gt;  /* Style Definitions */ table.MsoNormalTable  {mso-style-name:&quot;Table Normal&quot;;  mso-tstyle-rowband-size:0;  mso-tstyle-colband-size:0;  mso-style-noshow:yes;  mso-style-priority:99;  mso-style-parent:&quot;&quot;;  mso-padding-alt:0in 5.4pt 0in 5.4pt;  mso-para-margin:0in;  mso-para-margin-bottom:.0001pt;  mso-pagination:widow-orphan;  font-size:12.0pt;  font-family:Cambria;  mso-ascii-font-family:Cambria;  mso-ascii-theme-font:minor-latin;  mso-hansi-font-family:Cambria;  mso-hansi-theme-font:minor-latin;} &lt;/style&gt; &lt;![endif]--&gt;    &lt;!--StartFragment--&gt;  &lt;p class=&quot;MsoNormal&quot;&gt;On Frebruary 28&lt;sup&gt;th&lt;/sup&gt; the OSG Webpages located at &lt;a href=&quot;http://www.opensciencegrid.org&quot;&gt;www.opensciencegrid.org&lt;/a&gt; will move to the OSG Twiki. This move corresponds with the upcoming conclusion of a contract with the Chicago based web hosting service Tilted Planet.&lt;/p&gt;  &lt;p class=&quot;MsoNormal&quot;&gt;During the scheduled production service update on the 28&lt;sup&gt;th&lt;/sup&gt;, browsers will be redirected from the current location to the new location on the main OSG Twiki page. A mockup of the new page can be seen at twiki-itb.grid.iu.edu.&lt;/p&gt;  &lt;p class=&quot;MsoNormal&quot;&gt;This is the first step, and likely an interim web page home, in a project that will affect the OSG Public Web Pages, the OSG Twiki, the DocDB, and possibly other OSG services with web-UI’s. Evaluation of content management systems, wikis, and documentation file database solutions has already begun and will continue over the next several months. Please contact the GOC (&lt;a href=&quot;mailto:goc@opensciencegrid.org&quot;&gt;goc@opensciencegrid.org&lt;/a&gt;) if you have a suggestion for packages you think should be evaluated. &lt;/p&gt;  &lt;!--EndFragment--&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/7506259688180433777-7915760130468234603?l=insideosgops.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Wed, 15 Feb 2012 07:47:21 +0000</pubDate>
	<author>noreply@blogger.com (Rob Q)</author>
</item>
<item>
	<title>An Open Science Grid Work Log: NAMD with PBS and Infiniband on NERSC Dirac</title>
	<guid>http://osglog.wordpress.com/?p=715</guid>
	<link>http://osglog.wordpress.com/2012/02/13/namd-with-pbs-and-infiniband/</link>
	<description>&lt;p&gt;&lt;strong&gt;Overview&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;NAMD simulates molecular motion, especially of large molecules so it&amp;#8217;s often used to simulate molecular docking problems. One particularly interesting class of docking problem is the interaction of protein molecules with other molecules such as the cell membrane. The enormous number of atoms involved in these simulations confine the kinds of information we&amp;#8217;re able to learn about how proteins interact with and shape their environments because more atoms require more computing power. So we&amp;#8217;re investigating using GPU accelerated nodes in a shared memory cluster to speed up simulation time.&lt;/p&gt;
&lt;p&gt;This describes running NAMD in a multi-node configuration at NERSC Dirac to determine if we want to build out a Pegasus workflow executing in this mode through the OSG compute element. The process is, as usual with MPI codes using cluster interconnects, highly cluster specific. The next step is to determine if it&amp;#8217;s worth it and what our alternatives are.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Approach&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If you&amp;#8217;re having a hard time running NAMD in a PBS environment over an Infiniband interconnect, you are not alone. The NAMD &lt;a title=&quot;NAMD release notes&quot; href=&quot;http://www.ks.uiuc.edu/Research/namd/2.8/notes.html&quot; target=&quot;_blank&quot;&gt;release notes&lt;/a&gt; come right to the point:&lt;/p&gt;
&lt;p&gt;&amp;#8220;Writing batch job scripts to run charmrun in a queueing system can be challenging.&amp;#8221;&lt;/p&gt;
&lt;p&gt;These links, in addition to the release notes cited above provide useful insights:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://www.ks.uiuc.edu/Research/namd/2.8/ug/node76.html&quot;&gt;charmrun mpi&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a title=&quot;NAMD on  PBS&quot; href=&quot;http://www.ks.uiuc.edu/Research/namd/wiki/index.cgi?NamdOnPBS&quot; target=&quot;_blank&quot;&gt;NAMD on PBS&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;&lt;span&gt;&lt;span&gt;And without further delay, here&amp;#8217;s the approach that worked on Dirac. Mileage on your cluster may vary.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;pre&gt;#!/bin/bash

set -x
set -e

# build a node list file based on the PBS
# environment in a form suitable for NAMD/charmrun

nodefile=$TMPDIR/$PBS_JOBID.nodelist
echo group main &amp;gt; $nodefile
nodes=$( cat $PBS_NODEFILE )
for node in $nodes; do
   echo host $node &amp;gt;&amp;gt; $nodefile
done

# find the cluster's mpiexec
MPIEXEC=$(which mpiexec)

# Tell charmrun to use all the available nodes, the nodelist built  above and the cluster's MPI.
CHARMARGS=&quot;+p32 ++nodelist $nodefile&quot;&lt;/pre&gt;
&lt;p&gt;As an additional wrinkle, we want to run the GPU accelerated version. That&amp;#8217;s why we use the +idlepoll argument to NAMD.&lt;/p&gt;
&lt;p&gt;After setting NAMD_HOME, the command to execute NAMD is:&lt;/p&gt;
&lt;p&gt;${NAMD_HOME}/charmrun \&lt;/p&gt;
&lt;p&gt;${CHARMARGS} ++mpiexec ++remote-shell \&lt;/p&gt;
&lt;p&gt;${MPIEXEC} ${NAMD_HOME}/namd2 +idlepoll &amp;lt;input_file&amp;gt;&lt;/p&gt;
&lt;p&gt;The beginning of NAMD&amp;#8217;s output looks like this:&lt;/p&gt;
&lt;p&gt;Info: 1 NAMD 2.8 &lt;strong&gt;Linux-x86_64-ibverbs-CUDA&lt;/strong&gt; 16 dirac48 stevecox&lt;br /&gt;
Info: Running on 16 processors, 16 nodes, 2 physical nodes.&lt;br /&gt;
Info: CPU topology information available.&lt;br /&gt;
Info: Charm++/Converse parallel runtime startup completed at 0.025701 s&lt;br /&gt;
Pe 5 sharing CUDA device 0 first 0 next 6&lt;br /&gt;
Pe 5 physical rank 5 binding to CUDA device 0 on dirac48: &amp;#8216;Tesla C1060&amp;#8242; Mem: 4095MB Rev: 1.3&lt;br /&gt;
Pe 10 sharing CUDA device 0 first 8 next 11&lt;br /&gt;
Pe 10 physical rank 2 binding to CUDA device 0 on dirac47: &amp;#8216;Tesla C1060&amp;#8242; Mem: 4095MB Rev: 1.3&lt;br /&gt;
Pe 8 sharing CUDA device 0 first 8 next 9&lt;br /&gt;
Pe 8 physical rank 0 binding to CUDA device 0 on dirac47: &amp;#8216;Tesla C1060&amp;#8242; Mem: 4095MB Rev: 1.3&lt;br /&gt;
Pe 2 sharing CUDA device 0 first 0 next 3&lt;br /&gt;
Did not find +devices i,j,k,&amp;#8230; argument, using all&lt;br /&gt;
Pe 2 physical rank 2 binding to CUDA device 0 on dirac48: &amp;#8216;Tesla C1060&amp;#8242; Mem: 4095MB Rev: 1.3&lt;/p&gt;
&lt;p&gt;Of particular importance, note that there is a pre-built executable specific to ibverbs-CUDA &amp;#8211; that is, it works with infiniband connected clusters with CUDA accelerated nodes.&lt;/p&gt;
&lt;p&gt;These are the parameters of the dirac_reg queue:&lt;/p&gt;
&lt;pre&gt;[stevecox@cvrsvc01 namd]$ qstat -Qf dirac_reg
Queue: dirac_reg
 queue_type = Execution
 Priority = 10
 max_user_queuable = 500
 total_jobs = 39
 state_count = Transit:0 Queued:4 Held:27 Waiting:0 Running:8 Exiting:0
 acl_user_enable = False
 &lt;strong&gt;resources_max.nodect = 12&lt;/strong&gt;
 &lt;strong&gt;resources_max.walltime = 06:00:00&lt;/strong&gt;
 resources_min.nodect = 1
 &lt;strong&gt;resources_default.walltime = 00:05:00&lt;/strong&gt;
 mtime = 1323823829
 resources_assigned.nodect = 34
 max_user_run = 2
 enabled = True
 started = True&lt;/pre&gt;
&lt;p&gt;So to test jobs, I ran qsub like this:&lt;/p&gt;
&lt;pre&gt;qsub -I -q dirac_reg -l walltime=06:00:00 -l nodes=4:ppn=8&lt;/pre&gt;
&lt;p&gt;The -I parameter tells qsub to start an interactive job. The walltime parameter overrides the very low default walltime. Fnially, nodes tells PBS how many cluster nodes to use and ppn specifies the processes per node to start.&lt;/p&gt;
&lt;p&gt;After debugging, I ran the script like this:&lt;/p&gt;
&lt;pre&gt;qsub -q dirac_reg -l walltime=06:00:00 -l nodes=4:ppn=8 ./callnamd&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;Results&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I did three runs with 2, 4, and 8 nodes. The interesting performance number for a NAMD run is days/ns or days of computation time required per nanosecond of simulation.&lt;/p&gt;
&lt;p&gt;[stevecox@cvrsvc01 ~]$ grep -i days dev/dukechem/osg/namd/run.* | sed -e &amp;#8220;s,.txt,,&amp;#8221; -e &amp;#8220;s,.*run.,,&amp;#8221;&lt;br /&gt;
&lt;strong&gt;2way:Info: Initial time: 16 CPUs 0.0617085 s/step 0.357109 days/ns 91.0601 MB memory&lt;/strong&gt;&lt;br /&gt;
2way:Info: Initial time: 16 CPUs 0.0613538 s/step 0.355057 days/ns 94.0546 MB memory&lt;br /&gt;
2way:Info: Initial time: 16 CPUs 0.0619225 s/step 0.358348 days/ns 94.7324 MB memory&lt;br /&gt;
2way:Info: Benchmark time: 16 CPUs 0.0620334 s/step 0.35899 days/ns 94.8284 MB memory&lt;br /&gt;
2way:Info: Benchmark time: 16 CPUs 0.0621472 s/step 0.359648 days/ns 95.09 MB memory&lt;br /&gt;
2way:Info: Benchmark time: 16 CPUs 0.0620733 s/step 0.359221 days/ns 95.162 MB memory&lt;br /&gt;
&lt;strong&gt;4way:Info: Initial time: 32 CPUs 0.0472537 s/step 0.273459 days/ns 83.8981 MB memory&lt;/strong&gt;&lt;br /&gt;
4way:Info: Initial time: 32 CPUs 0.0470766 s/step 0.272434 days/ns 84.8605 MB memory&lt;br /&gt;
&lt;strong&gt;8way:Info: Initial time: 64 CPUs 0.0406125 s/step 0.235026 days/ns 81.0847 MB memory&lt;/strong&gt;&lt;br /&gt;
8way:Info: Initial time: 64 CPUs 0.0406405 s/step 0.235188 days/ns 82.1035 MB memory&lt;br /&gt;
8way:Info: Initial time: 64 CPUs 0.0407004 s/step 0.235534 days/ns 82.2474 MB memory&lt;br /&gt;
8way:Info: Benchmark time: 64 CPUs 0.0407453 s/step 0.235794 days/ns 82.3482 MB memory&lt;br /&gt;
8way:Info: Benchmark time: 64 CPUs 0.040858 s/step 0.236447 days/ns 82.3975 MB memory&lt;br /&gt;
8way:Info: Benchmark time: 64 CPUs 0.0406536 s/step 0.235264 days/ns 82.4038 MB memory&lt;/p&gt;
&lt;p&gt;Here are some details of NERSC &lt;a title=&quot;NERSC Dirac's configuration&quot; href=&quot;http://www.nersc.gov/users/computational-systems/dirac/node-and-gpu-configuration/&quot; target=&quot;_blank&quot;&gt;Dirac&amp;#8217;s configuration&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;Dirac is a 50 GPU node cluster connected with QDR IB.  Each GPU node also contains 2 Intel 5530 2.4 GHz, 8MB cache, 5.86GT/sec QPI Quad core Nehalem processors (8 cores per node) and 24GB DDR3-1066 Reg ECC memory.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt; 44 nodes:  1 NVIDIA Tesla C2050 (code named Fermi) GPU with 3GB of memory and 448 parallel CUDA processor cores.&lt;/li&gt;
&lt;li&gt;4 nodes:  1 C1060 NVIDIA Tesla GPU with 4GB of memory and 240 parallel CUDA processor cores.&lt;/li&gt;
&lt;li&gt;1 node:  4 NVIDIA Tesla C2050 (Fermi) GPU&amp;#8217;s, each with 3GB of memory and 448 parallel CUDA processor cores.&lt;/li&gt;
&lt;li&gt;1 node:  4 C1060 Nvidia Tesla GPU&amp;#8217;s, each with 4GB of memory and 240 parallel CUDA processor cores.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here are results from earlier runs on a cluster with far fewer GPUs but a configuration in which accelerated nodes contain four Nvidia Teslas (like one of the Dirac nodes):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;4CPU: 0.998798 days/ns&lt;/li&gt;
&lt;li&gt;8CPU: 0.565848 days/ns&lt;/li&gt;
&lt;li&gt;And with the production sample at 8CPU:  0.288802&lt;/li&gt;
&lt;/ul&gt;
&lt;div&gt;&lt;strong&gt;&lt;span&gt;&lt;span&gt;Conclusions &lt;/span&gt;&lt;/span&gt;&lt;/strong&gt;&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;div&gt;&lt;span&gt;&lt;span&gt;While these findings are preliminary, indications are that having four GPUs on a single node makes a substantial performance difference.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/osglog.wordpress.com/715/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/osglog.wordpress.com/715/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/osglog.wordpress.com/715/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/osglog.wordpress.com/715/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/osglog.wordpress.com/715/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/osglog.wordpress.com/715/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/osglog.wordpress.com/715/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/osglog.wordpress.com/715/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/osglog.wordpress.com/715/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/osglog.wordpress.com/715/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/osglog.wordpress.com/715/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/osglog.wordpress.com/715/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/osglog.wordpress.com/715/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/osglog.wordpress.com/715/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=osglog.wordpress.com&amp;amp;blog=15233693&amp;amp;post=715&amp;amp;subd=osglog&amp;amp;ref=&amp;amp;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Tue, 14 Feb 2012 15:20:50 +0000</pubDate>
</item>
<item>
	<title>OSG Technology Area Rumblings: Job Isolation in Condor</title>
	<guid>tag:blogger.com,1999:blog-8803173202887660937.post-864451098144380179</guid>
	<link>http://osgtech.blogspot.com/2012/02/job-isolation-in-condor.html</link>
	<description>I'd like to share a few exciting new features under construction for Condor 7.7.6 (or 7.9.0, as it may be).&lt;br /&gt;&lt;br /&gt;I've been working hard to improve the&lt;i&gt;&amp;nbsp;job isolation&lt;/i&gt;&amp;nbsp;techniques available in Condor. &amp;nbsp;My dictionary defines the verb &quot;to isolate&quot; as &quot;to be or remain alone or apart from others&quot;; when applied to the Condor context, we'd like to isolate each job from the others. &amp;nbsp;We'll define &lt;i&gt;process isolation&lt;/i&gt;&amp;nbsp;as&amp;nbsp;the inability of a process running in a batch job to interfere with a process not a part of the job. &amp;nbsp;Interfering with processes on Linux, loosely defined, means the sending of POSIX signals, taking control via the ptrace mechanism, or writing into the other process's memory.&lt;br /&gt;&lt;br /&gt;Process isolation is only one aspect of job isolation. &amp;nbsp;Job isolation also includes the inability to interfere with other jobs' files (&lt;i&gt;file isolation&lt;/i&gt;) and not being able to consume others' system resources such as CPU, memory, or disk (&lt;i&gt;resource isolation&lt;/i&gt;).&lt;br /&gt;&lt;br /&gt;In Condor, process isolation has historically been accomplished via one of two mechanisms:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;Submitting user&lt;/b&gt;. &amp;nbsp;Jobs from Alice and Bob will be submitted as the unix users alice and bob, respectively. &amp;nbsp;In this model, the jobs running on the worker node will be run as users alice and bob, respectively. &amp;nbsp;The processes in the job running under user bob are protected from the processes in the job running as user alice via traditional POSIX security mechanisms.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;This model makes the assumption that jobs submitted by the same user do not need isolation from each other. &amp;nbsp;In other words, there shouldn't be any shared user accounts!&lt;/li&gt;&lt;li&gt;This model also assumes the submit host and the worker node share a common user namespace. &amp;nbsp;This can be more difficult to accomplish than it sounds: if the submit host has thousands of unique users, we must make sure each functions on the worker node. &amp;nbsp;If the submit host is on a remote site with a different user namespace from the worker node, this may not be easily achievable!&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;b&gt;Per-slot users&lt;/b&gt;. &amp;nbsp;Each &quot;slot&quot; (roughly corresponding to a CPU) in condor is assigned a unique unix user. &amp;nbsp;The job currently running in that slot is run under the associated username.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;This solves the &quot;gotchas&quot; noted above with the submitting user isolation model.&lt;/li&gt;&lt;li&gt;This is difficult to accomplish in-practice if the job wants to utilize a filesystem shared between the submit and worker nodes. &amp;nbsp;The filesystem security is based on two users having distinct Unix user names; in this model, there's no way to mark your files as only readable by your own jobs.&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;Notice both techniques require on user isolation to accomplish process isolation. &amp;nbsp;Condor has an oft-overlooked &lt;a href=&quot;http://research.cs.wisc.edu/condor/manual/v7.7/3_6Security.html#SECTION004613200000000000000&quot;&gt;third mode&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;Mapping remote users to nobody&lt;/b&gt;. &amp;nbsp;In this mode, local users (where the site admin can define the meaning of &quot;local&quot;) get mapped to the submit host usernames, but non-local users all get mapped to user &lt;i&gt;nobody&lt;/i&gt; - the traditional unprivileged user on Linux.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Local users can access all their files, but remote users only get access to the batch resources - no shared file systems.&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;div&gt;Unfortunately, this is not a very secure mode as, according to the manual, the &lt;i&gt;nobody&lt;/i&gt; account&amp;nbsp;&quot;...&lt;span&gt;&amp;nbsp;may also be used by other Condor jobs running on the same machine, if it is a multi-processor machine&quot;; not very handy advice in an age where your cell phone likely is a multi-processor machine!&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;This third mode is particularly attractive to us - we can avoid filesystem issues for our local users, but no longer have to create the thousands of accounts in our LDAP database for remote users. &amp;nbsp;However, since jobs from remote users run under the same unix user account, the traditional security mechanism of user separation does not apply - we need a new technique!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Enter &lt;a href=&quot;http://lwn.net/Articles/259217/&quot;&gt;PID namespaces&lt;/a&gt;, a new separation technique introduced in kernel 2.6.24. &amp;nbsp;By passing an &lt;a href=&quot;http://www.kernel.org/doc/man-pages/online/pages/man2/clone.2.html&quot;&gt;additional flag&lt;/a&gt; when creating a new process, the kernel will assign an additional process ID (PID) to the child process. &amp;nbsp;The child will believe itself to be PID 1 (that is, when the child calls getpid(), it returns 1), while the processes in the parent's namespace will see a different PID. &amp;nbsp;The child will be able to spawn additional processes - all will be stuck in the same inner namespace - that similarly have an inner PID different from the outer one.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Processes within the namespace can only see and interfere (send signals, ptrace, etc) with other processes inside the namespace. &amp;nbsp;By launching the new job in its own PID namespace, Condor can achieve process isolation without user isolation: the job processes are isolated from all other processes on the system.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Perhaps the best way to visualize the impact of PID namespaces in the job is to examine the output of &lt;b&gt;ps&lt;/b&gt;:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;pre&gt;[bbockelm@localhost condor]$ condor_run ps faux&lt;br /&gt;USER &amp;nbsp; &amp;nbsp; &amp;nbsp; PID %CPU %MEM &amp;nbsp; &amp;nbsp;VSZ &amp;nbsp; RSS TTY &amp;nbsp; &amp;nbsp; &amp;nbsp;STAT START &amp;nbsp; TIME COMMAND&lt;br /&gt;bbockelm &amp;nbsp; &amp;nbsp; 1 &amp;nbsp;0.0 &amp;nbsp;0.0 114132 &amp;nbsp;1236 ? &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;SNs &amp;nbsp;11:42 &amp;nbsp; 0:00 /bin/bash /home/bbockelm/.condor_run.3672&lt;br /&gt;bbockelm &amp;nbsp; &amp;nbsp; 2 &amp;nbsp;0.0 &amp;nbsp;0.0 115660 &amp;nbsp;1080 ? &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;RN &amp;nbsp; 11:42 &amp;nbsp; 0:00 ps faux&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;Only two processes can be seen from within the job - the shell executing the job script and &quot;ps&quot; itself.&lt;br /&gt;&lt;br /&gt;Releasing a PID namespaces-enabled Condor is an ongoing effort:&amp;nbsp;&lt;a href=&quot;https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1959&quot;&gt;https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1959&lt;/a&gt;; I've recently re-designed the patch to be far less intrusive on the Condor internals by switching from the glibc clone() call to the clone syscall. &amp;nbsp;I am hopeful it will make it in the 7.7.6 / 7.9.0 timescale.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;From a process isolation point-of-view, with this patch, it now is safe to run jobs as user &quot;nobody&quot; or re-introduce the idea of shared &quot;group accounts&quot;. &amp;nbsp;For example, we could map all CMS users to a single &quot;cmsuser&quot; account without having to worry about these becoming a vector for virus infection.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;However, the story of &lt;i&gt;job isolation&lt;/i&gt;&amp;nbsp;does not end with PID namespaces. &amp;nbsp;Stay tuned to find out how we are tackling file and resource isolation!&lt;/div&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/8803173202887660937-864451098144380179?l=osgtech.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Tue, 14 Feb 2012 09:46:01 +0000</pubDate>
	<author>noreply@blogger.com (Brian Bockelman)</author>
</item>
<item>
	<title>An Open Science Grid Work Log: Virtual Machines for OSG</title>
	<guid>http://osglog.wordpress.com/?p=711</guid>
	<link>http://osglog.wordpress.com/2012/02/09/virtual-machines-for-osg/</link>
	<description>&lt;p&gt;&lt;strong&gt;Overview&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The ability to run a virtual machine with a self-contained computing environment has major advantages. Users can&lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;Choose the operating system that&amp;#8217;s best for the application&lt;/li&gt;
&lt;li&gt;Execute programs that require elevated privileges&lt;/li&gt;
&lt;li&gt;Install any software they need&lt;/li&gt;
&lt;li&gt;Dynamically configure machine attributes like the number of cores to suit the host environment&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt;The Engage VO is beginning to see researchers new to OSG whose default mode of operation is to spin up a VM on EC2. They quickly get used to having complete control of the computing environment.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Context&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This capability has been explored for the Open Science Grid before. Clemson built &lt;a title=&quot;Clemson Kestrel&quot; href=&quot;http://etd.lib.clemson.edu/documents/1306872836/Stout_clemson_0050M_11177.pdf&quot; target=&quot;_blank&quot;&gt;Kestrel&lt;/a&gt; which supports KVM based virtualization with an XMPP communication architecture. Then &lt;a title=&quot;STAR on Kestrel&quot; href=&quot;http://iopscience.iop.org/1742-6596/331/6/062016/pdf/1742-6596_331_6_062016.pdf&quot; target=&quot;_blank&quot;&gt;STAR&lt;/a&gt; used Kestrel with great success. Clemson now also provides &lt;a title=&quot;Clemson OneCloud&quot; href=&quot;https://sites.google.com/site/cuonecloud/&quot; target=&quot;_blank&quot;&gt;OneCloud&lt;/a&gt; based on OpenNebula.&lt;/p&gt;
&lt;p&gt;There&amp;#8217;s also been &lt;a title=&quot;Condor on VMs&quot; href=&quot;http://osg-docdb.opensciencegrid.org/cgi-bin/ShowDocument?docid=1082&quot; target=&quot;_blank&quot;&gt;work&lt;/a&gt; by Brian Bockelman, Derek Weitzel and others to configure virtual machines running Condor to join the submit host&amp;#8217;s pool. Infrastructure background for that work and lots of great information is available at the team&amp;#8217;s &lt;a title=&quot;OSG Blog - Bockelman et al&quot; href=&quot;http://osgtech.blogspot.com/2011/08/creating-vm-for-openstack.html&quot; target=&quot;_blank&quot;&gt;blog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Recently, I&amp;#8217;ve had new Engage users who are heavy users of virtualization. As mentioned before, they tend to assume control over the environment. This background can make the need to specially prepare executables for the OSG by static compilation and other packaging seem onerous. Many Engage users, it should be added, have input and output file sizes in the low number of gigabytes and are not familiar with High Throughput Computing or a command line approach to virtualization.&lt;/p&gt;
&lt;p&gt;They asked if it was possible to run virtual machines on the OSG so I set out to look for an approach that would allow researchers to&lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;Create virtual machines on their desktops using simple graphical tools&lt;/li&gt;
&lt;li&gt;Deploy virtual machines onto the OSG&lt;/li&gt;
&lt;li&gt;Transfer input files to and from the virtual machine&lt;/li&gt;
&lt;li&gt;Avoid complex interactions with HTC plumbing like configuring X.509 certs, Condor, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;span&gt;Virtualization on RENCI-Blueberry&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Since virtualization is not a currently supported technology on the OSG, step one is to create a small area in which we change that. RENCI&amp;#8217;s Blueberry is a new cluster made of older machines that we&amp;#8217;ve recently brought online. It&amp;#8217;s a ROCKS, Centos 5.7, Torque/PBS cluster with a small number of virtualization capable nodes.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Here&amp;#8217;s an overview of changes we made to the cluster&lt;/span&gt;&lt;span&gt;:&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;First, we installed these packages on the virtualization capable nodes:&lt;br /&gt;
&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span&gt;&lt;strong&gt;&lt;a title=&quot;Libvirt&quot; href=&quot;http://libvirt.org/&quot; target=&quot;_blank&quot;&gt;Libvirt&lt;/a&gt;&lt;/strong&gt;&lt;/span&gt;: A library of low level capabilities supporting virtualization.&lt;/li&gt;
&lt;li&gt;&lt;span&gt;&lt;strong&gt;&lt;a title=&quot;QEMU&quot; href=&quot;http://wiki.qemu.org/Main_Page&quot; target=&quot;_blank&quot;&gt;QEMU&lt;/a&gt;&lt;/strong&gt;&lt;/span&gt;: Virtual machine emulation layer&lt;/li&gt;
&lt;li&gt;&lt;span&gt;&lt;strong&gt;&lt;a title=&quot;KVM&quot; href=&quot;http://www.linux-kvm.org/page/Main_Page&quot; target=&quot;_blank&quot;&gt;KVM&lt;/a&gt;&lt;/strong&gt;&lt;/span&gt;: A virtualization kernel module&lt;/li&gt;
&lt;li&gt;&lt;span&gt;&lt;strong&gt;&lt;a title=&quot;XMLStarlet&quot; href=&quot;http://xmlstar.sourceforge.net/&quot; target=&quot;_blank&quot;&gt;XMLStarlet&lt;/a&gt;&lt;/strong&gt;&lt;/span&gt;: A command line XSLT engine&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt;We configured QEMU to &lt;a title=&quot;QEMU user configuration&quot; href=&quot;http://libvirt.org/drvqemu.html&quot; target=&quot;_blank&quot;&gt;allow&lt;/a&gt; the engage user to execute virsh.&lt;/p&gt;
&lt;div&gt;&lt;span&gt;Then we created a Torque/PBS queue called virt grouping the upgraded machines. &lt;/span&gt;&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;div&gt;&lt;span&gt;A new GlideinWMS group was added on the Engage VO frontend. Jobs deploying virtual machines are decorated with the following modifications:&lt;/span&gt;&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;div&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Job Requirements&lt;/strong&gt;: &amp;amp;&amp;amp; (CAN_RUN_VIRTUAL_MACHINE == TRUE)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Job Attributes&lt;/strong&gt;: +RequiresVirtualization=TRUE&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt;A new resource was added to the GlideinWMS factory with RSL pointing at the virt queue on RENCI-Blueberry.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Creating and Running a VM&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Virtual machines were created using &lt;a title=&quot;Virt Manager&quot; href=&quot;http://virt-manager.org/&quot; target=&quot;_blank&quot;&gt;virt-manager&lt;/a&gt;, the Virtual Machine Manager. It&amp;#8217;s a graphical application providing a wizard like interface for creating and managing VMs.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;We used the command line virsh tool to export an XML description of the running virtual machine. Then the XML description and the disk image (the large file containing all of the VMs data) were moved to the Engage submit host.&lt;/p&gt;
&lt;p&gt;The OSG job was designed to&lt;/p&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;ul&gt;
&lt;li&gt;Download the virtual machine&amp;#8217;s XML description&lt;/li&gt;
&lt;li&gt;Determine the number of CPUs on the machine&lt;/li&gt;
&lt;li&gt;Modify the XML description to specify
&lt;ul&gt;
&lt;li&gt;The appropriate number of CPUs&lt;/li&gt;
&lt;li&gt;The correct location for the VM image file&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Download the virtual machine&lt;/li&gt;
&lt;li&gt;Execute the virtual machine&lt;/li&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;/ul&gt;
&lt;p&gt;This works. Jobs configured to run in the GlideinWMS virt group on the Engage submit node map to glideins on RENCI-Blueberry. There, the jobs download the XML config and the image, make the needed edits and spawn the virtual machine.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Getting Work to the Virtual Machine&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Now, if you&amp;#8217;ve tried to do this kind of thing before, you realize this is where things get tricky.&lt;/p&gt;
&lt;p&gt;When the virtual machine launches, it has no idea what to do. This is part of the reason that some previous approaches put Condor on the machine. That way, it can join an existing Condor pool and has all the good things that Condor brings us in terms of file transfer, matching and so on. But getting credentials into the virtual machine securely to allow it to join the Engage pool is tricky. If you know how to do that, please leave a comment.&lt;/p&gt;
&lt;p&gt;&lt;span&gt;&lt;strong&gt;Alternatives &amp;#8230; and OS Versions&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Now, in principle, there are two other ways to do this that would work fine. If we can get files onto and off of the machine, it would be Ok to transfer them into and out of the worker node the old fashioned way &amp;#8211; globs-url-copy. So here are two other mechanisms for file exchange between a host and a guest:&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Shared Host/Guest Filesystem&lt;/span&gt;: More recent versions of Libvirt/QEMU/KVM that support sharing filesystems between the host and guest. In this model, the guest&amp;#8217;s XML description can specify a directory on the host that should be mounted within the guest. But, as I mentioned, the RENCI-Blueberry cluster runs CentOS 5.7. As such, only a significantly older version of the virtualization stack is supported. We discussed upgrading to a newer version but that would prevent this solution from being generally reproducible on OSG.&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Guestfish&lt;/span&gt;: Next, there&amp;#8217;s libguest and the associated interactive shell guestfish. Guestfish lets you mount a disk image in user space. That is, there&amp;#8217;s no need to use root privileges. It also has convenient wrapper scripts for copying a file into and out of an image. But, again, it requires a version of CentOS   significantly greater than 5.7.&lt;/p&gt;
&lt;p&gt;From this angle, it looks like VMs on OSG could be an every day occurrence if it were not for very low OS version numbers.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;File Sharing REST API&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Before giving the approach up for dead, I decided to try something off the beaten path.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Beanstalk&lt;/strong&gt;: I installed beanstalk on the submit node. Beanstalk is a very simple HTTP based queue. You can put messages on the queue and get them off. You can name queues &amp;#8211; which it refers to, weirdly &amp;#8211; as tubes. Beanstalk does not have a notion of authentication so that&amp;#8217;s not great.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Beanstalkc&lt;/strong&gt;: This is the Python client for Beanstalk.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Box.net&lt;/strong&gt;: One of many file sharing sites with a REST API.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Workflow&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A command line Box.net authentication script does token negotiation with Box.net mostly from the command line using wget and curl.&lt;/p&gt;
&lt;p&gt;Then, Box.net URLs are published into the Engage event queue.&lt;/p&gt;
&lt;p&gt;When virtual machines run, they install and run the boot script.&lt;/p&gt;
&lt;p&gt;It installs the Beanstalk client and reads a single item from the event queue which it downloads from Box.net and processes. Queues are appropriately named so that different users and jobs never collide.&lt;/p&gt;
&lt;p&gt;Finally, it converts the download URL to an upload URL and publishes the results of the run via the file sharing API.&lt;/p&gt;
&lt;p&gt;So at the end of my run of 3 VM&amp;#8217;s on RENCI-Blueberry, there were three files waiting for me at Box.net.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conclusions&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I invite comments on how others have secured communication to a VM on OSG. I&amp;#8217;d love to hear.&lt;/p&gt;
&lt;p&gt;In particular, as mentioned above, I&amp;#8217;d love to hear how others have gotten X.509 credentials onto a VM in this environment.&lt;/p&gt;
&lt;p&gt;Anyone else running VMs on the OSG?&lt;/p&gt;
&lt;p&gt;
&lt;/p&gt;&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;div&gt;&lt;/div&gt;
&lt;p&gt;&lt;span&gt;&lt;br /&gt;
&lt;/span&gt;&lt;/p&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/osglog.wordpress.com/711/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/osglog.wordpress.com/711/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/osglog.wordpress.com/711/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/osglog.wordpress.com/711/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/osglog.wordpress.com/711/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/osglog.wordpress.com/711/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/osglog.wordpress.com/711/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/osglog.wordpress.com/711/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/osglog.wordpress.com/711/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/osglog.wordpress.com/711/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/osglog.wordpress.com/711/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/osglog.wordpress.com/711/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/osglog.wordpress.com/711/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/osglog.wordpress.com/711/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=osglog.wordpress.com&amp;amp;blog=15233693&amp;amp;post=711&amp;amp;subd=osglog&amp;amp;ref=&amp;amp;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Fri, 10 Feb 2012 17:36:22 +0000</pubDate>
</item>
<item>
	<title>Condor Project News: Red Hat announces the release of Red Hat Enterprise MRG 2.1, (February 10, 2012)</title>
	<guid>http://www.redhat.com/about/news/press-archive/2012/1/Red-Hat-Updates-Messaging-Realtime-and-Grid-Platform</guid>
	<link>http://www.redhat.com/about/news/press-archive/2012/1/Red-Hat-Updates-Messaging-Realtime-and-Grid-Platform</link>
	<description>offering increased performance, reliability, interoperability, as presented in this news release.</description>
	<pubDate>Fri, 10 Feb 2012 05:00:00 +0000</pubDate>
</item>
<item>
	<title>Derek's Blog: Fedora 16 on OpenStack</title>
	<guid>tag:blogger.com,1999:blog-3007054864987759910.post-1661685797417079292</guid>
	<link>http://derekweitzel.blogspot.com/2012/02/fedora-16-on-openstack.html</link>
	<description>After following Brian's guide on installing &lt;a href=&quot;http://osgtech.blogspot.com/2011/08/creating-vm-for-openstack.html&quot;&gt;Fedora 15 on OpenStack&lt;/a&gt;, I thought I would try my hand at Fedora 16. &amp;nbsp;There where a few differences.&lt;br /&gt;&lt;br /&gt;&lt;span&gt;&lt;b&gt;Filesystem Differences&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;Brian's guide installed Fedora using LVM. &amp;nbsp;I installed Fedora without LVM (there's a little checkbox on the partition page of Anaconda). &amp;nbsp;Without LVM, I can skip the steps on listing the physical volumes and logical volumes to find the start and end of the partition.&lt;br /&gt;&lt;br /&gt;Also, Fedora 16 uses gpt partition. &amp;nbsp;fdisk command cannot read the partition table, therefore I had to install gdisk (in epel). &amp;nbsp;Running it has very similar command and output:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;$ /usr/sbin/gdisk -l /tmp/fedora16&lt;br /&gt;GPT fdisk (gdisk) version 0.8.1&lt;br /&gt;&lt;br /&gt;Partition table scan:&lt;br /&gt;  MBR: protective&lt;br /&gt;  BSD: not present&lt;br /&gt;  APM: not present&lt;br /&gt;  GPT: present&lt;br /&gt;&lt;br /&gt;Found valid GPT with protective MBR; using GPT.&lt;br /&gt;Disk /tmp/fedora16: 20971520 sectors, 10.0 GiB&lt;br /&gt;Logical sector size: 512 bytes&lt;br /&gt;Disk identifier (GUID): A351197B-8233-4811-9B28-69A1DE121AD2&lt;br /&gt;Partition table holds up to 128 entries&lt;br /&gt;First usable sector is 34, last usable sector is 20971486&lt;br /&gt;Partitions will be aligned on 2048-sector boundaries&lt;br /&gt;Total free space is 4029 sectors (2.0 MiB)&lt;br /&gt;&lt;br /&gt;Number  Start (sector)    End (sector)  Size       Code  Name&lt;br /&gt;   1            2048            4095   1024.0 KiB  EF02  &lt;br /&gt;   2            4096         1028095   500.0 MiB   EF00  ext4&lt;br /&gt;   3         1028096        16777215   7.5 GiB     0700  &lt;br /&gt;   4        16777216        20969471   2.0 GiB     8200  &lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Then, to extract the image:&lt;br /&gt;&lt;pre&gt;dd if=/tmp/fedora16 of=/tmp/server-extract.img skip=1028096 count=$((16777215-1028096)) bs=512&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;span&gt;&lt;b&gt;SSH Key Differences&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;Brian's guide instructed you to create a /etc/rc.local. &amp;nbsp;Fedora 16 sees the introduction of systemd, which no longer executes rc.local. &amp;nbsp;Instead, it looks for the file /etc/rc.d/rc.local (possibly a symlink to /etc/rc.local?). &amp;nbsp;This file needs to be executable and be sure to include the shebang.&lt;br /&gt;&lt;br /&gt;Also, Fedora 16's selinux doesn't label the root file system correctly (&lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=499343&quot;&gt;BUG&lt;/a&gt;), and simply making the .ssh directory doesn't not allow sshd to read it. &amp;nbsp;To solve selinux problem, I disabled selinux (bad, bad me).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;&lt;b&gt;Common Commands&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;After installing Fedora 16 into an image, and extracting the kernel and ramdisk, there where a few commands that where executed over and over as I debugged the image:&lt;br /&gt;&lt;br /&gt;Make the changes to the image: &lt;br /&gt;&lt;pre&gt;sudo /usr/libexec/qemu-kvm -m 2048 -drive file=/tmp/fedora16 -net nic -net user -vnc 127.0.0.1:0 -cpu qemu64 -M rhel5.6.0 -smp 2 -daemonize&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Extract the partition: &lt;br /&gt;&lt;pre&gt;dd if=/tmp/fedora16 of=/tmp/server-extract.img skip=1028096 count=$((16777215-1028096)) bs=512&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Start the VM to change the label on the image:&lt;br /&gt;&lt;pre&gt;sudo /usr/libexec/qemu-kvm -m 2048 -drive file=/tmp/fedora16 -net nic -net user -vnc 127.0.0.1:0 -cpu qemu64 -M rhel5.6.0 -smp 2 -daemonize -drive file=/tmp/server-extract.img &lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Rename the image to something appropriate:&lt;br /&gt;&lt;pre&gt;mv /tmp/server-extract.img /tmp/fedora16-extracted.img&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Bundle the image for OpenStack:&lt;br /&gt;&lt;pre&gt;euca-bundle-image --kernel aki-0000002e --ramdisk ari-0000002f -i /tmp/fedora16-extracted.img -r x86_64&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Upload the image to OpenStack:&lt;br /&gt;&lt;pre&gt;euca-upload-bundle -b derek-bucket -m /tmp/fedora16-extracted.img.manifest.xm&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Register the image (this command completes fast, but openstack takes for ever to decrypt and untar the image):&lt;br /&gt;&lt;pre&gt;euca-register derek-bucket/fedora16-extracted.img.manifest.xml&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Now to build OSG packages for Fedora... &amp;nbsp;maybe not.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/3007054864987759910-1661685797417079292?l=derekweitzel.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Tue, 07 Feb 2012 13:10:18 +0000</pubDate>
	<author>noreply@blogger.com (Derek Weitzel)</author>
</item>
<item>
	<title>Condor Project News: Upgrades recommended (February 06, 2012)</title>
	<guid>http://research.cs.wisc.edu/condor/security</guid>
	<link>http://research.cs.wisc.edu/condor/security</link>
	<description>Please upgrade to the latest versions of the stable and development series.
There have been many important bug fixes in both, including a fix for a   
possible Denial Of Service attack from trusted users.
(Details here)</description>
	<pubDate>Mon, 06 Feb 2012 05:00:00 +0000</pubDate>
</item>
<item>
	<title>Spinning: Pool utilization</title>
	<guid>http://spinningmatt.wordpress.com/?p=603</guid>
	<link>http://spinningmatt.wordpress.com/2012/01/31/pool-utilization/</link>
	<description>&lt;p&gt;Here is a &lt;a href=&quot;https://github.com/mattf/condor_pool_tools/blob/master/utilization.sh&quot;&gt;utilization script&lt;/a&gt; for a Condor pool.&lt;/p&gt;
&lt;p&gt;&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
$ ./utilization.sh
       Unavailable Available    Total     Used:  Avail   Total
Slots         5968      5451    11419     4179  76.66%  36.59%
Cpus          6314      5903    12217     4631  78.45%  37.90%
Memory    14277325  11776800 26054125  9908190  84.13%  38.02%
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;And, if you know your workload will not run on slots with less then 1GB of memory, you can filter out slots that are too small,&lt;/p&gt;
&lt;p&gt;&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
$ ./utilization.sh 'Memory &amp;lt; 1024'
       Unavailable Available    Total     Used:  Avail   Total
Slots         6292      5127    11419     4177  81.47%  36.57%
Cpus          6638      5579    12217     4629  82.97%  37.88%
Memory    14592711  11461414 26054125  9904193  86.41%  38.01%
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;Remember, if an attribute is not on all slots you need to use the meta-comparison operators: &lt;code&gt;=?=&lt;/code&gt; and &lt;code&gt;=!=&lt;/code&gt;, e.g. &lt;code&gt;'MyCustomAttr =!= True'&lt;/code&gt;.&lt;/p&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/spinningmatt.wordpress.com/603/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/spinningmatt.wordpress.com/603/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/spinningmatt.wordpress.com/603/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/spinningmatt.wordpress.com/603/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/spinningmatt.wordpress.com/603/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/spinningmatt.wordpress.com/603/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/spinningmatt.wordpress.com/603/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/spinningmatt.wordpress.com/603/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/spinningmatt.wordpress.com/603/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/spinningmatt.wordpress.com/603/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/spinningmatt.wordpress.com/603/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/spinningmatt.wordpress.com/603/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/spinningmatt.wordpress.com/603/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/spinningmatt.wordpress.com/603/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=spinningmatt.wordpress.com&amp;#038;blog=6870579&amp;#038;post=603&amp;#038;subd=spinningmatt&amp;#038;ref=&amp;#038;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Tue, 31 Jan 2012 02:06:18 +0000</pubDate>
</item>
<item>
	<title>OSG Technology Area Rumblings: openstack - update</title>
	<guid>tag:blogger.com,1999:blog-8803173202887660937.post-1284219792325627410</guid>
	<link>http://osgtech.blogspot.com/2012/01/openstack-update.html</link>
	<description>Last time I was able to deploy an image. Next step would be to list it and then run. But I have hit problems.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;To list images I run command:&lt;br /&gt;&lt;br /&gt; euca-describe-images&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;which hangs up forever and after long time exits with message &quot;connection reset by peer&quot;.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I have disabled iptables to eliminate firewall issues. No help.&lt;br /&gt;&lt;br /&gt;All manuals assume that euca-describe-images should simply run and do not give instruction what to do if it does not.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Following Josh's advice I did:&lt;br /&gt;&lt;br /&gt; strace -o edi_output -f -ff euca-describe-images&lt;br /&gt;&lt;br /&gt;and then I looked into the output files. It seems that there might be two problems:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Some euca2ools files are missing - in particular the .eucarc configuration file.&lt;/li&gt;&lt;li&gt;There are messages about missing python files, like for example &quot;open(&quot;/usr/lib64/python2.6/site-packages/gtk-2.0/org.so&quot;, O_RDONLY) = -1 ENOENT (No such file or directory)&quot; (There are manu more like that).&lt;/li&gt;&lt;/ol&gt;So it seems that the eucatools installation described in previous posts may be not complete - and it missed some key files. Or python (which we already know had to be patched) is not OK. Or both.&lt;br /&gt;&lt;br /&gt;That's all I know for now.&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/8803173202887660937-1284219792325627410?l=osgtech.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Fri, 27 Jan 2012 10:36:26 +0000</pubDate>
	<author>noreply@blogger.com (TomW)</author>
</item>
<item>
	<title>Derek's Blog: Testing an Globus Free OSG-Software (From EPEL(-testing))</title>
	<guid>tag:blogger.com,1999:blog-3007054864987759910.post-5346198107817714937</guid>
	<link>http://derekweitzel.blogspot.com/2012/01/testing-globus-free-osg-software-from.html</link>
	<description>As you may or may not know, there is a massive &lt;a href=&quot;https://admin.fedoraproject.org/updates/FEDORA-EPEL-2012-0083/gsi-openssh-4.3p2-4.el5,globus-authz-callout-error-2.1-1.el5,globus-authz-2.1-1.el5,globus-callout-2.1-1.el5,globus-common-14.5-1.el5,globus-core-8.5-1.el5,globus-ftp-client-7.2-1.el5,globus-ftp-control-4.2-1.el5,globus-gass-cache-8.1-1.el5,globus-gass-cache-program-5.0-1.el5,globus-gass-copy-8.2-1.el5,globus-gass-server-ez-4.1-1.el5,globus-gass-transfer-7.1-1.el5,globus-gatekeeper-9.6-1.el5,globus-gfork-3.1-1.el5,globus-gram-client-12.3-1.el5,globus-gram-client-tools-10.0-1.el5,globus-gram-job-manager-callout-error-2.1-1.el5,globus-gram-job-manager-13.14-1.el5,globus-gram-job-manager-scripts-4.2-1.el5,globus-gram-protocol-11.2-1.el5,globus-gridftp-server-control-2.3-1.el5,globus-gridftp-server-6.5-1.el5,globus-gridmap-callout-error-1.2-1.el5,globus-gsi-callback-4.1-1.el5,globus-gsi-cert-utils-8.1-1.el5,globus-gsi-credential-5.1-1.el5,globus-gsi-openssl-error-2.1-1.el5,globus-gsi-proxy-core-6.1-1.el5,globus-gsi-proxy-ssl-4.1-1.el5,globus-gsi-sysconfig-5.1-1.el5,globus-gssapi-error-4.1-1.el5,globus-gssapi-gsi-10.2-1.el5,globus-gss-assist-8.1-1.el5,globus-io-9.2-1.el5,globus-openssl-module-3.1-1.el5,globus-proxy-utils-5.0-1.el5,globus-rls-client-5.2-6.el5,globus-rls-server-4.9-9.el5,globus-rsl-9.1-1.el5,globus-scheduler-event-generator-4.4-1.el5,globus-usage-3.1-1.el5,globus-xio-3.2-1.el5,globus-xio-gsi-driver-2.1-1.el5,globus-xio-pipe-driver-2.1-1.el5,globus-xio-popen-driver-2.2-1.el5,grid-packaging-tools-3.5-1.el5&quot;&gt;globus update pending&lt;/a&gt;&amp;nbsp;in EPEL that will update globus to the version the OSG distributes. &amp;nbsp;What this means is much less work for the osg-software team since we will not have to build and support our own builds of globus.&lt;br /&gt;&lt;br /&gt;Testing the globus from EPEL while installing some packages from osg repos is not a trivial matter.&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Disable the priority of the OSG repo&lt;/li&gt;&lt;li&gt;Exclude globus and related packages that are already in EPEL from the osg repo.&lt;/li&gt;&lt;/ol&gt;Below is my final file /etc/yum.repos.d/osg.repo &lt;script src=&quot;https://gist.github.com/1673263.js&quot;&gt;  &lt;/script&gt;&lt;br /&gt;&lt;br /&gt;Notice the many excludes in the file, the list may not be complete.&lt;br /&gt;&lt;br /&gt;Installation is just:&lt;br /&gt;&lt;pre&gt;yum install osg-client-condor --enablerepo=epel-testing&lt;/pre&gt;&lt;br /&gt;UPDATE!!!!&lt;br /&gt;&lt;b&gt;&lt;span&gt;Testing Results&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;very good!&lt;br /&gt;&lt;br /&gt;I ran 3 tests, all completely successful. &lt;br /&gt;&lt;b&gt;1. globus-job-run against a rpm CE.&lt;/b&gt;&lt;br /&gt;&lt;pre&gt;$ globus-job-run pf-grid.unl.edu/jobmanager-fork /bin/sh -c &quot;id&quot;&lt;br /&gt;uid=1761(hcc) gid=4001(grid) groups=4001(grid)&lt;/pre&gt;&lt;b&gt;2. Condor-G submission&lt;/b&gt;&lt;br /&gt;Condor-G Submission worked without problems. &amp;nbsp;The submission file is below:&lt;br /&gt;&lt;script src=&quot;https://gist.github.com/1674271.js&quot;&gt;  &lt;/script&gt; &lt;b&gt;3. And globus-url-copy worked:&lt;/b&gt;&lt;br /&gt;&lt;pre&gt;$ globus-url-copy gsiftp://pf-grid.unl.edu/etc/hosts ./hosts&lt;/pre&gt;&lt;br /&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/3007054864987759910-5346198107817714937?l=derekweitzel.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Tue, 24 Jan 2012 20:45:43 +0000</pubDate>
	<author>noreply@blogger.com (Derek Weitzel)</author>
</item>
<item>
	<title>Spinning: EC2, VNC and Fedora</title>
	<guid>http://spinningmatt.wordpress.com/?p=587</guid>
	<link>http://spinningmatt.wordpress.com/2012/01/24/ec2-vnc-and-fedora/</link>
	<description>&lt;p&gt;If you have ever wondered about running a desktop session in EC2, here is one way to set it up and some pointers.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;First&lt;/b&gt;, start an instance, my preferred way is via &lt;a href=&quot;http://spinningmatt.wordpress.com/2011/10/31/getting-started-condor-and-ec2-starting-and-managing-instances/&quot;&gt;Condor&lt;/a&gt;. I used ami-60bd4609 on an m1.small, providing a basic Fedora 15 server. Make sure the instance&amp;#8217;s security group has port 22 (ssh) open.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Second&lt;/b&gt;, install a desktop environment, e.g. &lt;code&gt;yum groupinstall 'GNOME Desktop Environment'&lt;/code&gt;. This is 467 packages and will take about 18 minutes.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Third&lt;/b&gt;, install and setup a VNC server. &lt;code&gt;yum install vnc-server ; vncpasswd ; vncserver :1&lt;/code&gt;. This produces a running desktop that can be contacted by a vncviewer.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Finally&lt;/b&gt;, connect via an SSH secured VNC session.&lt;br /&gt;
&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
VNC_VIA_CMD='/usr/bin/ssh -i KEYPAIR.pem -l ec2-user -f -L &amp;quot;$L&amp;quot;:&amp;quot;$H&amp;quot;:&amp;quot;$R&amp;quot; &amp;quot;$G&amp;quot; sleep 20' vncviewer localhost:1 -via INSTANCE_ADDRESS
&lt;/pre&gt;&lt;br /&gt;
What&amp;#8217;s going on here? &lt;code&gt;vncviewer&lt;/code&gt; allows for a proxy host when connecting to the vncserver. That is the &lt;code&gt;-via&lt;/code&gt; argument. The &lt;code&gt;VNC_VIA_CMD&lt;/code&gt; is an environment variable that specifies the command used to connect to the proxy. Here it is modified to provide the keypair needed to access the instance, and the user ec2-user, which is the default user on Fedora AMIs. The &lt;code&gt;INSTANCE_ADDRESS&lt;/code&gt; is the &lt;code&gt;Hostname&lt;/code&gt; from &lt;a href=&quot;http://spinningmatt.wordpress.com/2011/11/02/getting-started-condor-and-ec2-condor_ec2_q-tool/&quot;&gt;condor_ec2_q&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Alternatively, &lt;code&gt;ssh-add KEYPAIR.pem&lt;/code&gt; followed by &lt;code&gt;vncviewer localhost:1 -via ec2-user@INSTANCE_ADDRESS&lt;/code&gt;. However, be careful if you have many keys stored in your ssh-agent. They will all be tried and the remote sshd may reject your connection before the proper keypair is found.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Tips&lt;/b&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
It takes about 20 minutes from start to vncviewer. Once the instance is setup consider &lt;a href=&quot;http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/creating-an-ami.html&quot;&gt;creating your own AMI&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
Set a password for ec2-user, otherwise the screensaver will lock you out. Use &lt;code&gt;sudo passwd ec2-user&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;
Remember AWS charges for data transmitted out of the instance, as well as the uptime of the instance, see &lt;a href=&quot;http://aws.amazon.com/ec2/pricing/&quot;&gt;EC2 Pricing&lt;/a&gt;. You will want to figure out how much bandwidth your workflow takes on average to figure out total cost. For me, a half hour of browsing &lt;a href=&quot;http://planet.fedoraproject.org/&quot;&gt;Planet Fedora&lt;/a&gt;, editing with emacs, and compiling some code, transmitted about 60MB of data. That measurement is the difference in eth0&amp;#8242;s &amp;#8220;TX bytes&amp;#8221; as reported by ifconfig. This is not a perfect estimate because there is may have been data transferred within EC2, which is not charged.
&lt;/li&gt;
&lt;li&gt;
For transmit rates, consider running &lt;a href=&quot;http://www.gropp.org/?id=projects&amp;amp;sub=bwm-ng&quot;&gt;bmw-ng&lt;/a&gt; to see what actions use the most bandwidth.
&lt;/li&gt;
&lt;li&gt;
Generally, make the screen update as little as possible. Constantly changing graphics on web pages can run 60-120KB/s. Compare that to a text console and emacs producing a TX rate closer to 5-25KB/s.
&lt;/li&gt;
&lt;li&gt;
Cover consoles with compilations, or compile in a low verbosity mode.
&lt;/li&gt;
&lt;/ul&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/spinningmatt.wordpress.com/587/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/spinningmatt.wordpress.com/587/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/spinningmatt.wordpress.com/587/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/spinningmatt.wordpress.com/587/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/spinningmatt.wordpress.com/587/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/spinningmatt.wordpress.com/587/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/spinningmatt.wordpress.com/587/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/spinningmatt.wordpress.com/587/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/spinningmatt.wordpress.com/587/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/spinningmatt.wordpress.com/587/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/spinningmatt.wordpress.com/587/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/spinningmatt.wordpress.com/587/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/spinningmatt.wordpress.com/587/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/spinningmatt.wordpress.com/587/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=spinningmatt.wordpress.com&amp;#038;blog=6870579&amp;#038;post=587&amp;#038;subd=spinningmatt&amp;#038;ref=&amp;#038;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Tue, 24 Jan 2012 15:27:00 +0000</pubDate>
</item>
<item>
	<title>Derek's Blog: Initial EL6 Packages for OSG</title>
	<guid>tag:blogger.com,1999:blog-3007054864987759910.post-5908364533788003344</guid>
	<link>http://derekweitzel.blogspot.com/2012/01/initial-el6-packages-for-osg.html</link>
	<description>Last night I completed initial packages for EL6 support. &amp;nbsp;Just like for EL5, the first OSG component I created is the osg-wn-client.&lt;br /&gt;&lt;br /&gt;The osg-wn-client has a complicated &lt;a href=&quot;http://dl.dropbox.com/u/114765/osg-wn-client-deps.svg&quot;&gt;dependency tree&lt;/a&gt;. &amp;nbsp;Easily some of the most difficult packages where form glite.&lt;br /&gt;&lt;br /&gt;Just some quick tidbits that made the transition easier:&lt;br /&gt;&lt;br /&gt;&lt;span&gt;&lt;b&gt;UUID Differences&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;uuid.h and the associated library is used by many applications. &amp;nbsp;In el5, uuid is provided by the e2fsprogs package. &amp;nbsp;In el6, it has it's own package, libuuid. &amp;nbsp;It was common for me to copy this tidbit into a few packages:&lt;script src=&quot;https://gist.github.com/1649259.js&quot;&gt;  &lt;/script&gt; &lt;br /&gt;&lt;div&gt;&lt;span&gt;&lt;b&gt;gsoap Differences&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;glite-fts-client and glite-data-delegation-api-c both use gsoap. &amp;nbsp;In the past, it was common to copy stdsoap2.c from the gsoap distribution and compile that into your program. &amp;nbsp;Now that gsoap is a regular library though, it should be linked into the system's version. &amp;nbsp;In order to do this, I had to add patches to the Makefiles for both packages to link against the system's gsoap.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;&lt;b&gt;What's next? &amp;nbsp;&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;The next step is the &lt;a href=&quot;http://dl.dropbox.com/u/114765/osg-client-deps.svg&quot;&gt;osg-client&lt;/a&gt;. &amp;nbsp;Since there are no more glite packages for the osg-client, this step should be easier.&lt;/div&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/3007054864987759910-5908364533788003344?l=derekweitzel.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Fri, 20 Jan 2012 14:58:38 +0000</pubDate>
	<author>noreply@blogger.com (Derek Weitzel)</author>
</item>
<item>
	<title>Spinning: Manage inventory with Wallaby</title>
	<guid>http://spinningmatt.wordpress.com/?p=569</guid>
	<link>http://spinningmatt.wordpress.com/2012/01/16/manage-inventory-with-wallaby/</link>
	<description>&lt;p&gt;&lt;a href=&quot;http://getwallaby.com&quot;&gt;Wallaby&lt;/a&gt; will manage your configuration, as well as an inventory of your machines. It can differentiate between machines that are expected to be present and those that opportunistically appear.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Build the roster with &lt;code&gt;wallaby add-node&lt;/code&gt; -&lt;/b&gt;&lt;br /&gt;
&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
$ wallaby add-node node0.local node1.local node2.local
Adding the following node: node0.local
Console Connection Established...
Adding the following node: node1.local
Adding the following node: node2.local
$ for i in $(seq 3 10); do wallaby add-node node$i.local; done
Adding the following node: node3.local
Console Connection Established...
Adding the following node: node4.local
Console Connection Established...
...
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;List expected nodes (provisioned) -&lt;/b&gt;&lt;br /&gt;
&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
$ wallaby inventory
Console Connection Established...
P        Node name                 Last checkin
-        ---------                 ------------
+      node0.local Wed Jan 11 07:32:33 -0500 20
+      node1.local Thu Jan 05 12:15:00 -0500 20
+     node10.local Wed Jan 11 07:31:56 -0500 20
+      node2.local Wed Jan 11 07:31:56 -0500 20
+      node3.local Wed Jan 11 07:15:21 -0500 20
+      node4.local Wed Jan 11 07:31:42 -0500 20
+      node5.local Wed Jan 11 07:16:47 -0500 20
+      node6.local                        never
+      node7.local Wed Jan 11 07:32:33 -0500 20
+      node8.local Wed Jan 11 07:32:33 -0500 20
+      node9.local Wed Jan 11 07:30:47 -0500 20
-      robin.local Thu Dec 15 14:11:35 -0500 20
-      woods.local Tue Jan 10 20:33:47 -0500 20
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;List opportunistic, bonus nodes (unprovisioned) -&lt;/b&gt;&lt;br /&gt;
&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
$ wallaby inventory -o unprovisioned
Console Connection Established...
P        Node name                 Last checkin
-        ---------                 ------------
-      robin.local Thu Dec 15 14:11:35 -0500 20
-      woods.local Tue Jan 10 20:33:47 -0500 20
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Provisioned nodes that have never checked in, maybe setup failed -&lt;/b&gt;&lt;br /&gt;
&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
$ wallaby inventory -c 'last_checkin == 0 &amp;amp;&amp;amp; provisioned'
Console Connection Established...
P        Node name                 Last checkin
-        ---------                 ------------
+      node6.local                        never
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Provisioned node that have not checked in for the past 4 hours, maybe machine is down -&lt;/b&gt;&lt;br /&gt;
&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
$ wallaby inventory -c 'last_checkin &amp;gt; 0 &amp;amp;&amp;amp; last_checkin &amp;lt; 4.hours_ago &amp;amp;&amp;amp; provisioned'
Console Connection Established...
P        Node name                 Last checkin
-        ---------                 ------------
+      node1.local Thu Jan 05 12:15:00 -0500 20
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Unprovisioned nodes that have not checked in for 48 hours, candidates for &lt;code&gt;wallaby remove-node&lt;/code&gt; -&lt;/b&gt;&lt;br /&gt;
&lt;pre class=&quot;brush: plain; gutter: false;&quot;&gt;
$ wallaby inventory -c 'last_checkin &amp;lt; 48.hours_ago &amp;amp;&amp;amp; !provisioned'
Console Connection Established...
P        Node name                 Last checkin
-        ---------                 ------------
-      robin.local Thu Dec 15 14:11:35 -0500 20 
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;Enjoy.&lt;/p&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/spinningmatt.wordpress.com/569/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/spinningmatt.wordpress.com/569/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/spinningmatt.wordpress.com/569/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/spinningmatt.wordpress.com/569/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/spinningmatt.wordpress.com/569/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/spinningmatt.wordpress.com/569/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/spinningmatt.wordpress.com/569/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/spinningmatt.wordpress.com/569/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/spinningmatt.wordpress.com/569/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/spinningmatt.wordpress.com/569/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/spinningmatt.wordpress.com/569/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/spinningmatt.wordpress.com/569/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/spinningmatt.wordpress.com/569/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/spinningmatt.wordpress.com/569/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=spinningmatt.wordpress.com&amp;#038;blog=6870579&amp;#038;post=569&amp;#038;subd=spinningmatt&amp;#038;ref=&amp;#038;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Sun, 15 Jan 2012 17:49:53 +0000</pubDate>
</item>
<item>
	<title>OSG Technology Area Rumblings: How to register an image in openstack</title>
	<guid>tag:blogger.com,1999:blog-8803173202887660937.post-6608486540631647769</guid>
	<link>http://osgtech.blogspot.com/2012/01/how-to-register-image-in-openstack.html</link>
	<description>After having installed and configured the worker and controller nodes of the openstack testbed we would like to upload images into it.&lt;br /&gt;&lt;br /&gt;First I downloaded some images to /root/images on controller node.     One is from Xin and another one is a minimal image for testing I got     from the net. I have no idea what are they worth.&lt;br /&gt; &lt;br /&gt; &lt;br /&gt;   Then I tried to follow the instructions&lt;br /&gt; &lt;br /&gt;&lt;a class=&quot;moz-txt-link-freetext&quot; href=&quot;http://docs.openstack.org/cactus/openstack-compute/admin/content/part-ii-getting-virtual-machines.html&quot;&gt;http://docs.openstack.org/cactus/openstack-compute/admin/content/part-ii-getting-virtual-machines.html&lt;/a&gt;&lt;br /&gt; &lt;br /&gt;   which go like this:&lt;br /&gt;  &lt;br /&gt;    &lt;pre class=&quot;literallayout&quot;&gt;&lt;a id=&quot;d1542e1756&quot;&gt;image=&quot;ubuntu1010-UEC-localuser-image.tar.gz&quot;&lt;br /&gt;wget http://c0179148.cdn1.cloudfiles.rackspacecloud.com/ubuntu1010-UEC-localuser-image.tar.gz&lt;br /&gt;uec-publish-tarball $image [bucket-name] [hardware-arch]&lt;/a&gt;&lt;/pre&gt;   &lt;br /&gt; &lt;br /&gt;   and I could not find where does the&lt;br /&gt;   &lt;pre class=&quot;literallayout&quot;&gt;&lt;a id=&quot;d1542e1756&quot;&gt;uec-publish-tarball&lt;/a&gt;&lt;/pre&gt;   &lt;br /&gt;   command comes from. Finally I realized that it comes from Ubuntu and     the manual became Ubuntu specific without saying it explicitly.&lt;br /&gt; &lt;br /&gt; &lt;br /&gt;   So I tried different approach.&lt;br /&gt;&lt;br /&gt;   &lt;span&gt;cd /root/images&lt;/span&gt;&lt;br /&gt;   &lt;br /&gt;&lt;span&gt;     glance add name=&quot;My Image&quot; &amp;lt; sl61-kvm.tar.bz2 # the image I got     from Xin&lt;/span&gt;&lt;br /&gt; &lt;br /&gt;   The command responded that the image got Id=1, which is a good sign.&lt;br /&gt; &lt;br /&gt;   Then I did:&lt;br /&gt; &lt;br /&gt;   &lt;span&gt;glance show 1&lt;/span&gt;&lt;br /&gt; &lt;br /&gt;   and got:&lt;br /&gt; &lt;br /&gt;   &lt;span&gt;URI: &lt;/span&gt;&lt;a class=&quot;moz-txt-link-freetext&quot; href=&quot;http://0.0.0.0/images/1&quot;&gt;http://0.0.0.0/images/1&lt;/a&gt;&lt;br /&gt;&lt;span&gt;     Id: 1&lt;/span&gt;&lt;br /&gt;&lt;span&gt;     Public: No&lt;/span&gt;&lt;br /&gt;&lt;span&gt;     Name: My Image&lt;/span&gt;&lt;br /&gt;&lt;span&gt;     Size: 199737477&lt;/span&gt;&lt;br /&gt;&lt;span&gt;     Location: &lt;/span&gt;&lt;a class=&quot;moz-txt-link-freetext&quot;&gt;file:///var/lib/glance/images/1&lt;/a&gt;&lt;br /&gt;&lt;span&gt;     Disk format: raw&lt;/span&gt;&lt;br /&gt;&lt;span&gt;     Container format: ovf&lt;/span&gt;&lt;br /&gt; &lt;br /&gt;   Which suggests that the file is in the system. But when I tried:&lt;br /&gt; &lt;br /&gt;   &lt;span&gt;glance index&lt;/span&gt;&lt;br /&gt; &lt;br /&gt;   it said:&lt;br /&gt; &lt;br /&gt;   &lt;span&gt;no public images found&lt;/span&gt;&lt;br /&gt; &lt;br /&gt;   So I tried to register it again:&lt;br /&gt; &lt;br /&gt;   &lt;span&gt; glance add name=&quot;My Image&quot; is_public=true  &amp;lt; sl61-kvm.tar.bz2&lt;/span&gt;&lt;br /&gt;&lt;span&gt;     Added new image with ID: 2&lt;/span&gt;&lt;br /&gt; &lt;br /&gt;   I tried to list:&lt;br /&gt; &lt;br /&gt;   &lt;span&gt;glance index&lt;/span&gt;&lt;br /&gt;&lt;span&gt;     Found 1 public images...&lt;/span&gt;&lt;br /&gt;&lt;span&gt;     ID               Name                           Disk Format              Container Format     Size          &lt;/span&gt;&lt;br /&gt;&lt;span&gt;     ---------------- ------------------------------ --------------------     -------------------- --------------&lt;/span&gt;&lt;br /&gt;&lt;span&gt;     2                My Image                       raw                      ovf                       199737477&lt;/span&gt;&lt;br /&gt; &lt;br /&gt;   So it seems we have uploaded an image to the system.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Now I have to figure out how to run it.&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/8803173202887660937-6608486540631647769?l=osgtech.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Tue, 10 Jan 2012 08:53:04 +0000</pubDate>
	<author>noreply@blogger.com (TomW)</author>
</item>
<item>
	<title>OSG Technology Area Rumblings: How to configure worker node - part 2</title>
	<guid>tag:blogger.com,1999:blog-8803173202887660937.post-3018228315676012597</guid>
	<link>http://osgtech.blogspot.com/2012/01/how-to-configure-worker-node-part-2.html</link>
	<description>&lt;pre class=&quot;literallayout&quot;&gt;&lt;a id=&quot;d1542e564&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;Compute node configuration - continued&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We execute the following commands:&lt;br /&gt;&lt;br /&gt;This command is supposed to synchronize the database:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;/usr/bin/nova-manage db sync &lt;/span&gt;&lt;br /&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;Now we have to create users and projects. We call both users and projects &quot;nova&quot;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span&gt;/usr/bin/nova-manage user admin nova&lt;/span&gt;&lt;br /&gt;&lt;span&gt;/usr/bin/nova-manage project create nova nova &lt;/span&gt;&lt;br /&gt;&lt;span&gt;/usr/bin/nova-manage network create 192.168.0.0/24 1 256&lt;br /&gt;&lt;br /&gt;We check that users and projects were created correctly:&lt;br /&gt;&lt;br /&gt;&lt;span&gt;/usr/bin/nova-manage project list&lt;/span&gt;&lt;br /&gt;&lt;span&gt;nova&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;/usr/bin/nova-manage user list&lt;br /&gt;nova&lt;br /&gt;&lt;br /&gt;&lt;span&gt;&lt;span&gt;Create Certifications&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span&gt;On the controller node execute&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;a id=&quot;d1542e564&quot;&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;a id=&quot;d1542e575&quot;&gt;mkdir –p /root/creds&lt;br /&gt;&lt;br /&gt;/usr/bin/python /usr/bin/nova-manage project zipfile nova nova /root/creds/novacreds.zip&lt;/a&gt;&lt;a id=&quot;d1542e564&quot;&gt;&lt;br /&gt;&lt;br /&gt;If you encounter a python error, then apply the python patch described few posts earlier.&lt;br /&gt;&lt;br /&gt;Create /root/creds on the compute node and copy the&lt;br /&gt;&lt;/a&gt;&lt;a id=&quot;d1542e575&quot;&gt;novacreds.zip&lt;/a&gt;&lt;a id=&quot;d1542e564&quot;&gt; file there. Then unpack it&lt;br /&gt;&lt;br /&gt;&lt;/a&gt;&lt;a id=&quot;d1542e575&quot;&gt;unzip /root/creds/novacreds.zip -d /root/creds/&lt;br /&gt;&lt;br /&gt;A few files will appear, among them&lt;br /&gt;&lt;/a&gt;&lt;a id=&quot;d1542e575&quot;&gt;/root/creds/novarc . This file needs to be appended to .bashrc, but there is a catch:&lt;br /&gt;first line of the file has an error and has to be replaced:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Original line:&lt;br /&gt;&lt;br /&gt;NOVA_KEY_DIR=$(pushd $(dirname $BASH_SOURCE)&amp;gt;/dev/null; pwd; popd&amp;gt;/dev/null)&lt;br /&gt;&lt;br /&gt;has to be replaced with&lt;br /&gt;&lt;br /&gt;NOVA_KEY_DIR=~/creds&lt;br /&gt;&lt;br /&gt;The content of novarc file now is&lt;br /&gt;&lt;br /&gt;NOVA_KEY_DIR=~/creds&lt;br /&gt;&lt;br /&gt;export EC2_ACCESS_KEY=&quot;XXXXXXXXXXXXXXXXXXXXXXXX:nova&quot;&lt;br /&gt;export EC2_SECRET_KEY=&quot;XXXXXXXXXXXXXXXXXXXXXXXX&quot;&lt;br /&gt;export EC2_URL=&quot;http://130.199.148.53:8773/services/Cloud&quot;&lt;br /&gt;export S3_URL=&quot;http://130.199.148.53:3333&quot;&lt;br /&gt;export EC2_USER_ID=42 # nova does not use user id, but bundling requires it&lt;br /&gt;export EC2_PRIVATE_KEY=${NOVA_KEY_DIR}/pk.pem&lt;br /&gt;export EC2_CERT=${NOVA_KEY_DIR}/cert.pem&lt;br /&gt;export NOVA_CERT=${NOVA_KEY_DIR}/cacert.pem&lt;br /&gt;export EUCALYPTUS_CERT=${NOVA_CERT} # euca-bundle-image seems to require this set&lt;br /&gt;alias ec2-bundle-image=&quot;ec2-bundle-image --cert ${EC2_CERT} --privatekey ${EC2_PRIVATE_KEY} --user 42 --ec2cert ${NOVA_CERT}&quot;&lt;br /&gt;alias ec2-upload-bundle=&quot;ec2-upload-bundle -a ${EC2_ACCESS_KEY} -s ${EC2_SECRET_KEY} --url ${S3_URL} --ec2cert ${NOVA_CERT}&quot;&lt;br /&gt;export NOVA_API_KEY=&quot;XXXXXXXXXXXXXXXXXXXXXXXXXXX&quot;&lt;br /&gt;export NOVA_USERNAME=&quot;nova&quot;&lt;br /&gt;export NOVA_URL=&quot;http://130.199.148.53:8774/v1.0/&quot;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Where &quot;XXXX..&quot; strings denote keys which I do not post here, for security.&lt;br /&gt;&lt;br /&gt;The content of novarc file should now be added to bashrc:&lt;br /&gt;&lt;br /&gt;&lt;/a&gt;&lt;a id=&quot;d1542e575&quot;&gt;cat /root/creds/novarc &amp;gt;&amp;gt; ~/.bashrc source ~/.bashrc&lt;br /&gt;&lt;br /&gt;This should be done both on compute and controller nodes.&lt;br /&gt;&lt;br /&gt;&lt;span&gt;&lt;span&gt;Enable access to worker node&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;First unset a proxy and then do:&lt;br /&gt;&lt;br /&gt;&lt;a id=&quot;d1542e583&quot;&gt;euca-authorize -P icmp -t -1:-1 default euca-authorize -P tcp -p 22 default&lt;/a&gt;&lt;a id=&quot;d1542e564&quot;&gt;&lt;br /&gt;&lt;span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/a&gt;&lt;/pre&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/8803173202887660937-3018228315676012597?l=osgtech.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Fri, 06 Jan 2012 09:37:49 +0000</pubDate>
	<author>noreply@blogger.com (TomW)</author>
</item>
<item>
	<title>An Open Science Grid Work Log: Protected: Grayson  Science Workflow on the Hybrid Grid</title>
	<guid>http://osglog.wordpress.com/?p=642</guid>
	<link>http://osglog.wordpress.com/2011/08/10/grayson-science-workflow-on-the-hybrid-grid/</link>
	<description>&lt;p&gt;This post is password protected. You must visit the website and enter the password to continue reading.&lt;/p&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/osglog.wordpress.com/642/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/osglog.wordpress.com/642/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/osglog.wordpress.com/642/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/osglog.wordpress.com/642/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/osglog.wordpress.com/642/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/osglog.wordpress.com/642/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/osglog.wordpress.com/642/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/osglog.wordpress.com/642/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/osglog.wordpress.com/642/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/osglog.wordpress.com/642/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/osglog.wordpress.com/642/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/osglog.wordpress.com/642/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/osglog.wordpress.com/642/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/osglog.wordpress.com/642/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=osglog.wordpress.com&amp;amp;blog=15233693&amp;amp;post=642&amp;amp;subd=osglog&amp;amp;ref=&amp;amp;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Thu, 05 Jan 2012 16:49:51 +0000</pubDate>
</item>
<item>
	<title>OSG Technology Area Rumblings: How to configure worker node</title>
	<guid>tag:blogger.com,1999:blog-8803173202887660937.post-6934297547717816467</guid>
	<link>http://osgtech.blogspot.com/2012/01/how-to-configure-worker-node.html</link>
	<description>In the following I will describe how to configure the worker node. I assume that the worker node has been already installed following the instructions posted on this blog.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Firs of all, before we start, we still need to add nova-network (it has not been installed so far).&lt;br /&gt;&lt;br /&gt;Do:&lt;br /&gt;&lt;br /&gt;&lt;span&gt;yum install openstack-nova-network&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Once this is done, we can go on and edit the &lt;span&gt;/etc/nova/nova.conf&lt;/span&gt; file.&lt;br /&gt;&lt;br /&gt;First, add to the file the option&lt;br /&gt;&lt;br /&gt;&lt;pre class=&quot;literallayout&quot;&gt;&lt;a id=&quot;d1542e508&quot;&gt;--daemonize=1 &lt;/a&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;The relevant switches are:&lt;br /&gt;&lt;br /&gt;&lt;span&gt;--sql_connection&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--s3_host&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--rabbit_host&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--ec2_api&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--ec2_url&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--fixed_range&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--network_size&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In the end the configuration file should look like:&lt;br /&gt;&lt;br /&gt;&lt;span&gt;--auth_driver=nova.auth.dbdriver.DbDriver&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--buckets_path=/var/lib/nova/buckets&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--ca_path=/var/lib/nova/CA&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--cc_host=&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--credentials_template=/usr/share/nova/novarc.template&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--daemonize=1&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--dhcpbridge_flagfile=/etc/nova/nova.conf&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--dhcpbridge=/usr/bin/nova-dhcpbridge&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--ec2_api=130.199.148.53&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--ec2_url=http://130.199.148.53:8773/services/Cloud&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--fixed_range=192.168.0.0/16&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--glance_host=&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--glance_port=9292&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--image_service=nova.image.glance.GlanceImageService&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--images_path=/var/lib/nova/images&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--injected_network_template=/usr/share/nova/interfaces.rhel.template&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--instances_path=/var/lib/nova/instances&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--keys_path=/var/lib/nova/keys&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--libvirt_type=kvm&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--libvirt_xml_template=/usr/share/nova/libvirt.xml.template&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--lock_path=/var/lib/nova/tmp&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--logdir=/var/log/nova&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--logging_context_format_string=%(asctime)s %(name)s: %(levelname)s [%(request_id)s %(user)s %(project)s] %(message)s&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--logging_debug_format_suffix=&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--logging_default_format_string=%(asctime)s %(name)s: %(message)s&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--network_manager=nova.network.manager.VlanManager&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--networks_path=/var/lib/nova/networks&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--network_size=8&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--node_availability_zone=nova&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--rabbit_host=130.199.148.53&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--routing_source_ip=130.199.148.53&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--s3_host=130.199.148.53&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--scheduler_driver=nova.scheduler.zone.ZoneScheduler&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--sql_connection=mysql://{USER}:{PWD}@130.199.148.53/{DATABASE}&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--state_path=/var/lib/nova&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--use_cow_images=true&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--use_ipv6=false&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--use_s3=true&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--use_syslog=false&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--verbose=false&lt;/span&gt;&lt;br /&gt;&lt;span&gt;--vpn_client_template=/usr/share/nova/client.ovpn.template&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;where {USER},{PWD} and {DATABASE} denote nova database user, pasword and database name.&lt;br /&gt;&lt;br /&gt;Now go to the controller node and open the following ports for incoming connections: 3333,3306,5672,8773,8000.&lt;br /&gt;&lt;br /&gt;Go back to worker node and prepare &lt;span&gt;/root/bin/openstack-init.sh&lt;/span&gt; script with the following content:&lt;br /&gt;&lt;br /&gt;&lt;span&gt;#!/bin/bash&lt;/span&gt;&lt;br /&gt;&lt;span&gt;for n in ajax-console-proxy compute vncproxy network; do&lt;/span&gt;&lt;br /&gt;&lt;span&gt;    service openstack-nova-$n $@;&lt;/span&gt;&lt;br /&gt;&lt;span&gt;done&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Then run&lt;br /&gt;&lt;br /&gt;/&lt;span&gt;root/bin/openstack-init.sh stop&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Stopping OpenStack Nova Web-based serial console proxy:    [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Stopping OpenStack Nova Compute Worker:                    [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Stopping OpenStack Nova VNC Proxy:                         [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Stopping OpenStack Nova Network Controller:                [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;[root@gridreserve30 compute]# /root/bin/openstack-init.sh start&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Nova Web-based serial console proxy:    [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Nova Compute Worker:                    [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Nova VNC Proxy:                         [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Nova Network Controller:                [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;to be continued...&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/8803173202887660937-6934297547717816467?l=osgtech.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Thu, 05 Jan 2012 13:09:38 +0000</pubDate>
	<author>noreply@blogger.com (TomW)</author>
</item>
<item>
	<title>OSG Technology Area Rumblings: What's the hold-up?</title>
	<guid>tag:blogger.com,1999:blog-8803173202887660937.post-2894650495289033883</guid>
	<link>http://osgtech.blogspot.com/2011/12/whats-hold-up.html</link>
	<description>Do you have the following diagram memorized?&lt;br /&gt;&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://1.bp.blogspot.com/-HXJdhJQRwaU/Tv0HaKXnHsI/AAAAAAAAAeo/36XqVjRgO2I/s1600/condor_startd_policy_states.png&quot;&gt;&lt;img border=&quot;0&quot; height=&quot;385&quot; src=&quot;http://1.bp.blogspot.com/-HXJdhJQRwaU/Tv0HaKXnHsI/AAAAAAAAAeo/36XqVjRgO2I/s400/condor_startd_policy_states.png&quot; width=&quot;400&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;If your site runs Condor, you probably should. &amp;nbsp;It shows the states of the &lt;b&gt;condor_startd&lt;/b&gt;, the activities within the state, and the transitions between them. &amp;nbsp;If you want to have jobs reliably pre-empted (or is that killed? &amp;nbsp;Or vacated?) from the worker node for something like memory usage, a clear understanding is required.&lt;br /&gt;&lt;br /&gt;However, the 30 state transitions might be a bit much for some site admins who just want to kill jobs that go over a memory limit. &amp;nbsp;In such a case, admins can utilize the &lt;i&gt;SYSTEM_PERIODIC_REMOVE&lt;/i&gt; or the &lt;i&gt;SYSTEM_PERIODIC_HOLD&lt;/i&gt; configuration parameters on the &lt;b&gt;condor_schedd&lt;/b&gt; to respectively remove or hold jobs.&lt;br /&gt;&lt;br /&gt;These expressions periodically evaluate the schedd's copy of the job ClassAd (by default, once every 60s); if they evaluate to true for a given job, they will remove or hold it. &amp;nbsp;This will almost immediately preempt execution on the worker node.&lt;br /&gt;&lt;br /&gt;[Note: While effective and simple, these are &lt;i&gt;not&lt;/i&gt;&amp;nbsp;the best way to accomplish these sort of policies! &amp;nbsp;As the worker node may talk to multiple &lt;b&gt;schedd&lt;/b&gt;'s (via flocking, or just through a complex pool with many schedd's), it's best to express the node's preferences locally.]&lt;br /&gt;&lt;br /&gt;At HCC, the periodic hold and release policy looks like this:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;# hold jobs using absurd amounts of disk (100+ GB)&lt;br /&gt;SYSTEM_PERIODIC_HOLD = \&lt;br /&gt;&amp;nbsp; &amp;nbsp;(JobStatus == 1 || JobStatus == 2) &amp;amp;&amp;amp; ((DiskUsage &amp;gt; 100000000 || ResidentSetSize &amp;gt; 1600000))&lt;br /&gt;&lt;br /&gt;# forceful removal of running after 2 days, held jobs after 6 hours,&lt;br /&gt;# and anything trying to run more than 10 times&lt;br /&gt;SYSTEM_PERIODIC_REMOVE = \&lt;br /&gt;   (JobStatus == 5 &amp;amp;&amp;amp; CurrentTime - EnteredCurrentStatus &amp;gt; 3600*6) || \&lt;br /&gt;   (JobStatus == 2 &amp;amp;&amp;amp; CurrentTime - EnteredCurrentStatus &amp;gt; 3600*24*2) || \&lt;br /&gt;   (JobStatus == 5 &amp;amp;&amp;amp; JobRunCount &amp;gt;= 10) || \&lt;br /&gt;   (JobStatus == 5 &amp;amp;&amp;amp; HoldReasonCode =?= 14 &amp;amp;&amp;amp; HoldReasonSubCode =?= 2)&lt;br /&gt;&lt;/pre&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;We place anything on hold that goes over some pre-defined resource limit (disk usage or memory usage). &amp;nbsp;Jobs are removed if they have been on hold for a long time, have run for too long, have restarted too many times, or are missing their input files.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Note that this is a flat policy for the cluster - heterogeneous nodes with larges amounts of RAM per core would not be well-utilized. &amp;nbsp;We could tweak this by having users utilize the &lt;i&gt;RequestMemory&lt;/i&gt; attribute to their job's ad (defaulting to 1.6GB), place into the &lt;i&gt;Requirements&lt;/i&gt; that the slot have sufficient memory, and have the node only accept jobs that request memory below a certain threshold. &amp;nbsp;The expression above could then be tweaked to hold jobs where &lt;i&gt;(ResidentSetSize &amp;gt; RequestMemory)&lt;/i&gt;. &amp;nbsp;Perhaps more on that in the future if we go this route.&lt;br /&gt;&lt;br /&gt;While the &lt;i&gt;SYSTEM_PERIODIC_*&lt;/i&gt; expressions are useful, Dan Bradley recently introduce me to the &lt;a href=&quot;https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2185&quot;&gt;&lt;i&gt;SYSTEM_PERIODIC_*_REASON&lt;/i&gt; parameter.&lt;/a&gt; &amp;nbsp;This allows you to build a custom hold message for the user whose jobs you're about to interrupt. &amp;nbsp;The expression is evaluated within the context of the job's ad, and the resulting string is placed in the job's &lt;i&gt;HOLD_REASON&lt;/i&gt;. &amp;nbsp;As an example, previously, the hold message was something bland and generic:&lt;br /&gt;&lt;br /&gt;The SYSTEM_PERIODIC_HOLD &amp;nbsp;expression evaluated to true.&lt;br /&gt;&lt;br /&gt;Why did it evaluate to true? &amp;nbsp;Was it memory or disk usage? &amp;nbsp;When it was held, how bad was the disk/memory usage? &amp;nbsp;These things can get lost in the system. &amp;nbsp;&lt;a href=&quot;https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2725&quot;&gt;Oops&lt;/a&gt;. &amp;nbsp;We added the following to our schedd's configuration:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;# Report why the job went on hold.&lt;br /&gt;SYSTEM_PERIODIC_HOLD_REASON = \&lt;br /&gt;&amp;nbsp; &amp;nbsp;strcat(&quot;Job in status &quot;, JobStatus, \&lt;br /&gt;&amp;nbsp; &amp;nbsp;&quot; put on hold by SYSTEM_PERIODIC_HOLD due to &quot;, \&lt;br /&gt;&amp;nbsp; &amp;nbsp;ifThenElse(isUndefined(DiskUsage) || DiskUsage &amp;lt; 100000000, \&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; strcat(&quot;memory usage &quot;, ResidentSetSize), \&lt;br /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; strcat(&quot;disk usage &quot;, DiskUsage)), &quot;.&quot;)&lt;br /&gt;&lt;/pre&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now, we have beautiful error messages in the user's logs explaining the issue:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;pre&gt;Job in status 2 put on hold by SYSTEM_PERIODIC_HOLD due to memory usage 1620340.&quot;&lt;br /&gt;&lt;/pre&gt;&lt;div&gt;&lt;br /&gt;One less thing to get confused about!&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/8803173202887660937-2894650495289033883?l=osgtech.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Thu, 29 Dec 2011 17:12:19 +0000</pubDate>
	<author>noreply@blogger.com (Brian Bockelman)</author>
</item>
<item>
	<title>Spinning: Amazon S3  Object Expiration, what about Instance Expiration</title>
	<guid>http://spinningmatt.wordpress.com/?p=559</guid>
	<link>http://spinningmatt.wordpress.com/2011/12/28/amazon-s3-object-expiration-what-about-instance-expiration/</link>
	<description>&lt;p&gt;&lt;a href=&quot;http://aws.amazon.com/&quot;&gt;AWS&lt;/a&gt; is providing APIs that take distributed computing concerns into account. One could call them cloud concerns these days. Unfortunately, not all cloud providers are doing the same.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://aws.typepad.com/aws/2010/09/new-amazon-ec2-features-resource-tagging-idempotency-filtering.html&quot;&gt;Idempotent instance creation&lt;/a&gt; showed up in Sept 2010, providing the ability to &lt;a href=&quot;http://spinningmatt.wordpress.com/2011/12/28/amazon-s3-object-expiration-what-about-instance-expiration/&quot;&gt;simplify interactions&lt;/a&gt; with EC2. Idempotent resource allocation is critical for distributed systems.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://aws.typepad.com/aws/2011/12/amazon-s3-object-expiration.html&quot;&gt;S3 object expiration&lt;/a&gt; appeared in Dec 2011, allowing for service-side managed deallocation of S3 resources.&lt;/p&gt;
&lt;p&gt;Next up? It would be great to have an EC2 instance expiration feature. One that could be (0) assigned per instance and (1) adjusted while the instance exists. Bonus if can also be (2) adjusted from within the instance without credentials. Think leases.&lt;/p&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/spinningmatt.wordpress.com/559/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/spinningmatt.wordpress.com/559/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/spinningmatt.wordpress.com/559/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/spinningmatt.wordpress.com/559/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/spinningmatt.wordpress.com/559/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/spinningmatt.wordpress.com/559/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/spinningmatt.wordpress.com/559/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/spinningmatt.wordpress.com/559/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/spinningmatt.wordpress.com/559/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/spinningmatt.wordpress.com/559/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/spinningmatt.wordpress.com/559/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/spinningmatt.wordpress.com/559/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/spinningmatt.wordpress.com/559/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/spinningmatt.wordpress.com/559/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=spinningmatt.wordpress.com&amp;#038;blog=6870579&amp;#038;post=559&amp;#038;subd=spinningmatt&amp;#038;ref=&amp;#038;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Wed, 28 Dec 2011 12:46:06 +0000</pubDate>
</item>
<item>
	<title>OSG Technology Area Rumblings: A simple iRODS Micro-Service</title>
	<guid>tag:blogger.com,1999:blog-8803173202887660937.post-4646684131624655729</guid>
	<link>http://osgtech.blogspot.com/2011/12/simple-irods-micro-service.html</link>
	<description>&lt;div&gt;&lt;b&gt;&lt;span&gt;Introduction&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The goal I had for this task was to identify and understand the steps and configurations involved in writing a micro-service and seeing it in action - for details regarding iRODS please refer to documentation at &lt;a href=&quot;https://www.irods.org/&quot;&gt;https://www.iRODS.org/&lt;/a&gt;. The micro-service that I wrote is very simplistic (it writes a hello world message to the system log), however it serves its purpose by providing an overview of steps that will be involved in writing a useful micro-service.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Before I document the configurations and codes involved in creating and registering the new micro-service let’s look at figure 1. &lt;br /&gt;&lt;br /&gt;&lt;img height=&quot;176px;&quot; src=&quot;https://lh3.googleusercontent.com/7ug-Iwe3O_50aQrfrC46oo6ujLIIeDOIULiu_yeVMsDwtycKuXswtB5fFCeFWPZtTkgCGAtkUSDtRNLdzJJH-MsrvzCBjANMvl6Fre4xHJioC38ajSw&quot; width=&quot;576px;&quot; /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Figure 1 shows a high level view of&amp;nbsp; invocation of a micro-service by the iRODS rules engine. One way of looking at the micro-service and the iRODS rule engine is to think of it as an event based triggering system that can perform ‘operations’ on the data objects, and/or external resources. The micro-services are registered in iRODS rule definitions and the rule engine invokes them based on the condition specified for that rule. For a list of places in the iRODS workflow where a micro-service may be triggered please visit: &lt;a href=&quot;https://www.irods.org/index.php/Default_iRODS_Rules&quot;&gt;https://www.irods.org/index.php/Default_iRODS_Rules&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Also you may refer to &lt;a href=&quot;https://www.irods.org/index.php/Rule_Engine&quot;&gt;https://www.iRODS.org/index.php/Rule_Engine&lt;/a&gt; for a detailed diagram of a micro-service invocation.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;img height=&quot;242px;&quot; src=&quot;https://lh5.googleusercontent.com/uf3vI967QtzS1obJj-R3PYfL6KRUYs5O4P_1iISdEYXXzwnPxFdT9o--8j__edpPSJxxYeOeNj7DxreQBM1HXB8O27ZmD26zq_-7iPRJpBdGlVL7zMo&quot; width=&quot;576px;&quot; /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;Figure 2 above shows the communication between the iRODS rule engine and a micro-service. A simplistic view of the communication layers is that the rule engine calls a defined C procedure, which exposes its functionality through an interface (commonly prefixed with msi). The arguments to the procedure are passed through a structure named &lt;i&gt;msParam_t&lt;/i&gt; that is defined below:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;typedef struct MsParam {&lt;br /&gt;&amp;nbsp; char *label;&lt;br /&gt;&amp;nbsp; char *type;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* this is the name of the packing instruction in&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * rodsPackTable.h */&lt;br /&gt;&amp;nbsp; void *inOutStruct;&lt;br /&gt;&amp;nbsp; bytesBuf_t *inpOutBuf;&lt;br /&gt;} msParam_t;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span&gt;Writing the micro-service&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Figure 3 shows the steps involved in creating a new micro-service:&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;img height=&quot;129px;&quot; src=&quot;https://lh4.googleusercontent.com/W6ZpoyTbAvhsPXnwI2_bJg7hgdTF3eOkL1tWN-aF7Cl10NidSpM8n2oKKOifxZhX5bruPK-IZHOSQOe525sMJEgkjP5yQacPF1tThetlEiRy9K4pjOM&quot; width=&quot;576px;&quot; /&gt;&lt;/div&gt;&lt;b&gt;Write the C procedure&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The C code below (lets call it test.c) has a function writemessage that writes a message to the system log. There is an interface to the function named msiWritemessage which exposes the writemessage function. The msi function takes a list of arguments of type msParam_t and a last argument of type ruleExecInfo_t for the result of the operation. &lt;br /&gt;&lt;br /&gt;&lt;pre&gt;#include &amp;lt;stdio.h&amp;gt;&lt;br /&gt;#include &amp;lt;unistd.h&amp;gt;&lt;br /&gt;#include &amp;lt;syslog.h&amp;gt;&lt;br /&gt;#include &amp;lt;string.h&amp;gt;&lt;br /&gt;#include &quot;apiHeaderAll.h&quot;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;void writemessage(char arg1[], char arg2[]);&lt;br /&gt;int msiWritemessage(msParam_t *mParg1, msParam_t *mParg2,&amp;nbsp; ruleExecInfo_t *rei);&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;void writemessage(char arg1[], char arg2[]) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; openlog(&quot;slog&quot;, LOG_PID|LOG_CONS, LOG_USER);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; syslog(LOG_INFO, &quot;%s %s from micro-service&quot;, arg1, arg2);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; closelog();&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;int msiWritemessage(msParam_t *mParg1, msParam_t *mParg2,&amp;nbsp; ruleExecInfo_t *rei)&lt;br /&gt;{&lt;br /&gt;&amp;nbsp;char *in1;&lt;br /&gt;&amp;nbsp;int *in2;&lt;br /&gt;&amp;nbsp;RE_TEST_MACRO (&quot;&amp;nbsp;&amp;nbsp;&amp;nbsp; Calling Procedure&quot;);&lt;br /&gt;&amp;nbsp;// the above line is needed for loop back testing using irule -i option&lt;br /&gt;&amp;nbsp;if ( strcmp( mParg1-&amp;gt;type, STR_MS_T ) == 0 )&lt;br /&gt;&amp;nbsp;{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in1 = (char*) mParg1-&amp;gt;inOutStruct;&lt;br /&gt;&amp;nbsp;}&lt;br /&gt;&amp;nbsp;if ( strcmp( mParg2-&amp;gt;type, INT_MS_T ) == 0 )&lt;br /&gt;&amp;nbsp;{&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; in2 = (int*) mParg2-&amp;gt;inOutStruct;&lt;br /&gt;&amp;nbsp;}&lt;br /&gt;&amp;nbsp;writemessage(in1, in1);&lt;br /&gt;&amp;nbsp;return rei-&amp;gt;status;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Next I will make a folder structure in the &lt;i&gt;module&lt;/i&gt; folder of iRODS home for placing this micro-service and copy a few files from an example &lt;i&gt;properties&lt;/i&gt; module and modify them to fit the test.c micro-service&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;cd ~irods&lt;br /&gt;mkdir modules/HCC&lt;br /&gt;cd modules/HCC&lt;br /&gt;&lt;br /&gt;mkdir microservices&lt;br /&gt;mkdir rules&lt;br /&gt;mkdir lib&lt;br /&gt;mkdir clients&lt;br /&gt;mkdir servers&lt;br /&gt;&lt;br /&gt;mkdir microservices/src&lt;br /&gt;mkdir microservices/include&lt;br /&gt;mkdir microservices/obj&lt;br /&gt;cp ../properties/Makefile .&lt;br /&gt;cp ../properties/info.txt .&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Listed below is my working copy of Makefile and the info.txt&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;#Makefile&lt;br /&gt;ifndef buildDir&lt;br /&gt;buildDir = $(CURDIR)/../..&lt;br /&gt;endif&lt;br /&gt;&lt;br /&gt;include $(buildDir)/config/config.mk&lt;br /&gt;include $(buildDir)/config/platform.mk&lt;br /&gt;include $(buildDir)/config/directories.mk&lt;br /&gt;include $(buildDir)/config/common.mk&lt;br /&gt;&lt;br /&gt;#&lt;br /&gt;# Directories&lt;br /&gt;#&lt;br /&gt;MSObjDir =&amp;nbsp;&amp;nbsp;&amp;nbsp; $(modulesDir)/HCC/microservices/obj&lt;br /&gt;MSSrcDir =&amp;nbsp;&amp;nbsp;&amp;nbsp; $(modulesDir)/HCC/microservices/src&lt;br /&gt;MSIncDir =&amp;nbsp;&amp;nbsp;&amp;nbsp; $(modulesDir)/HCC/microservices/include&lt;br /&gt;&lt;br /&gt;# Source files&lt;br /&gt;&lt;br /&gt;OBJECTS =&amp;nbsp;&amp;nbsp;&amp;nbsp; $(MSObjDir)/test.o&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;# Compile and link flags&lt;br /&gt;#&lt;br /&gt;INCLUDES +=&amp;nbsp;&amp;nbsp;&amp;nbsp; $(INCLUDE_FLAGS) $(LIB_INCLUDES) $(SVR_INCLUDES)&lt;br /&gt;CFLAGS_OPTIONS := $(CFLAGS) $(MY_CFLAG)&lt;br /&gt;CFLAGS =&amp;nbsp;&amp;nbsp;&amp;nbsp; $(CFLAGS_OPTIONS) $(INCLUDES) $(MODULE_CFLAGS)&lt;br /&gt;&lt;br /&gt;.PHONY: all server client microservices clean&lt;br /&gt;.PHONY: server_ldflags client_ldflags server_cflags client_cflags&lt;br /&gt;.PHONY: print_cflags&lt;br /&gt;&lt;br /&gt;# Build everytying&lt;br /&gt;all:&amp;nbsp;&amp;nbsp;&amp;nbsp; microservices&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; @true&lt;br /&gt;&lt;br /&gt;# List module's objects and needed libs for inclusion in clients&lt;br /&gt;client_ldflags:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; @true&lt;br /&gt;&lt;br /&gt;# List module's includes for inclusion in the clients&lt;br /&gt;client_cflags:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; @true&lt;br /&gt;&lt;br /&gt;# List module's objects and needed libs for inclusion in the server&lt;br /&gt;server_ldflags:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; @echo $(OBJECTS) $(LIBS)&lt;br /&gt;&lt;br /&gt;# List module's includes for inclusion in the server&lt;br /&gt;server_cflags:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; @echo $(INCLUDE_FLAGS)&lt;br /&gt;&lt;br /&gt;# Build microservices&lt;br /&gt;microservices:&amp;nbsp;&amp;nbsp;&amp;nbsp; print_cflags $(OBJECTS)&lt;br /&gt;&lt;br /&gt;# Build client additions&lt;br /&gt;client:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; @true&lt;br /&gt;&lt;br /&gt;# Build server additions&lt;br /&gt;server:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; @true&lt;br /&gt;&lt;br /&gt;# Build rules&lt;br /&gt;rules:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; @true&lt;br /&gt;&lt;br /&gt;# Clean&lt;br /&gt;clean:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; @echo &quot;Clean image module...&quot;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; rm -rf $(MSObjDir)/*.o&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;# Show compile flags&lt;br /&gt;print_cflags:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; @echo &quot;Compile flags:&quot;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; @echo &quot;&amp;nbsp;&amp;nbsp;&amp;nbsp; $(CFLAGS_OPTIONS)&quot;&lt;br /&gt;&lt;br /&gt;# Compile targets&lt;br /&gt;#&lt;br /&gt;$(OBJECTS): $(MSObjDir)/%.o: $(MSSrcDir)/%.c $(DEPEND)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; @echo &quot;Compile image module `basename $@`...&quot;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; @$(CC) -c $(CFLAGS) -o $@ $&amp;lt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;info.txt&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;Name:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; HCC&lt;br /&gt;Brief:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; HCC Test microservice&lt;br /&gt;Description:&amp;nbsp;&amp;nbsp;&amp;nbsp; HCC Test microservice.&lt;br /&gt;Dependencies:&lt;br /&gt;Enabled:&amp;nbsp;&amp;nbsp;&amp;nbsp; yes&lt;br /&gt;Creator:&amp;nbsp;&amp;nbsp;&amp;nbsp; Ashu Guru&lt;br /&gt;Created:&amp;nbsp;&amp;nbsp;&amp;nbsp; December 2011&lt;br /&gt;License:&amp;nbsp;&amp;nbsp;&amp;nbsp; BSD&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;In the next step I will define the micro-service header and micro-service table files so that the iRODS can be configured with the new micro-service. This is done in the folder microservices/include. In this example&amp;nbsp; there is no header for this code so I have left the header file blank;&amp;nbsp; in the micro-service table file I have the entry for the table definition.&amp;nbsp; The specifics to note below are that the first argument is the label of the micro-service, the second argument is the count of input arguments&amp;nbsp; (do not count the ruleExecInfo _t argument) of the msi interface and the third argument is the name of the msi interface function.&lt;br /&gt;&lt;br /&gt;File microservices/include/microservices.table&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;{ &quot;msiWritemessage&quot;,2,(funcPtr) msiWritemessage },&amp;nbsp;&amp;nbsp; &lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Following is the directory tree structure for the HCC module that I have so far:&lt;br /&gt;&lt;pre&gt;bash-4.1$ pwd&amp;nbsp;&lt;/pre&gt;&lt;pre&gt;/opt/iRODS/modules&lt;br /&gt;bash-4.1$ tree HCC&lt;br /&gt;HCC&lt;br /&gt;├── clients&lt;br /&gt;├── info.txt&lt;br /&gt;├── lib&lt;br /&gt;├── Makefile&lt;br /&gt;├── microservices&lt;br /&gt;│&amp;nbsp;&amp;nbsp; ├── include&lt;br /&gt;│&amp;nbsp;&amp;nbsp; │&amp;nbsp;&amp;nbsp; ├── microservices.header&lt;br /&gt;│&amp;nbsp;&amp;nbsp; │&amp;nbsp;&amp;nbsp; ├── microservices.table&lt;br /&gt;│&amp;nbsp;&amp;nbsp; ├── obj&lt;br /&gt;│&amp;nbsp;&amp;nbsp; └── src&lt;br /&gt;│&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ├── test.c&lt;br /&gt;├── rules&lt;br /&gt;└── servers&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Next I will make an entry for enabling the new module (this micro-service), this is done in the file &lt;i&gt;~irods/config/config.mk&lt;/i&gt; so that the iRODS Makefile can include the new micro-service for build. To do this simply add the module folder name (in my case HCC) to the variable &lt;i&gt;MODULES&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span&gt;Compile and test&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;cd ~irods/modules/&amp;lt;YOURMODULENAME&amp;gt;&lt;br /&gt;make&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The above commands should result in creation of an object file in the micro-service/obj folder. I am going to test the micro-service manually first, to accomplish this I will create a client side rule file in the folder &lt;i&gt;~irods/ clients/icommands/test/rules&lt;/i&gt;. I have named the file aguru.ir and following are the contents of the file:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;aguruTest||msiWritemessage(*A,*B)|nop&lt;br /&gt;*A=helloworld%*B=testing&lt;br /&gt;&lt;/pre&gt;&amp;nbsp;&lt;/div&gt;&lt;div&gt;The first line in file&amp;nbsp; is the rules definition and the second line are the input parameters. To test the micro-service I will&amp;nbsp; invoke the micro-service which will then write a message to the system log (see figure below).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;div&gt;&lt;img height=&quot;363px;&quot; src=&quot;https://lh6.googleusercontent.com/PvAfaBLKo7o0OayethKj9p71V-a_sQA0rZHS4GBWk8VW_3gGK8dPi5g3Jp1f_0E5vZnCCU4XFv2-y1XA5MXaXaG6sNF2sOUbIVryR1jb6M41H0vGxWs&quot; width=&quot;516px;&quot; /&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;span&gt;Recompile iRODS&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Before this step I must make the entries for the headers and the msi table in the iRODS main micro-service action table (i.e. file ~irods/server/re/include/reAction.h). This should be done using the following commands:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;rm server/re/include/reAction.h&lt;br /&gt;make reaction&amp;nbsp;&lt;/pre&gt;&lt;pre&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;However, I had to manually add the code segment below to the file &lt;i&gt;server/re/include/reAction.h&lt;/i&gt; file to accomplish that:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;int msiWritemessage(msParam_t *mParg1, msParam_t *mParg2,&amp;nbsp; ruleExecInfo_t *rei);&lt;br /&gt;&lt;/pre&gt;Finally, recompile iRODS&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;cd ~irods&lt;br /&gt;make test_flags&lt;br /&gt;make modules&lt;br /&gt;./irodsctl stop&lt;br /&gt;make clean&lt;br /&gt;make&lt;br /&gt;./irodsctl start&lt;br /&gt;./irodsctl status&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;b&gt;&lt;span&gt;Register Micro-service and Test&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In this step we define a rule that will trigger the micro-service when a new data object is uploaded to iRODS. Open the file &lt;i&gt;~irods/server/config/reConfigs/core.re &lt;/i&gt;and add the following line&amp;nbsp; the Test Rules section.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;acPostProcForPut {msiWritemessage(&quot;HelloWorld&quot;,&quot;String 2&quot;); }&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;That is it… if now I put (iput) any file into iRODS a message is added to the /var/log/messages file on the iRODS server. Please note that the above rule is not filtering a particular occurrence but is a catchall rule that applies to all put events.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;References:&lt;/b&gt;&lt;br /&gt;&lt;a href=&quot;https://www.irods.org/&quot;&gt;https://www.irods.org/&lt;/a&gt;&lt;br /&gt;&lt;a href=&quot;http://www.wrg.york.ac.uk/iread/compiling-and-running-irods-with-micros-services&quot;&gt;http://www.wrg.york.ac.uk/iread/compiling-and-running-irods-with-micros-services&lt;/a&gt;&lt;br /&gt;&lt;a href=&quot;http://technical.bestgrid.org/index.php/IRODS_deployment_plan&quot;&gt;http://technical.bestgrid.org/index.php/IRODS_deployment_plan&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/8803173202887660937-4646684131624655729?l=osgtech.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Fri, 23 Dec 2011 06:14:49 +0000</pubDate>
	<author>noreply@blogger.com (Ashu Guru)</author>
</item>
<item>
	<title>An Open Science Grid Work Log: CAMx on OSG with UNC Environmental Science and Engineering</title>
	<guid>http://osglog.wordpress.com/?p=655</guid>
	<link>http://osglog.wordpress.com/2011/12/14/camx-on-osg-with-unc-environmental-science-and-engineering/</link>
	<description>&lt;p&gt;Dr. &lt;a title=&quot;William Vizuete&quot; href=&quot;http://www.unc.edu/~vizuete/research.htm&quot; target=&quot;_blank&quot;&gt;William Vizuete&amp;#8217;s&lt;/a&gt; team at the &lt;a href=&quot;http://www.sph.unc.edu/envr/&quot; target=&quot;_blank&quot;&gt;UNC department of Environmental Science and Engineering&lt;/a&gt;  evaluates and uses a number of air quality models which are, in turn used by regulatory agencies to understand air quality issues. From Dr. Vizuete&amp;#8217;s research page:&lt;/p&gt;
&lt;p&gt;&amp;#8220;Using high performance computers and three dimensional models to simulate the atmosphere, I am working to improve our understanding of the formation of atmospheric air pollution. These computer models improve our understanding of the extremely complex chemical and physical processes that occur in the atmosphere. A better understanding of the atmosphere gives us the knowledge to improve the tools and methods that policy makers use to make effective control strategies to clean the air above our dirtiest cities.&amp;#8221;&lt;/p&gt;
&lt;p&gt;Two models, CAMx by ENVIRON and CMAQ are heavily used.&lt;/p&gt;
&lt;p&gt;We chose to start with CAMx because word on the street is that it&amp;#8217;s easier to build than CMAQ. Well, it&amp;#8217;s all relative.&lt;/p&gt;
&lt;p&gt;Here&amp;#8217;s the new automated &lt;a title=&quot;RENCI CI CAMx build&quot; href=&quot;https://ci-dev.renci.org/hudson/view/RCI/job/rci-CAMx/&quot; target=&quot;_blank&quot;&gt;build&lt;/a&gt; process for CAMx. It uses the Intel fortran compiler version 11.1 and builds an x86_64 binary that&amp;#8217;s statically linked.&lt;/p&gt;
&lt;p&gt;Also built in that process is a program called &lt;a href=&quot;http://www.camx.com/down/support.php&quot; target=&quot;_blank&quot;&gt;avgdif&lt;/a&gt; from the ENVIRON website. It is a standalone fortran program that compares the average difference between two runs. This allows us to validate execution of the CAMx &lt;a href=&quot;http://www.camx.com/down/testcase.php&quot; target=&quot;_blank&quot;&gt;test cases&lt;/a&gt;. It is slightly modified in order to have enough memory for this model (maxx and maxy are set to 120 near the top of the file). Finally, there&amp;#8217;s a custom makefile for it to work with the Intel compiler.&lt;/p&gt;
&lt;p&gt;CAMx has for several versions been OpenMP enabled. This is well suited for the OSG&amp;#8217;s HTPC node concurrency model since the job can be configured to take advantage of all the CPUs on a node but does not require the additional complexity of a statically linked MPI launcher.&lt;/p&gt;
&lt;p&gt;The new job is being tested now.  A GlideinWMS submit script has been set up to launch the job to various HTPC compute venues. The job itself will:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fetch binaries from the RENCI continuous integration environment&lt;/li&gt;
&lt;li&gt;Fetch test cases from ENVIRON&amp;#8217;s website&lt;/li&gt;
&lt;li&gt;Execute the test&lt;/li&gt;
&lt;li&gt;Execute avgdif on the outputs&lt;/li&gt;
&lt;li&gt;Return the results of avgdif&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once this is done, the environmental engineering community will be able to give it a try once they&amp;#8217;ve &lt;a title=&quot;Join OSG Engage Virtual Organization&quot; href=&quot;https://twiki.grid.iu.edu/bin/view/Engagement/EngageNewUserGuide&quot; target=&quot;_blank&quot;&gt;joined the Engage VO&lt;/a&gt;.&lt;/p&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/osglog.wordpress.com/655/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/osglog.wordpress.com/655/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/osglog.wordpress.com/655/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/osglog.wordpress.com/655/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/osglog.wordpress.com/655/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/osglog.wordpress.com/655/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/osglog.wordpress.com/655/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/osglog.wordpress.com/655/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/osglog.wordpress.com/655/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/osglog.wordpress.com/655/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/osglog.wordpress.com/655/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/osglog.wordpress.com/655/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/osglog.wordpress.com/655/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/osglog.wordpress.com/655/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=osglog.wordpress.com&amp;amp;blog=15233693&amp;amp;post=655&amp;amp;subd=osglog&amp;amp;ref=&amp;amp;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Fri, 16 Dec 2011 16:22:36 +0000</pubDate>
</item>
<item>
	<title>An Open Science Grid Work Log: Pegasus WMS and NERSC Dirac</title>
	<guid>http://osglog.wordpress.com/?p=650</guid>
	<link>http://osglog.wordpress.com/2011/12/07/pegasus-wms-and-nersc-dirac/</link>
	<description>&lt;p&gt;The Engage VO&amp;#8217;s recently begun working with Duke Chemistry on a protein folding simulation  that&amp;#8217;s taking us to interesting places.&lt;/p&gt;
&lt;p&gt;In particular, deep inside the National Energy Research Scientific Center &amp;#8211; NERSC and specifically, to their 50 node General Purpose Graphics Processor (GPGPU) cluster, &lt;a title=&quot;Dirac - NERSC GPU Cluster&quot; href=&quot;https://www.nersc.gov/users/computational-systems/dirac/&quot;&gt;Dirac&lt;/a&gt;. You know you&amp;#8217;re having a good time when the cluster you&amp;#8217;re using is at a cutting edge DOE research lab and they consider it experimental.&lt;/p&gt;
&lt;p&gt;As noted in &lt;a title=&quot;Amber on GPU&quot; href=&quot;http://osglog.wordpress.com/2011/02/02/amber11-pmemd-for-nvidia-gpgpu/&quot; target=&quot;_blank&quot;&gt;previous &lt;/a&gt;posts, performance for many molecular dynamics applications is greatly improved on GPUs. We&amp;#8217;re using &lt;a title=&quot;NAMD&quot; href=&quot;http://www.ks.uiuc.edu/Research/namd/&quot; target=&quot;_blank&quot;&gt;NAMD&lt;/a&gt; which provides pre-built binaries for GPUs. The binaries ship with dynamic libraries for CUDA, its only external dependency. As such, it&amp;#8217;s practical for use on the Open Science Grid since it can be relocated to new clusters trivially.&lt;/p&gt;
&lt;p&gt;NAMD, like Amber and many scientific codes, especially ones that simulate natural phenomena, supports restart. That is, output from one run can be used as input to another to continue the calculation. This is important since ultimately we&amp;#8217;re modeling a biological system that doesn&amp;#8217;t have a particular endpoint &amp;#8211; it just keeps going. It also has implications for the shape of our workflow. In this case we have one computation activity &amp;#8211; execute NAMD &amp;#8211; which we&amp;#8217;d like to repeat, feeding the input from one run into the next.&lt;/p&gt;
&lt;p&gt;We used Pegasus WMS as the wrokflow execution engine. There are lots of reasons for this but here are some of the basics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It&amp;#8217;s state of the art for creating a useful layer of workflow abstraction over the OSG&amp;#8217;s runtime infrastructure.&lt;/li&gt;
&lt;li&gt;The support list is incredibly helpful.&lt;/li&gt;
&lt;li&gt;It allows me to use my personal OSG/DOE X.509 user credential which is required by NERSC policy.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each compute job stages in&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The GPU accelerated version of NAMD&lt;/li&gt;
&lt;li&gt;The input data files&lt;/li&gt;
&lt;li&gt;A statically linked version of MPICH2&lt;/li&gt;
&lt;li&gt;The ouptut file from the previous run if one exists&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Output is written from this job, archived and transferred back to the submit host.&lt;/p&gt;
&lt;p&gt;One of the interesting aspects of this is how to generate workflows with these characteristics. Each compute job is somewhat logically self contained. We&amp;#8217;d like to be able to repeat them in chains and otherwise organize them hierarchically &amp;#8211; for example &amp;#8211; to do parallel simulations, each of which consists of a chain of compute jobs. More on this in a later post.&lt;/p&gt;
&lt;p&gt;Another challenge, from a workflow maintenance and configuration perspective is how to get very hardware specific in our job specifications while keeping the workflow abstract. Here&amp;#8217;s the job specification for running NAMD on Dirac&amp;#8217;s Tesla GPUs.&lt;/p&gt;
&lt;h6&gt;&amp;lt;!&amp;#8211; part 3: Definition of all jobs/dags/daxes (at least one) &amp;#8211;&amp;gt;&lt;br /&gt;
&amp;lt;job id=&amp;#8221;1n3&amp;#8243; namespace=&amp;#8221;namd-flow.0&amp;#8243; name=&amp;#8221;namd&amp;#8221;&amp;gt;&lt;br /&gt;
&amp;lt;argument&amp;gt;&amp;#8211;model=ternarycomplex119819 &amp;#8211;config=ternarycomplex_popcwimineq-05 &amp;#8211;slice=0 &amp;#8211;namdType=CUDA &amp;#8211;runLength=100&amp;lt;/argument&amp;gt;&lt;br /&gt;
&amp;lt;profile namespace=&amp;#8221;globus&amp;#8221; key=&amp;#8221;queue&amp;#8221;&amp;gt;dirac_reg&amp;lt;/profile&amp;gt;&lt;br /&gt;
&amp;lt;profile namespace=&amp;#8221;globus&amp;#8221; key=&amp;#8221;xcount&amp;#8221;&amp;gt;8:tesla&amp;lt;/profile&amp;gt;&lt;br /&gt;
&amp;lt;profile namespace=&amp;#8221;globus&amp;#8221; key=&amp;#8221;maxWallTime&amp;#8221;&amp;gt;240&amp;lt;/profile&amp;gt;&lt;br /&gt;
&amp;lt;profile namespace=&amp;#8221;globus&amp;#8221; key=&amp;#8221;jobType&amp;#8221;&amp;gt;mpi&amp;lt;/profile&amp;gt;&lt;br /&gt;
&amp;lt;profile namespace=&amp;#8221;globus&amp;#8221; key=&amp;#8221;host_xcount&amp;#8221;&amp;gt;1&amp;lt;/profile&amp;gt;&lt;br /&gt;
&amp;lt;profile namespace=&amp;#8221;condor&amp;#8221; key=&amp;#8221;x509userproxy&amp;#8221;&amp;gt;/home/scox/dev/grayson/var/proxy/x509_proxy_scox&amp;lt;/profile&amp;gt;&lt;br /&gt;
&amp;lt;uses name=&amp;#8221;mpich2-static-1.1.1p1.tar.gz&amp;#8221; link=&amp;#8221;input&amp;#8221;/&amp;gt;&lt;br /&gt;
&amp;lt;uses name=&amp;#8221;cpuinfo&amp;#8221; link=&amp;#8221;input&amp;#8221;/&amp;gt;&lt;br /&gt;
&amp;lt;uses name=&amp;#8221;beratan-0.tar.gz&amp;#8221; link=&amp;#8221;input&amp;#8221;/&amp;gt;&lt;br /&gt;
&amp;lt;uses name=&amp;#8221;namd.tar.gz&amp;#8221; link=&amp;#8221;input&amp;#8221;/&amp;gt;&lt;br /&gt;
&amp;lt;uses name=&amp;#8221;out-0.tar.gz&amp;#8221; link=&amp;#8221;output&amp;#8221;/&amp;gt;&lt;br /&gt;
&amp;lt;/job&amp;gt;&lt;/h6&gt;
&lt;p&gt;Again, more on how we&amp;#8217;re managing this in a later post.&lt;/p&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/osglog.wordpress.com/650/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/osglog.wordpress.com/650/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/osglog.wordpress.com/650/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/osglog.wordpress.com/650/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/osglog.wordpress.com/650/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/osglog.wordpress.com/650/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/osglog.wordpress.com/650/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/osglog.wordpress.com/650/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/osglog.wordpress.com/650/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/osglog.wordpress.com/650/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/osglog.wordpress.com/650/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/osglog.wordpress.com/650/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/osglog.wordpress.com/650/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/osglog.wordpress.com/650/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=osglog.wordpress.com&amp;amp;blog=15233693&amp;amp;post=650&amp;amp;subd=osglog&amp;amp;ref=&amp;amp;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Fri, 16 Dec 2011 16:19:51 +0000</pubDate>
</item>
<item>
	<title>OSG Technology Area Rumblings: How to create openstack controller</title>
	<guid>tag:blogger.com,1999:blog-8803173202887660937.post-1846261836896771050</guid>
	<link>http://osgtech.blogspot.com/2011/12/how-to-create-openstack-controller.html</link>
	<description>As before, the &quot;official&quot; instructions on which our procedure is based are here:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://docs.openstack.org/cactus/openstack-compute/admin/content/installing-openstack-compute-on-rhel6.html&quot;&gt;&lt;span&gt;http://docs.openstack.org/cactus/openstack-compute/admin/content/installing-openstack-compute-on-rhel6.html&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;First setup the repository:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;wget http://yum.griddynamics.net/yum/cactus/openstack/openstack-repo-2011.2-1.el6.noarch.rpm&lt;/span&gt;&lt;br /&gt;&lt;span&gt;rpm -ivh openstack-repo-2011.2-1.el6.noarch.rpm&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Then install openstack and dependencies&lt;br /&gt;&lt;br /&gt;&lt;span&gt;yum install libvirt&lt;/span&gt;&lt;br /&gt;&lt;span&gt;chkconfig libvirtd on&lt;/span&gt;&lt;br /&gt;&lt;span&gt;/etc/init.d/libvirtd start&lt;/span&gt;&lt;br /&gt;&lt;span&gt;yum install euca2ools openstack&lt;br /&gt;nova-{api,compute,network,objectstore,scheduler,volume} openstack-nova-cc-config openstack-glance&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Start services:&lt;br /&gt;&lt;br /&gt;&lt;span&gt;service mysqld start&lt;/span&gt;&lt;br /&gt;&lt;span&gt;chkconfig mysqld on&lt;/span&gt;&lt;br /&gt;&lt;span&gt;service rabbitmq-server start&lt;/span&gt;&lt;br /&gt;&lt;span&gt;chkconfig rabbitmq-server on&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Setup database authorisations. First set up root password:&lt;br /&gt;&lt;br /&gt;&lt;span&gt;mysqladmin -uroot password &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now, to automate the procedure create an executable shell script&lt;br /&gt;&lt;br /&gt;&lt;span&gt;openstack-db-setup.sh&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;with the following content (fill the relevant user name and password fields as well as the IP's):&lt;br /&gt;&lt;br /&gt;&lt;span&gt;#!/bin/bash&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;DB_NAME=nova&lt;/span&gt;&lt;br /&gt;&lt;span&gt;DB_USER=&lt;/span&gt;&lt;br /&gt;&lt;span&gt;DB_PASS=&lt;/span&gt;&lt;br /&gt;&lt;span&gt;PWD=&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;#CC_HOST=&quot;A.B.C.D&quot; # IPv4 address&lt;/span&gt;&lt;br /&gt;&lt;span&gt;CC_HOST=&quot;130.199.148.53&quot; # IPv4 address, fill your own&lt;/span&gt;&lt;br /&gt;&lt;span&gt;#HOSTS='node1 node2 node3' # compute nodes list&lt;/span&gt;&lt;br /&gt;&lt;span&gt;HOSTS='130.199.148.54' # compute nodes list, fill your own&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;mysqladmin -uroot -p$PWD -f drop nova&lt;/span&gt;&lt;br /&gt;&lt;span&gt;mysqladmin -uroot -p$PWD create nova&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;for h in $HOSTS localhost; do&lt;/span&gt;&lt;br /&gt;&lt;span&gt;        echo &quot;GRANT ALL PRIVILEGES ON $DB_NAME.* TO '$DB_USER'@'$h' IDENTIFIED BY '$DB_PASS';&quot; | mysql -uroot -p$DB_PASS mysql&lt;/span&gt;&lt;br /&gt;&lt;span&gt;done&lt;/span&gt;&lt;br /&gt;&lt;span&gt;echo &quot;GRANT ALL PRIVILEGES ON $DB_NAME.* TO $DB_USER IDENTIFIED BY '$DB_PASS';&quot; | mysql -uroot -p$DB_PASS mysql&lt;/span&gt;&lt;br /&gt;&lt;span&gt;echo &quot;GRANT ALL PRIVILEGES ON $DB_NAME.* TO root IDENTIFIED BY '$DB_PASS';&quot; | mysql -uroot -p$DB_PASS mysql&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;And now execute this script:&lt;br /&gt;&lt;br /&gt;&lt;span&gt;./openstack-db-setup.sh&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Create db schema&lt;br /&gt;&lt;br /&gt;&lt;span&gt;nova-manage db sync&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Now comes point which is not in the &quot;official&quot; instructions. The installation will not work unless you patch your python:&lt;br /&gt;&lt;br /&gt;&lt;span&gt;patch -p0 &amp;lt; rhel6-nova-network-patch.diff&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Create logical volumes:&lt;br /&gt;&lt;br /&gt;&lt;span&gt;lvcreate -L 1G --name test nova-volumes&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;For your convenience create an openstack startup shell script&lt;span&gt; openstack-init.sh&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Here is its content:&lt;br /&gt;&lt;br /&gt;&lt;span&gt;#!/bin/bash&lt;/span&gt;&lt;br /&gt;&lt;span&gt;for n in api compute network objectstore scheduler volume; do  &lt;/span&gt;&lt;br /&gt;&lt;span&gt;    service openstack-nova-$n $@; &lt;/span&gt;&lt;br /&gt;&lt;span&gt;done&lt;/span&gt;&lt;br /&gt;&lt;span&gt;service openstack-glance-api $@&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;And finally we are ready to start openstack:&lt;br /&gt;&lt;br /&gt;&lt;span&gt;openstack-init.sh start&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;With fingers crossed you should get&lt;br /&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Nova API Server:                        [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Nova Compute Worker:                    [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Nova Network Controller:                [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Nova Object Storage:                    [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Nova Scheduler:                         [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Nova Volume Worker:                     [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Glance API Server:                      [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Now we need to configure and customize the installation which is another story for another day...&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;./openstack-init.sh start&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If everything goes fine&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Nova API Server:                        [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Nova Compute Worker:                    [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Nova Network Controller:                [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Nova Object Storage:                    [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Nova Scheduler:                         [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Nova Volume Worker:                     [  OK  ]&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Glance API Server:                      [  OK  ]&lt;/span&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/8803173202887660937-1846261836896771050?l=osgtech.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Thu, 15 Dec 2011 12:06:20 +0000</pubDate>
	<author>noreply@blogger.com (TomW)</author>
</item>
<item>
	<title>OSG Technology Area Rumblings: How to create openstack worker node</title>
	<guid>tag:blogger.com,1999:blog-8803173202887660937.post-4953757036561754700</guid>
	<link>http://osgtech.blogspot.com/2011/12/how-to-create-openstack-worker-node.html</link>
	<description>The &quot;official&quot; instructions how to install openstack components are located here:&lt;br /&gt;&lt;br /&gt;http://docs.openstack.org/cactus/openstack-compute/admin/content/installing-openstack-compute-on-rhel6.html&lt;br /&gt;&lt;br /&gt;Unfortunately they are not very clear and miss some key points. Below is summary of our installation procedure.&lt;br /&gt;&lt;br /&gt;First of all, let us install worker node.&lt;br /&gt;&lt;br /&gt;&lt;span&gt;wget http://yum.griddynamics.net/yum/cactus/openstack/openstack-repo-2011.2-1.el6.noarch.rpm&lt;/span&gt;&lt;br /&gt;&lt;span&gt;rpm -ivh openstack-repo-2011.2-1.el6.noarch.rpm&lt;/span&gt;&lt;br /&gt;&lt;span&gt;yum install libvirt&lt;/span&gt;&lt;br /&gt;&lt;span&gt;chkconfig libvirtd on&lt;/span&gt;&lt;br /&gt;&lt;span&gt;/etc/init.d/libvirtd start&lt;/span&gt;&lt;br /&gt;&lt;span&gt;yum install openstack-nova-compute openstack-nova-compute-config&lt;/span&gt;&lt;br /&gt;&lt;span&gt;service openstack-nova-compute start&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If everything goes fine you should see&lt;br /&gt;&lt;br /&gt;&lt;span&gt;Starting OpenStack Nova Compute Worker:                    [  OK  ]&lt;/span&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/8803173202887660937-4953757036561754700?l=osgtech.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Thu, 15 Dec 2011 11:46:58 +0000</pubDate>
	<author>noreply@blogger.com (TomW)</author>
</item>
<item>
	<title>Spinning: New toy: newpgid</title>
	<guid>http://spinningmatt.wordpress.com/?p=525</guid>
	<link>http://spinningmatt.wordpress.com/2011/12/13/new-toy-newpgid/</link>
	<description>&lt;p&gt;Useful with &lt;a href=&quot;https://spinningmatt.wordpress.com/2010/06/05/toys-cpusoak-and-memsoak/&quot;&gt;cpusoak and memsoak&lt;/a&gt;,&lt;/p&gt;
&lt;p&gt;&lt;b&gt;newpgid.c&lt;/b&gt;&lt;br /&gt;
&lt;pre class=&quot;brush: cpp;&quot;&gt;
#include &amp;lt;unistd.h&amp;gt;

int
main(int argc, char *argv[])
{
  setpgid(0, 0);

  execvp(argv[1], &amp;amp;(argv[1]));

  return 1;
}
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;When you want to start a new process in its own process group for easy killing.&lt;/p&gt;
&lt;p&gt;If you have coreutils 7.0+, you can take advantage of &lt;i&gt;timeout&lt;/i&gt;, which happens to setpgid.&lt;/p&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/spinningmatt.wordpress.com/525/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/spinningmatt.wordpress.com/525/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/spinningmatt.wordpress.com/525/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/spinningmatt.wordpress.com/525/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/spinningmatt.wordpress.com/525/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/spinningmatt.wordpress.com/525/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/spinningmatt.wordpress.com/525/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/spinningmatt.wordpress.com/525/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/spinningmatt.wordpress.com/525/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/spinningmatt.wordpress.com/525/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/spinningmatt.wordpress.com/525/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/spinningmatt.wordpress.com/525/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/spinningmatt.wordpress.com/525/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/spinningmatt.wordpress.com/525/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=spinningmatt.wordpress.com&amp;#038;blog=6870579&amp;#038;post=525&amp;#038;subd=spinningmatt&amp;#038;ref=&amp;#038;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Tue, 13 Dec 2011 11:14:41 +0000</pubDate>
</item>
<item>
	<title>OSG Technology Area Rumblings: Network Accounting for Condor</title>
	<guid>tag:blogger.com,1999:blog-8803173202887660937.post-8008964395474516425</guid>
	<link>http://osgtech.blogspot.com/2011/12/network-accounting-for-condor.html</link>
	<description>It's been a long time since the &lt;a href=&quot;http://osgtech.blogspot.com/2011/09/per-batch-job-network-statistics.html&quot;&gt;August post&lt;/a&gt; describing how to set up manual network accounting for a process. &amp;nbsp;We now have a solution integrated into Condor and available &lt;a href=&quot;https://github.com/bbockelm/condor-network-accounting&quot;&gt;on github&lt;/a&gt;. &amp;nbsp;It requires a bit to understand how it works, so I've put together a series of diagrams to illustrate it.&lt;br /&gt;&lt;br /&gt;First, we start off with the lowly &lt;i&gt;condor_starter&lt;/i&gt; on any worker node with an network connection (to simplify things, I didn't draw the other condor processes involved):&lt;br /&gt;&lt;div class=&quot;separator&quot;&gt;&lt;span id=&quot;goog_513466619&quot;&gt;&lt;/span&gt;&lt;span id=&quot;goog_513466620&quot;&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://4.bp.blogspot.com/-BtJTZ5Wmg5E/TtqliF1fPGI/AAAAAAAAAdQ/0-6Z7ZqE5YY/s1600/Network+Namespaces+Illustration+1.png&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://4.bp.blogspot.com/-BtJTZ5Wmg5E/TtqliF1fPGI/AAAAAAAAAdQ/0-6Z7ZqE5YY/s1600/Network+Namespaces+Illustration+1.png&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;By default, all processes on the node are in the same network namespace (labelled the &quot;System Network Namespace&quot; in this diagram). &amp;nbsp;We denote the network interface with a box, and assume it has address 192.168.0.1.&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;Next, the starter will create a pair of virtual ethernet devices. &amp;nbsp;We will refer to them as pipe devices, because any byte written into one will come out of the other - just how a venerable Unix pipe works:&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://2.bp.blogspot.com/-GcVmvkaUH2I/TtqmGAbkBoI/AAAAAAAAAdY/mIVqQVfNUHM/s1600/Network+Namespaces+Illustration+2.png&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://2.bp.blogspot.com/-GcVmvkaUH2I/TtqmGAbkBoI/AAAAAAAAAdY/mIVqQVfNUHM/s1600/Network+Namespaces+Illustration+2.png&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;By default, the network pipes are in a down state and have no IP address associated with them. &amp;nbsp;Not very useful! &amp;nbsp;At this point, we have some decisions to make: how should the network pipe device be presented to the network? &amp;nbsp;Should it be networked at layer 3, using NAT to route packets? &amp;nbsp;Or should we bridge it at layer 2, allowing the device to have a public IP address?&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;Really, it's up to the site, but we assume most sites will want to take the NAT approach: the public IP address might seem useful, but would require a public IP for each job. &amp;nbsp;To allow customization, all the routing is done by a helper script, but provide a default implementation for NAT. &amp;nbsp;The script:&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;&lt;/div&gt;&lt;ul&gt;&lt;li&gt;Takes two arguments, a unique &quot;job identifier&quot; and the name of the network pipe device.&lt;/li&gt;&lt;li&gt;Is responsible for setting up any routing required for the device.&lt;/li&gt;&lt;li&gt;Must create an iptables chain using the same name of the &quot;job identifier&quot;.&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Each rule in the chain will record the number of bytes matched; at the end of the job, these will be reported in the job ClassAd using an attribute name identical to the comment on the rule.&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;On stdout, returns the IP address the internal network pipe should use.&lt;/li&gt;&lt;/ul&gt;Additionally, the Condor provides a cleanup script does the inverse of the setup script. &amp;nbsp;The result looks something like this:&lt;br /&gt;&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://4.bp.blogspot.com/-5vX_SFAQkSY/TuFS4jDX3HI/AAAAAAAAAds/4aisRs97iEA/s1600/Network+Namespaces+Illustration+3.png&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://4.bp.blogspot.com/-5vX_SFAQkSY/TuFS4jDX3HI/AAAAAAAAAds/4aisRs97iEA/s1600/Network+Namespaces+Illustration+3.png&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;Next, the starter forks a separate process in a new network namespace using the clone() call with the CLONE_NEWNET flag. &amp;nbsp;Notice that, by default, no network devices are accessible in the new namespace:&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://4.bp.blogspot.com/-pTGUfwub6GE/TuFWqWn20GI/AAAAAAAAAd8/aLZy8z9X4N0/s1600/Network+Namespaces+Illustration+4.png&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://4.bp.blogspot.com/-pTGUfwub6GE/TuFWqWn20GI/AAAAAAAAAd8/aLZy8z9X4N0/s1600/Network+Namespaces+Illustration+4.png&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;Next, the external starter will pass one side of the pipe to the other namespace; the internal stater will do some minimal configuration of the device (default route, IP address, set the device to the &quot;up&quot; status):&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://4.bp.blogspot.com/-rcms2kWSNk4/TuFXwH0_LsI/AAAAAAAAAeE/P8o8wsZf2Bk/s1600/Network+Namespaces+Illustration+5.png&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://4.bp.blogspot.com/-rcms2kWSNk4/TuFXwH0_LsI/AAAAAAAAAeE/P8o8wsZf2Bk/s1600/Network+Namespaces+Illustration+5.png&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;Finally, the starter exec's to the job. &amp;nbsp;Whenever the job does any network operations, the bytes are routed via the internal network pipe, come out the external network pipe, and then are NAT'd to the physical network device before exiting the machine.&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://2.bp.blogspot.com/-2JlhOJkWPWk/TuFbBJ-f4sI/AAAAAAAAAeM/dDy0tHDxEUc/s1600/Network+Namespaces+Illustration+6.png&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://2.bp.blogspot.com/-2JlhOJkWPWk/TuFbBJ-f4sI/AAAAAAAAAeM/dDy0tHDxEUc/s1600/Network+Namespaces+Illustration+6.png&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;As mentioned, the whole point of the exercise is to do network accounting. &amp;nbsp;Since all packets go through one device, Condor can read out all the activity via iptables. &amp;nbsp;The &quot;helper script&quot; above will create a unique chain per job. &amp;nbsp;This allows some level of flexibility; for example, the chain below allows us to distinguish between on-campus and off-campus packets:&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;pre&gt;Chain JOB_12345 (2 references)&lt;br /&gt;&amp;nbsp;pkts bytes target &amp;nbsp; &amp;nbsp; prot opt in &amp;nbsp; &amp;nbsp; out &amp;nbsp; &amp;nbsp; source &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; destination &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;br /&gt;&amp;nbsp; &amp;nbsp; 0 &amp;nbsp; &amp;nbsp; 0 ACCEPT &amp;nbsp; &amp;nbsp; all &amp;nbsp;-- &amp;nbsp;veth0 &amp;nbsp;em1 &amp;nbsp; &amp;nbsp; anywhere &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 129.93.0.0/16 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/* OutgoingInternal */&lt;br /&gt;&amp;nbsp; &amp;nbsp; 0 &amp;nbsp; &amp;nbsp; 0 ACCEPT &amp;nbsp; &amp;nbsp; all &amp;nbsp;-- &amp;nbsp;veth0 &amp;nbsp;em1 &amp;nbsp; &amp;nbsp; anywhere &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;!129.93.0.0/16 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;/* OutgoingExternal */&lt;br /&gt;&amp;nbsp; &amp;nbsp; 0 &amp;nbsp; &amp;nbsp; 0 ACCEPT &amp;nbsp; &amp;nbsp; all &amp;nbsp;-- &amp;nbsp;em1 &amp;nbsp; &amp;nbsp;veth0 &amp;nbsp; 129.93.0.0/16 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;anywhere &amp;nbsp; &amp;nbsp; &amp;nbsp;   &amp;nbsp; &amp;nbsp; state RELATED,ESTABLISHED /* IncomingInternal */&lt;br /&gt;&amp;nbsp; &amp;nbsp; 0 &amp;nbsp; &amp;nbsp; 0 ACCEPT &amp;nbsp; &amp;nbsp; all &amp;nbsp;-- &amp;nbsp;em1 &amp;nbsp; &amp;nbsp;veth0 &amp;nbsp;!129.93.0.0/16 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;anywhere &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; state RELATED,ESTABLISHED /* IncomingExternal */&lt;br /&gt;&amp;nbsp; &amp;nbsp; 0 &amp;nbsp; &amp;nbsp; 0 REJECT &amp;nbsp; &amp;nbsp; all &amp;nbsp;-- &amp;nbsp;any &amp;nbsp; &amp;nbsp;any &amp;nbsp; &amp;nbsp; anywhere &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; anywhere &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; reject-with icmp-port-unreachable&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Thus, the resulting ClassAd history from this job will have an attribute for &lt;i&gt;NetworkOutgoingInternal&lt;/i&gt;, &lt;i&gt;NetworkOutgoingExternal&lt;/i&gt;, &lt;i&gt;NetworkIncomingInternal&lt;/i&gt;, and &lt;i&gt;NetworkIncomingInternal&lt;/i&gt;. &amp;nbsp;We have an updated Condor Gratia probe that looks for &lt;i&gt;Network*&lt;/i&gt; attributes and reports them appropriately to the accounting database.&lt;br /&gt;&lt;div class=&quot;separator&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;Thus, we have byte-level network, allowing us to answer the age-old question of &quot;how much would a CMS T2 cost on Amazon EC2?&quot;. &amp;nbsp;Or perhaps we could answer &quot;how much is a currently running job going to cost me?&quot; Matt has pointed out the network setup callout could be used to implement security zones, isolating (or QoS'ing) jobs of certain users at the network level. &amp;nbsp;There are quite a few possibilities! &amp;nbsp;&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;We'll definitely be returning to this work mid-2012 when the local T2 is based on SL6, and this patch can be put into production. &amp;nbsp;There will be some further engagement with the Condor team to see if they're interested in taking the patch. &amp;nbsp;The Gratia probe work to manage network information will be interesting upstream too. &amp;nbsp;Finally, I encourage interested readers to take a look at the github branch. &amp;nbsp;The patch itself is a tour-de-force of several dark corners of Linux systems programming (involves using clone, synchronization between processes with pipes, sending messages to the kernel via netlink to configure the routing, and reading out iptables configurations using C). &amp;nbsp;It was very rewarding to implement!&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class=&quot;separator&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/8803173202887660937-8008964395474516425?l=osgtech.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Thu, 08 Dec 2011 17:03:39 +0000</pubDate>
	<author>noreply@blogger.com (Brian Bockelman)</author>
</item>
<item>
	<title>Spinning: Service as a Job: Memcached</title>
	<guid>http://spinningmatt.wordpress.com/?p=548</guid>
	<link>http://spinningmatt.wordpress.com/2011/12/05/service-as-a-job-memcached/</link>
	<description>&lt;p&gt;Running services such as &lt;a href=&quot;http://spinningmatt.wordpress.com/2011/02/27/service-as-a-job-the-tomcat-app-server/&quot;&gt;Tomcat&lt;/a&gt; or &lt;a href=&quot;http://spinningmatt.wordpress.com/2010/05/18/service-as-a-job-the-qpid-c-broker/&quot;&gt;Qpidd&lt;/a&gt; show how to schedule and manage a service&amp;#8217;s life-cycle via Condor. It is also possible to gather and centralize statistics about a service as it runs. Here is an example of how with &lt;a href=&quot;http://memcached.org/&quot;&gt;memcached&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As with tomcat and qpidd, there is a control script and a job description.&lt;/p&gt;
&lt;p&gt;New in the control script for memcached will be a loop to monitor and chirp back statistic information.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;memcached.sh&lt;/b&gt;&lt;br /&gt;
&lt;pre class=&quot;brush: bash;&quot;&gt;
#!/bin/sh

# condor_chirp in /usr/libexec/condor
export PATH=$PATH:/usr/libexec/condor

PORT_FILE=$TMP/.ports

# When we get SIGTERM, which Condor will send when
# we are kicked, kill off memcached.
function term {
   rm -f $PORT_FILE
   kill %1
}

# Spawn memcached, and make sure we can shut it down cleanly.
trap term SIGTERM
# memcached will write port information to env(MEMCACHED_PORT_FILENAME)
env MEMCACHED_PORT_FILENAME=$PORT_FILE memcached -p -1 &amp;quot;$@&amp;quot; &amp;amp;

# We might have to wait for the port
while [ ! -s $PORT_FILE ]; do sleep 1; done

# The port file's format is:
#  TCP INET: 56697
#  TCP INET6: 47318
#  UDP INET: 34453
#  UDP INET6: 54891
sed -i -e 's/ /_/' -e 's/\(.*\): \(.*\)/\1=\2/' $PORT_FILE
source $PORT_FILE
rm -f $PORT_FILE

# Record the port number where everyone can see it
condor_chirp set_job_attr MemcachedEndpoint \&amp;quot;$HOSTNAME:$TCP_INET\&amp;quot;
condor_chirp set_job_attr TCP_INET $TCP_INET
condor_chirp set_job_attr TCP_INET6 $TCP_INET6
condor_chirp set_job_attr UDP_INET $UDP_INET
condor_chirp set_job_attr UDP_INET6 $UDP_INET6

# While memcached is running, collect and report back stats
while kill -0 %1; do
   # Collect stats and chirp them back into the job ad
   echo stats | nc localhost $TCP_INET | \
    grep -v -e END -e version | tr '\r' '&amp;#092;&amp;#048;' | \
     awk '{print &amp;quot;stat_&amp;quot;$2,$3}' | \
      while read -r stat; do
         condor_chirp set_job_attr $stat
      done
   sleep 30
done
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;A refresher about chirp. Jobs are stored in condor_schedd processes. They are described using the ClassAd language, extensible name value pairs. chirp is a tool a job can use while it runs to modify its classad stored in the schedd.&lt;/p&gt;
&lt;p&gt;The job description, passed to condor_submit, is vanilla except for how arguments are passed to memcached.sh. The dollardollar use, see &lt;a href=&quot;http://research.cs.wisc.edu/condor/manual/v7.6/condor_submit.html&quot;&gt;man condor_submit&lt;/a&gt;, allows memcached to use as much memory as is available on the slot where it gets scheduled. Slots may have different amounts of Memory available.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;memcached.job&lt;/b&gt;&lt;br /&gt;
&lt;pre class=&quot;brush: plain;&quot;&gt;
cmd = memcached.sh
args = -m $$(Memory)

log = memcached.log

kill_sig = SIGTERM

# Want chirp functionality
+WantIOProxy = TRUE

should_transfer_files = if_needed
when_to_transfer_output = on_exit

queue
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;An example, note that the set of memcached servers to use is generated from condor_q,&lt;/p&gt;
&lt;p&gt;&lt;pre class=&quot;brush: plain;&quot;&gt;
$ condor_submit -a &amp;quot;queue 4&amp;quot; memcached.job
Submitting job(s)....
4 job(s) submitted to cluster 80.

$ condor_q -format &amp;quot;%s\t&amp;quot; MemcachedEndpoint -format &amp;quot;total_items: %d\t&amp;quot; stat_total_items -format &amp;quot;memory: %d/&amp;quot; stat_bytes -format &amp;quot;%d\n&amp;quot; stat_limit_maxbytes
eeyore.local:50608	total_items: 0	memory: 0/985661440
eeyore.local:47766	total_items: 0	memory: 0/985661440
eeyore.local:39130	total_items: 0	memory: 0/985661440
eeyore.local:57410	total_items: 0	memory: 0/985661440

$ SERVERS=$(condor_q -format &amp;quot;%s,&amp;quot; MemcachedEndpoint); for word in $(cat words); do echo $word &amp;gt; $word; memcp --servers=$SERVERS $word; \rm $word; done &amp;amp;
[1] 959

$ condor_q -format &amp;quot;%s\t&amp;quot; MemcachedEndpoint -format &amp;quot;total_items: %d\t&amp;quot; stat_total_items -format &amp;quot;memory: %d/&amp;quot; stat_bytes -format &amp;quot;%d\n&amp;quot; stat_limit_maxbytes
eeyore.local:50608	total_items: 480	memory: 47740/985661440
eeyore.local:47766	total_items: 446	memory: 44284/985661440
eeyore.local:39130	total_items: 504	memory: 50140/985661440
eeyore.local:57410	total_items: 490	memory: 48632/985661440

$ condor_q -format &amp;quot;%s\t&amp;quot; MemcachedEndpoint -format &amp;quot;total_items: %d\t&amp;quot; stat_total_items -format &amp;quot;memory: %d/&amp;quot; stat_bytes -format &amp;quot;%d\n&amp;quot; stat_limit_maxbytes
eeyore.local:50608	total_items: 1926	memory: 191264/985661440
eeyore.local:47766	total_items: 1980	memory: 196624/985661440
eeyore.local:39130	total_items: 2059	memory: 204847/985661440
eeyore.local:57410	total_items: 2053	memory: 203885/985661440

$ condor_q -format &amp;quot;%s\t&amp;quot; MemcachedEndpoint -format &amp;quot;total_items: %d\t&amp;quot; stat_total_items -format &amp;quot;memory: %d/&amp;quot; stat_bytes -format &amp;quot;%d\n&amp;quot; stat_limit_maxbytes
eeyore.local:50608	total_items: 3408	memory: 338522/985661440
eeyore.local:47766	total_items: 3542	memory: 351784/985661440
eeyore.local:39130	total_items: 3666	memory: 364552/985661440
eeyore.local:57410	total_items: 3600	memory: 357546/985661440

[1]  + done       for word in $(cat words); do; echo $word &amp;gt; $word; memcp --servers=$SERVERS ; 
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;Enjoy.&lt;/p&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/spinningmatt.wordpress.com/548/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/spinningmatt.wordpress.com/548/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/spinningmatt.wordpress.com/548/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/spinningmatt.wordpress.com/548/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/spinningmatt.wordpress.com/548/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/spinningmatt.wordpress.com/548/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/spinningmatt.wordpress.com/548/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/spinningmatt.wordpress.com/548/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/spinningmatt.wordpress.com/548/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/spinningmatt.wordpress.com/548/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/spinningmatt.wordpress.com/548/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/spinningmatt.wordpress.com/548/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/spinningmatt.wordpress.com/548/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/spinningmatt.wordpress.com/548/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=spinningmatt.wordpress.com&amp;amp;blog=6870579&amp;amp;post=548&amp;amp;subd=spinningmatt&amp;amp;ref=&amp;amp;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Mon, 05 Dec 2011 12:45:59 +0000</pubDate>
</item>
<item>
	<title>OSG Technology Area Rumblings: Details on glexec improvements</title>
	<guid>tag:blogger.com,1999:blog-8803173202887660937.post-2079248913615855270</guid>
	<link>http://osgtech.blogspot.com/2011/12/details-on-glexec-improvements.html</link>
	<description>&lt;a href=&quot;http://osgtech.blogspot.com/2011/11/improving-glexec-enabled-life.html&quot;&gt;My last blog post&lt;/a&gt; gave a quick overview of why &lt;i&gt;glexec&lt;/i&gt; exists, what issues folks run into, and what we did to improve it. &amp;nbsp;Let's go into some details.&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot;&gt;How Condor Update Works&lt;/span&gt;&lt;br /&gt;The &lt;b&gt;lcmaps-plugin-condor-update&lt;/b&gt; package contains the modules necessary to advertise the payload certificate of the last glexec invocation in the pilot's ClassAd. &amp;nbsp;The concept is simple - the implementation is a bit tricky.&lt;br /&gt;&lt;br /&gt;For a long time, Condor has had a command-line tool called &lt;i&gt;condor_advertise&lt;/i&gt;&amp;nbsp;for awhile; it allows an admin to hand-advertise updates to ads in the collector. &amp;nbsp;Unfortunately, that's not quite what we need here: we want to update the &lt;b&gt;job&lt;/b&gt;&amp;nbsp;ad in the &lt;b&gt;schedd&lt;/b&gt;, while condor_advertise typically updates the &lt;b&gt;machine&lt;/b&gt;&amp;nbsp;ad in the &lt;b&gt;collector&lt;/b&gt;. &amp;nbsp;Close, but no cigar.&lt;br /&gt;&lt;br /&gt;There's a lesser-known utility called &lt;i&gt;condor_chirp&lt;/i&gt; that we can use. &amp;nbsp;Typically,  &lt;i&gt;condor_chirp&lt;/i&gt; is used to do I/O between the schedd and the starter (for example, you can pull/push files on demand in the middle of the job), but it can also update the job's ad in the schedd. &amp;nbsp;The syntax is simple:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;condor_chirp ATTR_NAME ATTR_VAL&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;(&lt;a href=&quot;http://spinningmatt.wordpress.com/2011/02/27/service-as-a-job-the-tomcat-app-server/&quot;&gt;look at the clever things Matt does with condor_chirp&lt;/a&gt;). &amp;nbsp;As condor_chirp allows additional access to the schedd, the user must explicitly request it in the job ad. &amp;nbsp;If you want to try it out, you must add the following line into your submit file:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;+WantIOProxy=TRUE&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;To work, chirp must know how to contact the starter and have access to the &quot;magic cookie&quot;; these are located inside the &lt;b&gt;$_CONDOR_SCRATCH_DIR&lt;/b&gt;, as set by Condor in the initial batch process. &amp;nbsp;As the glexec plugin runs as root (glexec must be setuid root to launch a process as a different UID), we must guard against being fooled by the invoking user.&lt;br /&gt;Accordingly, the plugin uses &lt;b&gt;/proc&lt;/b&gt; to read the parentage of the process tree until it finds a process owned by root. &amp;nbsp;If this is not init, it is assumed the process is the condor_starter, and the job's &lt;b&gt;$_CONDOR_SCRATCH_DIR&lt;/b&gt; can be deduced from the &lt;b&gt;$CWD &lt;/b&gt;and the PID of the starter. &amp;nbsp;Since we only rely on information from root-owned processes, we can be fairly sure this is the correct scratch directory. &amp;nbsp;As a further safeguard, before invoking &lt;i&gt;condor_chirp&lt;/i&gt;, the plugin drops privilege to that of the invoking user. &amp;nbsp;Along with the other security guarantees provided by &lt;i&gt;glexec&lt;/i&gt;, we have confidence that we are reading the correct chirp configuration and are not allowing the invoker to increase its privileges.&lt;br /&gt;&lt;br /&gt;Once we know how to invoke &lt;i&gt;condor_chirp&lt;/i&gt;, the rest of the process is all downhill. &amp;nbsp;&lt;i&gt;glexec&lt;/i&gt; internally knows the payload's DN, the payload Unix user, and does the equivalent of the following:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;condor_chirp set_job_attr glexec_user &quot;hcc&quot;&lt;br /&gt;condor_chirp set_job_attr glexec_x509userproxysubject &quot;/DC=org/DC=cilogon/C=US/O=University of Nebraska-Lincoln/CN=Brian Bockelman A621&quot;&lt;br /&gt;condor_chirp set_job_attr glexec_time 1322761868&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;condor_chirp writes the data into the starter, which then updates the shadow, then the schedd (&lt;a href=&quot;https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=JobClassAdFlow&quot;&gt;some of the gory details are covered in the Condor wiki&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;The diagram below illustrates the data flow:&lt;br /&gt;&lt;div class=&quot;separator&quot;&gt;&lt;a href=&quot;http://4.bp.blogspot.com/-EV7fEqSye6s/TtgiO2sLfII/AAAAAAAAAcw/L4Ilyk8k1JM/s1600/glexec-condor-update.png&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://4.bp.blogspot.com/-EV7fEqSye6s/TtgiO2sLfII/AAAAAAAAAcw/L4Ilyk8k1JM/s1600/glexec-condor-update.png&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot;&gt;Putting this into Play&lt;/span&gt;&lt;br /&gt;If you really want to get messy, you can check out the source code from Subversion at:&lt;br /&gt;&lt;pre&gt;svn://t2.unl.edu/brian/lcmaps-plugins-condor-update&lt;/pre&gt;(&lt;a href=&quot;http://t2.unl.edu:8094/browser/lcmaps-plugins-condor-update&quot;&gt;web view&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;The current version of the plugin is 0.0.2. &amp;nbsp;It's &lt;a href=&quot;https://koji-hub.batlab.org/koji/buildinfo?buildID=616&quot;&gt;available in Koji&lt;/a&gt;, or via yum in the osg-development repository:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;yum install --enablerepo=osg-development lcmaps-plugins-condor-update&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;(you must already have the &lt;i&gt;osg-release&lt;/i&gt; RPM installed and &lt;i&gt;glexec&lt;/i&gt; otherwise configured).&lt;br /&gt;&lt;br /&gt;After installing it, you need to update the &lt;b&gt;/etc/lcmaps.db&lt;/b&gt;&amp;nbsp;configuration file on the worker node to invoke the condor-update module. &amp;nbsp;In the top half, I add:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;condor_updates = &quot;lcmaps_condor_update.mod&quot;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Then, I add &lt;i&gt;condor-update&lt;/i&gt; to the &lt;i&gt;glexec&lt;/i&gt; policy:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;glexec:&lt;br /&gt;&lt;br /&gt;verifyproxy -&amp;gt; gumsclient&lt;br /&gt;gumsclient -&amp;gt; condor_updates&lt;br /&gt;condor_updates -&amp;gt; tracking&lt;br /&gt;&lt;/pre&gt;&lt;div&gt;&lt;br /&gt;Note we use the &quot;tracking&quot; module locally; most sites will use the &quot;glexec-tracking&quot; module. &amp;nbsp;Pick the appropriate one.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;Finally, you need to turn on the I/O proxy in the Condor submit file. &amp;nbsp;We do this by editing &lt;b&gt;condor.pm&lt;/b&gt;&amp;nbsp; (for RPMs, located in &lt;b&gt;/usr/lib/perl5/vendor_perl/5.8.8/Globus/GRAM/JobManager/condor.pm&lt;/b&gt;). &amp;nbsp;We add the following line into the &lt;i&gt;submit&lt;/i&gt; routine, right before &lt;b&gt;queue&lt;/b&gt; is added to the script file:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;print SCRIPT_FILE &quot;+WantIOProxy=TRUE\n&quot;;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;All new incoming jobs will get this attribute; any &lt;i&gt;glexec&lt;/i&gt; invocations they do will be reflected at the CE!&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot;&gt;GUMS and Worker Node Certificates&lt;/span&gt;&lt;br /&gt;&lt;div&gt;To map a certificate to a Unix user, &lt;i&gt;glexec&lt;/i&gt; calls out to the GUMS server using XACML with a grid-interoperable profile. &amp;nbsp;In the XACML callout, GUMS is given the payload's DN and VOMS attributes. &amp;nbsp;The same library (LCMAPS/SCAS-client) and protocol can also make callouts directly to SCAS, more commonly used in Europe.&lt;br /&gt;&lt;br /&gt;GUMS is a powerful and flexible authorization tool; one feature is that it allows different mappings based on the originating hostname. &amp;nbsp;For example, if desired, my certificate could map to user &lt;b&gt;hcc&lt;/b&gt;&amp;nbsp;at &lt;i&gt;red.unl.edu&lt;/i&gt; but map to &lt;b&gt;cmsprod&lt;/b&gt; at &lt;i&gt;ff-grid.unl.edu&lt;/i&gt;. &amp;nbsp;To prevent &quot;just anyone&quot; from probing the GUMS server, GUMS requires the client to present X509 a certificate (in this case, the hostcert); it takes the hostname from the client's certificate.&lt;br /&gt;&lt;br /&gt;This has the unfortunate side-effect of requiring a host certificate on every node that invokes GUMS; OK for the CE (100 in the OSG), but not for glexec on the worker nodes (thousands on the OSG).&lt;br /&gt;&lt;br /&gt;When &lt;i&gt;glexec&lt;/i&gt; is invoked in EGI, SCAS is invoked using the pilot certificate for HTTPS and information about the payload certificate in the XACML callout; this requires no worker node host certificate.&lt;br /&gt;&lt;br /&gt;To replicate how &lt;i&gt;glexec&lt;/i&gt; works in EGI, we had to develop a small patch to GUMS. &amp;nbsp;When the pilot certificate is used for authentication, the pilot's DN is recorded to the logs (so we know who is invoking GUMS), but the host name is self-reported in the XACML callout. &amp;nbsp;As the authentication is still performed, we believe this relaxing of the security model is acceptable.&lt;br /&gt;&lt;br /&gt;A &lt;a href=&quot;https://koji-hub.batlab.org/koji/buildinfo?buildID=944&quot;&gt;patched, working version of GUMS&lt;/a&gt; can be found in Koji and is available in the osg-development repository. &amp;nbsp;It will still be a few months before the RPM-based GUMS install is fully documented and released, however.&lt;br /&gt;&lt;br /&gt;Once installed, two changes need to be made at the server:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Do all hostname mappings based on &quot;DN&quot; in the web interface, not the &quot;CN&quot;.&lt;/li&gt;&lt;li&gt;Any group of users (for example, /cms/Role=pilot) that want to invoke GUMS must have &quot;read all&quot; access, not just &quot;read self&quot;.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Further, &lt;b&gt;/etc/lcmaps.db&lt;/b&gt; needs to be changed to &lt;u&gt;remove&lt;/u&gt;&amp;nbsp;the following lines from the gumsclient module:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;pre&gt;&quot;-cert &amp;nbsp; /etc/grid-security/hostcert.pem&quot;&lt;br /&gt;&quot;-key &amp;nbsp; &amp;nbsp;/etc/grid-security/hostkey.pem&quot;&lt;br /&gt;&quot;--cert-owner root&quot;&lt;br /&gt;&lt;/pre&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;This will be all automated going forward - but all should help remove some of the pain in deploying &lt;i&gt;glexec&lt;/i&gt;!&lt;/div&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/8803173202887660937-2079248913615855270?l=osgtech.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Thu, 01 Dec 2011 17:46:58 +0000</pubDate>
	<author>noreply@blogger.com (Brian Bockelman)</author>
</item>
<item>
	<title>Spinning: Custom resource attributes: Facter</title>
	<guid>http://spinningmatt.wordpress.com/?p=531</guid>
	<link>http://spinningmatt.wordpress.com/2011/11/29/custom-resource-attributes-facter/</link>
	<description>&lt;p&gt;Condor provides a large set of attributes, facts, about resources for scheduling and querying, but it does not provide everything possible. Instead, there is a mechanism to extend the set. Previously, we &lt;a href=&quot;https://spinningmatt.wordpress.com/2009/11/17/custom-classad-attributes-in-condor-freememorymb-via-startd_cron/&quot;&gt;added FreeMemoryMB&lt;/a&gt;. The set can also be extend with information from &lt;a href=&quot;http://projects.puppetlabs.com/projects/facter&quot;&gt;Facter&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Facter provides an extensible set of facts about a system. To include facter facts we need a means to translate them into attributes and add to Startd configuration.&lt;/p&gt;
&lt;pre&gt;
$ facter
...
architecture =&amp;gt; x86_64
domain =&amp;gt; local
facterversion =&amp;gt; 1.5.9
hardwareisa =&amp;gt; x86_64
hardwaremodel =&amp;gt; x86_64
physicalprocessorcount =&amp;gt; 1
processor0 =&amp;gt; Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz
selinux =&amp;gt; true
selinux_config_mode =&amp;gt; enforcing
swapfree =&amp;gt; 3.98 GB
swapsize =&amp;gt; 4.00 GB
...
&lt;/pre&gt;
&lt;p&gt;The facts are of the form &lt;i&gt;name =&amp;gt; value&lt;/i&gt;, not very far off from ClassAd attributes. A simple script to convert all the facts into attribute with string values is,&lt;/p&gt;
&lt;p&gt;&lt;b&gt;/usr/libexec/condor/facter.sh&lt;/b&gt;&lt;br /&gt;
&lt;pre class=&quot;brush: bash;&quot;&gt;
#!/bin/sh
type facter &amp;amp;&amp;gt; /dev/null || exit 1
facter | sed 's/\([^ ]*\) =&amp;gt; \(.*\)/facter_\1 = &amp;quot;\2&amp;quot;/'
&lt;/pre&gt;&lt;/p&gt;
&lt;pre&gt;
$ facter.sh
...
facter_architecture = &quot;x86_64&quot;
facter_domain = &quot;local&quot;
facter_facterversion = &quot;1.5.9&quot;
facter_hardwareisa = &quot;x86_64&quot;
facter_hardwaremodel = &quot;x86_64&quot;
facter_physicalprocessorcount = &quot;1&quot;
facter_processor0 = &quot;Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz&quot;
facter_selinux = &quot;true&quot;
facter_selinux_config_mode = &quot;enforcing&quot;
facter_swapfree = &quot;3.98 GB&quot;
facter_swapsize = &quot;4.00 GB&quot;
...
&lt;/pre&gt;
&lt;p&gt;And the configuration, simply dropped into /etc/condor/config.d,&lt;/p&gt;
&lt;p&gt;&lt;b&gt;/etc/condor/config.d/49facter.config&lt;/b&gt;&lt;br /&gt;
&lt;pre class=&quot;brush: plain;&quot;&gt;
FACTER = /usr/libexec/condor/facter.sh
STARTD_CRON_JOBLIST = $(STARTD_CRON_JOBLIST) FACTER
STARTD_CRON_FACTER_EXECUTABLE = $(FACTER)
STARTD_CRON_FACTER_PERIOD = 300
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;A &lt;code&gt;condor_reconfig&lt;/code&gt; and the facter facts will be available,&lt;/p&gt;
&lt;pre&gt;
$ condor_status -long | grep ^facter
...
facter_architecture = &quot;x86_64&quot;
facter_facterversion = &quot;1.5.9&quot;
facter_domain = &quot;local&quot;
facter_swapfree = &quot;3.98 GB&quot;
facter_selinux = &quot;true&quot;
facter_hardwaremodel = &quot;x86_64&quot;
facter_selinux_config_mode = &quot;enforcing&quot;
facter_processor0 = &quot;Intel(R) Core(TM) i7 CPU       M 620  @ 2.67GHz&quot;
facter_selinux_mode = &quot;targeted&quot;
facter_hardwareisa = &quot;x86_64&quot;
facter_swapsize = &quot;4.00 GB&quot;
facter_physicalprocessorcount = &quot;1&quot;
...
&lt;/pre&gt;
&lt;p&gt;For scheduling, just use the facter information in job requierments, e.g. &lt;code&gt;requirements = facter_selinux == &quot;true&quot;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Or, query your pool to see what resources are not running selinux,&lt;/p&gt;
&lt;pre&gt;
$ condor_status -const 'facter_selinux == &quot;false&quot;'
Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime
eeyore.local       LINUX      X86_64 Unclaimed Idle     0.030  3760  0+00:12:31
                     Machines Owner Claimed Unclaimed Matched Preempting
        X86_64/LINUX        1     0       0         1       0          0
               Total        1     0       0         1       0          0
&lt;/pre&gt;
&lt;p&gt;&lt;a href=&quot;https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2667&quot;&gt;Oops.&lt;/a&gt;&lt;/p&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/spinningmatt.wordpress.com/531/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/spinningmatt.wordpress.com/531/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/spinningmatt.wordpress.com/531/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/spinningmatt.wordpress.com/531/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/spinningmatt.wordpress.com/531/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/spinningmatt.wordpress.com/531/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/spinningmatt.wordpress.com/531/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/spinningmatt.wordpress.com/531/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/spinningmatt.wordpress.com/531/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/spinningmatt.wordpress.com/531/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/spinningmatt.wordpress.com/531/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/spinningmatt.wordpress.com/531/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/spinningmatt.wordpress.com/531/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/spinningmatt.wordpress.com/531/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=spinningmatt.wordpress.com&amp;amp;blog=6870579&amp;amp;post=531&amp;amp;subd=spinningmatt&amp;amp;ref=&amp;amp;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Tue, 29 Nov 2011 11:54:36 +0000</pubDate>
</item>
<item>
	<title>Inside OSG Ops.: GOC holiday schedule</title>
	<guid>tag:blogger.com,1999:blog-7506259688180433777.post-5019084569521371803</guid>
	<link>http://insideosgops.blogspot.com/2011/11/goc-holiday-schedule.html</link>
	<description>From 24/Nov through 27/Nov the GOC will be operating on a Holiday&lt;br /&gt;schedule. Staff will be available to respond to emergencies but&lt;br /&gt;routine operations will resume at start of business Monday 28/Nov.&lt;br /&gt;&lt;br /&gt;The GOC wishes its users and OSG staff a happy and satisfying&lt;br /&gt;Thanksgiving Holiday.&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/7506259688180433777-5019084569521371803?l=insideosgops.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Mon, 21 Nov 2011 06:06:21 +0000</pubDate>
	<author>noreply@blogger.com (scott)</author>
</item>
<item>
	<title>OSG Technology Area Rumblings: Improving the glexec-enabled life</title>
	<guid>tag:blogger.com,1999:blog-8803173202887660937.post-5769556833312037616</guid>
	<link>http://osgtech.blogspot.com/2011/11/improving-glexec-enabled-life.html</link>
	<description>&lt;div class=&quot;p1&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class=&quot;p1&quot;&gt;Pilot-based workflow management systems have had a dramatic transformation of how we view the grid today.&amp;nbsp; Instead of queueing a job (the &quot;payload&quot;) in a workflow onto a site on a grid, these systems send an &quot;empty&quot; job that starts up, then downloads and starts the payload from from a central endpoint.&amp;nbsp; In CS terms, it switches from a model of &quot;work delegation&quot; to &quot;resource allocation&quot;.&amp;nbsp; By allocating the resource (i.e., starting the pilot job) prior to delegating work, users no longer have to know the vagaries/failure modes of direct grid submission and don't have to pay the price of sending their payloads to a busy site!&lt;/div&gt;&lt;div class=&quot;p2&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class=&quot;p1&quot;&gt;In short, pilot jobs make the grid much better.&lt;/div&gt;&lt;div class=&quot;p2&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class=&quot;p1&quot;&gt;However, like most concepts, pilot jobs are a trade-off: they make life easier for users, but harder for security folks and sysadmins.&amp;nbsp; Pilots are sent using one certificate, but payloads are run under a different identity. &amp;nbsp;If the payload job wants to act on behalf of the user, it needs to bring the user's grid credentials to the worker node. &amp;nbsp;[Side note: this is actually an interesting assumption. &amp;nbsp;The &lt;a href=&quot;http://iopscience.iop.org/1742-6596/119/6/062036&quot;&gt;PanDA pilot system&lt;/a&gt;, heavily utilized by ATLAS, does not bring credentials to the worker node. &amp;nbsp;This simplifies this problem, but opens up a different set of concerns.] &amp;nbsp;If both pilot and payload are run as the same Unix user, the payload user can easily access the credentials (including the pilot credentials), executables, and output data of other running payloads.&lt;/div&gt;&lt;div class=&quot;p2&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class=&quot;p1&quot;&gt;The program &lt;a href=&quot;https://www.nikhef.nl/pub/projects/grid/gridwiki/index.php/GLExec&quot;&gt;glexec&lt;/a&gt; is a &quot;simple&quot; idea to solve this problem: given a set of grid credentials, launch a process under corresponding the Unix account at the site. &amp;nbsp;For example, with credentials from the HCC VO:&lt;/div&gt;&lt;div class=&quot;p1&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class=&quot;p1&quot;&gt;&lt;pre&gt;[bbockelm@brian-test ~]$ whoami&lt;br /&gt;bbockelm&lt;br /&gt;[bbockelm@brian-test ~]$ GLEXEC_CLIENT_CERT=/tmp/x509up_u1221 /usr/sbin/glexec &lt;br /&gt;/usr/bin/whoami&lt;br /&gt;hcc&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&quot;p2&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class=&quot;p2&quot;&gt;(You'll notice the invocation is not as simple as typing &quot;glexec whoami&quot;; it's not exactly designed for end-user invocation). &amp;nbsp;To achieve the user switching, glexec has to be &lt;a href=&quot;http://en.wikipedia.org/wiki/Setuid&quot;&gt;setuid&lt;/a&gt; root. &amp;nbsp;Setuid binaries must be examined under a security microscope, which have unfortunately led to a slow adoption of glexec.&lt;/div&gt;&lt;div class=&quot;p2&quot;&gt;&lt;br /&gt;The idea is that pilot jobs would wrap the payload with a call to &quot;glexec&quot;, separating the payload from the pilot and other payloads. &amp;nbsp;From there, it goes horribly wrong.&amp;nbsp; Not wrong really - but rather things get sticky.&lt;br /&gt;&lt;br /&gt;Since the pilot and payload are both low-privileged users, the pilot doesn't have permission to clean up or kill the payload. &amp;nbsp;It must again use glexec to send signals and delete sandboxes. &amp;nbsp;The several invocations are easy to screw up (and place load on the authorization system!). &amp;nbsp;There are tricky error conditions - if authorization breaks in the middle of the job, how does the pilot clean up the payload?&lt;br /&gt;&lt;br /&gt;As the payload is a full-fledged Linux process, it can create other processes, daemonize, escape from the batch system, etc. &amp;nbsp;As &lt;a href=&quot;http://osgtech.blogspot.com/2011/06/how-your-batch-system-watches-your.html&quot;&gt;previously&lt;/a&gt; &lt;a href=&quot;http://osgtech.blogspot.com/2011/06/part-ii-keeping-mindful-eye-on-your.html&quot;&gt;discussed&lt;/a&gt;, the batch system - with root access - typically does a poor job tracking processes. &amp;nbsp;The pilot will be hopeless unless we provide some assistance.&lt;br /&gt;&lt;br /&gt;Glexec imposes an integration difficulty at some sites. &amp;nbsp;There are popular cron scripts that kill process belonging to users on a node that aren't currently running batch system jobs. &amp;nbsp;So, if the pilot maps to &quot;cms&quot; and the payload maps to &quot;cmsuser&quot;, the batch system only knows about &quot;cms&quot;, and the cronjob will kill all processes belonging to &quot;cmsuser&quot;. &amp;nbsp;We lost quite a few jobs at some sites before we figured this out!&lt;br /&gt;&lt;br /&gt;Site admins manage the cluster via the batch system. &amp;nbsp;Since the payload is invisible to the batch system, we're unable to kill jobs from a user with batch system tools (condor_rm, qdel). &amp;nbsp;In fact, if we get an email from a user asking for help understanding their jobs, we can't even easily find where the job is running! &amp;nbsp;Site admins have to ssh into each worker node and examine the running jobs; a process that is simply medieval.&lt;br /&gt;&lt;br /&gt;Finally, on the OSG, invoking the authorization system requires host certificate credentials. &amp;nbsp;This is not a problem when host certs are needed for a handful of CEs at the site, but explodes when glexec is run on each worker node. &amp;nbsp;This is a piece of unique state on the worker nodes for sites to manage, adding to the glexec headache.&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot;&gt;We're the Government. &amp;nbsp;We're here to help.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The OSG Technology group has decided to tackle the three biggest site-admin usability issues in glexec:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;b&gt;Batch system integration&lt;/b&gt;: The Condor batch system provides the ability for running jobs to update the submit node with arbitrary status. &amp;nbsp;We have developed a plugin that updates the job's ClassAd with the payload's DN whenever glexec is invoked.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Process tracking&lt;/b&gt;: There is an existing glexec plugin to do process tracking. &amp;nbsp;However, this requires a admin to set up secondary GID ranges (an administration headache) and suffers the previously-documented process tracking issues. &amp;nbsp;We will port the ProcPolice daemon over to the glexec plugin framework.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Worker node certificates&lt;/b&gt;: We propose to fix this via improvements to GUMS, allowing the mappings to be performed based on the presence of &quot;Role=pilot&quot; VOMS extension in the pilot certificate.&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;The plugins in (1) and (2) have been prototyped, and are available in the osg-development repository as &quot;lcmaps-plugins-condor-update&quot; and &quot;lcmaps-plugins-process-tracking&quot;, respectively. &amp;nbsp;The third item is currently cooking.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The &quot;lcmaps-plugins-condor-update&quot; is especially useful, as it's a brand-new capability as opposed to an improvement. &amp;nbsp;It &amp;nbsp;advertises three attributes in the job's ClassAd:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;glexec_x509userproxysubject&lt;/b&gt;: The DN of the payload user.&lt;/li&gt;&lt;li&gt;&lt;b&gt;glexec_user&lt;/b&gt;: The Unix username for the payload.&lt;/li&gt;&lt;li&gt;&lt;b&gt;glexec_time&lt;/b&gt;: The Unix time when glexec was invoked.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;We can then use it to filter and locate jobs. &amp;nbsp;For example, if a user named Ian complains his jobs are running slowly, we could locate a few with the following command:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;pre&gt;[bbockelm@t3-sl5 ~]$ condor_q -g -const 'regexp(&quot;Ian&quot;, glexec_x509userproxysubject)' -format '%s ' ClusterId -format '%s\n' RemoteHost | head&lt;br /&gt;868341 slot6@red-d11n10.red.hcc.unl.edu&lt;br /&gt;868343 slot7@node238.red.hcc.unl.edu&lt;br /&gt;868358 slot6@red-d11n9.red.hcc.unl.edu&lt;br /&gt;868366 slot2@node239.red.hcc.unl.edu&lt;br /&gt;868373 slot3@node119.red.hcc.unl.edu&lt;br /&gt;868741 slot8@red-d9n6.red.hcc.unl.edu&lt;br /&gt;868770 slot3@red-d9n8.red.hcc.unl.edu&lt;br /&gt;868819 slot5@node109.red.hcc.unl.edu&lt;br /&gt;868820 slot4@node246.red.hcc.unl.edu&lt;br /&gt;868849 slot2@red-d11n6.red.hcc.unl.edu&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;Slick!&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/8803173202887660937-5769556833312037616?l=osgtech.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Fri, 11 Nov 2011 08:32:28 +0000</pubDate>
	<author>noreply@blogger.com (Brian Bockelman)</author>
</item>
<item>
	<title>Spinning: Getting started: Condor and EC2  EC2 execute node</title>
	<guid>http://spinningmatt.wordpress.com/?p=505</guid>
	<link>http://spinningmatt.wordpress.com/2011/11/10/getting-started-condor-and-ec2-ec2-execute-node/</link>
	<description>&lt;p&gt;We have been over &lt;a href=&quot;http://spinningmatt.wordpress.com/2011/10/31/getting-started-condor-and-ec2-starting-and-managing-instances/&quot;&gt;starting and managing instances&lt;/a&gt; from Condor, using &lt;a href=&quot;http://spinningmatt.wordpress.com/2011/11/02/getting-started-condor-and-ec2-condor_ec2_q-tool/&quot;&gt;condor_ec2_q&lt;/a&gt; to help, and &lt;a href=&quot;http://spinningmatt.wordpress.com/2011/11/07/getting-started-condor-and-ec2-importing-instances-with-condor_ec2_link/&quot;&gt;importing existing instances&lt;/a&gt;. Here we will cover extending an existing pool using execute nodes run from EC2 instances. We will start with an existing pool, create an EC2 instance, configure the instance to run condor, authorize the instance to join the existing pool, and run a job.&lt;/p&gt;
&lt;p&gt;Let us pretend that the node running your existing pool&amp;#8217;s condor_collector and condor_schedd is called &lt;i&gt;condor.condorproject.org&lt;/i&gt;.&lt;/p&gt;
&lt;p&gt;These instructions will require bi-directional connectivity between condor.condorproject.org and your EC2 instance. condor.condorproject.org must be connected to the internet with a publically routable address. Also, ports must be open in its firewall for the Collector and Schedd. The EC2 execute nodes have to be able to connect to condor.condorproject.org to talk to the condor_collector and condor_schedd. It cannot be behind a NAT or firewall. Okay, let&amp;#8217;s start.&lt;/p&gt;
&lt;p&gt;I am going to use ami-60bd4609, a publically available Fedora 15 AMI. You can either start the instance via the AWS console, or submit it by &lt;a href=&quot;http://spinningmatt.wordpress.com/2011/10/31/getting-started-condor-and-ec2-starting-and-managing-instances/&quot;&gt;following previous instructions&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Once the instance is up and running, login and &lt;code&gt;sudo yum install condor&lt;/code&gt;. Note, until &lt;a href=&quot;https://bugzilla.redhat.com/show_bug.cgi?id=656562&quot;&gt;BZ656562&lt;/a&gt; is resolved, you will have to &lt;code&gt;sudo mkdir /var/run/condor; sudo chown condor.condor /var/run/condor&lt;/code&gt; before starting condor. Start condor with &lt;code&gt;sudo service condor start&lt;/code&gt; to get a &lt;a href=&quot;http://spinningmatt.wordpress.com/2010/07/26/getting-started-installing-a-single-node-condor-pool/&quot;&gt;personal condor&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Configuring condor on the instance is very similar to &lt;a href=&quot;http://spinningmatt.wordpress.com/2011/06/12/getting-started-creating-a-multiple-node-condor-pool/&quot;&gt;creating a multiple node pool&lt;/a&gt;. You will need to set the &lt;code&gt;CONDOR_HOST&lt;/code&gt;, &lt;code&gt;ALLOW_WRITE&lt;/code&gt;, and &lt;code&gt;DAEMON_LIST&lt;/code&gt;,&lt;/p&gt;
&lt;pre&gt;
# cat &amp;gt; /etc/condor/config.d/40execute_node.config
CONDOR_HOST = condor.condorproject.org
DAEMON_LIST = MASTER, STARTD
ALLOW_WRITE = $(ALLOW_WRITE), $(CONDOR_HOST)
^D
&lt;/pre&gt;
&lt;p&gt;If you do not give condor.condorproject.org WRITE permissions, the Schedd will fail to start jobs. StartLog will report,&lt;/p&gt;
&lt;pre&gt;
PERMISSION DENIED to unauthenticated@unmapped from host 128.105.291.82 for command 442 (REQUEST_CLAIM), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 128.105.291.82,condor.condorproject.org, hostname size = 1, original ip address = 128.105.291.82
&lt;/pre&gt;
&lt;p&gt;Now remember, we need bi-directional connectivity. So condor.condorproject.org must be able to connect to the EC2 instance&amp;#8217;s Startd. The condor_start will listen on an ephemeral port by default. You could restrict it to a port range or use &lt;a href=&quot;http://spinningmatt.wordpress.com/2011/06/21/getting-started-multiple-node-condor-pool-with-firewalls/&quot;&gt;condor_shared_port&lt;/a&gt;. For simplicity, just force a non-ephemeral port of 3131,&lt;/p&gt;
&lt;pre&gt;
# echo &quot;STARTD_ARGS = -p 3131&quot; &amp;gt;&amp;gt; /etc/condor/config.d/40execute_node.config
&lt;/pre&gt;
&lt;p&gt;You can now open TCP port 3131 in the instance&amp;#8217;s iptables firewall. If you are using the Fedora 15 AMI, the firewall is off by default and needs no adjustment. Additionally, the security group on the instance needs to have TCP port 3131 authorized. Use the AWS Console or &lt;code&gt;ec2-authorize GROUP -p 3131&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If you miss either of these steps, the Schedd will fail to start jobs on the instance, likely with a message similar to,&lt;/p&gt;
&lt;pre&gt;
Failed to send REQUEST_CLAIM to startd ec2-174-129-47-20.compute-1.amazonaws.com &amp;lt;174.129.47.20:3131&amp;gt;#1220911452#1#... for matt: SECMAN:2003:TCP connection to startd ec2-174-129-47-20.compute-1.amazonaws.com &amp;lt;174.129.47.20:3131&amp;gt;#1220911452#1#... for matt failed.
&lt;/pre&gt;
&lt;p&gt;A quick &lt;code&gt;service condor restart&lt;/code&gt; on the instance, and a &lt;code&gt;condor_status&lt;/code&gt; on condor.condorproject.org would hopefully show the instance joined the pool. Except the instance has not been authorized yet. In fact, the CollectorLog will probably report,&lt;/p&gt;
&lt;pre&gt;
PERMISSION DENIED to unauthenticated@unmapped from host 174.129.47.20 for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD: reason: ADVERTISE_STARTD authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 174.129.47.20,ec2-174-129-47-20.compute-1.amazonaws.com
PERMISSION DENIED to unauthenticated@unmapped from host 174.129.47.20 for command 2 (UPDATE_MASTER_AD), access level ADVERTISE_MASTER: reason: ADVERTISE_MASTER authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 174.129.47.20,ec2-174-129-47-20.compute-1.amazonaws.com
&lt;/pre&gt;
&lt;p&gt;The instance needs to be authorized to advertise itself into the Collector. A good way to do that is to add, &lt;/p&gt;
&lt;pre&gt;
ALLOW_ADVERTISE_MASTER = $(ALLOW_WRITE), ec2-174-129-47-20.compute-1.amazonaws.com
ALLOW_ADVERTISE_STARTD = $(ALLOW_WRITE), ec2-174-129-47-20.compute-1.amazonaws.com
&lt;/pre&gt;
&lt;p&gt;to condor.condorproject.org&amp;#8217;s configuration and reconfig with &lt;code&gt;condor_reconfig&lt;/code&gt;. A note here, ALLOW_WRITE is added in because I am assuming you are following previous instructions. If you have ALLOW_ADVERTISE_MASTER/STARTD already configured, you should append to them instead. Also, appending for each new instance will get tedious. You could be very trusting and allow *.amazonaws.com, but it is better to use SSL or PASSWORD authentication. I will describe that some other time.&lt;/p&gt;
&lt;p&gt;After the reconfig, the instance will eventually show up in a condor_status listing.&lt;/p&gt;
&lt;pre&gt;
$ condor_status
Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime
localhost.localdom LINUX      INTEL  Unclaimed Benchmar 0.430  1666  0+00:00:04
&lt;/pre&gt;
&lt;p&gt;The name is not very helpful, but also not a problem.&lt;/p&gt;
&lt;p&gt;It is time to submit a job.&lt;/p&gt;
&lt;pre&gt;
$ condor_submit
Submitting job(s)
cmd = /bin/sleep
args = 1d
should_transfer_files = IF_NEEDED
when_to_transfer_output = ON_EXIT
queue
.^D
1 job(s) submitted to cluster 14.

$ condor_q
-- Submitter: condor.condorproject.org : &amp;lt;128.105.291.82:36900&amp;gt; : condor.condorproject.org
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
  31.0   matt           11/11 11:11   0+00:00:00 I  0   0.0  sleep 1d
1 jobs; 1 idle, 0 running, 0 held
&lt;/pre&gt;
&lt;p&gt;The job will stay idle forever, which is no good. The problem can be found in the SchedLog,&lt;/p&gt;
&lt;pre&gt;
Enqueued contactStartd startd=&amp;lt;10.72.55.105:3131&amp;gt;
In checkContactQueue(), args = 0x9705798, host=&amp;lt;10.72.55.105:3131&amp;gt;
Requesting claim localhost.localdomain &amp;lt;10.72.55.105:3131&amp;gt;#1220743035#1#... for matt 31.0
attempt to connect to &amp;lt;10.72.55.105:3131&amp;gt; failed: Connection timed out (connect errno = 110).  Will keep trying for 45 total seconds (24 to go).
&lt;/pre&gt;
&lt;p&gt;The root cause is that the instance has two internet addresses. A private one, which is not routable from condor.condorproject.org, that it is advertising,&lt;/p&gt;
&lt;pre&gt;
$ condor_status -format &quot;%s, &quot; Name -format &quot;%s\n&quot; MyAddress
localhost.localdomain, &amp;lt;10.72.55.105:3131&amp;gt;
&lt;/pre&gt;
&lt;p&gt;And a public one, which can be found from within the instance,&lt;/p&gt;
&lt;pre&gt;
$ curl -f http://169.254.169.254/latest/meta-data/public-ipv4
174.129.47.20
&lt;/pre&gt;
&lt;p&gt;Condor has a way to handle this. The &lt;code&gt;TCP_FORWARDING_HOST&lt;/code&gt; configuration parameter can be set to the public address for the instance.&lt;/p&gt;
&lt;pre&gt;
# echo &quot;TCP_FORWARDING_HOST = $(curl -f http://169.254.169.254/latest/meta-data/public-ipv4)&quot; &amp;gt;&amp;gt; /etc/condor/config.d/40execute_node.config
&lt;/pre&gt;
&lt;p&gt;A &lt;code&gt;condor_reconfig&lt;/code&gt; will apply the change, but a restart will clear out the old entry first. &lt;a href=&quot;https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2628&quot;&gt;Oops&lt;/a&gt;. Note, you cannot set TCP_FORWARDING_HOST to the public-hostname of the instance, because the public hostname will be revolved within the instance and will resolve to the instance&amp;#8217;s internal, private address.&lt;/p&gt;
&lt;p&gt;When setting TCP_FORWARDING_HOST, also set PRIVATE_NETWORK_INTERFACE to let the host talk to itself over its private address.&lt;/p&gt;
&lt;pre&gt;
# echo &quot;PRIVATE_NETWORK_INTERFACE = $(curl -f http://169.254.169.254/latest/meta-data/local-ipv4)&quot; &amp;gt;&amp;gt; /etc/condor/config.d/40execute_node.config
&lt;/pre&gt;
&lt;p&gt;Doing so will prevent the condor_startd from using its public address to send &lt;a href=&quot;http://spinningmatt.wordpress.com/2009/10/21/condor_master-for-managing-processes/&quot;&gt;DC_CHILDALIVE&lt;/a&gt; messages to the condor_master, which might fail because of a firewall or security group setting,&lt;/p&gt;
&lt;pre&gt;
attempt to connect to &amp;lt;174.129.47.20:34550&amp;gt; failed: Connection timed out (connect errno = 110).  Will keep trying for 390 total seconds (200 to go).
attempt to connect to &amp;lt;174.129.47.20:34550&amp;gt; failed: Connection timed out (connect errno = 110).
ChildAliveMsg: failed to send DC_CHILDALIVE to parent daemon at &amp;lt;174.129.47.20:34550&amp;gt; (try 1 of 3): CEDAR:6001:Failed to connect to &amp;lt;174.129.47.20:34550&amp;gt;
&lt;/pre&gt;
&lt;p&gt;Or if simply because the master does not trust the public address,&lt;/p&gt;
&lt;pre&gt;
PERMISSION DENIED to unauthenticated@unmapped from host 174.129.47.20 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 174.129.47.20,ec2-174-129-47-20.compute-1.amazonaws.com, hostname size = 1, original ip address = 174.129.47.20
&lt;/pre&gt;
&lt;p&gt;Now run that &lt;code&gt;service condor restart&lt;/code&gt; and the public, routable address will be advertised,&lt;/p&gt;
&lt;pre&gt;
$ condor_status -format &quot;%s, &quot; Name -format &quot;%s\n&quot; MyAddress
localhost.localdomain, &amp;lt;174.129.47.20:3131?noUDP&amp;gt;
&lt;/pre&gt;
&lt;p&gt;The job will be started on the instance automatically,&lt;/p&gt;
&lt;pre&gt;
$ condor_q -run
-- Submitter: condor.condorproject.org : &amp;lt;128.105.291.82:36900&amp;gt; : condor.condorproject.org
 ID      OWNER            SUBMITTED     RUN_TIME HOST(S)
  31.0   matt           11/11 11:11   0+00:00:11 localhost.localdomain
&lt;/pre&gt;
&lt;p&gt;If you want to clean up the &lt;i&gt;localhost.localdomain&lt;/i&gt;, set the instance&amp;#8217;s hostname and restart condor,&lt;/p&gt;
&lt;pre&gt;
$ sudo hostname $(curl -f http://169.254.169.254/latest/meta-data/public-hostname)
$ sudo service condor restart
(wait for the start to advertise)
$ condor_status -format &quot;%s, &quot; Name -format &quot;%s\n&quot; MyAddress
ec2-174-129-47-20.compute-1.amazonaws.com, &amp;lt;174.129.47.20:3131?noUDP&amp;gt;
&lt;/pre&gt;
&lt;p&gt;In summary,&lt;/p&gt;
&lt;p&gt;Configuration changes on condor.condorproject.org,&lt;/p&gt;
&lt;pre&gt;
ALLOW_ADVERTISE_MASTER = $(ALLOW_WRITE), ec2-174-129-47-20.compute-1.amazonaws.com
ALLOW_ADVERTISE_STARTD = $(ALLOW_WRITE), ec2-174-129-47-20.compute-1.amazonaws.com
&lt;/pre&gt;
&lt;p&gt;Setup on the instance,&lt;/p&gt;
&lt;pre&gt;
# cat &amp;gt; /etc/condor/config.d/40execute_node.config
CONDOR_HOST = condor.condorproject.org
DAEMON_LIST = MASTER, STARTD
ALLOW_WRITE = $(ALLOW_WRITE), $(CONDOR_HOST)
STARTD_ARGS = -p 3131
^D
# echo &quot;TCP_FORWARDING_HOST = $(curl -f http://169.254.169.254/latest/meta-data/public-ipv4)&quot; &amp;gt;&amp;gt; /etc/condor/config.d/40execute_node.config
# echo &quot;PRIVATE_NETWORK_INTERFACE = $(curl -f http://169.254.169.254/latest/meta-data/local-ipv4)&quot; &amp;gt;&amp;gt; /etc/condor/config.d/40execute_node.config
# hostname $(curl -f http://169.254.169.254/latest/meta-data/public-hostname)
&lt;/pre&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/spinningmatt.wordpress.com/505/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/spinningmatt.wordpress.com/505/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/spinningmatt.wordpress.com/505/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/spinningmatt.wordpress.com/505/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/spinningmatt.wordpress.com/505/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/spinningmatt.wordpress.com/505/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/spinningmatt.wordpress.com/505/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/spinningmatt.wordpress.com/505/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/spinningmatt.wordpress.com/505/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/spinningmatt.wordpress.com/505/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/spinningmatt.wordpress.com/505/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/spinningmatt.wordpress.com/505/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/spinningmatt.wordpress.com/505/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/spinningmatt.wordpress.com/505/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=spinningmatt.wordpress.com&amp;amp;blog=6870579&amp;amp;post=505&amp;amp;subd=spinningmatt&amp;amp;ref=&amp;amp;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Thu, 10 Nov 2011 05:36:39 +0000</pubDate>
</item>
<item>
	<title>Spinning: Getting started: Condor and EC2  Importing instances with condor_ec2_link</title>
	<guid>http://spinningmatt.wordpress.com/?p=496</guid>
	<link>http://spinningmatt.wordpress.com/2011/11/07/getting-started-condor-and-ec2-importing-instances-with-condor_ec2_link/</link>
	<description>&lt;p&gt;&lt;a href=&quot;http://spinningmatt.wordpress.com/2011/10/31/getting-started-condor-and-ec2-starting-and-managing-instances/&quot;&gt;Starting and managing instances&lt;/a&gt; describes the powerful feature of Condor to start and manage EC2 instances, but what if you are already using something other than Condor to start your instance, such as the &lt;a href=&quot;http://aws.amazon.com/console/&quot;&gt;AWS Management Console&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Importing instances turns out to be straightforward, if you know how instances are started. In a nutshell, the condor_gridmanager executes a state machine and records its current state in an attribute named GridJobId. To import an instance, submit a job that is already in the state where an instance id has been assigned. You can take &lt;a href=&quot;http://spinningmatt.wordpress.com/2011/10/31/getting-started-condor-and-ec2-starting-and-managing-instances/&quot;&gt;a submit file&lt;/a&gt; and add &lt;b&gt;+GridJobId = &amp;#8220;ec2 https://ec2.amazonaws.com/ BOGUS &lt;i&gt;INSTANCE-ID&lt;/i&gt;&amp;#8220;&lt;/b&gt;. The INSTANCE-ID needs to be the actual identifier of the instance you want to import. For instance,&lt;/p&gt;
&lt;pre&gt;
...
ec2_access_key_id = ...
ec2_secret_access_key = ...
...
+GridJobId = &quot;ec2 https://ec2.amazonaws.com/ BOGUS i-319c3652&quot;
queue
&lt;/pre&gt;
&lt;p&gt;It is important to get the &lt;i&gt;ec2_access_key_id&lt;/i&gt; and &lt;i&gt;ec2_secret_access_key&lt;/i&gt; correct. Without them Condor will not be able to communicate with EC2 and EC2_GAHP_LOG will report,&lt;/p&gt;
&lt;pre&gt;
$ tail -n2 $(condor_config_val EC2_GAHP_LOG)
11/11/11 11:11:11 Failure response text was '
&lt;code&gt;AuthFailure&lt;/code&gt;AWS was not able to validate the provided access credentialsab50f005-6d77-4653-9cec-298b2d475f6e'.
&lt;/pre&gt;
&lt;p&gt;This error will not be reported back into the job, putting it on hold, instead the gridmanager will think the EC2 is down for the job. &lt;a href=&quot;https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2274&quot;&gt;Oops&lt;/a&gt;.&lt;/p&gt;
&lt;pre&gt;
$ grep down $(condor_config_val GRIDMANAGER_LOG)
11/11/11 11:11:11 [10697] resource https://ec2.amazonaws.com is now down
11/11/11 11:14:22 [10697] resource https://ec2.amazonaws.com is still down
&lt;/pre&gt;
&lt;p&gt;To simplify the import, here is a script that will use ec2-describe-instances to get useful metadata about the instance and populate a submit file for you,&lt;/p&gt;
&lt;p&gt;&lt;b&gt;condor_ec2_link&lt;/b&gt;&lt;br /&gt;
&lt;pre class=&quot;brush: bash;&quot;&gt;
#!/bin/sh

# Provide three arguments:
#  . instance id to link
#  . path to file with access key id
#  . path to file with secret access key

# TODO:
#  . Get EC2UserData (ec2-describe-instance-attribute --user-data)

ec2-describe-instances --show-empty-fields $1 | \
   awk '/^INSTANCE/ {id=$2; ami=$3; keypair=$7; type=$10; zone=$12; ip=$17; group=$29}
        /^TAG/ {name=$5}
        END {print &amp;quot;universe = grid\n&amp;quot;,
                   &amp;quot;grid_resource = ec2 https://ec2.amazonaws.com\n&amp;quot;,
                   &amp;quot;executable =&amp;quot;, ami&amp;quot;-&amp;quot;name, &amp;quot;\n&amp;quot;,
                   &amp;quot;log = $(executable).$(cluster).log\n&amp;quot;,
                   &amp;quot;ec2_ami_id =&amp;quot;, ami, &amp;quot;\n&amp;quot;,
                   &amp;quot;ec2_instance_type =&amp;quot;, type, &amp;quot;\n&amp;quot;,
                   &amp;quot;ec2_keypair_file = name-&amp;quot;keypair, &amp;quot;\n&amp;quot;,
                   &amp;quot;ec2_security_groups =&amp;quot;, group, &amp;quot;\n&amp;quot;,
                   &amp;quot;ec2_availability_zone =&amp;quot;, zone, &amp;quot;\n&amp;quot;,
                   &amp;quot;ec2_elastic_ip =&amp;quot;, ip, &amp;quot;\n&amp;quot;,
                   &amp;quot;+EC2InstanceName = \&amp;quot;&amp;quot;id&amp;quot;\&amp;quot;\n&amp;quot;,
                   &amp;quot;+GridJobId = \&amp;quot;$(grid_resource) BOGUS&amp;quot;, id, &amp;quot;\&amp;quot;\n&amp;quot;,
                   &amp;quot;queue\n&amp;quot;}' | \
      condor_submit -a &amp;quot;ec2_access_key_id = $2&amp;quot; \
                    -a &amp;quot;ec2_secret_access_key = $3&amp;quot;
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;In action,&lt;/p&gt;
&lt;pre&gt;
$ ./condor_ec2_link i-319c3652 /home/matt/Documents/AWS/Cert/AccessKeyID /home/matt/Documents/AWS/Cert/SecretAccessKey
Submitting job(s).
1 job(s) submitted to cluster 1739.

$ ./condor_ec2_q 1739
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
1739.0   matt           11/11 11:11   0+00:00:00 I  0   0.0 ami-e1f53a88-TheNa
  Instance name: i-319c3652
  Groups: sg-4f706226
  Keypair file: /home/matt/Documents/AWS/name-TheKeyPair
  AMI id: ami-e1f53a88
  Instance type: t1.micro
1 jobs; 1 idle, 0 running, 0 held

(20 seconds later)

$ ./condor_ec2_q 1739
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
1739.0   matt           11/11 11:11   0+00:00:01 R  0   0.0 ami-e1f53a88-TheNa
  Instance name: i-319c3652
  Hostname: ec2-50-17-104-50.compute-1.amazonaws.com
  Groups: sg-4f706226
  Keypair file: /home/matt/Documents/AWS/name-TheKeyPair
  AMI id: ami-e1f53a88
  Instance type: t1.micro
1 jobs; 0 idle, 1 running, 0 held
&lt;/pre&gt;
&lt;p&gt;There are a few things that can be improved here, the most notable of which is the RUN_TIME. The Gridmanager gets status data from EC2 periodically. This is how the EC2RemoteVirtualMachineName (Hostname) gets populated on the job. The instance&amp;#8217;s launch time is also available. &lt;a href=&quot;https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2618&quot;&gt;Oops&lt;/a&gt;.&lt;/p&gt;
&lt;br /&gt;  &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/spinningmatt.wordpress.com/496/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/spinningmatt.wordpress.com/496/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/spinningmatt.wordpress.com/496/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/spinningmatt.wordpress.com/496/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gofacebook/spinningmatt.wordpress.com/496/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/facebook/spinningmatt.wordpress.com/496/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gotwitter/spinningmatt.wordpress.com/496/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/twitter/spinningmatt.wordpress.com/496/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/spinningmatt.wordpress.com/496/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/spinningmatt.wordpress.com/496/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/spinningmatt.wordpress.com/496/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/spinningmatt.wordpress.com/496/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/spinningmatt.wordpress.com/496/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/spinningmatt.wordpress.com/496/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=spinningmatt.wordpress.com&amp;amp;blog=6870579&amp;amp;post=496&amp;amp;subd=spinningmatt&amp;amp;ref=&amp;amp;feed=1&quot; width=&quot;1&quot; height=&quot;1&quot; /&gt;</description>
	<pubDate>Sat, 05 Nov 2011 16:02:58 +0000</pubDate>
</item>
<item>
	<title>Inside OSG Ops.: Moving Services to Bloomington</title>
	<guid>tag:blogger.com,1999:blog-7506259688180433777.post-6124249112199104232</guid>
	<link>http://insideosgops.blogspot.com/2011/11/moving-services-to-bloomington.html</link>
	<description>As you know, the GOC updates services on the second and fourth Tuesday of each month.&lt;br /&gt;The update scheduled for November 8th marks a milestone for the infrastructure team.&lt;br /&gt;After this date all GOC services (with one exception) be be hosted exclusively in&lt;br /&gt;the Bloomington, Indiana data center.&lt;br /&gt;&lt;br /&gt;Previously, most services had two instances, one physically hosted in Indianapolis&lt;br /&gt;the other in Bloomington. These instances are in DNS round robin allowing users&lt;br /&gt;of these services transparent use of either instance. The GOC will continue to&lt;br /&gt;operate (at least) two instances and keep them in round robin, but both instances&lt;br /&gt;will be in Bloomington.&lt;br /&gt;&lt;br /&gt;So why the change? Originally, the Bloomington machine room was extremely unreliable.&lt;br /&gt;Problems included a leaky roof, insufficient cooling and power and a lack&lt;br /&gt;of space. In short, the systems hosted there had outgrown the facility. The machine&lt;br /&gt;room in Indianapolis was larger, newer and considered more reliable. The old Bloomington&lt;br /&gt;machine room went down during a thunderstorm when it was discovered that both electrical&lt;br /&gt;feeds were, at one point, hung from the same utility pole. (Care to guess where the&lt;br /&gt;lightning struck?) Two weeks were required to restore power during which many of the&lt;br /&gt;university enterprise services were unavailable. This situation was clearly unacceptable&lt;br /&gt;so the university decided to invest $37.2M in a new, state-of-the-art data center.&lt;br /&gt;&lt;br /&gt;The 92,000 sq. ft. Bloomington data center is designed to withstand category 5 tornadoes.&lt;br /&gt;The facility is secured with card-key access and 7 x 24 x 365 video surveillance.&lt;br /&gt;Only staff with systems or network administration privileges have access to the machine room&lt;br /&gt;requiring biometric identity verification. Fire suppression is provided by a double interlock&lt;br /&gt;system accompanied by a Very Early Smoke Detection Apparatus (VESDA). Three circuits feed&lt;br /&gt;the Data Center, traveling redundant physical paths from two different substations.&lt;br /&gt;Any two circuits can fully power the building. A flywheel motor/generator set conditions&lt;br /&gt;the power and provides protection against transient events and uninterruptible power&lt;br /&gt;supplies protect against failures of moderate (~1 hour) duration. Dual diesel generators&lt;br /&gt;can provide power for 24 hours in the event of a longer term power failure. In house&lt;br /&gt;chillers provide cooling. Externally supplied chilled water plus city water can be used&lt;br /&gt;in the event of a failure of this system.&lt;br /&gt;&lt;br /&gt;Several advantages are realized by hosting all instances in one location. Service failures&lt;br /&gt;associated with the network between Indianapolis and Bloomington are avoided. By using the&lt;br /&gt;same LAN, DNS round robin can be replaced with Linux Virtual Server (LVS) giving control&lt;br /&gt;of round robin to the GOC rather than the DNS administrators at Indiana University. Also&lt;br /&gt;avoided are failures associated with the loss of one of two data centers. It is trivial to&lt;br /&gt;move virtual machines from host to host since the IP address of the VM does not change, a&lt;br /&gt;property allowing detailed load balancing on all VM hosts.&lt;br /&gt;&lt;br /&gt;The GOC looks forward to continuing providing services with the availability OSG users&lt;br /&gt;have come to expect.&lt;div class=&quot;blogger-post-footer&quot;&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;https://blogger.googleusercontent.com/tracker/7506259688180433777-6124249112199104232?l=insideosgops.blogspot.com&quot; alt=&quot;&quot; /&gt;&lt;/div&gt;</description>
	<pubDate>Thu, 03 Nov 2011 08:08:33 +0000</pubDate>
	<author>noreply@blogger.com (scott)</author>
</item>

</channel>
</rss>

