News and Announcements from OSG Operations > GOC Service Update - Tuesday, December 13th at 14:00 UTC

The GOC will upgrade the following services beginning Tuesday, December 13 at 14:00 UTC. The GOC reserves 8 hours in the unlikely event unexpected problems are encountered.

Collector
  * Update HTCondor-ceview to latest version

Software.grid
  * Create alias to repo.grid in preparation to retire software.grid

OASIS
  * Update cvmfs packages on oasis and oasis-replica to the latest version
  * Update frontier-squid on oasis-replica to the latest version
  * Add the BNL stashcache server to the default list of data servers for osgstorage.org repositories

VOMS
  * Rebuild VOMS for mis and osgedu VOs at CentOS7

Twiki
  * Reinstalled on CentOS6

News and Announcements from OSG Operations > GOC Service Update - Tuesday, December 13th at 14:00 UTC

The GOC will upgrade the following services beginning Tuesday, December 13 at 14:00 UTC. The GOC reserves 8 hours in the unlikely event unexpected problems are encountered.

Collector
  * Update HTCondor-ceview to latest version

Software.grid
  * Create alias to repo.grid in preparation to retire software.grid

OASIS
  * Update cvmfs packages on oasis and oasis-replica to the latest version
  * Update frontier-squid on oasis-replica to the latest version
  * Add the BNL stashcache server to the default list of data servers for osgstorage.org repositories

VOMS
  * Rebuild VOMS for mis and osgedu VOs at CentOS7

Twiki
  * Reinstalled on CentOS6

News and Announcements from OSG Operations > UPDATE: Operations Emergency Maintenance - November 29, 2016

As was previously noted, there was an emergency maintenance period today starting at 8am EST to repair a damaged filesystem. The affected filesystem, /net/nas01, has been repaired as of 11:45 EST and all systems are operating as expected. We regret any inconvenience that this may have caused. Please feel free to contact us with any questions.

News and Announcements from OSG Operations > Operations Emergency Maintenance - November 29, 2016

There will be an emergency maintenance period tomorrow, November 29, starting at 8am EST to repair a damaged filesystem. The affected filesystem is /net/nas01, commonly used as scratch space. This will affect some internal service monitoring functionality, but should not affect other GOC services. We regret any inconvenience and will inform you when maintenance is completed.


Pegasus news feed > Pegasus 4.7.2 released

pegasus-472We are happy to announce the release of Pegasus 4.7.2. Pegasus 4.7.2 is a minor release of Pegasus and includes improvements and bug fixes to the 4.7.1 release.

Improvements in 4.7.2 are

  • [PM-1141] – The commit to allow symlinks in pegasus-transfer broke PFN fall through
  • [PM-1142] – Do not set LD_LIBRARY_PATH in job env
  • [PM-1143] – R DAX API
  • [PM-1144] – pegasus lite prints the wrong hostname for non-glidein jobs

 

76 views


News and Announcements from OSG Operations > Thanksgiving Holiday

On November 24th and 25th, the OSG Operations Center will be operating on a holiday schedule. Staff will be available to respond to emergencies but routine operations will resume at the start of business Monday, November 28.

OSG Operations wishes its users and OSG staff a happy Thanksgiving Holiday.

Pegasus news feed > Pegasus’ contributions to LIGO highlighted in OSG’s HPCwire awards

The Pegasus team congratulates the Open Science Grid (OSG) for the two HPCwire ‘Top Supercomputing Achievement’ awards for 2016, which recognize the use of high performance computing to verify Einstein’s theory of gravitational waves. OSG won in both the online publication’s annual Readers’ Choice and Editors’ Choice categories.

As in early publications of this year, Pegasus workflows have been used by LIGO to process, on OSG, over five terabytes of LIGO data many thousands of times, leading to many petabytes of exported data.

 

 OSG receives HPCwire’s ‘Top Supercomputing Achievement’ awards

Read the full publication #kadbtn51:hover {color:#ffffff !important;}

hpcwire

img_8947

 

99 views


News and Announcements from OSG Operations > OSG receives HPCwire's 'Top Supercomputing Achievement' awards

Multi-partner awards cite OSG's role in gravitational wave detection

The Open Science Grid (OSG) is a recipient of two HPCwire 'Top Supercomputing Achievement' awards for 2016, recognizing the use of high performance computing to verify Einstein's theory of gravitational waves.

Funded by the U.S. Department of Energy (DOE) and National Science Foundation (NSF), OSG is a multi-disciplinary research partnership specializing in high throughput computational services.

The HPCwire awards were presented at the 2016 International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), in Salt Lake City, Utah. OSG won in both the online publication's annual Readers' Choice and Editors' Choice categories.

The awards also recognize the San Diego Supercomputer Center (SDSC) at the University of California San Diego, the NSF's Extreme Science and Engineering Discovery Environment (XSEDE), the Holland Computing Center at University of Nebraska-Lincoln (UNL), the National Center for Supercomputing Applications (NCSA), and the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for their participation in verifying the existence of gravitational waves.

OSG provided access to numerous high performance computing systems for LIGO, seamlessly integrating XSEDE supercomputers like SDSC's Comet and TACC's Stampede with HPC clusters in academia and DOE national labs. LIGO used this integrated infrastructure over the course of several months to verify the statistical significance of the observed gravitational wave, thus leading to its unambiguous detection.

In the process, roughly five terabytes of LIGO data was processed many thousands of times, leading to many petabytes of exported data. The storage infrastructure at Holland Computing Center hosted the LIGO data for processing via OSG. The data was pulled as needed by the Pegasus workflow.
Frank Würthwein, who is the current executive director of the OSG, worked in tandem with XSEDE researchers to make these resources available to LIGO scientists. Würthwein is also SDSC's head of High Throughput Computing.

"There were a lot of institutions and researchers involved in this landmark discovery, as well as the actual verification process," said Würthwein, also a physicist with UC San Diego. "While high performance computing resources from all over have been used by LIGO for years, this award focused on the use of distributed high throughput computing across a wide range of HPC resources for the actual verification of this amazing discovery."

"From thought leaders to end users, the HPCwire readership reaches and engages every corner of the high performance computing community," said Tom Tabor, CEO of Tabor Communications, publisher of HPCwire. "Receiving their recognition signifies community support across the entire HPC space as well as the breadth of industries it serves. We are proud to recognize these efforts and make the voices of our readers heard, and our congratulations go out to all the winners."


A multi-partner collaboration

In February 2016, the NSF made a pivotal announcement: For the first time, scientists detected gravitational waves in the universe as hypothesized by Albert Einstein about 100 years ago. On September 14, 2015 scientists at the NSF-funded Laser Interferometer Gravitational-Wave Observatory (LIGO) detected gravitational waves using both LIGO detectors. The waves reached Earth from the southern hemisphere, passed through the Earth, and emerged at the Earth's surface, first at the LIGO detector near Livingston, Louisiana, and then, seven milliseconds later and 1,890 miles away at the second LIGO detector in Hanford, Washington.
More details on the LIGO discovery can be found at:
https://www.opensciencegrid.org/osg-helps-ligo-scientists-confirm-einsteins-last-unproven-theory/
http://www.sdsc.edu/News%20Items/PR20160225_ligo.html
https://www.xsede.org/xsede-resources-help-confirm-ligo-discovery.

The annual HPCwire Readers' and Editors' Choice Awards are determined through a nomination and voting process with the global HPCwire community, as well as selections from the HPCwire editors. The awards are an annual feature of the publication and constitute prestigious recognition from the HPC community. These awards are revealed each year to kick off the annual Supercomputing Conference, which showcases high-performance computing, networking, storage, and data analysis. More information on these awards can be found at the HPCwire website or on Twitter through the #HPCwireAwards hashtag.


About OSG

The Open Science Grid Consortium is a community-driven organization that spans academia and Department of Energy national laboratories to advance the state of the art of distributed high throughput computing. In addition to community contributions, the OSG project receives funding from DOE and NSF to operate a fabric of services, including a production infrastructure, an integrated software stack, and a variety of intellectual support services for educators, scientists, and IT professionals.


About HPCwire

HPCwire is an online news and information resource covering the fastest computers in the world and the people who run them. Started in 1986, HPCwire has enjoyed a legacy of world-class editorial and journalism, making it the news source of choice for science, technology, and business professionals interested in high-performance and data-intensive computing. Visit HPCwire at www.hpcwire.com.


Media Contacts:

Kyle Gross, OSG communications lead, kagross@iu.edu
Chelsea Lang, corporate marketing manager, Tabor Communications, 919 749-1895


 Related Links:

The Open Science Grid: https://www.opensciencegrid.org
National Science Foundation: https://www.nsf.gov/
Department of Energy, Office of Science: http://science.energy.gov
Pegasus: https://pegasus.isi.edu
LIGO: http://ligo.org/

Derek's Blog > Running Mixed EL6 & EL7 Clusters

We are entering a new era of transition from Enterprise Linux (EL) 6 to EL7. During this transition, we have to support submitting jobs to clusters that are running one or both of these OS’s. In this post, I will describe how we have accomplished this at a few sites.

When GlideinWMS factories submit jobs to a CE, they are configured to run on only a single operating system. Therefore, you must route jobs to their designated OS through the HTCondor-CE Routes.

HTCondor-CE Configuration

A HTCondor-CE must be configured to submit to multiple clusters or separate sections of the same cluster. In the examples below, I will show how a HTCondor-CE-Bosco CE can be configured to submit to multiple clusters, each running a different OS. But, the examples could be adapted to work for a HTCondor or a Slurm cluster.

Changes for HTCondor or Slurm cluster:

  • HTCondor: remote_requirements would have to be set so that the jobs only run on Startd’s that advertise the correct OS version.
  • Slurm: Usually different OS nodes will be in different partitions. If this is true, then the default_queue argument can be used in the route to send the job to the correct partition.

Here is an example route from an HTCondor-CE-Bosco:

JOB_ROUTER_ENTRIES = \
     [ \
     GridResource = "batch pbs griduser@el7.example.edu"; \
     TargetUniverse = 9; \
     name = "Local_BOSCO_el7"; \
     requirements = TARGET.distro=?=“RHEL7”;\
     ]\
     [ \
     GridResource = "batch pbs griduser@el6.example.edu"; \
     TargetUniverse = 9; \
     name = "Local_BOSCO_el6"; \
     requirements = TARGET.distro=?=“RHEL6”;\
     ]

In this configuration, any job that includes the argument +distro="RHEL7" or "RHEL6" will go to the correct EL7 or EL6 cluster. This can be used by the GlideinWMS factory to route jobs to the correct nodes.

Factory Configuration

The GlideinWMS factory must also be configured to submit to each of the different OS types. The factory creates two entry points, one for each of the different OS’s. One entry point will have +distro="RHEL7", the other +distro="RHEL6". They will have all other attributes the same, including the CE hostname and contact information.

Conclusions

With clusters upgrading to EL7 at a rapid pace, the OSG must support mixed OS clusters. Hopefully this post will help with recipes for HTCondor-CE’s that are used to access mixed OS clusters.

In the next post, I hope to discuss how Singularity can be used to hide the local OS. It can also allow for users to create and distribute complicated environments without the need of downloading and configuring software on each worker node.


Pegasus news feed > Pegasus @ AGU Fall Meeting 2016

Are you going to attend the AGU Fall Meeting in San Francisco, CA on December 12-16, 2016? We will be presenting two posters in the sessions below. Please, join us in the poster hall (Moscone South) and let’s have some coffee and very interesting discussions.       Related Publication: … Read More

News and Announcements from OSG Operations > Announcing OSG Software version 3.3.18

We are pleased to announce OSG Software version 3.3.18

Changes to OSG 3.3.18 include:
* GlideinWMS 3.2.16: can specify BOSCO user, start glidein manually, bug fixes
* SSLv3 is now disabled on several Globus tools
* Fix to prevent Globus GridFTP server process hangs
* Fixed edg-mkgridmap on Enterprise Linux 7
* PKI tools now generate CSRs using SHA2
* Fixed blahp qstat call to support torque-4.2.9
* Fixed crash in blahp when using glexec and limited proxies
* Augment Gratia PBS probe: process "exec_host", when "ALLPROCS" flag present
* GridFTP server script returns correct value when the service isn't running
* HTCondor-CE 2.0.11: Minor fixes

Release notes and pointers to more documentation can be found at:
https://www.opensciencegrid.org/bin/view/Documentation/Release3/Release3318

Need help? Let us know:
https://www.opensciencegrid.org/bin/view/Documentation/Release3/HelpProcedure

We welcome feedback on this release!

Pegasus news feed > Pegasus 4.7.1 Released

pegasus-471

We are happy to announce the release of Pegasus 4.7.1. Pegasus 4.7.1 is a minor release of Pegasus and includes improvements and bug fixes to the 4.7.0 release.

Improvements in 4.7.1 are

  1. Fix for stage in jobs with repeated portion of LFN [PM-1131]
  2. Fix for pegasus.transfer.bypass.input.staging breaks symlinking on the local site [PM-1135]
  3. Capture the execution site information in pegasus lite [PM-1134]
  4. Added ability to check CVMFS for worker package

3 views


Condor Project News > European HTCondor workshop scheduled for June 6-9, 2017 ( November 2, 2016)

The 2017 European HTCondor workshop will be held Tuesday, June 6 through Friday, June 9, 2017 at DESY in Hamburg, Germany. We will provide more details as they become available.

News and Announcements from OSG Operations > GOC Service Update - Tuesday, November 8th at 14:00 UTC

The GOC will upgrade the following services beginning Tuesday, November 8th at 14:00 UTC. The GOC reserves 8 hours in the unlikely event unexpected problems are encountered.

Ticket
Configuration Changes tickets exchanged with BNL will be linked on the ticket view page

OIM
JIRA OIM-140: Retry CILogon host certs when we receive a "retry" error

VOMS
Remove obsolete voms for the CSIU, osg, osgcrossce and UC3 VOs

News and Announcements from OSG Operations > GOC Service Update - Tuesday, October 25th at 13:00 UTC

The GOC will upgrade the following services beginning Tuesday, October 25th at 13:00 UTC.
The GOC reserves 8 hours in the unlikely event unexpected problems are encountered.

Ticket Exchange
* Configuration Changes
* Adding new exchange (XSEDE)

OIM
* Add “Created” timestamp field in OIM downtime table to preserve original log creation timestamp

Oasis
* Update the configuration for ligo.osgstorage.org to work better on Debian hosts.

MyOSG
* “Created” downtime timestamp to be available through reporting

All Services

* Operating system updates; reboots will be required. The usual HA mechanisms will be used, but some services will experience brief outages.


Subscribe