r38 - 07 Mar 2008 - 17:53:59 - DougJacobsenYou are here: TWiki >  Computing Web > AvailableSoftware > UsingCondor

Condor at SCS

About Condor

Condor is a batch system for queuing, scheduling, and prioritizing compute-intensive jobs. It is developed by the Condor team at the University of Wisconsin. In a nutshell, Condor matches user submitted jobs to available computer resources. These resource could be desktop machines or owner-based clusters. As long as the machine has the required resources the job will be sent to the machine and run. The Condor documentation is full of useful information about the software and how the system works. The following page summarizes bits of the condor documentation that are relevant to using Condor at SCS.

Prepare your program

A job run under Condor must be able to run as a background batch job. Condor can redirect console output and keyboard input to and from files for you. Create any needed files that contain the proper keystrokes needed for program input. Before submitting your job, it is a good idea to make certain the program can run correctly with the input files you have created.

Choose a condor universe

Condor has several run-time environments, which are referred to as universes. SCS supports the Standard and the Vanilla universes. The Standard universe allows remote system calls and jobs can checkpoint and migrate. This is useful, when the job is running on a desktop system and the user returns. Instead of the job being killed, it will save an image of itself and move to another compatible node that is not being used. The Standard universe requires that the program be linked to the condor libs; therefore, if you do not have the object code you may be restricted to the vanlilla environment. The vanilla universe does not require that you have the program object code, but as a consequence jobs run in the vanilla universe do not checkpoint or migrate. Vanilla jobs will be evicted from a machine if a user returns.

Compiling code for condor (Only Standard Universe)

If you choose to use the standard universe in condor, your program has to be linked to the condor libs. So you have to have either the object code or the source code for the program. To compile your program you have to log on to a submit node (eg. Phoenix) and compile using the following command:
condor_ compile cc | CC | gcc | f77 | g++ | ld | ...
You would just append the condor_compile command with the normal command you would use to compile your program.

Example 1

If your source code is called test.c you would compile using this command:
condor_compile gcc test.c -o test
and this will create an executable called test that can now be run in the standard universe for condor.

Create a submit description file

A "Submit description file" contains commands and keywords to direct the queuing of jobs. In this file, condor finds everything it needs to know about the job(s). Items such as the name of the executable to run, the initial working directory, and command-line arguments to the programs all go into the description file.

Example 1

A very simple submit description file may take the following form.

#################### 
# 
# Example 1 
# Simple condor job description file 
# 
#################### 
Executable = foo 
Log = foo.log 
Queue 

Example 2

Here's a slightly more complicated submit description file:

#################### 
# 
# Example 2: demonstrate use of multiple 
# directories for data organization. 
# 
#################### 
Executable = mathematica 
Universe = vanilla 
input = test.data 
output = loop.out 
error = loop.error 
Log = loop.log 
Initialdir = run_1 
Queue 
Initialdir = run_2 
Queue 

Example 3

The vanilla environment allows jobs to be run on heterogeneous architectures. Instead of specifying the executable explicitly, a macro is included that will be expanded when a machine is available and matched with your job.

#################### 
# 
# Example 3: demonstrate heterogeneous submit 
# file. 
# 
####################
 
  initialdir =/home/u5/users/jwilgenb/condor/RepFiles
  Rank = kflops 

  Executable   = /usr/common/i686-linux/bin/paup.$$(OpSys).$$(Arch) 
 
  Universe     = vanilla 
  requirements = (OpSys =="OSX" && Arch =="PPC") || \
                 (OpSys =="WINNT51" && Arch =="INTEL") || \
                 (OpSys =="LINUX" && Arch =="INTEL") || \
                 (OpSys =="LINUX" && Arch =="ALPHA")

  should_transfer_files = YES
  when_to_transfer_output = ON_EXIT_OR_EVICT
  transfer_input_files = anolis.nex, rep.$(Process)

  notification = NEVER
  arguments = rep.$(Process) -n -f
  output    = rep_out.$(Process)
  error     = rep_error.$(Process)
  log       = rep.log

Queue 5 

Submitting your job

The program condor_submit is used to submit jobs to the SCS Condor cluster. At SCS, you can submit jobs from submit.scs.fsu.edu. In addition, most of the owner-based cluster head-nodes can be used to submit jobs to the condor "flock."

The condor_submit application requires a submit-description file, which contains the commands needed to match jobs with an execute node.

To submit a job log in to a submit node (e.g., submit) and type the following:

condor_submit "submit description file" 

If the job is successfully submitted, then you will something like this:

 
  Submitting job(s)..
  Logging submit event(s)..
  2 job(s) submitted to cluster 56836.

Manage your job (frequently used commands)

After submitting a job, condor provides a number of commands for managing the job and monitoring its status. For example:

  • condor_q display information about jobs in queue.
      
         condor_q
                   display information about jobs in queue.By default,it only queries the local job queue
    
         condor_q -global  
                   query all job queues in the pool
                   "condor_q -g" has the same fuction.
                   If the display list is too long, you can use "condor_q | less" to show it page by page.
         
         condor_q  -name "scheed name"
                   cause the queue of the named schedd to be queried
                   example "condor_q -name petal"
         
         condor_q  -submiter "submitter name"
                   List jibs of specific submitter from all the queues in the pool.
                   example "condor_q -submitter yanfeng" (yanfeng is my user name) will show you all
                           the running jobs submitted by yanfeng
                   "condor_q yanfeng" has the same fuction.
    
         condor_q -run 
                   get information about runing job.
                   example " condor_q -run yanfeng" will show you all the running jobs submitted by yanfeng
                   "condor_q -r" has the same fuction.
         
         condor_q -help
                   get a brief description of the supported options
                   
     
  • condor_status is a versatile tool that may be used to monitor and query the condor pool.

 
      condor_status
               Display the status of the Condor pool
     
      condor_status -avail
               indentify resources which are avaiable.

      condor_status -schedd
               Query condor_schedd ads and display attributes

      condor_status  -help
               get a brief description of the supported options
 

  • condor_rm remove Jobs from the condor queue

 
      condor_rm username
              remove one or more jobs from the condor job queue
              example "condor_rm yanfeng" remove all the jobs submitted by yanfeng 
              
      condor_rm cluster
              remove all jobs in the specified cluster

      condor_rm cluster.precess
              remove the specific job in the clustal

      condor_rm -help
              get a brief description of the supported options

  • condor_hold hold your job

 
      condor_hold cluster
              Hold all jobs in the specified cluster
      
      condor_hold cluster.process
             Hold the specific job in the cluster
    
      condor_hold user
              Hold all jobs belonging to specified user 
    
      condor_hold -help
              get a brief description of the supported options

  • condor_release release held jobs in the Condor queue

      condor_release cluster
             Release all jobs in the specified cluster

      condor_release cluster.process
             Release the specific job in the cluster

      condor_release user
              Release jobs belonging to specified user 

      condor_release -help
              get a brief description of the supported options

  • condor_prio change priority of jobs in the condor queue

      condor_prio  [{+|-}priority ]  cluster
              change priority for all processes belonging to the specified cluster.
              The user can also adjust the priority by supplying a + or - immediately followed 
              by a digit. The priority of a job can be any integer, with higher numbers corresponding 
              to greater priority. Only the owner of a job or the super user can change the 
              priority for it.
              example "condor_prio +2  56639" will change the cluster priority to +2. You can check with "condor_q"

      condor_prio  [{+|-}priority ] cluster.process
              change the priority of the specified process.
    
      condor_prio  [{+|-}priority ] user
              change priority of all jobs belonging to that user.

      condor_prio -help
              get a brief description of the supported options

For a more complete list of condor submit file examples try visiting the Condor Project Website.

Examples (applications used at SCS)

Migrate

MrBayes

PAUP

CHARMM


Adding your machine to the SCS Pool

If you would like to add your machine to the SCS Condor Pool, please contact TSG.


Condor help

If you need help with Condor, there are several resources that you can tap for useful information. The University of Wisconsin-Madison maintains a mailing list for Condor users. The list is regularly monitored by the Condor development team. You can also subscribe to the SCS Condor mailing list. Please, read through the documentation on the SCS TSG twiki site, the University of Wisconsin Condor homepage, and the mailing list archives before posting a question.

Page information

Known Issues

If you are submitting a queue of jobs from an NFS mount, and you are trying to save a single log file, you will most likely run into an issue where some of your jobs stall. We have noticed this issue from queues running from around 20 jobs to 200 jobs, but it may happen for any number of jobs. This issue essentially takes down the submit node that you are working on and will be down until we get around to fixing it. If you want to avoid this, you can either not print any log files or replace you log print line to the following.
log = log.$(Process)
Or something similar. This will produce one separate log file for each of your running jobs in the queue. If you don't need the log files though, we recommend just removing the line. Another way to bypass this issue, is to write the log file onto a local disk of the submission machine. You can do this with the following line.
log = /tmp/JOBNAME.Log
Then after your job is complete, you can look in /tmp to see your job log and you can remove it when you are done.

Show attachmentsHide attachments
Topic attachments
I Attachment Action Size Date Who Comment
ziptgz Condor.tgz manage 1.6 K 07 Mar 2008 - 16:52 DougJacobsen Condor Examples
Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r38 < r37 < r36 < r35 < r34 | More topic actions
Computing.UsingCondor moved from TechHelp.UsingCondor on 07 Nov 2006 - 13:48 by JimWilgenbusch - put it back
 
SCS TWiki

This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback