Running PAUP Jobs in the Vanilla Universe (no checkpointing)
Prepare a PAUP batch file
For this example you'll need two file -- one containing a NEXUS dataset and the other containing the PAUP commands. The data set looks like this:
#NEXUS
Begin data;
Dimensions ntax=8 nchar=200;
Format datatype=dna interleave;
Matrix
A CGAATATAACGGAGCCAGTACTCAGACGCACTGCCAACCCAGCGAAGCCCGATACGCCGT
B CGAATATAACGAAGCCAGTATTCAGACGCACTGCTAACCCAGCGGAGCCCGGTACGCCGT
C CGAATATAACAAAGCCAGTACTCGGACGCACTACCAACCCAGCGGAGCCCGATACGCCAT
D CGAATACAACAAAGCCAGTATTCAGACGCACTGCCAACCCAACAGAGACCGGCGTGCTAT
E CGAATACAACAAAGCCAGTATTCAGACGCACTGCCAACCCAGCAGAGACCCCCACGCTAT
F CGAATACAACAAAGCCAGTATTCAGACGCACTGCCAACCCAGCAGAGACCCACACGCTAT
G CGAATACAACAAAGCCAATATTCAGACGGACTGCCAACCCAGCAGAGACCGACACGTCAT
H CGAATACAACAAAGCCAATATTCAGACGGACTGCCAACCCGGCAGAGACCGACGCGTCAT
...
;
end;
Download a copy of the sample data file
here.
The commads used to run a specific analysis are kept in a separate NEXUS file, which will reference the data set given above.
#NEXUS;
begin paup;
set autoclose=yes warnreset=no increase=auto;
[tell paup the data file name]
[ execute example_data.nex;]
[log file]
log file= example_paup.log replace;
[reconstructe a neighbour-joining tree]
nj;
[ save the tree to a file]
savetrees file= example_paup.tre replace;
end;
Copy the paup block given above and paste it into a new file named
example_paup.nex.
Test the batch file
Before launching your job under condor, test the batch file at the console to make sure that it is working properly. At this point you should have two file:
example_data.nex and
example_paup.nex.
Type:
paup example_paup.nex
Because this is a short analysis, the program should execute and terminate within a second or two, saving a single tree and log file to the current directory. In reality, you will be testing analyses that might run several hours or days before completing. If this is the case, you will want to terminate the analsysis after making sure that PAUP properly executes the file. To interrupt a PAUP process, simply press control+C.
Create a submit file
Now that you know you paup job will run without errors, you need to create a condor submit file. More specific information is on how to create a submit discription file is given in the
UsingCondor topic.
########################################
#
# PAUP run in the Condor vanilla universe
#
#########################################
Universe = vanilla
InitialDir = /home/u5/users/yanfeng/condor_dir/run1
Executable = /usr/common/i686-linux/bin/paup
# Use command "which paup" to find out the path of "paup"
Arguments = example_paup.nex -n -f
requirements = (OpSys =="LINUX" && Arch =="INTEL")
should_transfer_files = YES
WhenToTransferOutput = ON_EXIT_OR_EVICT
transfer_input_files= example_data.nex
output = example_paup_condor.out
error = example_paup_condor.error
log = example_paup_condor.log
Queue
Copy the text given above and paste it into a new file named
example_paup.cmd.
Logon to an SCS submit node
SCS maintains several submit nodes that give users a way to access SCS computer resources. The general access submit node is named
phoenix. There are also two other submit nodes (
anfinsen and
petal), which are part of restricted access research clusters. Special permission from the resource owner is required to access the
petal and
anfinsen submit nodes.
$ ssh <username>@phoenix.scs.fsu.edu
Submit the job
To submit a job to the condor cluster you will use the
condor_submit command. For example:
$ condor_submit example_paup.cmd
You should see the following output:
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 56776.
Check the status of a job
After submitting your job to the condor cluster you can check on the status of your job by using the
condor_q command. For example:
$ condor_q <your user name>
The output from this command should look something like this:
-- Submitter: petal017.csit.fsu.edu : <144.174.160.147:10076> : petal017.csit.fsu.edu
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
56776.0 yanfeng 5/22 18:22 0+00:00:00 R 0 1.8 paup exa
1 jobs; 0 idle, 1 running, 0 held
Don't be surprised if your job remains idle (designated by an
I under
ST) for serveral minutes or longer. If your job does not run right away it most likely means that you have a low priority on the cluster and the cluster is being heavily utilized or it may mean that someone job with a lower priority is taking a while to vacate a node so that your job can run. Remember, condor is based on a High Throughput Computing (HTC) model and not a High Performance Comuting (HPC) model.
You can also see what has happened to your job by looking at the condor log file. Remember the condor log file was defined in the condor submit file. To look at the file you might use the
cat command. For example:
$ cat example_data.log
The output from this command should look something like this:
000 (617.000.000) 10/19 14:34:02 Job submitted from host: <144.174.160.169:11297>
...
001 (617.000.000) 10/19 14:34:32 Job executing on host: <144.174.160.207:9705>
...
005 (617.000.000) 10/19 14:34:32 Job terminated.
(1) Normal termination (return value 1)
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
3346 - Run Bytes Sent By Job
2000108 - Run Bytes Received By Job
3346 - Total Bytes Sent By Job
2000108 - Total Bytes Received By Job
...
Moving on
After the analysis is complete, you will find *.out and *.error files in your directory. These files contain the standard out and standard error generated by the executable.
This is a barebones example submit file. See the
UsingCondor topic for more information on creating submit files.
Condor in heterogeneous environment
# paup running in Condor vanilla Universe
initialdir = /home/u5/users/johndo/condor/
Rank = kflops
Executable = /usr/common/i686-linux/bin/paup.$$(OpSys).$$(Arch)
Universe = vanilla
requirements = (OpSys =="OSX" && Arch =="PPC") || \
(OpSys =="WINNT51" && Arch =="INTEL") || \
(OpSys =="LINUX" && Arch =="INTEL") || \
(OpSys =="LINUX" && Arch =="ALPHA")
should_transfer_files = YES
when_to_transfer_output = ON_EXIT_OR_EVICT
transfer_input_files = primates.nex, rep.$(Process)
notification = NEVER
arguments = rep.$(Process) -n -f
output = rep_out.$(Process)
error = rep_error.$(Process)
log = rep.log
Queue 100