Skip Navigation

University of Nebraska–Lincoln

Research Computing Facility

Condor FAQ

  1. What is Condor?
  2. How do I create a Condor Script?
  3. How do I submit a job?
  4. Different Applications running on Condor
  5. My job needs user intervention. Is there an interactive mode for Condor?
  6. Are there queues in Condor like in PBS?
  7. Can I just forget about PBS?
  8. Now that I have submitted my job, how do I check the status?
  9. Is there a way to see all of the jobs I have run through Condor on PrairieFire?
  10. I don't see any output from my job. What's going on?
  11. How do I kill a job?
  12. My job was killed. What should I do if I don't know why?
  13. Why shouldn't I just ask for the maximum amount of resources possible all the time?
  14. Is there a user manual available?
  15. What do I need to run through Condor and what can I run on the head node?
  16. How can I run in the standard universe?

What is Condor?

The Condor project is designed to implement High Throughput Computing. Condor is a bundle of software that takes care of scheduling applications as well as checking for computing resources in a clustered/grid environment. The reason RCF is using Condor is its ability to cycle-scavenge. This means jobs may be run through Condor on processors not being used by PBS. Once PBS schedules jobs on those processors Condor either checkpoints its jobs or moves them out of the way. This way machines that are reserved can be used for computation. Right now, Condor can only handle serial jobs.

How do I create a Condor Script?

Condor, much like PBS, needs a script to tell it how to do what the user needs. This is a basic script that should handle most jobs submitted to Condor.
#Example of a condor script
#with executable, stdin, stderr and log
Universe = vanilla
Executable = a.out
Arguments = file_name 12
Output = a.out.out
Error = a.out.err
Log = a.out.log
Queue

#

Lines starting with # are comments in Condor files.

Universe
is the way Condor manages different ways it can run, or what is called in the Condor documentation a runtime environment. There is standard and vanilla on Prairieifire. The vanilla universe is where most jobs should be run. To run on the standard universe programs must be recompiled.

Executable
is the name of the executable you want to run on Condor.

Arguments
are the command line arguments for your program. For example, if one was to run ls -l / on Condor. The Executable would be ls and the Arguments would be -l /.

Output
is the file where the information printed to stdout is going to be sent.

Error
is the file where the information printed to stderr is going to be sent.

Log
is the file where information about your Condor job will be sent. Information like if the job is running, if it was halted or, if running in the standard universe, if the file was check-pointed or moved.

Queue
is the command to send the job to Condor's scheduler.

If one is to submit a job, like a Monte-Carlo simulation, where the same program needs to be run several times with the same parameters the script above can be used with one modification. The Queue command can be given the number of times one wants the job to be queued in Condor. So if the Queue command is changed to the one below a.out will be run 5 times with the exact same parameters.

Queue 5

If one would like to submit the same job but with different parameters, Condor accepts files with multiple Queue statements. Only the parameters that need to be changed, need to be changed in the Condor file.

#Example of a condor script
#with executable, stdin, stderr and log
#and multiple Argument parameters
Universe = vanilla
Executable = a.out
Arguments = file_name 10
Output = a.out.$(Process).out
Error = a.out.$(Process).err
Log = a.out.$(Process).log
Queue
Arguments = file_name 20
Queue
Arguments = file_name 30
Queue

To submit a file to a windows machine, there needs to be a requirement that specifies the operating system and if files need to be transfered there are transfer file commands:

#Example of a condor script
#with executable, stdin, stderr and log
#and multiple Argument parameters
Universe = vanilla
Executable = a.out
Requirements = (OpSys == "WINNT51")
Arguments = file_name 10
Output = a.out.$(Process).out
Error = a.out.$(Process).err
Log = a.out.$(Process).log
TRANSFER_FILES = ALWAYS
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
transfer_input_files = file1, file2, file3
transfer_output_files = outputfile

Queue

How do I submit a job?


condor_submit condor_script

Different Applications running on Condor


Can I run BLAST on Condor?

BLAST can be run on Condor in the vanilla universe. The NCBI toolkit is installed on on /home/programs/ncbi.

To make life easier, decide with other people in your research group on a shared directory to hold the formated databases. Why this makes things easier in a moment. Lets say someone makes a directory in their group directory called db_formatted and gives everyone else in their group access to it. Example:
>$ mkdir /home/swanson/db_formatted
>$ chmod g+rwx /home/swanson/db_formatted


Now create a file called .ncbirc in your home directory. It should contain this:

[NCBI]
Data = /home/programs/ncbi/data

[BLAST]
BLASTMAT = /home/programs/ncbi/data
BLASTDB = /home/swanson/db_formatted

This file contains paths to where the metricises needed by the NCBI tools are and the path to where the formatted data bases are. If the formated data bases are going to be placed somewhere else, change the value of BLASTDB.

Now look at the section on how to write a Condor script and section on how to submit. The BLAST condor script should look something like this. I'm using the nr data base as an example.

#BLAST submission script
Universe = vanilla
Executable = /home/programs/ncbi/bin /blastall
Arguments = -p blastp -d nr -i test.seq -o blastout.ncbi
Log = BLAST.log
Queue


If you have any questions contact the administrator..

Can I run Java on Condor?

Java can be run on Condor in the vanilla universe.

Prairiefire has JDK 1.4 and 1.5 installed on /home/programs/java. For all examples I'll be using the path to JDK 1.4 because some code won't compile on 1.5.

Because there is supposed to be a Java universe in Condor java support under the vanilla universe is a little awkward. A wrapper has to be written around Java. The easiest way, at least for me, is to write it in BASH. Create a file called java.sh.

#!/bin/bash
# In case I need to point to some special libraries
#uncomment line bellow and add path
#export CLASSPATH= /home/programs/java/j2sdk1.4.2_10/bin/java $@


Make this file executable.

>$ chmod u+x java.sh

Look at the section on how to write a Condor script and section on how to submit. For the Executable entry type java.sh and for Arguments enter the program arguments, the name of the Java program and all it's parameters.

#Example of a condor script
#for Java
#with executable, stdin, stderr and log
#and multiple Argument parameters
Universe = vanilla
Executable = java.sh
Arguments = Hello_World 10
Output = hello.out
Error = hello.err
Log = hello.log
Queue

If you have any questions contact the administrators.

My job needs user intervention. Is there an interactive mode for Condor?


No.

Are there queues in Condor because in PBS there are queues?


Even though Condor is a scheduler of sorts, there are no queues implemented in Condor. We would like Condor to be used more as a cycle scavenging system for serial processes. The Condor scheduler works per user, so everyone, no matter how many jobs they submit, will have access to time on the cluster.

Can I just forget about PBS?


No. As stated in section one, Condor on Prairiefire does not have the capability to run parallel jobs. As well, PBS will preempt or kill Condor jobs. Thus, Condor is most appropriate for the following cases:

1. Serial code of short duration.
2. Serial code compiled under the standard universe.

Now that I've submitted my job, how do I check the status?


condor_q will show you all processes in the Condor scheduler.

condor_q username will show you all jobs in the scheduler from user username.
condor_status
will show you the status of the entire Condor pool.

Is there a way to see all of the jobs I have run through Condor on PrairieFire?


No, but this may be added in the future.

I don't see any output from my job. What's going on?


You probably did not add the Output line in your Condor script file. If you have any questions contact the administrators.

How do I kill a job?


Use condor_q to find out what job number your job is and use condor_rm to kill the job.

My job was killed. What should I do if I don't know why?


Check the Condor queue, condor_q. If your job is in the queue, it wasn't killed just probably restarted on another machine. As long as jobs are in the Condor queue they are going to be run at some point. If your job is not in the Condor queue, and you didn't kill it, contact the administrators.

Why shouldn't I just ask for the maximum amount of resources possible at all times?


On Prairiefire, since most of the nodes are homogeneous, and the nodes with more RAM are usually very busy, using ClassAdds to try to get these nodes will, most likely, get your job stuck on the queue forever. If you need more resources than the nodes Condor routinely uses contact the administrators for help.

Is there a user's manual available?


There is a Condor user manual on-line at Condor's website.

What do I need to run through Condor and what can I run on the head node?


As stated before, Condor on Prairiefire is a cycle-scavenging system. If you don't mind your job not running 'right now', the job's serial, or you don't want to use PBS, then use Condor. For Condor, if you're willing to experiment with the standard(section [sec:standard_universe]) universe, the code can be compiled on the head node.

How can I run in this standard universe?


The standard universe, as of the writing of this document, only runs 32 bit even though Prairiefire has 64 bit machines. This is due to the Condor libraries being 32 bit. For now, Condor has been tested with gcc, g++ and g77.

To compile against the Condor libraries the compiler in the Makefile must be changed. In most Makefiles find the compiler declaration. In most it's CC = gcc. Change to CC = condor_compile gcc. If your program uses C++ then change gcc to g++ in the previous example. For Fortran change gcc to f77 in the previous example. As well, the -m32 flag MUST be given somewhere. It tells the compiler you want to create 32 bit binaries. Your compilation will break if this flag is not given. Look for CFLAGS. Most Makefiles have a -O option for the compiler somewhere. Look for this and add the -m32 flag. The standard universe is still under testing, and most programs will not even build under it so, the vanilla universe is usually the best option.

Created By: Cesar Delgado

Last Modified: February 28, 2006