Getting the best I/O performance for your computational jobs

Your compute jobs can be slowed by a number of factors. In order to have them run as fast as possible, you will want to eliminate as many limiting factors as possible. Factors that limit your job’s speed are often called bottlenecks. Common bottlenecks include:

  • Processor speed
  • Storage (disk, network storage)
  • Available RAM (RAM is fast workspace memory)
  • Serial code

When a job is limited by a certain factor, it is said to be bound by that factor. For example, a process that is heavy on computation and light on disk input/ouput will be limited by the processing power of the computer’s processor. It is said to be CPU-bound. A job that transfers large amounts of information to and from disks would be said to be I/O bound

I/O-bound jobs and what to do about them

Most research groups have Isilon storage space. When running a job, you have the option of choosing where your input and output data will reside while the job is running. If you run a job using files which are stored on Isilon, please reconsider. It is likely in many cases that your job will be I/O bound because the Isilon is a network attached file system, built for reliability (not speed). A job run against network storage will spend a large portion of its time waiting on the system to read from an input file or write to an output file. That issue can cause your job to take several times as long to complete.

To eliminate that problem when working with Isilon data, copy the data to faster local storage, process it, and then copy the results back to Isilon for safe keeping.

Choosing a location

Biology-IT machines (biocrunch, bigram, speedy, etc.)

If you are on speedy or speedy2, and have large I/O requirements, consider using /tmp during your job because the /tmp directory is physically a very fast solid state disk drive.

When copying to local storage, the most logical option is to copy to your home directory, defined as /home/net-id where net-id is your ISU net-id.

  1. Copy to /home/net-id
  2. Run job
  3. Copy results back to Isilon for permanent storage

Condo

In order from fastest storage to slowest

$TMPDIR (you can run the copy from your job script)

/ptmp

/work/LAS/net-id

Remember to use the data transfer node

For Condo, remember to use condodtn (condodtn.its.iastate.edu) to transfer data into, and out of, the Condo cluster.

Learn how to mount Isilon on condotn.

Questions you may have

Q.  This inconvenient; why should I do this?

A. Reads and writes will not go over the network and will be much faster. Also, you will not be clogging up the network with traffic that should be local to a machine.

 

Q.  Won’t it just take the same amount of time to copy it to local storage and copy it back as it would to just run it from Isilon?

A.  In most cases, no. For jobs with significant I/O, the lower latency of the local storage will result in a large speedup.

Page
Category: