Diviner UCLA Account FAQ

This Frequently Asked Questions (FAQ) page refers to user accounts on Diviner UCLA computers.
  1. Which computers will I be using?

    You will be able to login to and use these computers:

    luna1.diviner.ucla.edu - This is the primary computer you will use. It is a single node running CentOS 6. Single jobs can be compiled and run here, but if you intend to run more than 1-2 at a time, please use the div5 cluster!

    div5.diviner.ucla.edu - For running multiple single jobs as well as parallel processing jobs. It is a cluster front-end running the Rocks 6.1 Operating System. Here you will submit jobs to the Rocks cluster. Please note that since luna1 and div5 have subtly different operating systems, programs compiled on one may not work on the other.

    It's fine to do compiles on div5, but if you want to test your running code, please login to one of the following compute nodes from div5:

        compute-0-0
        compute-0-1
        compute-0-2
        compute-0-3
    
    You can see if anyone else you is running stuff on it by doing:
        ps -ef | grep -v root
    
    In which case, you might need to look for a freer node.


  2. How do I login?

    You will login using SSH, either from the command line, e.g.:

         ssh username@luna1.diviner.ucla.edu

    or via an SSH client program. I recommend PuttyCyg for Windows.


  3. How do I change my password?

    Use the command:

         passwd

    Be sure you change your password upon your first login, as any password given to you by system administrators is temporary only!

    Note: Your passwords are not shared between luna1 and div5. If you change one, it will not affect the other, so be sure to login to both computers and run 'passwd' on both.


  4. What are each of the computers used for?

    luna1 Home directories. All small files (less than 50 megabytes) are backed up. Store any important small files like source code, docs, hand-edited things etc. here. Try not to put big files here. This is the only disk array that gets partially archived.
    luna2 Selected users have disk space here. Some luna1 backup.
    luna3 Selected users have disk space here.
    luna4 RDR Recalibration Effort
    luna5 Selected users have disk space here.
    luna6 Selected users have disk space here.
    luna7 luna1 backup
    luna8 FDS data, MCS data, some luna1 backup
    luna9 FDS data, Backups of divdata.
    lunap Archiving.
    div5 Cluster front end.

    If you need more space for creating large amounts of data, email a request to Mark Sullivan.

  5. How do I copy files to/from these computers??

    Use "scp". You can run this remotely on our computers, or on your client computer (Cygwin prompt, or graphical SCP client). Example usage:

       Use 'scp' to copy files:
          scp USER@HOST:/path/to/file .
    
       or directories:
          scp -r USER@HOST:/path/to/dir .
    

    Type 'man scp' for more, or Google that string.

    If you want to regularly copy or back-up files from a particular directory, it is recommended you use 'rsync'. 'scp -r' will always re-copy and overwrite files, whereas 'rsync' will only copy files if they are different. Here's an example usage. From your home computer, or Cygwin prompt:

        rsync -avz USERNAME@luna1.diviner.ucla.edu:/path/to/dir .
    
    Unlike 'scp', an 'rsync' command can be restarted as necessary without losing progress. See 'man rsync' for more.

  6. Is my data being backed up?

    Your data is NOT being backed up. While key components of the SOC (like source code, configuration, small home directory files) are backed up regularly, we do not have the capacity to back up most data.

    As with any account you own, anywhere, you should always make sure your important small files are copied to other computers/accounts you own and can safely protect. Be redundant. Put them on multiple safe computers.

        You are responsible for your own data

    What sorts of files should you backup?

        - Source code
        - Configuration files
        - Input files
        - Makefiles
        - Scripts
        - Documentation you wrote
        - Canonical information
        - Anything needed to regenerate your giant datasets.
    

    How do you do backups?

    Multiple programs will help you do this. You can use 'scp' as specified above. 'rsync' is also a good program for this as it can backup files incrementally, resuming where it left off if it got disconnected or something went down. 'rsync' may already be on your computer in a shell, terminal, Cygwin prompt, or the like. From your client computer, you can backup a whole directory like this:

        rsync -avz USERNAME@luna1.diviner.ucla.edu:/path/to/directory .
    
       (Note the . at the end, to copy it to your current directory)
    

  7. Where's the Diviner data?

    Our data can be accessed by logging into your Linux computer accounts (luna1 and the div5 cluster) and via the web.

    Login to Linux computer accounts:

    Data Location Comments
    Level 0 EDR, 1-hour ASCII text
    /q/marks/feidata/DIV:opsEdr/data
    ASCII data files are named *.TAB, and labels are *.LBL
    Level 1A, 1-hour ASCII text
    /q/marks/feidata/DIV:opsL1A/data
    ASCII data files are named *.TAB, and labels are *.LBL
    Level 1b RDR, 1-hour ASCII text
    /q/marks/feidata/DIV:opsRdr/data
    ASCII data files are named *.TAB.zip (use "unzip" to view), and labels are *.LBL
    Level 1b RDR, 1-hour binary format, split and sorted by latitude and channel
    /q/divdata/ees/e[1-9]*/marks (via the "divdata" program)
    
    Use the program divdata to pipe Diviner data into pipes programs.
    "rdrs" - EDR/L1A/RDR plus Spice Geometry, combined into individual records. 1-hour pipes-compatible binary format.
    /luna5/marks/rdrs_data
    
    readme_rdrs.txt
    Level 2 and 3 GDR, gridded data records
    /q/marks/gdr_db (via the "divgdr" program)
    
    Use the program divgdr to pipe Diviner data into pipes programs.

    Web:

    Data Location Comments
    EDR and RDR documentation Diviner Data Docs Software Interface Specification (SIS) documents.
    All data, plus Data Viewer and Diviner Data Web Query UCLA Diviner Data
    (requires password)

    We have a new tool for displaying and constraining RDR data, the Diviner Data Web Query.

    You must login to your account to get the username and password. Login to luna1, and type:

        cat /u/paige/marks/team_info/diviner_data.txt
    
    Do not email or transmit insecurely the information you find in the above file! The data files are not meant for public consumption, yet. Email is insecure. Don't send anything sensitive over email.
    RDR Processing Status RDR Processing Status Shows the status of (re-)processing RDR data received from JPL, including calibration versions.


  8. Can I use the web server on luna1 to serve files from my home directory?

    Yes. Simply create a directory called "WWW" in your home directory and put your files there. The address to use is:

        http://luna1.diviner.ucla.edu/~USERNAME/
    
    (Note: Replace USERNAME with your username)
    By default, the above address will list all the files in your WWW directory. If you don't want that, create a file there called "index.html" to serve as the starting point.

    Note that your web pages will be visible to anyone. If you want to restrict access to your directory, you can password protect it by creating two files in that directory:

        .htaccess
        .htpasswd
    
    Note that they both start with a ".". How to make these files is well documented on the web. Here's one page that shows you how to do it:

    Password Protect a Directory with .htaccess


  9. How do I run graphical programs remotely?

    If your computer runs Linux, this should work pretty much automatically.

    For Windows, I recommend using Cygwin's "startx" to start an X-windows environment. Download and install Cygwin and make sure to install the "X11->xinit" package. From a Cygwin command prompt (preferably that of PuttyCyg, see above), type:

    	startx
    
    This should start an X-windows emulator window. Switch to that window, and in one of its terminals type:
    	ssh -Y USERNAME@luna1.diviner.ucla.edu
    
    The -Y switch is important! Once you have logged in this way, you can run graphical programs. Try 'xclock' as a simple test.


  10. What compilers do we have on this cluster?

    We have the gnu compilers 'gcc' (for C), 'g++' (for C++), and 'g77' (for Fortran 77), as well as 'ifort' (Intel Fortran 90) and icc (Intel C/C++).

    On div5, the cluster front-end, you can compile MPI jobs using mpicc (C), mpiCC (C++), and mpif77 (Fortran 77). These are built off the Intel compilers and use OpenMPI as their parallel environment.


  11. How can I view/process the binary datasets?

    The program divdata initiates a flow of binary RDR data into a series of programs called "Pipes" - please view the Pipes Documentation.


  12. What are the names of the compute nodes that run my jobs?

    The compute nodes are named like so:

    compute-0-4
    compute-0-5
    compute-0-6
    ...
    

    When you submit a job to the scheduler (see below), it will be assigned to a subset of these nodes.

    The compute nodes use the private IP address space 10.255.255.* , and are reachable only from div5 and from each other. The number of nodes varies as we add and delete them - see the file /etc/hosts for an idea of how many there are.


  13. Where can I compile, debug, and test jobs for the div5 Rocks cluster?

    When not using the Rocks queue to run jobs, you can do this sort of thing on one of the first few nodes. We have four machines set aside for this purpose that are not in any queue. They are:

    compute-0-0
    compute-0-1
    compute-0-2
    compute-0-3
    

    Login from div5 to any of these hosts to do your compiling, debugging, testing. Please do NOT do this sort of thing on the front-end (div5), as it will bog down the scheduler and reduce performance for everyone.


  14. What are the basic commands for submitting a job and checking its status?

    Most commands such as checking the status of your job and deleting a running job must be done on the cluster front-end div5. The lone exception to this is the command that we use to submit jobs. You can submit your jobs easily and with little setup by running the command "clusterit".

       clusterit COMMAND ARG1 ARG2 ...
    

    clusterit takes your command and arguments and produces a command script named COMMAND.cmd.[DATE-TIME info], then submits it to the "one" queue.

    Normally under a Rocks Cluster one creates their own job script containing job control language and submits it to a cluster using 'qsub'. You can still do this, but "clusterit" simplifies this process.

    "clusterit" only works for simple commands, not for multi-processor MPI jobs. It runs from either luna1 or div5 with appropriate messages. It preserves your environment, including your current working directory and your PATH. It tells you the expected name of your output file and how to check the queue status. (Note: If you somehow missed this message, the output file that contains any unredirected standard output/error will be named COMMAND.cmd.[DATE-TIME info].o#####, where #'s are numbers).

    To check the status of your running job, ssh to div5 and run the command:

        qstat
    

    Note: If you compile your own software and intend to run in on the cluster, it is a good idea to compile it on one of our interactive compute nodes. Please see the FAQ Item: "Where can I compile, debug, and test jobs for the div5 Rocks cluster?"


  15. Where can I find examples of how to submit MPI jobs to the cluster?

    In the directory /u/paige/marks/mpi_examples you will find instances of code that can be used as templates. There are examples for basic shell script, Perl, C, and f77.


  16. How do I see the overall status of the cluster, e.g. jobs, load, memory, usage, etc.?

    The Ganglia page on div5 provides a graphical view of the cluster status:

    div5 Cluster Status


  17. I am running an mpi job, but qstat won't show me which nodes it is running on, just the node where it started. How do I view this information?

    Use 'qstat -g t'.


  18. What other options can I specify to make my job run a certain way?

    See 'man qsub' for more information.

    For the best source of information on how to submit jobs and get the full use out of this cluster, check out the "N1 Grid Engine 6 Collection" link at the bottom of this page.


  19. How do I find out the amount of memory, swap space, and other similar characteristics of our compute nodes?

    Use the command 'qhost'. Without arguments, you will get the processor type, number of cpus, load, total and used memory, and total and used swap space. You can also get a longer listing by using the -F command. There are other options to tailor your output; see 'man qhost' for more information.


  20. When I run qstat, the state of my job is listed as "Eqw", and the job never gets run. What does this mean?

    "Eqw" means that an error occurred when the system tried to queue or run your job. There can be many possible reasons why this happened. One really obvious thing to check is: do you have write permission in the directory in which your job should be run, i.e. did you attempt to run the job in someone else's home directory? This will usually result in an Eqw.

    In any event, you should clean up after yourself by removing your Eqw jobs from the queue so that others won't see the clutter.


  21. How do I remove my jobs from the queue?

    By typing 'qstat', you will see a list of all jobs in the queue. To see just your own jobs, do 'qstat -u USERID'. Each job has a number at the beginning of the line. To delete a specific job, use 'qdel NUMBER'. To delete ALL your jobs in the queue, use 'qdel -u USERID'.

    Note: Sometimes jobs just won't delete. This usually happens if they are running (as opposed to waiting) when they are deleted. In this case you should figure out which node(s) they are running on (use qstat or 'qstat -g t' (for mpi jobs)), login to the nodes and "kill" your processes. If this still doesn't terminate the job, try using the "force" option: 'qdel -f NUMBER'. If that doesn't work, contact one of the administrators below and have them do it as root.


  22. Where can I find more information on the Rocks operating system, cluster documentation, user guides, etc?

    div5 Cluster Home Page - Includes cluster status and other links.

    Sun Grid Engine - Basic information on how to set up, submit, and monitor jobs.

    N1 Grid Engine 6 Collection - The most complete information, check out the User's Guide.


  23. Who do I contact if I need help?

    Mark Sullivan   marks@mars.ucla.edu