All,

Presenting "divdata", the program.

This is the new method by which we read and process the Diviner RDR
dataset at UCLA.

The old method involved doing something like this:

    cat [a bunch of files from the /d* disks] | PIPES COMMANDS

Now, we start our command like this:

    divdata ARGUMENTS | PIPES COMMANDS

Two examples:

    divdata daterange=2010040000,2010040400 c=7,7 clat=10,30 cloctime=6,18 | pprint > output.txt

    divdata daterange=200912 c=5,5 clat=70,90 tb=0,100 | pgetranges > ranges

(see the Usage statement below this email, or type "divdata"
 with no arguments)

You don't have to mess with "cat", references to the /d disks,
descriptor files, or pcons.   divdata takes care of all that.
Pipe divdata into your favorite pipes program: pprint, pbin3d,
pgetranges, etc.

This program is fast, anywhere from two to hundreds of times faster
than the old method.   The program takes your constraints
and matches them with a database of minimum and maximum
values for each hour in dataset.   How much performance
increase you get depends on how well you constrain your data.

You may see some differences in your data.   Since the
new dataset stores most numbers in four-byte floats
(as opposed to eight-byte doubles), sometimes a number
may be off 0.001 from the previous dataset.   This
is expected.   If you see much greater difference
than this, please let me know.

There may be bugs in this program.   Dave has advocated
having you, the Science Team, be our loyal beta testers.
Scrutinize your output.   Do you get the same results as before?
It is vital that we uncover any bugs so I can fix them.

It is very important that you compare your results
with that of the previous method.   To this end,
I've created a script that compares the
ranges and number of data points using the
old and new methods.  This script is:

    /u/marks/c38/rel/comp2

That creates two ranges files, ran.1 and ran.2. 
I then "diff ran.1 ran.2" to see if anything changed,
above the precision difference we expect.
You can use this as a template or do your own thing.

Other things:

* The daterange=MIN,MAX argument is flexible.  You can use a month
  (YYYYMM), a day (YYYYMMDD), or an hour (YYYYMMDDHH).
  See the usage statement below this email.

* There is a debugging mode.   Use "debug=1" to get more info
  on what the program is doing.  You can even "debug=2" to
  get extra verbosity.

* It is useful to sometimes run it in "noindex" mode (see below),
  as this bypasses all the min,max checking and gives you
  all the files in your date range.  This slows the program down,
  but it is a useful sanity check if you think you are not
  getting all the data you requested.

* I've yet to incorporate this into the Web Query Tool, so
  that thing still runs pretty slow by comparison.
  Will do this fairly soon.

* divdata fields    ---> Shows you all the fields you can constrain upon.

Scrutinize your output, make sure you get the same number of data
points and numbers as before, and send me any bugs you find.

-Mark Sullivan

-------------------------------------------------------------------
THE MANUAL
-------------------------------------------------------------------

Quick info:

divdata      (No arguments, prints this usage statement)

divdata [type=datatype] fields   (to just print out selectable fields)

----

Piping data into other pipes commands:

divdata [type=datatype] [noindex] daterange=BEGIN,END [clat=MIN,MAX] [c=MIN,MAX]
           [FIELD=MIN,MAX FIELD=MIN,MAX ...] | PIPES_COMMANDS ...

   BEGIN and END for daterange can be the following format:
       YYYYMM     - A month, gets you all the days in that month.
       YYYYMMDD   - A day,   gets you all the hours in that day.
       YYYYMMDDHH - An hour, gets you all the minutes in that hour.
   If BEGIN and END are equal, e.g. 200907, you can just use: daterange=200907
   Multiple daterange=BEGIN,END arguments specify disjoint times.

   Other fields (except for 'c', only one instance of each may be allowed):

   clat=MIN,MAX  - Center latitude of observation, greatly improves performance

   c=MIN,MAX     - Channel number, greatly improves performance
                   You can specify multiple arguments for this, e.g.:
                       c=1,1 c=5,6 c=8,9
                   but don't mix inclusive (MIN<=MAX) with exclusive (MIN>MAX)

   FIELD=MIN,MAX - Any other FIELD in the dataset,
                       moderately improves performance

   type=DATATYPE - Output data format.  Default is 'div38'.

   noindex - Do not use indexing to match data constraints.
             Significantly SLOWS DOWN your data access.
             Use only for debugging, a sanity check to make
             sure you are getting all your data.   Using this
             option *should* not alter your results except in
             terms of speed.  Let us know otherwise.

   nodel - Do not delete the catfile this program creates.

   debug=N - Debug level where N is one of:
               0 - Normal, only high level messages.
               1 - Detailed
               2 - Extra detailed.

             All debugging messages are printed to standard error.

A note on constraints:

   When specifying a MIN,MAX, if MIN<=MAX, you get all the data
   between MIN and MAX, inclusively.   If MIN>MAX, you get all
   the data OUTSIDE of the inclusive MIN,MAX range.   Examples:

   clat=-70,50 - All latitudes between -70 and 50, inclusively.
   clat=50,-70 - (-90,-70.0000000001) + (50.0000000001,90)

   c=3,3 - Channel 3 only.
   c=3,5 - Channels 3,4,5
   c=5,3 - Channels 1,2,6,7,8,9