All, Presenting "divdata", the program. This is the new method by which we read and process the Diviner RDR dataset at UCLA. The old method involved doing something like this: cat [a bunch of files from the /d* disks] | PIPES COMMANDS Now, we start our command like this: divdata ARGUMENTS | PIPES COMMANDS Two examples: divdata daterange=2010040000,2010040400 c=7,7 clat=10,30 cloctime=6,18 | pprint > output.txt divdata daterange=200912 c=5,5 clat=70,90 tb=0,100 | pgetranges > ranges (see the Usage statement below this email, or type "divdata" with no arguments) You don't have to mess with "cat", references to the /d disks, descriptor files, or pcons. divdata takes care of all that. Pipe divdata into your favorite pipes program: pprint, pbin3d, pgetranges, etc. This program is fast, anywhere from two to hundreds of times faster than the old method. The program takes your constraints and matches them with a database of minimum and maximum values for each hour in dataset. How much performance increase you get depends on how well you constrain your data. You may see some differences in your data. Since the new dataset stores most numbers in four-byte floats (as opposed to eight-byte doubles), sometimes a number may be off 0.001 from the previous dataset. This is expected. If you see much greater difference than this, please let me know. There may be bugs in this program. Dave has advocated having you, the Science Team, be our loyal beta testers. Scrutinize your output. Do you get the same results as before? It is vital that we uncover any bugs so I can fix them. It is very important that you compare your results with that of the previous method. To this end, I've created a script that compares the ranges and number of data points using the old and new methods. This script is: /u/marks/c38/rel/comp2 That creates two ranges files, ran.1 and ran.2. I then "diff ran.1 ran.2" to see if anything changed, above the precision difference we expect. You can use this as a template or do your own thing. Other things: * The daterange=MIN,MAX argument is flexible. You can use a month (YYYYMM), a day (YYYYMMDD), or an hour (YYYYMMDDHH). See the usage statement below this email. * There is a debugging mode. Use "debug=1" to get more info on what the program is doing. You can even "debug=2" to get extra verbosity. * It is useful to sometimes run it in "noindex" mode (see below), as this bypasses all the min,max checking and gives you all the files in your date range. This slows the program down, but it is a useful sanity check if you think you are not getting all the data you requested. * I've yet to incorporate this into the Web Query Tool, so that thing still runs pretty slow by comparison. Will do this fairly soon. * divdata fields ---> Shows you all the fields you can constrain upon. Scrutinize your output, make sure you get the same number of data points and numbers as before, and send me any bugs you find. -Mark Sullivan ------------------------------------------------------------------- THE MANUAL ------------------------------------------------------------------- Quick info: divdata (No arguments, prints this usage statement) divdata [type=datatype] fields (to just print out selectable fields) ---- Piping data into other pipes commands: divdata [type=datatype] [noindex] daterange=BEGIN,END [clat=MIN,MAX] [c=MIN,MAX] [FIELD=MIN,MAX FIELD=MIN,MAX ...] | PIPES_COMMANDS ... BEGIN and END for daterange can be the following format: YYYYMM - A month, gets you all the days in that month. YYYYMMDD - A day, gets you all the hours in that day. YYYYMMDDHH - An hour, gets you all the minutes in that hour. If BEGIN and END are equal, e.g. 200907, you can just use: daterange=200907 Multiple daterange=BEGIN,END arguments specify disjoint times. Other fields (except for 'c', only one instance of each may be allowed): clat=MIN,MAX - Center latitude of observation, greatly improves performance c=MIN,MAX - Channel number, greatly improves performance You can specify multiple arguments for this, e.g.: c=1,1 c=5,6 c=8,9 but don't mix inclusive (MIN<=MAX) with exclusive (MIN>MAX) FIELD=MIN,MAX - Any other FIELD in the dataset, moderately improves performance type=DATATYPE - Output data format. Default is 'div38'. noindex - Do not use indexing to match data constraints. Significantly SLOWS DOWN your data access. Use only for debugging, a sanity check to make sure you are getting all your data. Using this option *should* not alter your results except in terms of speed. Let us know otherwise. nodel - Do not delete the catfile this program creates. debug=N - Debug level where N is one of: 0 - Normal, only high level messages. 1 - Detailed 2 - Extra detailed. All debugging messages are printed to standard error. A note on constraints: When specifying a MIN,MAX, if MIN<=MAX, you get all the data between MIN and MAX, inclusively. If MIN>MAX, you get all the data OUTSIDE of the inclusive MIN,MAX range. Examples: clat=-70,50 - All latitudes between -70 and 50, inclusively. clat=50,-70 - (-90,-70.0000000001) + (50.0000000001,90) c=3,3 - Channel 3 only. c=3,5 - Channels 3,4,5 c=5,3 - Channels 1,2,6,7,8,9