Storr Consulting - sc510-ada013


Home	The Company	Publications	Products	Links	Tips

Last update: 7 January 2005

READ / FIND / HISTOGRAM

What is better?

Very often, people asked whether it is better to use READ, READ(1), FIND, FIND(1), or HISTOGRAM. Kelly Jones, I know this ADABAS expert since the 80s, explained very well the differences.

ADABAS - as of 7.4.2 - does not store the number of ISNs for each value in the inverted list. In a Normal Index block ADABAS contains the value, count of ISNs in the block, and a list of ISNs - but this is not the total number of records with the same index value.

As a test I ran a HISTOGRAM(1) against a descriptor (single field) which returned the value 10,200 in *number. As reported by APAS, the first time I ran the program it issued 17 associator I/Os, a rerun a few minutes later it required 11 an immediate rerun required 0.

Based on discussions over the years, there is no Golden Rule when attempting to choose between READ(1), HISTOGRAM(1) and FIND NUMBER.

All commands will require I/O (at least logical) for

FCB
FDT
Upper Indices
First Normal Index block with an index-value >= that specified.

Note, any or all of these may be in memory and not actually require a physical I/O. The file I ran the test against is seldom used in our test database, so the first reference probably had to actually read every block - some of which remained in memory for a while.

Differences between the commands include some of the following.

A READ(1) and HISTOGRAM(1) will always be followed by an RC command. Both commands will establish entries in the table of sequential commands and an entry in the format pool, the entries are released by the RC command.
The HISTOGRAM(1) and FIND NUMBER will require an I/O for every N/I block containing 1 or more ISNs with the index value being returned. It is also possible that multiple Upper Index blocks will be read.
The READ(1) will require one (logical) I/O to the Address Converter and one to Data Storage.
The FIND NUMBER will require I/O to work if the number of ISNs is large.

Examples of when the different commands are useful include

When a lot of records probably exist for a specific value

   MOVE FALSE TO #RECORDS-EXIST-FLAG
   READ (1) EMPLOYEE BY MARITAL-STATUS FROM 'x' THRU 'x'
      MOVE TRUE TO #RECORDS-EXIST-FLAG
   END-READ

When doing an edit check using a unique key value

   FS. FIND NUMBER STATE-TABLE WITH STATE-CODE = 'xx'
   IF *NUMBER(FS.) = 0 REINPUT 'INVALID STATE CODE'

When looking for the 'next' value or a list of values

   HISTOGRAM (1) EMPLOYEE DESCENDING JOBCODE-SALARY-KEY
     DISPLAY 'HIGHEST SALARY FOR JOB-CODE' #job-code 'IS ' #salary ' 
             BEING PAID TO ' *number ' EMPLOYEES'
   END-HISTOGRAM

Jim Wisdom from Boston University added some outstanding statements to this issue:

Without knowing the specific data on any file, the question we are trying to answer is "Is record X on a file" not "Is record X on a file for which I may write the program 'look up' routine one way for file A but another way for file B". The READ solution should never require more than 7 physical I/Os. Any other Natural approach can lead to easily requiring more, depending on the data. How many programmers walk around knowing whether or not ISN lists span blocks? Or how these lookups work when the pointer you are looking for is NOT in the list? Since I don't want to be intrinsically involved in specific data records nor have the program work better in 2004 than it will in 2005, I have always concluded that the overall best approach is READ (1) or READ with ESCAPE.

READ / FIND / HISTOGRAM

What is better?

Top Page

Back to ADABAS Tips, Tricks, Techniques -- Overview