Home The Company Publications Products Links Tips Jobs

Compression

ADABAS, DB2 Viper, Oracle, MS SQL Server

By Dieter W. Storr

Last update: 6 June 2007

General

Different databases compress their records in different ways and with different success. But ADABAS is top. The ADABAS values are not based on the same files/tables:
Databases      Compression Rates
---------      -----------------
ADABAS:        60% - 80% (92.5%)
Oracle:        18% - 38%
DB2 Viper:     58% - 60%
MS SQL Server: none 

ADABAS

Data Compression

Fields (columns) in an ADABAS file (table) are normally compressed. ADABAS eliminates all trailing blanks from alphanumeric fields and all leading zeros in numeric fields. As a result, unpacked fields are packed, multiple empty fields within a record are compressed, and all empty fields at the end of a record are completely suppressed.

ADABAS automatically compresses the data on field and record level.

  • Trailing blanks are removed for fields defined with A format.
  • Leading zeros are removed for numeric fields (fields defined with B, F, P or U format).
  • If the field is defined with U (unpacked) format, the value is converted to packed (P) format.
  • Trailing zeros in floating-point (G format) fields are removed.
  • If the field is defined with the NU option and the value is a null value, a one-byte indicator is stored. Hexadecimal 'C1' indicates one empty field follows, 'C2' indicates that two empty fields follow, and so on, up to a maximum of 63 before the indicator byte is repeated. For SQL null value (NC option field) compression, see Representing SQL Null Values in the COMPRESS Essential Data Definition Syntax section.
  • Empty fields located at the end of the record are not stored, and therefore not compressed.
Source: ADABAS Design Manual (Software AG)

ADABAS compresses files between 60% and 80%. Depends on the field attributes, the compression rate can be more than 80%, for example files with many alpha, MU and PE fields.

I unloaded (ADAUNLD) a file/table with many fields/columns in one PE group, decompressed (ADACMP DECOMPRESS), and compressed (ADACMP COMPRESS).

The report of the compress process for this file showed:

COMPRESS Processing Statistics (data only -- not index):
--------------------------------------------------------

Number of records read              8,489,961
Number of incorrect records                 0
Number of Compressed records        8,489,961
  
Raw data                       47,678,957,988 bytes  
Compressed data                 3,776,461,073 bytes  
Compression rate                         92.5 %      
Largest Compressed Record               1,781 bytes  

To see the entire output of the utilties, please click here.

Some data compression examples from other companies, published on SAG-L:

- Customer A = 60 %
- Customer B = 86.4% -- number of records = 3,981,371
                        largest compressed record = 292 bytes
- Customer C = 77.8% -- number of records = 2,999,582 
                        largest compressed record = 346 bytes
Index Compression

In addition to the standard index (ASSO) compression, ADABAS version 7 uses Forward Index Compression (parameter INDEXCOMPRESSION=YES). Within an index block, the part of the index value that is identical to the forward part of the previous index value is suppressed: for the Normal Index approx. up to 70%.

Oracle

Chris Eaton, Product Manager, IBM, writes:

Oracle of course has table compression (and has had it for a while now). However, there is a big difference in the implementation which appears to result in big differences in the compression rates. In Oracle, compression occurs at the data block (page) level. So as you use the bulk loader, Oracle will look for repeating column values on the page and replace those values with a key. The key is stored in the page header and a pointer is stored in the row on that page. The problem with this method is that not only is it not looking for larger patterns that may occur across page boundaries but if there is a very common string, it is stored multiple times in every page header that contains that row value.

TPC-H is a well-known data warehouse benchmark in which each vendor uses the same tables and same data. Oracle published a paper at the VLDB conference in 2003 on their compression capabilities. In this paper they showed compression rates for TPC-H tables.

Lineitem table:  38% compression
Orders table:    18% compression
Entire database: 29% compression

More on Oracle's Web site

More on Ordix News 1/2007 about Oracle compression.
More on Ordix News 3/2006 about Oracle compression.
Here is an abstract and a translation from German to English:
"Oracle knows Index and Data Compression. Both are based on a table of values in the block header. It really looks like the forward compression of ADABAS. Regarding the data, multiple attributes (fields) might be replaced and this can be very costly. ORDIX measured during their Oracle data compression benchmarks an increase of nearly four times for both CPU and PGA memory." And as a user wrote, he is using the forward index compression over the data compression because of its really efficient algorithm regarding performance and space.

DB2 Viper

Chris Eaton, Product Manager, IBM, writes:

In DB2 Viper, we look at the entire table to determine the repeated patterns, store them once, and then compress those values out of each page. In addition, DB2 will honor compression keys during insert and load while only the direct path loader in Oracle supports compression. Do any of these internal differences matter? They wouldn't if they produced the same results. But from publicly available compression results there is a big difference. Oracle license terms do not allow anyone to publish benchmarks without their prior consent so I can't tell you the differences that other customers are seeing. But I can tell you what Oracle published themselves and how that compares to DB2 Viper.

TPC-H is a well-known data warehouse benchmark in which each vendor uses the same tables and same data. Oracle published a paper at the VLDB conference in 2003 on their compression capabilities. In this paper they showed compression rates for TPC-H tables. IBM ran the same tests on the same tables and here are the comparison results:

Lineitem table:  58% compression
Orders table:    60% compression
Entire database: 59% compression

Microsoft SQL Server

Chris Eaton, Product Manager, IBM, writes:

Compression in Microsoft SQL Server doesn't exist (at least not natively). In fact, according to support article, placing a SQL Server database on compressed volumes is not recommended and not supported. I have seen some references to using .Net compression libraries to call out from the database to some application code to do some compression but I don't think that's the same as DB2, Oracle or any other vendor's table compression.

To read the entire BLOG about compression of Oracle, DB2 Viper and MS SQL Server, click here

Top Page


Back to ADABAS Tips, Tricks, Techniques -- Overview