Essbase® Analytic Services Database Administrator's Guide | | Update Contents | Previous | Next | Print | ? | |
Information Map | |
This chapter explains how Analytic Services stores and compresses data on disk.
This chapter includes the following sections:
Analytic Services uses a data file to store data blocks. By default, a data file is located in its associated database folder. Data files follow the naming convention ess
n.pag
, where n is greater than or equal to one and less than or equal to 65,535.
Analytic Services uses an index file to store the index for a database. By default, an index file is located in its associated database folder. Index files follow the naming convention ess
n
.ind
, where n is greater than or equal to one and less than or equal to 65,535.
Analytic Services automatically allocates storage for data and index files. You can use disk volumes to control how storage is allocated for these files.
To specify disk volumes so that you control how storage is allocated, use this procedure:
To view index file (
.ind
file) and data file (.pag
file) names, counts, sizes, and totals, and to determine whether each file is presently open in Analytic Services, use either of the following methods:
Tool |
Topic |
Location |
---|---|---|
Note: The file size information that is provided by the Windows NT operating system for index and data files that reside on NTFS volumes may not be accurate. The file size information provided by Administration Services and by LISTFILES is accurate.
Use disk volumes to specify where you want to store Analytic Services index files (ess
n
.ind
) and data files (ess
n
.pag
). If you do not use the disk volumes setting, Analytic Services stores data only on the volume where the ARBORPATH directory resides. If the ARBORPATH variable is not set, Analytic Services stores data only on the volume where the server was started. For instructions on setting ARBORPATH, see the Essbase Analytic Services Installation Guide.
Note: For information about how to check the size of the index and data files, see Checking Index and Data File Sizes.
You can specify disk volumes using Administration Services, MaxL, or ESSCMD. When you use disk volumes, Analytic Services provides the following options for each volume:
ess00001.ind
is filled to maximum size, Analytic Services creates ess00002.ind
.Caution: If you specify a volume name but not a volume size, Analytic Services uses all available space on the volume.
Analytic Services creates new data files and index files in either of these situations:
For example, suppose you want to use up to 12 GB for Analytic Services files on volume E, 16 GB on volume F, and 16 GB on volume G. Analytic Services creates a new file on volume F when the sizes of the index and data files reach 12 GB on volume E and more data needs to be written out to disk.
On volume G, Analytic Services creates file ess00001.ind
and fills it to the default limit of 2 GB. On volume G, Analytic Services creates file ess00001.pag
and fills it to 1 GB.
You have specified a limit of 16 GB on volume G, and you have used 3 GB. You have 13 GB left to use on volume g, but ess00001.ind
has reached the maximum file size of 2 GB. The next time Analytic Services needs storage space when writing index files to disk, Analytic Services creates a new file on volume G and names it ess00002.ind
. Analytic Services then fills ess00002.ind
to its 2 GB limit and creates ess00003.ind
. Analytic Services follows the same procedures for data files.
Figure 230: Example of How Analytic Services Stores Files Across Volumes
Analytic Services names files consecutively, starting with ess00001.
xxx
, where xxx is ind
for an index file and PAG
for a data file, and continuing up to ess65535.
xxx
. This naming convention applies to each volume, so in the above example, volumes E, F, and G each have files named ess00001.pag
and ess00001.ind
.
Keep in mind the following guidelines when specifying disk volumes:
To specify disk volumes with Administration Services, see "Setting Disk Volumes" in Essbase Administration Services Online Help.
To allocate a new volume, enter
SETDBSTATEITEM
23
in ESSCMD and either follow the prompts or supply the required values on the command line.
ESSCMD prompts you for the number of new disk volumes you want to add, unless you supply the number on the command line.
Then, for each new volume, ESSCMD prompts you for the following values, unless you supply them on the command line.
When you use ESSCMD, you can specify volume size in bytes (B), kilobytes (K), megabytes (M), gigabytes (G), or terabytes (T). ESSCMD displays minimum, maximum, and current values and 0 for unlimited.
When you use ESSCMD, you can specify file size in bytes (B), kilobytes (K), megabytes (M), or gigabytes (G). ESSCMD displays minimum, maximum, and current values.
The following example allocates up to 10 gigabytes on Volume E, sets a maximum file size of 2 gigabytes, and specifies that data files should be stored only on E:
SETDBSTATEITEM 23 "SAMPLE" "BASIC" "1" "E" "10G" "2" "2G"
To change the settings on an allocated volume, enter SETDBSTATEITEM 24 in ESSCMD and either follow the prompts or supply the required values on the command line.
ESSCMD prompts you for the following values, unless you supply them on the command line:
The following example allocates up to 20 gigabytes on Volume C and sets a maximum file size of 2 gigabytes:
SETDBSTATEITEM 24 "SAMPLE" "BASIC" "1" "C" "20G" "3" "2G"
To stop Analytic Services from storing additional files on a volume, enter SETDBSTATEITEM 25 in ESSCMD and either follow the prompts or supply the required values on the command line. Analytic Services continues accessing files on the deallocated volume, but does not write new files to it.
ESSCMD prompts you for the following value, unless you supply it on the command line-Delete which volume definition. Use the GETDBSTATE command in ESSCMD to see a list of the currently defined disk volumes and to see the number assigned to each volume.
The following example deallocates the volume that is specified as fourth:
SETDBSTATEITEM 25 "SAMPLE" "BASIC" "4"
Note: If you delete an application or database, Analytic Services does not remove the directory containing the application or database on a disk volume. The computer's operating system still shows the folder and file labels on the disk. However, you can reuse the same name of the application or database that you had removed on the disk volume.
For more syntax information, see the Technical Reference.
On UNIX, volume_name is a mounted UNIX file system. You must enter a fully qualified path name up to the name of the directory you are using for Analytic Services. Analytic Services automatically appends the \app
directory to the path; you do not specify the \app
directory.
Consider the following examples:
/vol2/essbase 10M
Volume size is the maximum space, in kilobytes, allocated to the volume. The default value is unlimited-Analytic Services uses all available space on that volume.
Assume you want to use up to 20 GB for Analytic Services files on Volume E, 25 GB on Volume F, and 25 GB on Volume G. You are using the default file size limit of 2 GB.
When you load data, Analytic Services stores up to 20 GB on Volume E; if the database is larger than 20 GB, Analytic Services stores the next 25 GB on Volume F, and so on.
Figure 231: Example of Using Disk Volumes
Analytic Services allows you to choose whether or not data blocks that are stored on disk are compressed, as well as which compression scheme to use. When data compression is enabled, Analytic Services compresses data blocks when it writes them out to disk. Analytic Services fully expands the compressed data blocks, including empty cells, when the blocks are swapped into the data cache.
Generally, data compression optimizes storage use. You can check compression efficiency by checking the compression ratio statistic. See Checking the Compression Ratio for a review of methods.
Analytic Services provides several options for data compression:
Because Analytic Services compresses data blocks as they are written to disk, it is possible for bitmap, RLE, and uncompressed data blocks to coexist in the same data file. Keep in mind the following rules:
You may want to disable data compression if blocks have very high density (90% or greater) and have few consecutive, repeating data values. Under these conditions, enabling compression consumes resources unnecessarily.
With bitmap compression, Analytic Services uses a bitmap to represent data cells, and stores only the bitmap, the block header, and the other control information. A bitmap uses one bit for each cell in the data block, whether the cell value is missing or non-missing. When a data block is not compressed, Analytic Services uses 8 bytes to store every non-missing cell.
When using bitmap compression, Analytic Services stores only non-missing values and does not compress repetitive values or zeros (contrast with RLE compression, described in RLE Data Compression). When Analytic Services pages a data block into the data cache, it fully expands the data block, using the bitmap to recreate the missing values.
Because the bitmap uses one bit for each cell in the data block, the bitmap scheme provides a fixed overhead for data compression. Figure 232 represents a portion of a data block, as an example. In this example, Analytic Services uses 64 bytes to store the data in the fully expanded block, but uses one byte (eight bits) to store the bitmap of the compressed data on disk. (Analytic Services also uses a 72-byte block header for each block, whether the block is compressed or not.)
Figure 232: Bitmap Data Compression
In most cases, bitmap compression conserves disk space more efficiently. However, much depends on the configuration of the data.
When using the run-length encoding (RLE) compression scheme, Analytic Services compresses any consecutive, repetitive values-any value that repeats three or more times consecutively, including zero. Analytic Services keeps track of each repeating value and the number of times it is repeated consecutively.
In the example in Figure 233, Analytic Services uses 64 bytes to store the data in the fully expanded block, but uses 56 bytes to store the compressed data on disk. (Analytic Services also uses a 72-byte block header for each block, whether the block is compressed or not.)
Figure 233: RLE Data Compression
This method is used in packages like PNG, Zip, and gzip. Calculation and data loading is faster with direct I/O and zlib compression than with buffered I/O and zlib compression. If data storage is your greatest limiting factor, use zlib, but be aware that, under some circumstances, data loads may be up to 10% slower than bitmap compression. On the other hand, the size of the database is generally significantly smaller when you use zlib compression.
In contrast to bitmap compression, which uses an algorithm to track which values are missing and does not interact with any other type of data, zlib compression builds a data dictionary based on the actual data being compressed (including any missing values). Therefore, zlib compression should provide greater compression ratios over bitmap compression given extremely dense data. However, because the effectiveness of the zlib algorithm is dependent (at the bit level) on the actual data being compressed, general guidelines about when zlib compression provides greater compression than bitmap compression based solely on density are not available. Unlike other compression methods, the storage space saved has little or no relationship to the number of missing cells or the number of contiguous cells of equal value. Generally, the more dense or heterogeneous the data is, the better zlib will compress it in comparison to bitmap or RLE compression. However, under some circumstances, it is possible that zlib will not yield better results than using bitmap or RLE compression. It is best to test with a representative sample of data.
To estimate the storage savings you may obtain with zlib, create a small database using your normal compression technique (bitmap or RLE) with a small sampling of real data and shut down Analytic Server. Note the size of the created data files. Then clear the data in the sample database, change the compression setting to zlib, reload the same sample data, and shut down Analytic Server again. Now note the difference in the storage used. You can also use the small sample database to estimate any changes in calculation or data loading speed.
Index Value Pair addresses compression on databases with larger block sizes, where the blocks are highly sparse. This compression algorithm is not selectable, but is automatically used whenever appropriate by the database. The user must still choose between the compression types None, bitmap, RLE, and zlib through Administration Services.
For example, if the user selects RLE, Analytic Services reviews each block and evaluates the following compression types for highest compression: RLE, bitmap, or Index Value Pair. On the other hand, if the user chooses zlib, for example, zlib is the only compression type applied.
The following table illustrates the available compression types the user can choose and the compression types that Analytic Services evaluates, then applies.
Chosen Compression Type |
Evaluated Compression Type |
---|---|
You can choose from four compression settings: bitmap (the default), RLE, zlib, or None.
In most cases, you need not worry about choosing a compression setting. Bitmap compression almost always provides the best combination of fast performance and small data files. However, much depends on the configuration of the data.
Data compression is CPU-intensive. You need to consider the trade-offs of computation costs versus I/O costs and disk space costs when determining which compression setting to use.
In general, a database compresses better using the RLE setting than the bitmap setting if a large number of repeated non-missing data cells for a given block have the same value. Using RLE compression is computationally more expensive than using bitmap compression. Although if your database shrinks significantly using RLE compression, you may actually see a performance improvement due to decreased I/O costs.
Most databases shrink when using zlib compression, but this is not always the case. Using zlib compression significantly increases the CPU processing. For most databases, this extra CPU processing outweighs the benefits of the decreased block size. But, if your database shrinks significantly using zlib compression, you may see a performance improvement due to decreased I/O costs.
The None compression setting does not reduce the disk usage of a database compared to bitmap compression. In fact, no compression may make no difference to improve the performance of the database because bitmap compression is extremely fast.
Remember that each database is unique and the previous statements are general characteristics of each compression type of which you can choose. Although the default bitmap compression works well for almost every database, the most reliable way to determine which compression setting is best for your database is to try out each one.
Changes to the data compression setting take effect immediately as Analytic Services writes data blocks to disk. For blocks already on disk, Analytic Services does not change compression schemes or enable or disable compression. When you change the data compression settings of blocks already on disk, Analytic Services uses the new compression scheme the next time Analytic Services accesses, updates, and stores the blocks.
To view or change the current settings, use any of the following methods:
Tool |
Topic |
Location |
---|---|---|
To enable or disable data compression, enter SETDBSTATEITEM 14 in ESSCMD and either follow the prompts or supply the required values on the command line.
ESSCMD prompts you for the following values, unless you supply them on the command line:
To specify the data compression type, enter SETDBSTATEITEM 15 in ESSCMD and either follow the prompts or supply the required values on the command line. ESSCMD prompts you for a value of "1" (run length encoding) or "2" (bitmap, the default).
The following example enables Bitmap compression:
SETDBSTATEITEM 14 "SAMPLE" "BASIC" "Y" "2"
For more syntax information, see the Technical Reference.
The compression ratio represents the ratio of the compressed block size (including overhead) to the uncompressed block size, regardless of the compression type in effect. Overhead is the space required by mechanisms that manage compression/expansion.
To check the compression ratio, use either of the following methods:
Tool |
Topic |
Location |
---|---|---|
Note: The larger the number, the more compression. The compression ratio can vary widely from block to block.
Data block size is determined by the amount of data in a particular combination of dense dimensions. For example, when you change the dense or sparse configuration of one or more dimensions in the database, the data block size changes. Data block size is 8n bytes, where n is the number of cells that exist for that combination of dense dimensions.
Note: Eight to 100 kilobytes is the optimum size range.
For information about and instructions for determining the size of a data block, see Size of Expanded Data Block.
To view the block size for a database, use either of the following methods:
Tool |
Topic |
Location |
---|---|---|
![]() |