Problem:
It has been growing very rapidly and I am trying to figure out what parts are growing fast.
Solution:
This is not an easy question to answer. With the tools that are available
computing an upper bound on the space that is utilitzed by table data is
possible. Getting an exact amount of space usage is not currently
possible, and getting an estimate usage of space is not practical to do.
I'll explain these facts in the next paragraphs.
When a record in a table is stored it is stored with references to a trail
to all of the back versions of the record. When a record is stored there
is also some RLE (run length encoding) of the record to save some space,
so the record is stored with "mild" compression be done to it. In terms
of the transaction being used one may be looking at the current record or
a back version of the record. If we take a very nice case where a backup
and restore has been done to the database, there are only current versions
of a record. To go through each record and record the space taken up by
the compressed record takes some time and expense (it is not an easy
global calculation), while this is a judgement call it is not likely that
there will be a return on an investment in calculating this disk usage.
Add the elements of doing UPDATEs and DELETEs which produce back versions
of a record and READ_COMMITTED and REPEATABLE_READ transactions, this
space usage calculation becomes much more complex.
When a record is stored on a data page it may be occupying only a portion
of a page, or if it is large may spill over onto a second or third page
and may occupy all or a portion of these pages. A record may also be
placed on a page that contains data belonging to another record. If a
record has back versions (or delta records) these records versions may be
stored on data pages that contain data from other records. A record back
version can also spill over to other data pages as well.
In maintaining data pages, each data page has some management overhead
space that will take away from some of the storage capacity of the page.
A restore typically fills each page only up to 80% capacity to leave some
space on the page for UPDATEs and DELETEs and for delta records.
There is one final caveat in this situation on how data pages are used,
this is when new data is INSERTed. New records are inserted first into
empty spaces on existing data pages and then unallocated pages are
converted to data pages before new data pages are allocated.
As seen from the last paragraphs, obtaining the exact space usage by a
table is next to impossible to calculate. What is possible is to obtain
an upper bound for the space that is occupied. To get this data requires
running the gstat command line utility, or by running the Database
Statistics process in the Server Manager Utility. When running a Database
Statistics go the the View menu and choose the Database Analysis choice.
The data given here will show the number of data pages currently occupied
per table, and the average fill percentage. When the average fill
percentage starts to get relatively low with regard to the number of data
pages used, this can used as a sign that it is time to backup and restore
the database. Same applies to indexes.
Some other things that can be done is to check that the OIT (oldest
interesting transaction) is moving forward, or table data may be
proliferating back versions of records. Another tactic that can be done
if appropriate is to use the GBAK command line executable, there is a
"use_all_space" parameter that will cause the data pages to be restored
with a 100% fill capacity (if possible).
Posted below is an excerpted portion of the Database Statistics to look
at, here are the sections of the output that can be used to determine the
upper bound of the data pages used by a table, and the index pages used by
an index:
EMPLOYEE (131)
Primary pointer page: 270, Index root page: 271
Data pages: 5, data page slots: 5, average fill: 71%
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 4
80 - 99% = 0
Index NAMEX (3)
Depth: 1, leaf buckets: 1, nodes: 42
Average data length: 15.00, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 1
Index RDB$FOREIGN8 (1)
Depth: 1, leaf buckets: 1, nodes: 42
Average data length: 0.00, total dup: 23, max dup: 4
Fill distribution:
0 - 19% = 0
20 - 39% = 1
40 - 59% = 0
60 - 79% = 0
80 - 99% = 0
Index RDB$FOREIGN9 (2)
Depth: 1, leaf buckets: 1, nodes: 42
Average data length: 6.00, total dup: 15, max dup: 4
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 0
80 - 99% = 0
Index RDB$PRIMARY7 (0)
Depth: 1, leaf buckets: 1, nodes: 42
Average data length: 1.00, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 0
20 - 39% = 1
40 - 59% = 0
60 - 79% = 0
80 - 99% = 0
Connect with Us