Tivoli Blog

  • Home
    Home This is where you can find all the blog posts throughout the site.
  • Categories
    Categories Displays a list of categories from this blog.
  • Tags
    Tags Displays a list of tags that have been used in the blog.
  • Bloggers
    Bloggers Search for your favorite blogger from this site.
  • Team Blogs
    Team Blogs Find your favorite team blogs here.

BACKUP DB – DEDUPDEVice=YES (and NUMStreams, and COMPress)

Posted by on in Storage Management
  • Font size: Larger Smaller
  • Hits: 395
  • 0 Comments
  • Subscribe to this entry
  • Print

Back in TSM v6.2 and earlier versions, backing up the TSM database (DB) required just 2 things to think about: the device class to be used and the type of backup you wanted. Of course, you can still use this, but since TSM v6.3 different methods were introduced to handle the increased TSM DB backup size. Typically, the size of the TSM DB will increase noticeably when using TSM native deduplication (client-side or server-side). All that metadata about all those chunks, pointers, dereferenced chunks, etc., etc. eventually needs to be stored somewhere. The increased TSM DB size also increased the space requirements for the DBB’s.

Two of the methods introduced:

NUMStreams (v6.3.0)

Introduced in v6.3.0, and named “Multistream database backup and restore processing”. The NUMStreams specifies the number of parallel data movement streams to use when you backup the DB. The default is 1, the maximum is 4. This will not reduce the space requirements for the DBB’s, but it might improve the overall time a DBB take. Typically you should only use this if your TSM DB is big enough. Meaning: multistream backups will not win you that much time with a small TSM DB, but the concurrent data stream will cause you to lose volumes which are not fully utilized. As always, there is a tradeoff to consider.

 

COMPress (v7.1.1)

Introduced just recently in TSM v7.1.1 and named “Compress database backups”. In addition, there is also new functionality named “Compress archive logs”. The default is NO. As expected, COMPRess specifies whether volumes that are created by the BACKUP DB command are compressed, and it will be valid for all types of database backups (full, incremental, snapshot). This will reduce the space requirements, but performance might degrade because of the overhead for compression: the time to perform the DBB might increase. If you’re backing up to virtual or physical tape: skip this and leave compression up to the tape drive.

SET DBRECOVery can be used to set the values of NUMStreams and/or COMPress to be used for automatic backups. It cannot be used to set the value of DEDUPDEVice for automatic backups.

The remainder of this article is not about NUMStreams or COMPRess, but it is about DEDUPDEVice.

DEDUPDEVice (v6.3.2)

Introduced in v6.3.2. Use this parameter to specify that the underlying device of a target storagepool supports hardware data deduplication. Most likely, this will be a virtual tape library (VTL). When set to YES, the format for backup images is optimized for data deduplication devices, making backup operations more efficient.

Now what exactly is happening when you specify that you want your TSM DBB to be written in a better “deduplicatable” form? The documentation states: “….the format for backup images is optimized for data deduplication devices, making backup operations more efficient”. This sounds very generic (backup images, backup operations) but it is only applicable to the TSM DBB itself.

This parameter is also recommended in several ProtecTIER publications (“Harnessing the power of ProtecTIER and TSM” and “IBM ProtecTIER Implementation and Best Practices Guide”):

“Use the parameter DEDUPDEVice=YES for backing up TSM Database to the sequential access device class created for the database backup using the ProtecTIER. This option specifies that a target storage device supports data deduplication. When set to YES, the format for backup images is optimized for data deduplication devices, making backup operations more efficient.”

So, now what is this “optimized” format, dedicated to data deduplication devices?

Please be aware that DEDUPDEVice is used to improve the deduplication factor, and that it will not necessarily improve the overall time of the DBB itself. The actual hardware deduplication for the DBB data stream will be faster because of the optimized format. The truth is that this is not a TSM feature, but merely a DB2 functionality, which is exploited by TSM. IBM introduced the DEDUP_DEVICE option for the BACKUP DATABASE command in DB2 v9.7, FixPack 3, and improved its behavior in FixPack 4. Beginning with TSM v6.3.2 the bundled DB2 version included the DEDUP_DEVICE option.

“The primary reason for the introduction of the DEDUP_DEVICE option in DB2 v9.7, FixPack 3, was to optimize DB2 backup images for deduplication devices and to simplify the backup operation when such devices are used as a target for DB2 backup operations….”

“The use of the DEDUP_DEVICE on the backup invocation will result in a backup image is that optimized for data deduplication devices.”

Sounds very similar to the explanation in the TSM documentation. More interesting DB2 information about this can be found here:
Optimizing Backup Images for Data Deduplication Devices
and, with some overlap on IBM developerWorks, here:
Integrated support for data deduplication devices in DB2 for Linux, UNIX, and Windows

I will now try to reuse portions of these links to explain what is happening in a nutshell. For a regular (BACKUP DB with DEDUPDEVice=NO – the default) DB2 backup:

“….normally data retrieved by buffer manipulator db2bm threads is read and multiplexed across all of the output streams being used by the media controller db2med threads.”

In the case when the DEDUP_DEVICE function of DB2 is used (BACKUP DB with DEDUPDEVice=YES):

“….data retrieved by the buffer manipulator db2bm threads is no longer read and multiplexed across the output streams…..all of that table space’s data is send to one, and only one, output stream….”

This will result in the optimized format that deduplication devices just love.

tsmblog_dedupSource: http://ibmdatamag.com/2012/03/optimizing-backup-images-for-data-deduplication-devices/

“All data for a particular table space is always written in table space page order, from lowest to highest. This predictable and deterministic pattern of the data in each output stream makes it easy for a deduplication device to identify chunks of data that have been backed up previously.”

In the first link there are some tips to increase the deduplication ratio of the back-end deduplication device, when not running TSM v6.3.2 (that is, an earlier version of the underlying DB2 v.9.7 fixpack 3). But this might not be useable at all, at this is all outside of TSM. If you want to use the DEDUPDEVice functionality, upgrade your TSM server.

Set the DEDUPDEVice parameter to NO for physical tape libraries and VTL’s that don’t support hardware deduplication. Also, an interesting remark is that for all devices that are defined with a FILE device class, this parameter should be set to NO. This means that this should not be used for TSM native deduplication, which requires a filepool. Client-side or server-side TSM native deduplication is for data originating from the clients – it cannot be used for the DBB. An alternative to reduce the DBB size in this situation (when there is no VTL) would be the COMPress parameter in this case. You might use both: a DBB in the VTL – written in a form which is optimized for deduplication – and a local DBB, which is compressed. But I’m sure you can come up with some use case for this.

—- Tommy Hueber

0
Comments are not available for public users. Please login first to view / add comments.