SSIS

Tuesday, 9 August 2011

Top 10 SQL Server Integration Services Best Practices

Authors: ASHISH MISHRA
Technical Reviewers: Burzin Patel, Kevin Cox, Peter Carlin, Mark Souza, Richard Tkachuk, Len Wyatt, Lindsey Allen, Prem Mehra, Nicholas Dritsas, Carl Rabeler, Kun Cheng, Stuart Ozer, Donald Farmer

Published: 7/1/2011
Updated: 8/12/2011

How many of you have heard the myth that Microsoft® SQL Server® Integration Services (SSIS) does not scale? The first question we would ask in return is: “Does your system need to scale beyond 4.5 million sales transaction rows per second?” SQL Server Integration Services is a high performance Extract-Transform-Load (ETL) platform that scales to the most extreme environments. And as documented in SSIS ETL world record performance, SQL Server Integration Services can process at the scale of 4.5 million sales transaction rows per second.

	SSIS is an in-memory pipeline, so ensure that all transformations occur in memory. The purpose of having Integration Services within SQL Server features is to provide a flexible, robust pipeline that can efficiently perform row-by-row calculations and parse data all in memory. While the extract and load phases of the pipeline will touch disk (read and write respectively), the transformation itself should process in memory. If transformations spill to disk (for example with large sort operations), you will see a big performance degradation. Construct your packages to partition and filter data so that all transformations fit in memory. A great way to check if your packages are staying within memory is to review the SSIS performance counter Buffers spooled, which has an initial value of 0; above 0 is an indication that the engine has started swapping to disk. For more information, please refer to Something about SSIS Performance Counters.
	Plan for capacity by understanding resource utilization. SQL Server Integration Services is designed to process large amounts of data row by row in memory with high speed. Because of this, it is important to understand resource utilization, i.e., the CPU, memory, I/O, and network utilization of your packages. CPU Bound Seek to understand how much CPU is being used by Integration Services and how much CPU is being used overall by SQL Server while Integration Services is running. This latter point is especially important if you have SQL Server and SSIS on the same box, because if there is a resource contention between these two, it is SQL Server that will typically win – resulting in disk spilling from Integration Services, which slows transformation speed. The perfmon counter that is of primary interest to you is Process / % Processor Time (Total). Measure this counter for both sqlservr.exe and dtexec.exe. If SSIS is not able to drive close to 100% CPU load, this may be indicative of: Application contention: For example, SQL Server is taking on more processor resources, making them unavailable to SSIS. Hardware contention: A common scenario is that you have suboptimal disk I/O or not enough memory to handle the amount of data being processed. Design limitation: The design of your SSIS package is not making use of parallelism, and/or the package uses too many single-threaded tasks. Network Bound SSIS moves data as fast as your network is able to handle it. Because of this, it is important to understand your network topology and ensure that the path between your source and target have both low latency and high throughput. The following Network perfmon counters can help you tune your topology: Network Interface / Current Bandwidth: This counter provides an estimate of current bandwidth. Network Interface / Bytes Total / sec: The rate at which bytes are sent and received over each network adapter. Network Interface / Transfers/sec: Tells how many network transfers per second are occurring. If it is approaching 40,000 IOPs, then get another NIC card and use teaming between the NIC cards. These counters enable you to analyze how close you are to the maximum bandwidth of the system. Understanding this will allow you to plan capacity appropriately whether by using gigabit network adapters, increasing the number of NIC cards per server, or creating separate network addresses specifically for ETL traffic. I/O Bound If you ensure that Integration Services is minimally writing to disk, SSIS will only hit the disk when it reads from the source and writes to the target. But if your I/O is slow, reading and especially writing can create a bottleneck. Because tuning I/O is outside the scope of this technical note, please refer to Predeployment I/O Best Practices. Remember that an I/O system is not only specified by its size ( “I need 10 TB”) – but also by its sustainable speed (“I want 20,000 IOPs”). Memory bound A very important question that you need to answer when using Integration Services is: “How much memory does my package use?” The key counters for Integration Services and SQL Server are: Process / Private Bytes (DTEXEC.exe) – The amount of memory currently in use by Integration Services. This memory cannot be shared with other processes. Process / Working Set (DTEXEC.exe) – The total amount of allocated memory by Integration Services. SQL Server: Memory Manager / Total Server Memory: The total amount of memory allocated by SQL Server. Because SQL Server has another way to allocate memory using the AWE API, this counter is the best indicator of total memory used by SQL Server. To understand SQL Server memory allocations better, refer to Slava Ok’s Weblog. Memory / Page Reads / sec – Represents to total memory pressure on the system. If this consistently goes above 500, the system is under memory pressure.
	Baseline source system extract speed. Understand your source system and how fast you extract from it. After all, Integration Services cannot be tuned beyond the speed of your source – i.e., you cannot transform data faster than you can read it. Measure the speed of the source system by creating a very simple package reading data from your source with the a destination of “Row Count”: Execute the package from the command line (DTEXEC) and measure the time it took for it to complete its task. Use the Integration Services log output to get an accurate calculation of the time. You want to calculate rows per second: Rows / sec = Row Count / Time_{Data Flow} Based on this value, you now know the maximum number of rows per second you can read from the source – this is also the roof on how fast you can transform your data. To increase this Rows / sec calculation, you can do the following: Improve drivers and driver configurations: Make sure you are using the most up-to-date driver configurations for your network, data source, and disk I/O. Often the default network drivers on your server are not configured optimally for the network stack, which results in performance degradations when there are a high number of throughput requests. Note that for 64-bit systems, at design time you may be loading 32-bit drivers; ensure that at run time you are using 64-bit drivers. Start multiple connections: To overcome limitations of drivers, you can try to start multiple connections to your data source. As long as the source can handle many concurrent connections, you may see an increase in throughput if you start several extracts at once. If concurrency is causing locking or blocking issues, consider partitioning the source and having your packages read from different partitions to more evenly distribute the load. Use multiple NIC cards: If the network is your bottleneck and you’ve already ensured that you’re using gigabit network cards and routers, then a potential solution is to use multiple NIC cards per server. Note that you will have to be careful when you configure multiple NIC environments; otherwise you will have network conflicts.
	Optimize the SQL data source, lookup transformations, and destination. When you execute SQL statements within Integration Services (as noted in the above Data access mode dialog box), whether to read a source, to perform a look transformation, or to change tables, some standard optimizations significantly help performance: Use the NOLOCK or TABLOCK hints to remove locking overhead. To optimize memory usage, SELECT only the columns you actually need. If you SELECT all columns from a table (e.g., SELECT * FROM) you will needlessly use memory and bandwidth to store and retrieve columns that do not get used. . If possible, perform your datetime conversions at your source or target databases, as it is more expensive to perform within Integration Services.. In SQL Server 2008 Integration Services, there is a new feature of the shared lookup cache. When using parallel pipelines (see points #8 and #10 below), it provides a high-speed, shared cache. . If Integration Services and SQL Server run on the same server, use the SQL Server destination instead of the OLE DB destination to improve performance.. Commit size 0 is fastest on heap bulk targets, because only one transaction is committed. If you cannot use 0, use the highest possible value of commit size to reduce the overhead of multiple-batch writing. Commit size = 0 is a bad idea if inserting into a Btree – because all incoming rows must be sorted at once into the target Btree—and if your memory is limited, you are likely to spill. Batchsize = 0 is ideal for inserting into a heap. For an indexed destination, I recommend testing between 100,000 and 1,000,000 as batch size. Use a commit size of <5000 to avoid lock escalation when inserting; note that in SQL Server 2008 you can now enable/disable lock escalation at the object level, but use this wisely. Heap inserts are typically faster than using a clustered index. This means that you may want to drop indexes and rebuild if you are changing a large part of the destination table; you will want to test your inserts both by keeping indexes in place and by dropping all indexes and rebuilding to validate.. Use partitions and partition SWITCH command; i.e., load a work table that contains a single partition and SWITCH it in to the main table after you build the indexes and put the constraints on.. Another great reference from the SQL Performance team is Getting Optimal Performance with Integration Services Lookups.
	Tune your network. A key network property is the packet size of your connection. By default this value is set to 4,096 bytes. This means a new network package must be assemble for every 4 KB of data. As noted inSqlConnection.PacketSize Property in the .NET Framework Class Library, increasing the packet size will improve performance because fewer network read and write operations are required to transfer a large data set. If your system is transactional in nature, with many small data size read/writes, lowering the value will improve performance. Since Integration Services is all about moving large amounts of data, you want to minimize the network overhead. This means that the value 32K (32767) is the fastest option. While it is possible to configure the network packet size on a server level using sp_configure, you should not do this. The database administrator may have reasons to use a different server setting than 32K. Instead, override the server settings in the connection manager as illustrated below. Another network tuning technique is to use network affinity at the operating system level. At high throughputs, you can sometimes improve performance this way. For the network itself, you may want to work with your network specialists to enable jumbo frames to increase the default payload of 1,500 bytes to 9,000 bytes. By enabling jumbo frames, you will further decrease the amount of network operation required to move large data sets.
	Use data types – yes, back to data types! –wisely. Of all the points on this top 10 list, this is perhaps the most obvious. Yet, it is such an important point that it needs to be made separately. Follow these guidelines: Make data types as narrow as possible so you will allocate less memory for your transformation. Do not perform excessive casting of data types – it will only degrade performance. Match your data types to the source or destination and explicitly specify the necessary data type casting.. Watch precision issues when using the money, float, and decimal types. Also, be aware themoney is faster than decimal, and money has fewer precision considerations than float.
	Change the design. There are some things that Integration Services does well – and other tasks where using another tool is more efficient. Your tool choice should be based on what is most efficient and on a true understanding of the problem. To help with that choice, consider the following points: Do not sort within Integration Services unless it is absolutely necessary. In order to perform a sort, Integration Services allocates the memory space of the entire data set that needs to be transformed. If possible, presort the data before it goes into the pipeline. If you must sort data, try your best to sort only small data sets in the pipeline. Instead of using Integration Services for sorting, use an SQL statement with ORDER BY to sort large data sets in the database – mark the output as sorted by changing the Integration Services pipeline metadata on the data source. There are times where using Transact-SQL will be faster than processing the data in SSIS. As a general rule, any and all set-based operations will perform faster in Transact-SQL because the problem can be transformed into a relational (domain and tuple) algebra formulation that SQL Server is optimized to resolve. Also, the SQL Server optimizer will automatically apply high parallelism and memory management to the set-based operation – an operation you may have to perform yourself if you are using Integration Services. Typical set-based operations include: Set-based UPDATE statements - which are far more efficient than row-by-row OLE DB calls. Aggregation calculations such as GROUP BY and SUM. These are typically also calculated faster using Transact-SQL instead of in-memory calculations by a pipeline. Delta detection is the technique where you change existing rows in the target table instead of reloading the table. To perform delta detection, you can use a change detection mechanism such as the new SQL Server 2008 Change Data Capture (CDC) functionality. If such functionality is not available, you need to do the delta detection by comparing the source input with the target table. This can be a very costly operation requiring the maintenance of special indexes and checksums just for this purpose. Often, it is fastest to just reload the target table. A rule of thumb is that if the target table has changed by >10%, it is often faster to simply reload than to perform the logic of delta detection.
	Partition the problem. One of the main tenets of scalable computing is to partition problems into smaller, more manageable chunks. This allows you to more easily handle the size of the problem and make use of running parallel processes in order to solve the problem faster. For ETL designs, you will want to partition your source data into smaller chunks of equal size. This latter point is important because if you have chunks of different sizes, you will end up waiting for one process to complete its task. For example, looking at the graph below, you will notice that for the four processes executed on partitions of equal size, the four processes will finish processing January 2008 at the same time and then together continue to process February 2008. But for the partitions of different sizes, the first three processes will finish processing but wait for the fourth process, which is taking a much longer time. The total run time will be dominated by the largest chunk. To create ranges of equal-sized partitions, use time period and/or dimensions (such as geography) as your mechanism to partition. If your primary key is an incremental value such as an IDENTITY or another increasing value, you can use a modulo function. If you do not have any good partition columns, create a hash of the value of the rows and partition based on the hash value. For more information on hashing and partitioning, refer to the Analysis Services Distinct Count Optimizationwhite paper; while the paper is about distinct count within Analysis Services, the technique of hash partitioning is treated in depth too. Some other partitioning tips: Use partitioning on your target table. This way you will be able to run multiple versions of the same package, in parallel, that insert data into different partitions of the same table. When using partitioning, the SWITCH statement is your friend. It not only increases parallel load speeds, but also allows you to efficiently transfer data. Please refer to the SQL Server Books Online article Transferring Data Efficiently by Using Partition Switching for more information. As implied above, you should design your package to take a parameter specifying which partition it should work on. This way, you can have multiple executions of the same package, all with different parameter and partition values, so you can take advantage of parallelism to complete the task faster. From the command line, you can run multiple executions by using the “START” command. A quick code example of running multiple robocopy statements in parallel can be found within the Sample Robocopy Script to custom synchronize Analysis Services databases technical note.
	Minimize logged operations. When you insert data into your target SQL Server database, use minimally logged operations if possible. When data is inserted into the database in fully logged mode, the log will grow quickly because each row entering the table also goes into the log. Therefore, when designing Integration Services packages, consider the following: Try to perform your data flows in bulk mode instead of row by row. By doing this in bulk mode, you will minimize the number of entries that are added to the log file. This reduction will improve the underlying disk I/O for other inserts and will minimize the bottleneck created by writing to the log. If you need to perform delete operations, organize your data in a way so that you can TRUNCATE the table instead of running a DELETE. The latter will place an entry for each row deleted into the log. But the former will simply remove all of the data in the table with a small log entry representing the fact that the TRUNCATE occurred. In contrast with popular belief, a TRUNCATE statement can participate in a transaction. Use the SWITCH statement and partitioning. If partitions need to be moved around, you can use the SWITCH statement (to switch in a new partition or switch out the oldest partition), which is a minimally logged statement. Be careful when using DML statements; if you mix in DML statements within your INSERT statements, minimum logging is suppressed.
	Schedule and distribute it correctly. After your problem has been chunked into manageable sizes, you must consider where and when these chunks should be executed. The goal is to avoid one long running task dominating the total time of the ETL flow. A good way to handle execution is to create a priority queue for your package and then execute multiple instances of the same package (with different partition parameter values). The queue can simply be a SQL Server table. Each package should include a simple loop in the control flow: Pick a relevant chunk from the queue: “Relevant” means that is has not already been processed and that all chunks it depends on have already run. If no item is returned from the queue, exit the package. Perform the work required on the chunk. Mark the chunk as “done” in the queue. Return to the start of loop. Picking an item from the queue and marking it as “done” (step 1 and 3 above) can be implemented as stored procedure, for example. The queue acts as a central control and coordination mechanism, determining the order of execution and ensuring that no two packages work on the same chunk of data. Once you have the queue in place, you can simply start multiple copies of DTEXEC to increase parallelism.

Improving the Performance of the Data Flow

SQL Server 2008

Other Versions

This topic provides suggestions about how to design Integration Services packages to avoid common performance issues. This topic also provides information about features and tools that you can use to troubleshoot the performance of packages.

Configuring the Data Flow

To configure the Data Flow task for better performance, you can configure the task's properties, adjust buffer size, and configure the package for parallel execution.

Configure the Properties of the Data Flow Task

You can configure the following properties of the Data Flow task, all of which affect performance:

Specify the locations for temporary storage of buffer data (BufferTempStoragePath property) and of columns that contain binary large object (BLOB) data (BLOBTempStoragePath property). By default, these properties contain the values of the TEMP and TMP environment variables. You might want to specify other folders to put the temporary files on a different or faster hard disk drive, or to spread them across multiple drives. You can specify multiple directories by delimiting the directory names with semicolons.
Define the default size of the buffer that the task uses, by setting the DefaultBufferSize property, and define the maximum number of rows in each buffer, by setting theDefaultBufferMaxRows property. The default buffer size is 10 megabytes, with a maximum buffer size of 100 megabytes. The default maximum number of rows is 10,000.
Set the number of threads that the task can use during execution, by setting the EngineThreads property. This property provides a suggestion to the data flow engine about the number of threads to use. The default is 5, with a minimum value of 3. However, the engine will not use more threads than it needs, regardless of the value of this property. The engine may also use more threads than specified in this property, if necessary to avoid concurrency issues.
Indicate whether the Data Flow task runs in optimized mode (RunInOptimizedMode property). Optimized mode improves performance by removing unused columns, outputs, and components from the data flow.

Adjust the Sizing of Buffers

The data flow engine begins the task of sizing its buffers by calculating the estimated size of a single row of data. Then it multiplies the estimated size of a row by the value ofDefaultBufferMaxRows to obtain a preliminary working value for the buffer size.

If the result is more than the value of DefaultBufferSize, the engine reduces the number of rows.
If the result is less than the internally-calculated minimum buffer size, the engine increases the number of rows.
If the result falls between the minimum buffer size and the value of DefaultBufferSize, the engine sizes the buffer as close as possible to the estimated row size times the value of DefaultBufferMaxRows.

When you begin testing the performance of your data flow tasks, use the default values for DefaultBufferSize and DefaultBufferMaxRows. Enable logging on the data flow task, and select the BufferSizeTuning event to see how many rows are contained in each buffer.

Before you begin adjusting the sizing of the buffers, the most important improvement that you can make is to reduce the size of each row of data by removing unneeded columns and by configuring data types appropriately.

When sufficient memory is available, you should use a smaller number of large buffers, rather than a larger number of small buffers. In other words, you can improve performance by reducing the total number of buffers required to hold your data, and by fitting as many rows of data into a buffer as possible. To determine the optimum number of buffers and their size, experiment with the values of DefaultBufferSize and DefaultBufferMaxRows while monitoring performance and the information reported by the BufferSizeTuning event.

Do not increase buffer size to the point where paging to disk starts to occur. Paging to disk hinders performance more than a buffer size that has not been optimized. To determine whether paging is occurring, monitor the "Buffers spooled" performance counter in the Performance snap-in of the Microsoft Management Console (MMC).

Configure the Package for Parallel Execution

Parallel execution improves performance on computers that have multiple physical or logical processors. To support parallel execution of different tasks in the package, Integration Services uses two properties: MaxConcurrentExecutables and EngineThreads.

The MaxConcurrentExcecutables Property

The MaxConcurrentExecutables property is a property of the package itself. This property defines how many tasks can run simultaneously. The default value is -1, which means the number of physical or logical processors plus 2.

To understand how this property works, consider a sample package that has three Data Flow tasks. If you set MaxConcurrentExecutables to 3, all three Data Flow tasks can run simultaneously. However, assume that each Data Flow task has 10 source-to-destination execution trees. Setting MaxConcurrentExecutables to 3 does not ensure that the execution trees inside each Data Flow task run in parallel.

The EngineThreads Property

The EngineThreads property is a property of each Data Flow task. This property defines how many threads the data flow engine can create and run in parallel. TheEngineThreads property applies equally to both the source threads that the data flow engine creates for sources and the worker threads that the engine creates for transformations and destinations. Therefore, setting EngineThreads to 10 means that the engine can create up to ten source threads and up to ten worker threads.

To understand how this property works, consider the sample package with three Data Flow tasks. Each of Data Flow task contains ten source-to-destination execution trees. If you set EngineThreads to 10 on each Data Flow task, all 30 execution trees can potentially run simultaneously.

Configuring Individual Data Flow Components

To configure individual data flow components for better performance, there are some general guidelines that you can follow. There are also specific guidelines for each type of data flow component: source, transformation, and destination.

General Guidelines

Regardless of the data flow component, there are two general guidelines that you should follow to improve performance: optimize queries and avoid unnecessary strings.

Optimize Queries

A number of data flow components use queries, either when they extract data from sources, or in lookup operations to create reference tables. The default query uses the SELECT * FROM <tableName> syntax. This type of query returns all the columns in the source table. Having all the columns available at design time makes it possible to choose any column as a lookup, pass-through, or source column. However, after you have selected the columns to be used, you should revise the query to include only those selected columns. Removing superfluous columns makes the data flow in a package more efficient because fewer columns create a smaller row. A smaller row means that more rows can fit into one buffer, and the less work it is to process all the rows in the dataset.

To construct a query, you can type the query or use Query Builder.

Note
When you run a package in Business Intelligence Development Studio, the Progress tab of SSIS Designer lists warnings. These warnings include identifying any data column that a source makes available to the data flow, but is not subsequently used by downstream data flow components. You can use the RunInOptimizedMode property to remove these columns automatically.

Avoid Unnecessary Sorting

Sorting is inherently a slow operation, and avoiding unnecessary sorting can enhance the performance of the package data flow.

Sometimes the source data has already been sorted before being used by a downstream component. Such pre-sorting can occur when the SELECT query used an ORDER BY clause or when the data was inserted into the source in sorted order. For such pre-sorted source data, you can provide a hint that the data is sorted, and thereby avoid the use of a Sort transformation to satisfy the sorting requirements of certain downstream transformations. (For example, the Merge and Merge Join transformations require sorted inputs.) To provide a hint that the data is sorted, you have to do the following tasks:

Set the IsSorted property on the output of an upstream data flow component to True.
Specify the sort key columns on which the data is sorted.

For more information, see How to: Sort Data for the Merge and Merge Join Transformations.

If you have to sort the data in the data flow, you can improve performance by designing the data flow to use as few sort operations as possible. For example, the data flow uses a Multicast transformation to copy the dataset. Sort the dataset once before the Multicast transformation runs, instead of sorting multiple outputs after the transformation.

For more information, see Sort Transformation, Merge Transformation, Merge Join Transformation, and Multicast Transformation.

Sources

OLE DB Source

When you use an OLE DB source to retrieve data from a view, select "SQL command" as the data access mode and enter a SELECT statement. Accessing data by using a SELECT statement performs better than selecting "Table or view" as the data access mode.

Transformations

Use the suggestions in this section to improve the performance of the Aggregate, Fuzzy Lookup, Fuzzy Grouping, Lookup, Merge Join, and Slowly Changing Dimension transformations.

Aggregate Transformation

The Aggregate transformation includes the Keys, KeysScale, CountDistinctKeys, and CountDistinctScale properties. These properties improve performance by enabling the transformation to preallocate the amount of memory that the transformation needs for the data that the transformation caches. If you know the exact or approximate number of groups that are expected to result from a Group by operation, set the Keys and KeysScale properties, respectively. If you know the exact or approximate number of distinct values that are expected to result from a Distinct count operation, set the CountDistinctKeys and CountDistinctScale properties, respectively.

If you have to create multiple aggregations in a data flow, consider creating multiple aggregations that use one Aggregate transformation instead of creating multiple transformations. This approach improves performance when one aggregation is a subset of another aggregation because the transformation can optimize internal storage and scan incoming data only once. For example, if an aggregation uses a GROUP BY clause and an AVG aggregation, combining them into one transformation can improve performance. However, performing multiple aggregations within one Aggregate transformation serializes the aggregation operations, and therefore might not improve performance when multiple aggregations must be computed independently.

For more information, see Aggregate Transformation.

Fuzzy Lookup and Fuzzy Grouping Transformations

For information about optimizing the performance of the Fuzzy Lookup and Fuzzy Grouping transformations, see the white paper, Fuzzy Lookup and Fuzzy Grouping in SQL Server Integration Services 2005.

Lookup Transformation

Minimize the size of the reference data in memory by entering a SELECT statement that looks up only the columns that you need. This option performs better than selecting an entire table or view, which returns a large amount of unnecessary data.

Merge Join Transformation

The Merge Join transformation includes the MaxBuffersPerInput property, which suggests the maximum number of buffers that you want to be active for each input at the same time. However, the value that you set for the MaxBuffersPerInput property is only a suggestion. The Merge Join transformation almost always has to increase the number of buffers to a value that is larger than this suggestion. This increase is typically necessary for the following reasons:

To obtain enough data for merging from each input.
To avoid the risk of a deadlock between the threads that are processing the data.

Note
The number of buffers that the Merge Join transformation uses can become very large, which in turn can cause Integration Services to consume a large amount of memory on the computer.

When the transformation has enough data from each input, and the danger of a threading deadlock is not present, the transformation resumes using the value that theMaxBuffersPerInput property suggests.

The default value of the MaxBuffersPerInput property is 5, which is the number of buffers that works well in most scenarios. To improve performance, you might try increasing the number of buffers. To reduce memory pressure, you might try decreasing the number of buffers. However, a very small number of buffers might adversely affect performance, and a value of 0 (zero) disables all throttling and should not be used.

Slowly Changing Dimension Transformation

The Slowly Changing Dimension Wizard and the Slowly Changing Dimension transformation are general-purpose tools that meet the needs of most users. However, the data flow that the wizard generates is not optimized for performance.

Typically, the slowest components in the Slowly Changing Dimension transformation are the OLE DB Command transformations that perform UPDATEs against a single row at a time. Therefore, the most effective way to improve the performance of the Slowly Changing Dimension transformation is to replace the OLE DB Command transformations. You can replace these transformations with destination components that save all rows to be updated to a staging table. Then, you can add an Execute SQL task that performs a single set-based Transact-SQL UPDATE against all rows at the same time.

Advanced users can design a custom data flow for slowly changing dimension processing that is optimized for large dimensions. For a discussion and example of this approach, see the section, "Unique dimension scenario," in the white paper, Project REAL: Business Intelligence ETL Design Practices.

Destinations

To achieve better performance with destinations, consider using a SQL Server destination and testing the destination's performance.

SQL Server Destination

When a package loads data to an instance of SQL Server on the same computer, use a SQL Server destination. This destination is optimized for high-speed bulk loads.

Test the Performance of Destinations

You may find that saving data to destinations takes more time than expected. To identify whether the slowness is caused by the inability of the destination to process data quickly enough, you can temporarily replace the destination with a Row Count transformation. If the throughput improves significantly, it is likely that the destination that is loading the data is causing the slowdown.

Monitoring the Performance of the Package

Integration Services includes tools and features that you can use to monitor the performance of a package. For example, logging captures run-time information about a package, and performance counters let you monitor the data flow engine. Use the following suggestions to determine which parts of the package are having the greatest effect on performance.

Review the Information on the Progress Tab

SSIS Designer provides information about both control flow and data flow when you run a package in Business Intelligence Development Studio. The Progress tab lists tasks and containers in order of execution and includes start and finish times, warnings, and error messages for each task and container, including the package itself. It also lists data flow components in order of execution and includes information about progress, displayed as percentage complete, and the number of rows processed.

To enable or disable the display of messages on the Progress tab, toggle the Debug Progress Reporting option on the SSIS menu. Disabling progress reporting can help improve performance while running a complex package in BI Development Studio.

Configure Logging in the Package

Integration Services includes a variety of log providers that allow packages to log information at run time to different types of files, or to SQL Server. You can enable log entries for packages and for individual package objects such as tasks and containers. Integration Services includes a wide variety of tasks and containers, and each task and container has its own set of descriptive log entries. For example, a package that includes an Execute SQL task can write a log entry that lists the SQL statement that the task executed, including parameter values for the statement.

The log entries include information such as the start and finish times of packages and package objects, making it possible to identify slow running tasks and containers. For more information, see Logging Package Execution, Implementing Logging in Packages, and Custom Messages for Logging.

Configure Logging for Data Flow Tasks

The Data Flow task provides many custom log entries that can be used to monitor and adjust performance. For example, you can monitor components that might cause memory leaks, or keep track of how long it takes to run a particular component. For a list of these custom log entries and sample logging output, see Data Flow Task.

Use the PipelineComponentTime Event

Perhaps the most useful custom log entry is the PipelineComponentTime event. This log entry reports the number of milliseconds that each component in the data flow spends on each of the five major processing steps. The following table describes these processing steps. Integration Services developers will recognize these steps as the principal methods of a PipelineComponent.

Step	Description
Validate	The component checks for valid property values and configuration settings.
PreExecute	The component performs one-time processing before it starts to process rows of data.
PostExecute	The component performs one-time processing after it has processed all rows of data.
ProcessInput	The transformation or destination component processes the incoming rows of data that an upstream source or transformation has passed to it.
PrimeOutput	The source or transformation component fills the buffers of data to be passed to a downstream transformation or destination component.

When you enable the PipelineComponentTime event, Integration Services logs one message for each processing step performed by each component. The following log entries show a subset of the messages that the Integration Services CalculatedColumns package sample logs:

The component "Calculate LineItemTotalCost" (3522) spent 356 milliseconds in ProcessInput.

The component "Sum Quantity and LineItemTotalCost" (3619) spent 79 milliseconds in ProcessInput.

The component "Calculate Average Cost" (3662) spent 16 milliseconds in ProcessInput.

The component "Sort by ProductID" (3717) spent 125 milliseconds in ProcessInput.

The component "Load Data" (3773) spent 0 milliseconds in ProcessInput.

The component "Extract Data" (3869) spent 688 milliseconds in PrimeOutput filling buffers on output "OLE DB Source Output" (3879).

The component "Sum Quantity and LineItemTotalCost" (3619) spent 141 milliseconds in PrimeOutput filling buffers on output "Aggregate Output 1" (3621).

The component "Sort by ProductID" (3717) spent 16 milliseconds in PrimeOutput filling buffers on output "Sort Output" (3719).

These log entries show that the data flow task spent the most time on the following steps, shown here in descending order:

The OLE DB source that is named "Extract Data" spent 688 ms. loading data.
The Derived Column transformation that is named "Calculate LineItemTotalCost" spent 356 ms. performing calculations on incoming rows.
The Aggregate transformation that is named "Sum Quantity and LineItemTotalCost" spent a combined 220 ms—141 in PrimeOutput and 79 in ProcessInput—performing calculations and passing the data to the next transformation.

Monitor the Performance of the Data Flow Engine

Integration Services includes a set of performance counters for monitoring the performance of the data flow engine. For example, you can track the total amount of memory, in bytes, that all buffers use and check whether components are out of memory. A buffer is a block of memory that a component uses to store data. For more information, seeMonitoring the Performance of the Data Flow Engine.

External Resources

Technical article, SQL Server 2005 Integration Services: A Strategy for Performance, on technet.microsoft.com
Technical article, Integration Services: Performance Tuning Techniques, on technet.microsoft.com
Video, Tuning Your SSIS Package Data Flow in the Enterprise, on msdn.microsoft.com
Video, Understanding SSIS Data Flow Buffers, on msdn.microsoft.com

Stay Up to Date with Integration Services

For the latest downloads, articles, samples, and videos from Microsoft, as well as selected solutions from the community, visit the Integration Services page on MSDN or TechNet:

For automatic notification of these updates, subscribe to the RSS feeds available on the page.