|
|
Chapter 7 - Your First Script Component
Wednesday, July 6, 2005
|
|
|
To order this book or to find additional details, go to www.rationalpress.com.
The Integration Services feature of Microsoft® SQL Server 2005 brings together data from diverse sources in a high-performance data integration platform, to enable users to work across multiple applications that may not have been designed to cooperate. In The Rational Guide To Scripting SQL Server 2005 Integration Services Beta Preview, Donald Farmer, Microsoft's Group Program Manager for SQL Server Integration Services, clearly explains how to build practical and useful scripts for this exciting new SQL Server 2005 feature. His systematic approach, friendly tutorials, and numerous examples are suitable for both developers and IT professionals alike. You'll learn how scripting in SQL Server Integration Services can enable you to easily add new, fully-integrated functionality to achieve your enterprise data integration needs.
This book takes a rational, no-nonsense approach in a compact guide - only 224 pages. The book is written to give you the basics…fast!
Technical accuracy is assured by Ashvini Sharma, Development Lead, SQL Server Integration Services, Microsoft Corporation.
Copyright © 2005 by Mann Publishing Group.
www.mannpublishing.com
By Donald Farmer
In Chapter 2, you learned about the three types of Data Flow components in SSIS: sources,
transformations and destinations. This chapter describes these different types and show you
how to create your first transformation component using script. Once you have mastered
the Script Component, you will find that you can readily integrate almost any functionality
into the SSIS Data Flow. Custom scripted data sources can handle otherwise unsupported
file formats; custom transformations can call functions in managed assemblies, including
the .NET Framework; and custom scripted destinations enable SSIS to output data in very
flexible ways. In fact, it is quite possible to write sophisticated data flows for ETL and data
integration using only script components!
Script Component Types
You will typically use script somewhat differently in each of the three component types.
Script Source Component
Source components have no input columns, but do have output columns. The purpose of
the Script Component in this case is to deliver data to the output columns. For example,
the component author could write a script using file and string handling functions to parse
a complex text file whose format is not suitable for parsing with the Flat File Connection
Manager.
Script Transformation Component
Transformation components have input columns and output columns. In these components,
the script typically will transform the data in some way between inputs and outputs. What
these transformations are is up to you-that is why scripts are so fl exible. Chapter 2
discussed two different patterns of transformation: synchronous and asynchronous. Script
components can handle both patterns quite easily.
A synchronous component, you will remember, is particularly useful for row-by-row
transformations. For example, I may have incoming data that includes customer names.
Perhaps for easy cross-referencing with other customer records, I would like to calculate
a SOUNDEX value for each customer name. SOUNDEX is an algorithm invented by
the US Census for codifying names to take account of different spellings. T-SQL has a
SOUNDEX function, but SSIS does not.
An SSIS developer can code a SOUNDEX function in Visual Basic .NET script. With
this, the script can transform every incoming customer name and emit a SOUNDEX value
at the output.

An asynchronous component is useful for performing operations which change the shape
of the data signifi cantly, or where incoming rows do not have related rows at the output.
A good use of an asynchronous script would be to aggregate text. The SSIS Aggregate
component is very powerful, but it can only perform Min and Max calculations against
numeric columns. If you need to be able to calculate the Min and Max values of a string
column, this can easily be achieved in Visual Basic .NET script, as we shall see in Chapter
8.
Script Destination Component
As mentioned previously, it is possible to have a text fi le that the Flat File Connection
Manager cannot parse, but which a script source component can handle. In an enterprise
where text fi les like this are important to legacy applications, you may also need the
ability to write data to a fi le in this format for the legacy application to read. The script
destination component is useful in these circumstances. You will have guessed by now
that a script destination component has input columns but no output columns. Instead, the
Visual Basic .NET script handles the data, perhaps using fi le and string routines from the
.NET Framework to output text fi les in the appropriate format.
Adding a Script Component to Your Package
The first Script Component we are going to look at it is a transformation component. In fact,
we are going to build a simple SOUNDEX component to transform a column containing a
name to a codifi ed value representing the sound of the name, just as the US Census would
do. As this fi rst component is a transformation, it requires some data to work with. We can
quickly build a package containing a Data Flow and a source component to get started.

Preparing the Package
Use the following steps to prepare the package:
- Create a new SSIS package.
- In the designer, drag a Data Flow Task from the Control Flow Items tab of
the Toolbox to the Control Flow design surface.
- Double-click the Data Flow Task to open the Data Flow design surface.
- Drag an OLE DB Source component from the Data Flow Sources tab of
the Toolbox.
- Double-click the OLE DB Source component shape on the design surface to
open the OLE DB Source Editor.
- Click the New button to create a new OLE DB Connection Manager.
- Select an existing connection to your AdventureWorksDW database, or
create a new one now. This assumes you have installed the sample databases
with SQL Server.
- Click the OK button to return to the OLE DB Source Editor.
- Select the Table or View Data Access Mode.
- Select the [dbo].[DimCustomer] table. (DimCustomer stands for Customer
Dimension and is not a reflection on the intelligence of AdventureWorks
customers! DimEmployee, however, may be a different matter.)
- Click the Preview button if you would like to see the data in this table. Note
the Last Name field that we shall be using later.
- Click the OK button to close the OLE DB Source Editor.
Now we have some source data to work with. At this stage, your package design should
look like the example in Figure 7.1.

Adding the Script Component
- In the designer, drag a Script Component from the Data Flow
Transformations tab of the Toolbox to the Data Flow design surface.
- When you drag the Script Component, the Select Script Component Type
Dialog will appear, prompting you to select the kind of component you
want to create (see Figure 7.2). This is important, as the configuration of the
Script Component is somewhat different for each type. This dialog box sets
up the component for you automatically, saving some work in creating this
configuration yourself.

- Select Transformation (the default) and click the OK button.
- Now connect the output of the OLE DB Source to the Script Component.
At this stage your package should look like Figure 7.3 Note the warning icon
that appears in the Script Component. You can mouse over that to see the
text of the warning. In this case, we have not yet added any script code, so
the component at this point is in an invalid state.

Adding Columns to Your Script Component
The next step is to tell your Script Component which input columns to work with. The Script
Component requires you explicitly to say in advance which columns are required. This
is because the component creates a wrapper that exposes these columns to the scripting
environment. It would be expensive and redundant to expose all the input columns by
default when you may only be using one of them.
Selecting the input columns is easy. Just follow these steps:
- Double-click the Script Component shape on the design surface to open the
Script Transformation Editor.
- The default view is of the Input Columns tab. In this case, check the box
next to the Last Name column. The component editor should now look like
Figure 7.4.

Now we can use the Last Name column in our component. Note that we have left the
Usage Type of the column as Read Only. This is because we do not want to alter the last
name. Instead, we want to calculate a new column from the values in this column. But
where will these new values go? We need to create a new column to hold these new values.
This new column will not appear at the input. It is being calculated by the script, so it can
only appear at the output.
We should add an Output Column, as follows:
- Select the Inputs and Outputs tab of the Script Transformation Editor.
- Expand the Output node of the Inputs and Outputs tree view.
- Select the Output Columns folder under the Output node.
- Click the Add Column button. This will add a new Output Column, named
Column by default. The data type will be a four-byte signed integer. You can
see this in Figure 7.5.

This default column is not quite what we need. We can, however, edit its
properties.
- Edit the Name property of the column to be Soundex. The column name
should change in the tree view, too.
- SOUNDEX codes are strings, so select the Data Type property of the
column and use the drop-down list to select the type String [DT_STR].

At this stage, your component should look Figure 7.6.

We have selected an Input Column (LastName) and created an Output Column (Soundex)
to hold the transformed value. Now we are ready to write some code!
Elements of the Script
Navigate to the Script tab of the component. The component editor will now look like
Figure 7.7

There are some useful properties available in the property grid, but they are, in fact,
optional. However, it is a good idea to set the PreCompile property to True.

To get started, just click the Design Script button. The VSA editor will open, looking like
Figure 7.8

In Chapter 3, you learned about the VSA environment and its various elements. These
should be familiar to you now, but some aspects of the script for a Script Component
will certainly be new, compared to the Script Task we looked at earlier. First, look at the
Imports statements for the Script Component in Listing 7.1.
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Listing 7.1: Imports Statements for a Script Component.
You can see that rather than importing the Dts.Runtime namespace, there are two
"wrappers" to be imported. These wrappers enable a script within a Data Flow Task to
"see out" from the task to objects such as variables. In Chapter 8, you will learn that this
introduces a slightly different way of accessing variables.
In the Script Task, the Main() subroutine was where most of the work was done, but here
in the Script Component, we see something a little more involved, as in Listing 7.2
Public Class ScriptMain
Inherits UserComponent
Public Overrides Sub Input_ProcessInputRow(ByVal Row As InputBuffer)
'
' Add your code here
'
End Sub
End Class
Listing 7.2: ScriptMain Class in the Script Component.
The ScriptMain class performs those duties we mentioned earlier – wrapping and
exposing the Data Flow columns and objects to the script. Within ScriptMain you can
see the following subroutine: Input_ProcessInputRow(ByVal Row As InputBuffer)
does this mean? We will cover this in more detail later, but for now, all we need to know is
that when the input to the Script Component is processed, SSIS in effect calls this routine
for every row. In practice, this means that we can immediately start writing useful script
code.
Editing the Script Component
The example we are going to implement is a function to convert the customer's last name
to a SOUNDEX code. Listing 7.3 shows the function that we will use for SOUNDEX,
written in Visual Basic .NET code.
Function CalcSoundex(ByVal sName As String) As String
Dim i, acode, dcode, prevCode As Integer
Const codes As String = "01230120022455012623010202"
SName = UCase(SName)
CalcSoundex = Left(SName, 1)
prevCode = Asc(Mid(codes, Asc(sName) - 64))
For i = 2 To Len(SName)
acode = Asc(Mid(SName, i, 1)) - 64
' we are not interested in symbols or numbers
If acode >= 1 And acode <= 26 Then
' convert the character to a digit based on soundex code
dcode = Asc(Mid(codes, acode, 1))
' ignore repeats
If dcode <> 48 And dcode <> prevCode Then
CalcSoundex = CalcSoundex & Chr(dcode)
If Len(CalcSoundex) = 4 Then Exit For
End If
prevCode = dcode
End If
Next
End Function
Listing 7.3: A Simple SOUNDEX Function in VB.NET Script.
We do not really need to understand much about the internals of this function for this
example. It returns a code based on the consonants in a string according to the SOUNDEX
algorithm described earlier. For now, write this function into your script code, immediately
after the Input_ProcessInputRow function, but still within ScriptMain().
The next step is to reference the function in our script. For each row coming in to the
component (in the input buffer) we will set the value of the Soundex column to the
value returned by CalcSoundex based on the LastName column. Just below the line Add
your code here, type: Row. That is, Row followed by a period. As shown in Figure 7.9,
IntelliSense (within the editor) offers a list of options to complete this object, including the
available columns (in this case LastName and Soundex).

In this case, select Soundex and complete the rest of the statement as shown in Listing 7.4,
using IntelliSense if you like to complete the other references in this statement:
Row.Soundex = CalcSoundex(Row.LastName)
That is all! Your script is ready to run. Close the scripting environment. SSIS automatically
saves the script for you. Now click the OK button in the Script Transformation Editor and
return to the Data Flow design surface. The validation icon has now gone-your script is
valid and you are ready to complete the Data Flow for testing.
Debugging the Script Component
In Chapter 5, you learned how to debug the Script Task by setting breakpoints within the
script itself. The VSA design environment does enable you to set breakpoints in a Script
Component, but the SSIS designer and debugger ignores them when executing a package.
There are a number of reasons for this, but it is mainly due to the nature of the Script
Component , which (unlike the Script Task) does not call a script only once, but many
times within an execution.
However, SSIS does provide some very elegant techniques for debugging Data Flows
in general, and we can use these techniques to understand if our script is performing as
expected. Later we will look at more fine-grained, row-by-row debugging. For now, we
will see how to debug the script working against the Data Flow as many rows pass through
it.
Debugging a Data Flow with a RowCount Component
You might think that you could execute the SSIS package right now, with just an OLE DB
Source and a Script Component. In fact, this is possible, but not much will happen. If you
were to try it, you would see no activity. You could find the explanation on the Progress
tab of the designer, where you would see the following message:
Warning: Source "OLE DB Source Output" () will not be
read because none of its data ever becomes visible
outside the Data Flow Task.
SSIS has recognized that your data is not going anywhere, so it has optimized the Data
Flow by removing the redundant components! Of course, the data from your Script
Component (including your Soundex column) is not going anywhere, so you have no
opportunity to see this data.
In most data-integration applications, you would send the data to a temporary destination
in order to examine it and see if the process had transformed the data correctly. SSIS
provides an excellent way to debug your data without having to create temporary tables
or files.
The Row Count component simply counts the passing rows wherever you add it to the
Data Flow. This component can be very useful for debugging. It gives you a count of the
rows at a particular point and SSIS will not optimize it out of the Data Flow. This enables
you to execute the Data Flow with a Row Count component at the end of the flow, even
though you have added no destination. In other words, you can execute your SSIS package
and debug it, even though the data is not going anywhere-it is just counting rows!
In Chapter 5, you learned how to create SQL Server Integration Services variables. Create
a variable now, named RowCount of Int32 type, as follows:
- Select SSIS
Variables to show the Variables window.
- Click the New Variable button to create a new variable. This variable will
be of type Int32 by default.
- Rename it to RowCount (see Figure 7.10). You can now close the Variables
window if you like.

- Drag a Row Count Component from the Data Flow Transformations tab
of the Toolbox to the Data Flow design surface.
- Now connect the output of the Script Component to the Row Count
Component.
- Edit the Row Count component by double-clicking the component shape in
the designer.
- In the Advanced Editor for Row Count, set the VariableName property to
the name of the variable that you created: RowCount.
If you were to execute the package now, the number of rows passing through the Row
Count component would be written to the named variable.

Now that you can execute the package and the Data Flow within it, you can start to debug
the output of the script. You shall see whether the script has calculated the SOUNDEX
value correctly. The best way to see the output from the Script Component is to add a data
viewer to the Data Flow on the path between the output of the Script Component and the
input of the Row Count Component.
- Right-click the line between path between the output of the Script
Component and the input of the Row Count Component.
- Select Data Viewers.
- The Data Flow Path Editor will appear, with the Data Viewers tab visible
by default (see Figure 7.11).

- Click the Add... button in the Data Flow Path Editor to show the
Configure Data Viewer dialog box (see Figure 7.12).

- Select the Grid type of Data Viewer, then click the Grid tab.
- The Grid tab shows all the columns currently available selected on the right
hand side. In fact, we only need to see the LastName and Soundex columns
in order to debug our script. Use the arrow buttons to move all the other rows
to the Unused Columns list on the left-hand side. In fact, it may be easier to
move all the columns over to the left-hand side and reselect LastName and
Soundex (see Figure 7.13).

- Click the button to close the Configure Data Viewer dialog, and click
OK again to close the Data Flow Path Editor.
Now the Data Flow is ready for debugging. The Data Flow designer shows the Data
Viewer on the path between the Script Component and the Row Count Component with a
"spectacles" icon (see Figure 7.14).

Running a Data Flow with a Script Component
To execute the package, and thus the Data Flow, you can use the F5 key, or select
Debug Start Debugging from the main menu, or right-click the package itself in the
Solution Explorer and select Execute Package. When the package executes, you will
notice that the >Data Viewer window appears. You can drag and dock this Data Viewer to
any size or convenient location in your designer or on your screen.
On execution, the Data Flow task reads a buffer of data from the OLE DB Source
component. The Script Component acts on this buffer, transforming each row using our
function. The Row Count component should next count these rows, but before it can do so,
the Data Viewer will pause execution of the Data Flow and the package.
At this point, all the components in the flow will be colored yellow, to indicate that they
are in progress and have not yet completed. When the Data Viewer has received a buffer of
data, it displays the columns you selected in the viewer. As shown in Figure 7.15, the grid
now shows the results of our SOUNDEX calculation for the first buffer of data-4872
rows.

You can examine all the values to ensure that the function is working correctly. In this
example they certainly appear to be correct. If necessary, you can even copy the data to
the clipboard and paste it into other applications, such as Microsoft Excel, for further
analysis.
To see the next buffer of data, click the green Continue button in the top left-hand corner
of the Data Viewer. The Data Viewer will show the next buffer of data and pause again.
Alternatively, if you have finished examining the data, you could click the Detach button
and the flow will continue without pausing. Or, you could just close the Data Viewer.

When the Data Flow is complete, all the components in the Data Flow should be colored
green to show that they have all completed successfully. You can now stop debugging,
using F5 if you like, just like any other SQL SSIS package.
Summary
As you can see, script components are versatile. They are easy to add to your Data Flow,
quite simple to program, and give excellent performance. Add to that the power of visual
debugging and you can see why they are such an exciting feature of SSIS. Mastering the
Script Component can be your key to a vast range of data integration functionality.
|
|
|
81 people have rated this page.
Average rating: 4 out of 9
|
PUBLISHED BY
Rational Press - An imprint of the Mann Publishing Group
710 Main Street, 6th Floor
PO Box 580
Rollinsford, NH 03869, USA
www.rationalpress.com
www.mannpublishing.com
+1 (603) 601-0325
Copyright © 2005 by Mann Publishing Group.
All rights reserved. No part of the contents of this book may be reproduced in any form or by any means without the written permission of the publisher. For questions about rights and permissions, send e-mail to permissions@mannpublishing.com.
ISBN: 1-932577-21-1
Library of Congress Control Number (LCCN): 2005928284
Printed and bound in the United States of America.
Trademarks
Mann Publishing, Mann Publishing Group, Agility Press, Rational Press, Inc.Press, NetImpress,
Farmhouse Press, BookMann Press, The Rational Guide To, Rational Guides, ExecuGuide,
AdminExpert, From the Source, the Mann Publishing Group logo, the Agility Press logo, the
Rational Press logo, the Inc.Press logo, Timely Business Books, Rational Guides for a Fast-Paced
World, and Custom Corporate Publications are all trademarks or registered trademarks of Mann
Publishing Incorporated.
All brand names, product names, and technologies presented in this book are trademarks or
registered trademarks of their respective holders.
Disclaimer of Warranty
While the publisher and author(s) have taken care to ensure accuracy of the contents of this book,
they make no representation or warranties with respect to the accuracy or completeness of the
contents of this book and specifically disclaim any implied warranties or merchantability or fitness
for a specific purpose. The advice, strategies, or steps contained herein may not be suitable for your
situation. You should consult with a professional where appropriate before utilizing the advice,
strategies, or steps contained herein. Neither the publisher nor author(s) shall be liable for any
loss of profit or any other commercial damages, including but not limited to special, incidental,
consequential, or other damages.
All Mann Publishing Group books may be purchased at bulk discounts. Visit us on the web at www.mannpublishing.com or call (877) 877-MANN.
|
|