Archive for March, 2007

DTS Designer.

Posted: March 10, 2007 in Uncategorized

Welcome to DTS
A company’s data is its life. Since the evolution of the mainframe and relational databases, companies
have been evolving and increasing their productivity through database products like SQL Server and
Oracle. The problem with evolution, however, is that some database systems are left behind. They are
just too expensive to convert to the newer systems.
In the past, as a company upgraded its database system, a programmer would have to reprogram the
communication layer through complex code. To remedy the time-to-market problem this created,
Microsoft invented different ways to communicate with legacy and modern relational databases via an
open standard layer called OLE DB. This breakthrough allowed programmers to communicate with
IBM’s DB2 database using code similar to that used to communicate to a Microsoft’s SQL Server
database. Suddenly portability became much easier.
This solved the problem of a program’s portability, but what happens when you have a DB2 database
that you need to convert to a SQL Server database? For example, your company purchases another
company that still uses a legacy VSAM (mainframe driven) system. Your company, however, has its
corporate database infrastructure in SQL Server. Converting the VSAM system may be too expensive
and time consuming. In the meantime, you need the data that a sales representative is entering into a
dumb terminal (DT) from the company you bought to flow to your SQL Server system in the shipping
department. In the past, this process would have taken quite a while to program and employee turnover
would be so bad from the merger that by the time you finished the development of a workflow system,
you would have no one to support it.
This scenario is being repeated throughout the data processing industry daily. As companies merge,
they need a way to rapidly develop an application to transition data from any source to any destination.
Doing this on the mainframe is expensive and the human resources to do this are dwindling.
1
Chapter 1
8
Developers also need to be able to transform data. This means that you can have the data look one way
on the source and pass the data through a cleansing process to appear a different way on the destination.
We can, for example, perform calculations to generate new values for loading, combine multiple
columns into a single column, or conversely, break a single column into multiple columns. Imagine that
the company you purchased calls sales territories by one set of names, and you’d like to call them by
another; this could cause problems when the data was queried, as the values would appear to belong to
different territories. However, you can allow them to keep their current system and just scrub the data
(that is, make it consistent) as it enters your tracking systems. Another common example is seen when
the fields FirstName and LastName need to become a single field named Name. In either case,
programmers would have had to spend months staring at their computer screen turning over line after
line to perform this simple translation.
DTS to the rescue! Data Transformation Services (DTS) was first introduced with the release of SQL
Server 7.0. It was immediately hailed as a revolution in data transformation, since it was, after all, built
into SQL Server.
DTS is a collection of utilities and objects that allow you to import, export, and
convert data from any data source to any data source, whether to/from another
database system (heterogeneous) or to/from another SQL Server. DTS isn’t just about
data! DTS can move SQL Server objects, execute programs, FTP files, and even has
the flexibility to be expanded with your own custom COM components.
Data source here refers to any source in an OLE DB, ODBC, or text file format.
In this chapter we will:
❑ Provide an introduction to DTS
❑ Explain the different types of connection methods to SQL Server using OLE DB and ODBC
❑ Dive into the DTS Architecture
❑ Explore the new DTS features in SQL Server 2000
❑ Examine the tools we can use to create packages, concentrating on the wizards to introduce
the basics of DTS
Let’s start by looking at exactly what we can do with DTS.
The Next Generation in Database Solutions
In SQL Server 6.5, data movement was done through clunky utilities such as Transfer Manager.
Transfer Manager allowed us to transfer data with a single step from one SQL Server to another SQL
Server. However, we had no access to any other data source, such as Oracle or Microsoft Access.
DBAs also had access to Bulk Copy Program (BCP). BCP is a non-logged event that inserts data into
SQL Server in bulk fashion without any type of transformation. A BCP file’s columns can be de-limited
by a character such as a comma or a tab. BCP files can also use fixed column width to assign where
each column begins and ends. The problem with BCPing files was that the schema positioning on the
server and in the BCP file had to exactly match. For example, Column1 in the flat file had to be
Column1 in the SQL Server.
Welcome to DTS
9
Products such as Data Junction, produced by Data Junction Corporation, were released to fill the void.
Data Junction allowed you to transfer data from just about any Database Management System (DBMS)
or flat file, to any other DBMS. It was an extremely powerful tool, but expensive.
Today, developers and DBAs have access to the same power that existed in Data Junction, but through
a tool that ships with all editions of SQL Server 7.0 and 2000 (including MSDE). DTS can act
independently of SQL Server using a set of programs from a command line, or integrate tightly with
SQL Server. This means that Oracle and Access users for example, can use DTS for their own data
movement needs without having to buy a SQL Server license. However, if you do have a SQL license,
you get the benefit of using the DTS Designer, which we’ll look at more later on in this chapter. DTS
Designer is an integral part of Enterprise Manager and no standalone program is available. If you don’t
have the DTS Designer, then you will have to develop DTS workflows on a system that has a SQL
Server license and then port them to the other system, or program the workflows using the DTS object
model, which we will discuss later in the book, and a COM compliant programming language. Keep in
mind that if you connect to SQL Server in your workflow, you’ll still need to have a SQL Server Client
Access License (CAL). We will go into further detail about how to make DTS standalone in Chapter 2.
DTS allows users to convert data from any OLE DB compliant data source to any data source. This is
done through a workflow system where you can create a process that sequentially moves data. It also
allows you to expand the functionality by adding your own COM components. These components can
be in any programming language that supports COM, such as Visual C++ or Visual Basic, for example.
What is OLE DB?
OLE DB is an API that allows COM applications, such as DTS, to communicate with almost any data
storage system, including non-relational databases, e-mail, directories, and DTS packages. With OLE
DB, services are made available via a provider, which is a dynamic link library (DLL) that allows you to
connect to a data source. The OLE DB provider actually contains the API that is used by OLE DB.
OLE DB is also extensible. This means that you can install new providers to connect to a countless
number of data sources. For example, you can install the OLE DB provider for OLAP to access Microsoft
Analysis Services cubes. OLE DB is the only connectivity tool that will allow you to connect to such
systems. OLE DB has truly opened the door for open communication between database systems and
allowed applications to become much more portable. For example, a web application could simply change
its connection string in one file (GLOBAL.ASA) to move from connecting to Access to SQL Server.
A different type of common communication layer is Open Database Connectivity (ODBC). ODBC was
Microsoft’s first attempt at an open standard of data access. It was widely accepted around the industry
as the data access standard years ago, and even grew acceptance in the UNIX world. An application
that uses ODBC generally connects using Remote Data Objects (RDO). Most new development at
Microsoft surrounds OLE DB because it offers more compatibility with other systems.
One of the OLE DB providers is the OLE DB Provider for ODBC. This substantially expands the list of
database systems you can access. The OLE DB Provider for ODBC is considerably slower than the
straight OLE DB one, however.
OLE DB is able to communicate with almost any system because it exposes the database’s
communication layer to the programmer through open COM APIs. A programmer can easily
communicate to the provider through Active Data Objects (ADO). Because OLE DB uses these lower
level COM API calls, you can communicate to almost any consumer.
Chapter 1
10
In the diagram below, you can see the application using ADO to communicate to the database through
OLE DB. Optionally, you can use the OLE DB Provider for ODBC to communicate to systems that
may not have converted to OLE DB yet.
Application Browser
ADO
OLE DB
ODBC
SQL Data:
SQL Server,
Oracle, Jet,
Fox, other
Non-SQL Data:
Directory Services,
Video, Text, other
Mainframe
and Legacy
Data
Additionally, you can install programs such as Microsoft Host Integration Services (formally SNA
Server) to expand the available providers. For example, after installing Host Integration Services, you
can communicate to DB2 and legacy systems. Some providers do take a little more work to configure.
By default, Oracle providers are installed when you install SQL Server 2000, however, you must install
Oracle tools (SQL Net) to establish connectivity.
The DTS Package
The core component of DTS is the package. The package is what you create and execute when you are using
DTS. Every other object in DTS is a child to a package. A package is used to accomplish a particular goal
using the various children that it may have. It contains all the connection objects and items that DTS will
need to connect and transform the data. A package may not contain a connection to a database at all, instead
executing a program or an ActiveX script. All of these objects are defined in a simple GUI called the DTS
Designer. You can also design a package in any COM compliant programming language.
DTS packages can be saved into the local SQL Server, the Microsoft Repository, Visual Basic Files
(SQL Server 2000 only) or COM-Structured files. You can also execute packages without saving them.
It is only compulsory to save packages if you plan to schedule them for later execution.
A package has four major components. The four components, which are shown in the following figure,
don’t all have to exist for a package to be executable. For example, you can have a package with one
task that executes an external custom program or process. In this case, you don’t have any of the other
three components.
Welcome to DTS
11
Package
Connection
Task
Step
Global
Variable
Let’s take a look at each of these components in turn.
Connections
DTS allows you to connect to any OLE DB compliant data source. Natively, DTS can facilitate OLE DB
providers such as:
❑ SQL Server
❑ Microsoft Access
❑ Microsoft Excel
❑ Visual FoxPro
❑ Paradox
❑ dBase
❑ Text Files
❑ HTML Files
❑ Oracle
DTS also supports Microsoft Data Link files. Data link files are physical files that can be written to
connect to a database at run time (SQL 7.0 compiles these at design time). Because they’re files, they
can be transported from system to system. The list of OLE DB providers can also be expanded. For
example, on the Microsoft SNA CD-ROM (or HIS CD-ROM) there is an installation for OLE DB
providers for IBM’s DB2 database. Sybase providers are also available through the vendor’s website.
More information about OLE DB providers can be found at http://www.microsoft.com/data.
We’ll examine establishing connections in more detail in Chapter 2.
Chapter 1
12
Tasks
The package holds a number of instructions called tasks. Without tasks, a package would be a car without an
engine. There were 8 built-in tasks for SQL Server 7.0, which could do a number of things, such as:
❑ Transform data
❑ Bulk insert data
❑ Execute a SQL script
❑ Execute a program
❑ Execute an ActiveX script
❑ Move a database
There are 17 built-in tasks for SQL Server 2000, which have additional functionality including the ability to:
❑ FTP a file
❑ Execute another package
❑ Send or receive MSMQ (Microsoft Message Queue) messages from another package
We’ll cover all of these tasks in Chapter 2 and 3. If you find that the built-in tasks do not fit your needs,
you can expand the capabilities by registering your own custom tasks. A custom task can be written in
any language that uses COM. Creating custom tasks will also be covered in a later chapter.
Steps
A step gives a package its logic. The step object allows you to connect tasks in a sequential order. Each
task has one step associated with it that can either execute in sequential order or in parallel order,
depending on how you have the package configured. The key difference between a task and a step is
that the task object holds the information about the individual function that you’re doing. For example,
a task would contain what file you’re executing. A step would tell the task when to execute.
You can also set constraints on tasks, which are called precedence constraints. Precedence constraints
allow you to dictate if a task will be executed in the event of a failure, success, or completion of another
task. For example, as shown below, step 1, which creates the table will have to execute and succeed for
step 2, which transfers data from Excel to SQL Server, to execute.
Welcome to DTS
13
Global Variables
Global variables can extend the dynamic abilities of DTS. Global variables allow you to set a variable in a
single area in your package, and use the variable over and over throughout the package, whether in an
ActiveX script or a data transformation. They allow you to communicate with other DTS tasks and pass
messages between them. Rowsets can also be stored into a global variable for later use by a different task.
An ideal case for using these variables would be to set up a dynamic database load. For example, say
you receive an extract each day from a mainframe. The extract file name changes daily based on the
client the extract is for and the date: for example, the filename CLIENT1-010198.txt would mean
client 1 run on January 1 1998. By using an ActiveX script, you can read the file name, and change the
global variable for the client number and run date based on the name you read. You can later read
these two global variables to determine where to insert the data and what security to allow. We’ll see a
complete example of how to do this is in Chapter 8 (Dynamic Configuration of Package Objects).
In SQL Server 2000, DTS uses global variables to a greater extent. Various tasks have
been added and modified to allow input and output parameters. Global variables act
as a holding tank for these until they’re needed. This is still possible in SQL Server
7.0. In SQL Server 7.0, you will have to write an ActiveX script to perform the same
action that the GUI completes in one step.
A Simple Package
Much of your package creation, as we will discuss later in the book, will be done in DTS Designer. An
example of this is shown below:
Chapter 1
14
In this package, data is transformed from SQL Server 1 to SQL Server 2. The execution of this task is
considered a step. This is shown as a solid line with an arrowhead pointing at the destination. If that
step fails, then the operator (system administrator) is e-mailed – the failure line (the one between SQL
Server 2 and the Email Operator) is displayed in red in the Designer. If the transfer succeeds – the
lower horizontal line, displayed in green in the Designer – then an ActiveX script is fired off to move
the log files into an archive directory. Once the move is complete, a batch file is executed to send a
broadcast message (using net send) to all network administrators of the package’s completion.
This just displays the steps in the package – the result of the package execution is not displayed in the
Designer. In the above example, the network administrators are informed of the package’s completion,
not its success or failure. The administrator would then have to go to the log files and determine if it was
properly executed. You can also program more complex logic in ActiveX to send a message regarding
the package’s success or failure.
What’s New in 2000?
SQL Server 2000 has expanded the power of DTS substantially. The DTS engine improvements that are
discussed in Chapter 2 and 3 include:
❑ Ability to save packages to Visual Basic files. You can then place the .BAS file into your VB program.
❑ Integration with Windows 2000 security (Kerberos) and the ability for Windows 2000 users to
cache packages.
❑ Ability to execute individual steps. In SQL Server 7.0, you had to disable all other steps to
debug problems in one step. In SQL Server 2000, that’s no longer necessary.
❑ Ability to run packages asynchronously.
❑ Support for Microsoft data link files (.udl). This allows you to create a .udl file that
expresses the connection strings (the ones you define in your package to point to the source
and destination) and to use it for your connection. In SQL Server 7.0, the .udl file was
compiled at design time, now it’s not compiled until runtime. This allows you to dynamically
configure the .udl at run time.
❑ Addition of an HTML Web page source.
❑ Multi-phase data pump, which allows a failed insertion attempt to not fail the entire package,
as well as giving you the ability to break a transformation into a number of stages, which are
completely customizable. This is discussed in great detail in Chapter 3.
❑ Ability to edit a package disconnected from the network.
❑ Ability to save your DTS files as template (.DTT) files.
The most exciting addition to DTS in SQL Server 2000 is the new tasks. The complete list of tasks is
discussed in Chapter 2. The added tasks are:
❑ File transfer protocol (FTP) transformation
❑ Microsoft Message Queue
❑ Dynamic Properties
❑ Execute Package
Welcome to DTS
15
❑ Move Database
❑ Move Users
❑ Move Messages
❑ Move Jobs
❑ Move Master Stored Procedures
There are also some added tasks once you install Analysis Services. These include one to reprocess an
OLAP cube and one to train a data mining model. Although the DTS user interface (the DTS Designer)
has not had a major facelift, it does have some added functionality.
The DTS Toolbox
Microsoft has given us several tools to create our packages. A DTS programmer can create a package with:
❑ Import and Export Wizards – which automatically create the package for you after asking a
series of questions.
❑ DTS Designer – a graphical user interface (GUI) that allows you to design and execute packages.
❑ DTS Programming Interfaces – series of APIs that are accessible through COM interfaces and any
COM compliant programming language such as Visual Basic or C++ and scripting languages.
Built into DTS is the ability to program in VBScript and JavaScript. We’ll look at this much more
in the Chapter 8. This can be expanded to any installed script, however, such as PerlScript.
As you might expect, the simplest way to create a package is through the wizards. As the wizards are
also the easiest way to get a little hands-on experience of DTS, we’ll take a look at these wizards here.
However, as we’ll see later, this method is also the weakest in terms of functionality. That’s why, in the
remainder of the book, we’ll be using the wizards as little as possible and be exploring the alternatives.
In DTS, the easiest way to move data is through the built-in wizards. The wizards allow you to quickly
create DTS packages or not even see the packages at all, and just execute them without saving them.
They give you the basics you need to create a package to transform data, but lack depth. Most of what
we discuss in the further chapters can’t be done with the wizards. The wizards are only meant to
transform data; any other functionality you might want is missing.
The wizards do provide a great way to create a basic package. The last step the wizard provides is the ability
to save the package for later execution, and at that point, you can modify the package to add your own logic
and the features that are missing. For example, you may know a flat file named transform.txt is going to
be in a certain location daily for transformation. However, you’ll need to fetch the file using the FTP task.
You can use the wizard to create the transformation part of the package and then save it. After you save it,
you can edit the package in the DTS Designer and add the FTP functionality.
Chapter 1
16
Using the Wizards
SQL Server provides an array of wizards to guide us through exporting and importing data. There are
two core wizards that help you move or copy data: The Import/Export Wizard and the Copy Database
Wizard (CDW). The wizards are an ideal way to migrate data from development to production or to
create the root of a package to add logic to later. In this section, we will work our way through examples
that will teach us how to:
❑ Export and import data from a database with DTS Wizards (we only go through the export
wizard – although the import wizard is almost identical).
❑ Export and import database objects such as stored procedures and triggers.
❑ Create packages through the wizards that you can later edit in Designer.
❑ Save and schedule packages for later execution.
In the examples, we will provide sections that are hands-on, so that you can begin to get a feel for DTS.
Interspersed between these are sections describing the functionality of the other options available, to
provide a comprehensive overview of what we can achieve with the wizards.
In SQL Server 6.5, we were able to transfer database objects using Transfer Manager. Transfer Manager
is a distant relative to DTS, but it lacks some of the functionality that DTS provides. This functionality
has carried over into SQL Server 2000 with the Copy objects and data between SQL Server
databases option that we will discuss in a moment. Some of the differences include:
❑ A direct way to save jobs to add custom components into them
❑ Access to OLE DB compliant data sources other than SQL Server
❑ Customization with ActiveX scripts
❑ A workflow type system
Transferring Data Using the Import/Export Wizard
Accessing the Wizard
In this example, we’ll import data from the SQL Northwind database into a database called Wrox. To
create the empty Wrox database, click your right mouse button in Enterprise Manager under the
database tree node and select New Database. It will not matter where your database is located.
You can also transform your data in this example into the TempDB database. The next time your SQL
Server stops and starts, the tables you create in this example will be purged. The TempDB database is a
nice way to test transformations before you move them over to the live database, and it automatically
cleans itself out after the server cycles. Never use the TempDB database for this purpose in production,
however, since you could slow the performance of some queries by doing this.
Welcome to DTS
17
As with most wizards, there are many ways of accessing them. You can access the wizards (as long as
you have highlighted Databases) through the Tools menu, Wizards, then Data Transformation
Services (or Management for the Copy Database Wizard); or Tools, then Data Transformation
Services. However, the primary way to access the DTS Import and Export Wizards is to open SQL
Server Enterprise Manager, and click with your right mouse button on the database you want to
import/export from – in this case, the Northwind database. Then select All Tasks, and Export Data:
Once you’re in the wizard, make sure you read each question very carefully. The wrong answer could
result in you purging records from your tables unintentionally. If you’re transforming into a
production database or any database that contains data you care about, make sure you backup the
database before running the wizard. Note also that entries are added into the log, so that we can
recover the data, if necessary.
The reason we choose to access the Import/Export Wizard by clicking our right mouse button on the
database is it takes a step out of the process. Since we did this, the first screen overleaf is already fully
completed for us. If you came into the wizard another way, and had not highlighted the Northwind
database first, you would need to select Northwind as the source connection database in the drop down
box at the bottom of the screen. You will need to enter the necessary security information (username
and password). As in ODBC data sources, the OLE DB provider for SQL Server supports either using
standard SQL Server or Windows Authentication. Those using Windows 98 on their machines will have
to click the Refresh button to see the database listings populate.
Chapter 1
18
The first screen you see prompts you for your Data Source, defaulting to the OLE DB Provider for
SQL Server:
Advanced Options
If you need more advanced OLE DB connection options in your transformation procedure, click the
Advanced… button to reveal the screen below:
The dialog box that will appear will present you with a number of OLE DB connection options that are
specific to the provider. In the OLE DB provider for SQL Server you will see options such as increasing
the connection timeout, or specifying an IP address for the source server. Note that if you need to
specify any Boolean values, they must be represented with a 0 (False) or 1 (True).
Welcome to DTS
19
The Application Name and Workstation ID options are nice for auditing purposes. Each database
management system (DBMS) has its own way of auditing who is connected, but in SQL Server if you
adjust these two options, you can run a stored procedure named SP_WHO2 from the server, and detect
which application and server name is connecting.
Note that you can also view connection information in Enterprise Manager. If you open Enterprise
Manager and drill down into the Management node, you can select Current Activity, then Process Info to
see the current connections. The Process Info item, as shown below, is the only item that contains the
level of granularity to let you see with which applications users are connecting. This is a handy tool to
debug connectivity problems and to determine if your connection request is even making it to the server.
Active connections performing a query are lit, while sleeping connections are gray.
Choosing the Destination Database
Once you’ve set all the options in the Choose a Data Source page, click Next. The Wizard will then
prompt you to select the destination database. Repeat the above steps for the destination database, this
time selecting Wrox as the database. Again, you may have to click Refresh to see the Wrox database in
the drop-down box.
Chapter 1
20
On clicking Next, you’ll be presented with the following options:
We’ll look at what each of these does in turn. For the purposes of the current example, choose the Copy
table(s) and view(s) from the source database option.
Copy Tables and Views Option
The easiest of the three options is to select Copy table(s) and view(s) from the source database. This
option transfers data from the source database, but it does lack in selectivity. In other words, this option
will select all data from any given table. If you wish to selectively transfer data, use the next option (Use
a query to specify the data to transfer), which we will examine later in the chapter.
The first dialog box in this section of the wizard will ask you which tables and views you’d like to
transfer into the Wrox database. For the purpose of our example, select only the Suppliers table from
the source column:
Welcome to DTS
21
The destination column will automatically be filled after you check the source. This does not mean that
the table exists in the destination database. By default, if the table does not exist in the destination
database, then the DTS engine will create it for you. If the table does already exist, then DTS will by
default append the new data to the existing data. The destination column will also allow you to select
another table from a drop-down list if you’ve already created one that’s more suitable.
Advanced Options – Column Mappings Tab
In the Transform column, you can click on the three dots to select some more advanced options. If your
destination table does not have the same schema as the source table or view, then you could specify which
columns to transfer and their mapping here. You can ignore certain columns by clicking on them, then
selecting the option from the Source drop-down box. The reason you’ll see a drop-down box for
only the Source column is that if you adjusted the Destination, you would receive an error. The error
would be generated by DTS sending data to a column that’s being ignored. A typical reason you’d ignore
columns is if you only wanted a subset of the data vertically, or if you wanted the destination table to
assign the identity column and not take on the identity from the source.
The options that are not needed are grayed out. For example, you can’t append data to a table that
doesn’t exist. Let’s have a look at options provided by the Column Mappings tab (although we will not
apply any of them in this example):
❑ Create destination table. This will also allow you to customize the transformation into that
table. If the table already exists, you are still given the option; however, you would need to
select the option to Drop and recreate the destination table. Otherwise you would receive an
error on execution complaining about a name conflict in creating the table.
❑ Append rows to destination table. If you were receiving incremental updates from a source,
you would use this option.
❑ Delete rows in destination table. Use this if you want to complete a data refresh without
dropping and recreating the table.
Chapter 1
22
❑ Enable identity insert. This option will turn off identity fields temporarily during the insert.
This is only needed if you’re inserting into a table that has an identity column (or a column
that’s auto numbering a column). You would receive an error upon inserting data into a table
that has an identity column unless you have this checked. This is because the system wants to
auto assign the new data a number. This option is automatically checked if there is a column
on the destination with an identity column in it.
❑ Drop and recreate the destination table. Use this option if you want to delete the table and
recreate it with a different schema.
As you can see, the append rows and delete rows options are not available when you have the Create
Destination Table radio box selected. The Create destination table radio box is automatically selected
when the table does not exist on the destination. This is because the table doesn’t exist. If you select the
box Drop and recreate the destination table and the table has not been created, you’ll receive an error
when executing the package as shown below. This is due to DTS issuing a drop command of a table that
doesn’t exist yet. The DTS engine will detect that the table does not exist on the destination and not
allow you to select, append, or delete from the table.
You can also click the Edit SQL button to modify the script used to create the table by hand. For
example, you could add statements to this to add the necessary primary and foreign keys. You will also
need to add any constraints or identity column information in here as shown below. The dialog box
does not support GO statements, which you will receive by default if you generate a script from
Enterprise Manager. There is an unusual workaround that you must do to create primary keys in this
fashion. After you try to exit this dialog box and go back to the transformation screen, DTS performs a
check on your SQL code. When it can’t find the table, DTS will return an error since you can’t issue an
ALTER TABLE statement (which is used to create constraints and primary keys) on a table that doesn’t
exist yet. The workaround is to create a shell of the table on the destination with one column:
Welcome to DTS
23
It is because of this workaround that I would recommend that you just create tables that need Primary
Keys and constraints on the destination before you try to transform the data into them. You can also run
through the wizard and then add the constraints afterwards. You can also select the Copy objects and
data between SQL Server databases branch of the wizard, which we will cover shortly in this chapter.
This feature is still nice if you’d like to modify the schema slightly on the destination. Those who
participated in the Beta program will remember that this wizard had the Foreign and Primary key
capabilities in it during the beta phases. It was pulled in the production, but Microsoft has promised to
revisit this ability in a later release of SQL Server.
Advanced Options – Transformations Tab
You can also perform more customized transformations. By going to the Transformations tab, as shown
in the following screenshot, you can write ActiveX logic to transform the data before it is committed to
the destination. For more complex logic, use DTS Designer. It offers more flexibility than the wizard’s
transformation tab will offer. To enable this option, select the option to Transform information as it is
copied to the destination.
As we will discuss in a later chapter, after you select this option, your transformation of data will slow
considerably because records must pass through logic in a script versus the built-in COM object that
DTS provides. You can write your script in any scripting language that is installed on the client running
the package. The only exception is if you want to schedule the package. In that case, you can only use
scripting languages that are installed on the server, since SQL Server Agent is actually executing the
package. To change the scripting language, select the new language from the drop-down box seen
above. The fastest of the scripting methods is VBScript followed by JavaScript. If you chose a language
other than those two, REXX for example, you take on a slight risk because DTS may have not been
tested in the scripting language.
Why would you use this feature then if it is considerably slower? You can use this to perform a number
of functions that you couldn’t perform with a straight copy. For example, if you received two fields on
the source, FirstName and LastName, and wanted to merge the fields onto the destination as just
Name, you could do so with a transformation script shown overleaf. Make sure you add a space between
the two fields with an empty string ” “. You can separate fields with the plus sign (+).
Chapter 1
24
DTSDestination(“Name”) = DTSSource(“FirstName”) + ” ” + DTSSource(“LastName”)
You can also use this feature to make fields upper case or to convert a zip code to the proper format. For
example, the Customers table in the Northwind database has a number of postal codes from around the
world, some alphanumeric, some numeric. With VBScript, you could determine if the postal code met a
certain requirement, like length, and then transform it. Other fields would be copied straight through. The
code below shows you an example on how to do perform such logic. Don’t worry about the code quite yet.
Several chapters are dedicated to how to do this in more detail.
If LEN(DTSSource(“PostalCode”)) = 9 Then
DTSDestination(“PostalCode”) = Left(DTSSource(“PostalCode”),5) + “-” +
Right(DTSSource(“PostalCode”),4)
Else
DTSDestination(“PostalCode”) =DTSSource(“PostalCode”)
As we mentioned earlier, the DTS Designer is a much better place to be performing this type of
transformation. Quite a few pre-defined transformations have already been set up for you in the Designer.
For example, you can easily make a field upper case in Designer with one click and no programming.
Saving and Scheduling the Package
After clicking OK and Next, you will be prompted to save, schedule, and execute the package. For the
purpose of this example, check the box that causes the package to Run immediately, and save the
package onto the local SQL Server by selecting the following options:
We’ll go into details about the various places to save your package in Chapter 2.
Welcome to DTS
25
Advanced Options
If you check Use replication to publish destination data, the Create Publication Wizard will begin after
the data Import/Export Wizard finishes transforming the data. You can then set up you destination
database to replicate to other sources, or schedule a package for later execution. This option will only
work if SQL Server Agent is running. If it is not running, the wizard will insert the record into the
MSDB database, which holds job information, and schedule the job, but the job will not be executed.
You will have to start it manually from Enterprise Manager or wait until the next scheduled cycle
arrives. A scheduled job can be manually executed by drilling down to the Jobs node under SQL
Server Agent (which is under Management).
You will have to save the package before it can be scheduled. If you do not save the package, you will
receive the below error:
If you schedule a package for later execution, SQL Server Agent will execute the package as a CmdExec
job using a utility called DTSRUN, which we will discuss later in the book. If your source and destination
connections use Windows Authentication, then you will need to ensure that the account that starts the
SQLServerAgent service has permissions to the destination and source database, and is started. You can
change the permission for SQL Server Agent by drilling down under the Management group and
clicking on SQL Server Agent with your right mouse button, then selecting Properties, to reveal the
screen shown overleaf:
Chapter 1
26
Naming and Executing the Package
When you have completed the information required on the save, schedule and replicate package
dialog, and clicked Next, you will be prompted to name the DTS package that you have just created.
Name this example package Ch1_CopyData. You can also specify an owner and user password here.
We will discuss this in the next chapter; so in the meantime, leave this blank.
Welcome to DTS
27
Make sure that you also add a meaningful description in the appropriate box. As you add more and
more packages to your server, naming conventions and descriptions become increasingly important.
The descriptions and names you type in these examples can later be loaded into the Microsoft Meta
Data Services as meta data (data about your data). This meta data will allow you to view details about
your package, the connections that it has and other descriptive data.
Click Next, and you will be presented with a summary screen. Clicking on Finish will cause the package
to execute, first saving the package in the event that an error occurs. If an error occurs, you will be
given an opportunity to go back and correct the problem. Data that has already been transformed will
stay committed on the destination database.
You should find that the Suppliers table has been added to your Wrox database.
The Import/Export Wizard is really designed to transfer data rapidly from source to destination. It does
not provide much room for customization. Once your data has been transferred, if you didn’t alter the
SQL statement used to create the table, you will have to create the primary key and foreign key
relationships. Also any type of other information, like identity fields, will need to be created. You will
not have to worry about this if you’re transferring into a table that already exists.
If you recall, Copy table(s) and view(s) was only one option that our Wizard offered us for transferring
our data. What about the other options?
Using Queries to Transfer Data
Follow the steps above to get back to the Specify Table Copy or Query screen. This time we’ll export
only a select set of records into the Wrox database from Northwind.
In this example, we’ll create a table that will be used for bulk faxing. Some of our data in the source
table is useless for this purpose and needs to be excluded.
Select Use a query to specify the data to transfer. You can transfer data from one or more views or
tables with this option. You are then given the option to type a query into the dialog box (Query
statement) or use Query Builder to help you create the query. We will go into more detail on the Query
Builder in a few moments. Since our query is simple, just type the following query in the Query
statement. This query will select all the suppliers that have entered a fax number:
Chapter 1
28
select supplierid, country, companyname, contactname, fax
from suppliers where fax is not null
order by supplierid
The Parse button will confirm that your query is a valid one. The Parse button checks all object names to
makes sure that everything exists. The Browse button will allow you to find a prewritten script.
After you have continued on to the screen below by clicking the Next button, you may want to click on
the “…” button under the Transform column to adjust the destination table schema, as shown earlier.
The default table name that the data will be transferred into is Results, as shown below. You can only
do one query at a time with the Import/Export Wizard.
Welcome to DTS
29
Execute your package as you did in the previous example (chose to run it immediately and save it the
local SQL Server), but this time save it as Ch1_QueryResult. Again, you will see the same screen as you
did before as it executes step after step. First the package will save as previously, then transform your
new table named Results.
Query Builder
We briefly touched on the Query Builder in the last example. The Query Builder allows you to easily
build queries that can include multiple tables from the same source database. You cannot go outside the
source database or select views in the Query Builder. It is a superb way to build quick queries for those
who are do not like the rigors of programming a nasty inner join in T-SQL. You will see Query Builder
is available in a number of DTS tasks.
Once again, repeat the steps to get back to the Query statement screen seen previously. This time open
Query Builder, chose the columns you want in your query and double-click on them. The interface will
only allow you to select one column at a time.
In this example, we are de-normalizing the database slightly. This means that we’re translating all the
foreign keys into their real data. We’ve taken CustomerID in the Order table and joined it with the
Customers table to find out the customer’s CompanyName.
Begin by selecting the columns shown in the next screenshot:
Next, select the column(s) on which you want your query to be ordered. Unless you change this later in
the Query statement screen, the query will be sorted in ascending order. In our example, the
CompanyName column will be ordered first, beginning with the ‘A’s, followed by the OrderID. This
performs the same action as an ORDER BY clause. Query Builder does not have the ability to do GROUP BY
clauses. GROUP BY clauses allow you to see the combined orders that each customer made. If you want to
do a GROUP BY, then you will have to make the adjustments in the Query statement screen we saw earlier.
Chapter 1
30
In the next screen, you can set the criteria for the query, by filtering your query horizontally based on
another column, or data in the column. This screen performs the same action as the WHERE clause.
The first Column: drop-down box will set the column you want to filter. Then, the Oper: drop-down
box sets the operator that will be used (=, , ). Finally, setting the Value/Column: drop-down box
makes the comparison.
To select an actual value, you can select the browse button (“…”) next to the Value/Column: drop-down
box. To select a value, double-click on the value and it will populate the drop-down box.
Welcome to DTS
31
You are then returned to the Query statement screen where you can then modify any part of the query
by hand, now that the hard part of your query is written:
Some of the limitations of Query Builder stop most programmers from using it. The inability to perform
a GROUP BY clause or sort in a descending manner is a big restriction. Most programmers find
themselves writing their query in Enterprise Manager then copying and pasting the query into the
Query statement screen. By right-clicking on any table in Enterprise Manager and selecting Query from
the Open Table option, you can write a very effective query without any of the restrictions.
Chapter 1
32
Transferring SQL Objects
We will now look at the final option on the Specify Table Copy or Query screen. The Copy objects and
data between SQL Server databases option is a close relative to Transfer Manager in SQL Server 6.5
and gives you the ability to transfer SQL Server objects between databases. You can transfer any SQL
Server object from any SQL Server 2000 instance to any SQL Server 2000 instance. You can also go
from any SQL Server 7.0 instance to 7.0 instance or upgrade to 2000 with this branch of the wizard. To
demonstrate how it works, we’re going to copy selected objects from the Orders table into Northwind.
Again, follow the steps to get back to the Specify Table Copy or Query screen, and select Copy objects
and data between SQL Server databases. You should see the screen below as the first screen. This
screen is similar to Transfer Manager.
Advanced Options
You can configure this branch of the wizard to perform the following functions:
❑ The Create destination objects option will create the objects you wish to transfer on the
destination server. Uncheck this option if you wish to only transfer data.
_ The Drop destination objects first will drop all the objects on the destination SQL Server
before it issues the commands to create them again. Check this option if you think that the
objects you’re trying to transfer may already be on the destination server, and you’d like to recreate
them. If you run through the wizard and experience errors, you may want to check this
option as well. Otherwise, you may see the error shown above opposite which states that the
object already exists on the destination server. This is because you went part way through the
transfer and there is no rollback performed.
Welcome to DTS
33
_ The Include all dependent objects option will transfer objects that depend on the table you’re
trying to transfer. For example, any views that depend on the table will be transferred if you
select this option.
_ The Include extended properties option will transfer all extended properties on SQL Server 2000
databases. This option does not apply if you’re transferring objects from a SQL Server 7.0 database.
❑ The Copy data option will enable you to transfer the data from the source to the destination server.
_ The Replace existing data will purge the source tables before it transfers the new data into it.
_ The Append data option will add the new data from the source at the end of the table on the
destination server.
❑ The Use collation option will enable you to transfer data between servers of different collations.
❑ The Copy all objects option will transfer all objects in the database and not allow you to select
certain objects to transfer. This includes all tables, views, stored procedures, functions,
defaults, rules, and user-defined data types. If you uncheck this option, you are given the
option to select which objects you would like to transfer. The Select Objects button allows
you to check the objects you’d like to transfer from the source SQL Server, as shown below:
❑ The Use default options checkbox will transfer the other SQL Server objects like indexes,
users, and primary and foreign keys. If you wish to specify other attributes to transfer, such as
logins, you must uncheck this option and select the Options button. This will open the screen
overleaf, which will allow you to specify more advanced options.
Chapter 1
34
Generally, it is not a good idea to specify any other options other than the default. We
recommend that if you want to transfer logins, for example, you just create those on the
destination server by hand. The Copy SQL Server logins option does not give you control of
what logins you transfer. One of the other options that are not checked by default is the
Generate Scripts in Unicode checkbox. This option is nice if you have a number of Unicode
fields on the source (nchar).
❑ The final option on the Select Objects to Copy screen is the Script file directory. This option
denotes where the scripts are written to on the computer executing the wizard. These scripts
are run on the destination server to produce and transfer the objects.
You have the option to generate a script that will automatically create the SQL Server objects
in the destination database, or you can just transfer the data. By checking Include All
Dependent Objects, DTS will transfer all the tables that a view refers to.
Transferring SQL Objects Continued
For our example, we will need to ensure that referential integrity is kept by transferring all dependent
objects. To do this, check the Include All Dependent Objects option.
Next, uncheck Copy all objects, and click on Select Objects. You’ll be presented with a list of objects
in the database to transfer. In this example, we’re only concerned with the Orders table and its data, so
select just the Orders table:
Welcome to DTS
35
It is generally a good idea to leave the default setting enabled for this branch of the wizard, so you can
ensure that the proper indexes and keys are also transferred. Valid objects that you can transfer include:
❑ Tables
❑ Views
❑ Extended Properties
❑ Stored Procedures
❑ Defaults
❑ Rules
❑ User-defined data types
❑ Logins and their object-level permissions
Save this package as Ch1_TransferObject and execute it. After executing the package, view the Wrox
database in Enterprise Manager. You will notice that the Orders table has been transferred, as well as its
dependents, which are Employees, Shippers, and Customers.
Chapter 1
36
We’ve seen how we can transfer data between two tables on the same database. What about transferring
data between different databases?
The Copy Database Wizard (CDW)
The Copy Database Wizard (CDW) allows you to copy or move one or more databases, and all their
related objects, to another server. CDW can be executed from the source server, destination server, or
workstation. Typical applications for using CDW include:
❑ Merging databases in another SQL Server instance to one instance, to consolidate the licenses
❑ Moving a database from an Alpha processor machine to an Intel-based server
❑ Copying a database from development to production
❑ Cloning a database to many servers
❑ Upgrade a database from SQL Server 7.0 to 2000
CDW uses DTS on the backend to perform its operations. The database is detached from the source, and reattached
to the new server. Through a number of new tasks added to DTS, CDW can transfer a database and
all associated logins, jobs, messages, and history. There are some rules that apply to using CDW:
❑ The destination server must be a SQL Server 2000 server
❑ The source server can be a SQL Server 7.0 or 2000 machine
❑ No sessions on the source database can be active
❑ User using CDW must be a member of the sysadmin role and must have NT administrator
rights to the system
❑ There cannot be a database on the destination server with the same name as the source you’re
trying to copy
Accessing the Copy Database Wizard
To begin the Wizard, go to Tools | Wizards… in Enterprise Manager, and under Management, select
the Copy Database Wizard. The wizard will only exist on workstations running the SQL Server 2000
client and is a part of the base SQL Server installation. Specify both your source and destination servers
as you did in the Import/Export Wizard. Note that, since the database names can’t be the same on the
source and the destination, you cannot use the same instance of a server as both the source and the
destination. At that point SQL Server will scan the source sever to find out up front which databases can
be transferred, as seen above opposite:
Welcome to DTS
37
After clicking Next, CDW will scan the directory to make sure there aren’t any file name conflicts.
Conflicts occur when the file name on the destination is the same on the as on the source. Conflicts are
represented with a red X:
Chapter 1
38
Advanced Options
Conflicts can be corrected by clicking on the Modify button. You can also adjust where the files will be
placed on the target server. If a conflict occurs, CDW will not allow you to leave the below screen:
If you’re moving a database to a new server, the physical files will not be deleted until you ensure that
the database is fine and delete them manually – you have to delete the physical files after you finish the
move. If you’re moving a database from one instance to another instance on the same SQL Server
machine, then the file will be physically moved to the new location, and no manual delete is necessary.
Specifying Items and Executing the Package
The next screen is one of the most important in CDW. This dialog box is where you specify which SQL
Server items you’d like to move. By default, every SQL Server login will be copied. If you would like
CDW to be more selective about what it moves, select the relevant option. For the purpose of this
exercise, just accept the defaults:
Welcome to DTS
39
Next, select when you’d like the package to execute, as shown in the screen below. The Name property
designates what name the package will be saved as on the destination server. The package is saved in
case you receive an error during the copying of the database. If you receive an error, you can execute
the package manually later from Enterprise Manager. The most common error is that users may be
logged into the source server’s database that you’re trying to move or copy. You can also schedule the
package as you did in the Import/Export Wizard.
Since all the users must be disconnected from the source database being transferred, when you’re doing
this in the “real world”, you may want to schedule the package to be executed at off-peak hours.
After executing the package, you should see a dialog box that will display the status of the move:
Chapter 1
40
The More Info >>> button will show you the step details. For example, in the following screenshot you
can see that we received an error from a user being connected to the source database:
If you select more than one database to move, only one database is moved at a time. Again, once you
have validated the move, you can delete the physical files from the source, if they’re no longer needed:
Welcome to DTS
41
Because CDW uses detach and attach processes, it can move a database much faster than the DTS
Import/Export Wizard. CDW also physically creates the database. The advantage to the Import/Export
Wizard is that it can handle heterogeneous data and can import into databases that already exist.
The DTS Designer
We said earlier that the DTS Designer is a GUI that allows us to design and execute packages. We will
examine it in much more detail in the next chapter, but before we take a preliminary look here, you
need to realize that this is a very powerful utility. Not only can you use the DTS Designer to create new
packages, but you can also edit existing packages, including those created with the wizards. Let’s begin
by seeing how we access this Designer.
Accessing the DTS Designer
To access the DTS Designer, open SQL Server Enterprise Manager, then click your right mouse button
on the Data Transformation Services group and click New Package or you can click Action from the
Enterprise Manager menu and select New Package.
Chapter 1
42
Viewing DTS Packages
In the examples in this chapter, we’ve saved our packages locally onto our SQL Server. To view a
package after you’ve saved with the Wizard, select Local Packages under the Data Transformation
Services group in Enterprise Manager. In the right pane, you will the packages that you have saved:
We will cover the advantages of saving packages in the various formats in the next chapter. If you have
saved a package as a Meta Data Services package, you can open it again by clicking Meta Data
Services Packages under the Data Transformation Services group in Enterprise Manager. To open
COM-structured file packages, click your right mouse button on the Data Transformation Services
group, then select Open Package.
You can execute a package in Enterprise Manager by clicking your right mouse button on the package,
then selecting Execute Package.
Double-clicking on any package opens it in the DTS Designer. For example, our Ch1_CopyData package
looks like the screenshot above opposite. In the below package, you can see that the first step to the right is to
create the table needed, then we transform the data from Connection1 to Connection2 in the second step.
Welcome to DTS
43
Although the wizards do not offer much flexibility, they are a good building block if you wish to edit
them in DTS Designer later. We’ll go into more detail about the various options and DTS Designer in
the next chapter.
A Note About NT Authentication Security
SQL Servers running Windows 98 will not be able to use Windows Authentication. However, if you’re
signing into a domain and try to access a SQL Server on the network running Windows NT or 2000,
then this option is available. It is important to note that Windows Authentication is the preferred
method to access SQL Server since no passwords are passed over the network or saved. A majority of
the SQL Server security bugs that Microsoft has announced were based on standard SQL Server
security. For example, many SQL Server bugs are announced where SQL Server saves the sa password
as clear text in the registry.
Chapter 1
44
Summary
DTS is a collection of objects and tools that allow you to transform data easily, and is an integral part of
SQL Server. In the past, you would have had to buy expensive programs to do such a task. DTS is not
just for transforming data though. You can develop a workflow system to execute programs or scripts.
You can even develop your own custom COM components to plug into DTS.
We’ve seen that the package is the primary component of DTS and is what you create and execute.
Inside the package is a set of objects called tasks, which are a set of instructions, and steps, which tell
DTS which order to execute the tasks in.
After our basic introduction to DTS, we moved on to look at the Wizards that DTS provides to help
simplify some of the most fundamental tasks. They also provide us with the essentials to create a
package. You can create a package in the wizards, then go back into it in Enterprise Manager (after
saving the package in the wizard) to add your own logic into the package.
One of the most efficient ways to move data is through the Copy Database Wizard, which will move
the database and all related items. The DTS Import/Export Wizard also provides an easy way to convert
data quickly from other sources such as Excel or DB2. We have an entire chapter dedicated to
heterogeneous data conversion.
There are three ways of transforming our data through the wizard:
❑ The copying data option is the fastest way to transfer your data but lacks in selectivity.
❑ Using a query to select data to be transferred allows you to copy select data. In this chapter,
we had an example where we filtered the Suppliers table to only return suppliers in the USA.
❑ The transfer of SQL Server objects allows you to transfer objects like tables, stored
procedures, and views.
In the next chapter we will discuss more advanced ways of manually creating a package through the

Advertisements

Indexes

Posted: March 10, 2007 in Uncategorized

What are Indexes?
Indexes are database objects that help for fast retrieval of data and its integrity. The database finds the shortest path to the requested data by the help of indexes.
Why are indexes created?

Indexes are created to fetch information fast. They are pointers to the information.

Indexes should be created keeping in mind the most appropriate or frequently fetched column.

Create index IndexName
On Table (column name)

SQL Server stores database information in three files that are visible even from the windows explorer. These files are called the : primary data file (the extension is “.*.mdf”), secondary data file (the extension is “ndf”), and log file (the extension is “.ldf”).
SQL Server stores data in the base units called pages, each of which can store 8K of data. There is a maximum of 8060 bytes can be used for storing data in each page. This accounts to the maximum row size constraint. The maximum row size available in SQL Server is 8060 bytes, because that is the maximum that can be stored in a page. The pages are then combined in to extents, which are a combination of 8 pages. There are special pages, called Index Allocation Maps, which stores information about which extents are used for a particular table.
The information about SQL Server indexes is stored in sysindexes system table. It also contains information about how to start reading actual data. If you have defined a clustered index in a table, the whole data is stored in ordered format, and the Indexes can be used to exactly identify the storage location. This means SQL Server doesn’t have to do a complete table scan to find the exact data that was requested for. This is an optimized path, which helps to access data and fetch it to the user. Since there is a pointer that actually points to the data, queries can be executed faster.
Types of indexes:
Indexes are of two types. Clustered indexes and non-clustered indexes.
Clustered indexes:
When you create a clustered index on a table, all the rows in the table are stored in the order of the clustered index key. So, there can be only one clustered index per table.
The pages in the data chain and the rows in them are ordered on the value of the clustered index key. All inserts are made at the point the key value in the inserted row fits in the ordering sequence.
Indexes are organized as B-trees. Each page in an index holds a page header followed by index rows. Each index row contains a key value and a pointer to either a lower-level page or a data row. Each page in an index is called an index node. The top node of the B-tree is called the root node. The bottom layer of nodes in the index is called the leaf nodes. The pages in each level of the index are linked together in a doubly-linked list. In a clustered index, the data pages make up the leaf nodes. Any index levels between the root and the leaves are collectively known as intermediate levels.
SQL Server navigates down the index to find the row corresponding to a clustered index key. To find a range of keys, SQL Server navigates through the index to find the starting key value in the range, and then scans through the data pages using the previous or next pointers.
The average size of a clustered index is about five percent of the table size. However clustered index size varies depending on the size of the indexed column.
During index creation, the SQL server temporarily uses disk space from the current database. A clustered index requires 1.2 times the table size for working space when the index is created. The disk space that is used during index creation is reclaimed automatically after the index is created.
Non-clustered Indexes
Nonclustered indexes have the same B-tree structure as clustered indexes, with two significant differences:
The data rows are not sorted and stored in order based on their nonclustered keys.
The leaf layer of a nonclustered index does not consist of the data pages.
Instead, the leaf nodes contain index rows. Each index row contains the nonclustered key value and one or more row locators that point to the data row (or rows if the index is not unique) having the key value.
If the table is a heap (does not have a clustered index), the row locator is a pointer to the row. The pointer is built from the file identifier (ID), page number, and number of the row on the page. The entire pointer is known as a Row ID.
If the table does have a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server 2000 makes duplicate keys unique by adding an internally generated value. This value is not visible to users; it is used to make the key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.
Because nonclustered indexes store clustered index keys as their row locators, it is important to keep clustered index keys as small as possible. Do not choose large columns as the keys to clustered indexes if a table also has nonclustered indexes.
If the table does not have a clustered index, the row locator will be combination of fileno, pageno and the no of rows in a page.
If the table does have clustered index, the row location will be clustered index key value.
Non-clustered indexes are particularly handy when we want to return a single row from a table.
The order of the leaf pages of a non clustered index differs from the physical order of the table. The leaf level pages are sorted in ascending order. There are up to 249 non-clustered indexes.
Disadvantages:

If you create an index on each column of a table, it improves the query performance, as the query optimizer can choose from all the existing indexes to come up with an efficient execution plan. At the same time, data modification operations (such as INSERT, UPDATE, DELETE) will become slow, as every time data changes in the table, all the indexes need to be updated. Another disadvantage is that, indexes need disk space, the more indexes you have, more disk space is used.

What Are Recovery Models?

Posted: March 10, 2007 in Uncategorized

Although you and I might work on very different databases, when it comes down to backups, most people’s requirements are met by a similar set of needs. For example, many people have development databases that they don’t need to back up that often. Likewise, many of us have worked on systems where every minute of activity is important and needs to be backed up as soon as possible. For each of these situations, a different backup strategy should be applied.
In order to make backups easier to implement, Microsoft has grouped the various strategies that can be employed into three stereotypes or models: simple, full, and bulk-logged. These SQL Serverspecific models not only help you think about your needs, but also simplify the implementation of the appropriate strategy.
Understanding Transaction Logs and Bulk CopyBefore I delve into recovery models, let me first briefly review two related features of SQL Server: the transaction log and the bulk copy utility. How the recovery models treat these two features will be one of the main differences between them.
Every database must have at least one file that is used as the transaction log. When a change is made to any information in a database, the change is first written to the transaction log before the transaction is committed. The change is only written to the database itself when a “checkpoint” occurs. During a checkpoint, all committed transactions are written to the database itself. (SQL Server periodically initiates a checkpoint for each database.).
SQL Server uses the transaction log, among other things, to ensure that committed transactions are not lost due to a power failure. A transaction log also allows transactions to be aborted and rolled back before they are committed.
Think about a transaction that moves money from your checking accounting to your savings account. The transaction requires two separate actions—a debit to your checking account and a credit to your savings account. If any change were written directly to the database, a failure between the debit and the credit would result in an imbalance in your accounts. Instead, the change is first written to the transaction log. Only after both operations have been successfully recorded in the transaction log and the transaction is committed are the changes written to the database.
SQL Server comes with a utility (the bulk copy program) that is very useful when loading large amounts of data from a file into the database. One of the ways that this utility can be faster than a similar number of INSERT statements is by allowing the bulk copy program to operate in non-logged mode. In this mode, each inserted row will not be written to the transaction log, thereby speeding up the operation. (Similarly, updates to text fields can be done in either logged or non-logged mode.)
Now let’s look at the various recovery models and see where these features come into play.

Exploring the Recovery ModelsSQL Server 2000 has three models: simple, full, and bulk-logged. Let’s look at what each model means, starting with the easiest.
Simple: The simple model describes just that: the simplest and easiest situation to manage. When a database is set to this model, there is no way you can back up only the changes made since the last backup. Instead, only full backups are allowed. One benefit of this model is that the transaction log won’t become full from transactions occurring between the full backups. Whenever the database performs a checkpoint, space in the log is reclaimed. Additionally, non-logged operations such as bulk copy are permitted.
Full: The full model allows you to create not only complete backups of the database, but also incremental backups of only the changes that have occurred since the last full backup.An added benefit is that it’s possible to recover the database to a particular point in time. For example, if a user accidentally deletes all accounts in a database at 1 PM, it’s possible to restore the database up to 12:59 PM, right before the deletion of the accounts occurred.Under this mode, space in the transaction log is only reclaimed when a backup of the transaction log is made. When this occurs, all the changes stored in the transaction log are written to the backup and the space is freed up. Therefore, databases in this mode need to have enough space available for the transaction log to store all the transactions that occur between each backup. Additionally, non-logged operations are not allowed.
Bulk-Logged: The bulk-logged model lies between the other two models. On the one hand, incremental backups of the database are possible. The transaction log is treated the same way in this model as in the full model. However, bulk copy operations are only minimally logged. Instead of logging each insert into the table, SQL Server only logs the minimum necessary to recover the data if the backup is needed. However, because of this, if a bulk copy operation occurs, point-in-time recovery (described in the previous paragraph) is not possible. Recovery can only stop at the end of a transaction log.

Choosing the Right Model for Your NeedsNow that you understand the recovery models, it should be relatively straightforward for you to choose one for a particular situation. Ask yourself these questions.
When Does Data Change in Your System? If changes only occur during scheduled batches, then the simple model of operation might be the easiest. You will only need to execute a full backup after each batch is run. For example, if your reporting database loads data each evening, this model will be the simplest and allow fast loading of data when using BULK COPY.However, if it is crucial to capture each and every change to the database—and such changes occur throughout the day—the full model would be the most appropriate. This applies in most systems that run business operations—for example, order-taking systems. In this case, you need to go to the next step before choosing your model.
Do You Need Fast Loading of Data? If you need to capture changes that occur to your database throughout the day, you have a choice between using the full or the bulk-logged model. Both allow incremental backups. If you need to load data quickly using BULK COPY, then choose the bulk-logged model. Otherwise, choose the full model. However, before finalizing your decision, you have one more question to answer.
Do You Need Point-in-Time Recovery? If you need to be able to recover to a particular point in time, you must choose the full model. The bulk-logged model only allows you to recover to the end of a transaction log.
Applying a Model to the DatabaseNow that you’ve chosen a model, apply it to the database of your choice. To do so, issue an ALTER DATABASE statement: ALTER DATABASE [database name]SET RECOVERY [either: FULL BULK_LOGGED SIMPLE]
In order to tell what model is being used by a particular database, execute this command: SP_HELPDB [database name]
This stored procedure returns a wealth of information regarding the database. You will find the recovery model of the database under the Status column where it says, “RECOVERY=[name of model]”.
Where Have All My Options Gone?

Those of you familiar with previous versions of SQL Server might be wondering what’s become of sp_dboption. Read on…

For those of you who have used SQL Server in the past and were expecting to use the SP_DBOPTION command, see the sidebar on this topic.
Choosing a recovery model doesn’t mean that you are stuck with your choice. On the contrary, it is easy to switch from one to another simply by issuing another ALTER DATABASE command. For example, you might choose to switch to BULK_LOGGED mode right before you perform a bulk copy in order to speed up that task and immediately after switching back to FULL mode in order to minimize the possible loss of data. Not only is the switch easy and painless, but backups can continue as scheduled.
Your Next StepChoosing a model does not automatically generate backups for you. You still need to set up and schedule them. In a future article, I’ll discuss the various types of backups—full, differential, transaction, and file backups—that SQL Server provides, and demonstrate how to choose among them.
If you need some immediate information on the subject you can look in Books Online in the chapter called “Backing Up and Restoring Databases.”
Joe Lax has spent the last 10 years working in various database environments and has been a practicing DBA on all versions of SQL Server from version 4.2 onward. Joe is also a MCSE and an MCT. Recently, he has started to learn Oracle, which affords him no end of fun.

Introduction To Backup & Recovery.

Posted: March 10, 2007 in Uncategorized

Introduction
One of the most important aspects for a database environment is ensuring reliable backups are being executed and a dependable recovery plan is established in the event of a system failure or data corruption. Several options are available for defining your backup and recovery model and your choices will determine the reliability and the amount of data loss your company can acceptably incur.
This document analyzes the various options available for your backup and recovery process with SQL Server 2000 as well as an enhancement to your SQL Server backup and recovery process using a highly efficient backup and restore utility that provides significant time and disk space savings called SQL LiteSpeed.
Recovery Model
Which recovery model is best for the databases in your environment? This setting depends on the critically of the data and the acceptable data loss in the event of a system failure. SQL Server 2000 offers three recovery models that can be implemented for your databases. The appropriate choice depends on your applications and the criticality of your data. These settings can be configured either through Enterprise Manager or through T-SQL using the ALTER DATABASE command.
The three database recovery model options are:
Simple
With the Simple Recovery model, data is recoverable only to the most recent full database or differential backup. Transaction log (or incremental changes) backups are not available. The Simple Recovery model is easier to manage than the Full or Bulk-Logged models, but at the expense of higher data loss because the contents of the database transaction log are truncated each time a checkpoint is issued for the database.
Full
The Full Recovery model uses database backups and transaction log backups to provide complete protection against failure. If one or more data files are damaged, restoring the backups permits recovery of all committed transactions using a combination of the database and transaction log backups.
Full Recovery provides the ability to recover the database to the point of failure or to a specific point in time. All operations, including bulk operations such as SELECT INTO, CREATE INDEX, and bulk loading data, are fully logged and recoverable.
Bulk Logged
The Bulk-Logged Recovery model provides protection against failure combined with the best performance. In order to get better performance, the following operations are minimally logged and not fully recoverable: SELECT INTO, Bulk load operations, CREATE INDEX as well as text and image operations.
Under the Bulk-Logged Recovery model, a damaged data file can result in having to redo work manually based on the operations above that are not fully logged. In addition, the Bulk-Logged Recovery model only allows the database to be recovered to the end of a transaction log backup when the log backup contains bulk changes. Point-in-time recovery is not supported.
SQL Server 2000 Enterprise Manager directions to configure the database Recovery Model:
In SQL Server Enterprise Manager, open the ‘Databases’ folder. Once the database folder is expanded, right click on the database and select the ‘Properties’ option. The ‘Database Properties’ window will open. Click on the ‘Options’ tab and the recovery model will be listed in the middle of the screen. Click on the drop down box to select the needed recovery model. On the bottom of the screen click ‘OK’ to save the Recovery Model.
SQL Server 2000 Transact-SQL directions for ALTER DATABASE commands to configure the database Recovery Model:
ALTER DATABASE NorthwindSET RECOVERY FULL GOIn this example the Northwind database is set to ‘Full’ Recovery Model.
Backup Options
Once the database recovery model has been identified, it is necessary to decide which backup method needs to be instituted for your backup and recovery procedures. There are several options and each has advantages and disadvantages. The backup options can be configured with either the Maintenance Plan Wizard, Enterprise Manager or through the use of T-SQL commands. Below outlines the available backup options:
Database
This option creates a full copy of the database. A complete snapshot of your database is created at the time the backup occurs.
Transaction
This option provides a copy of the active transaction log. Transaction log backups operate in conjunction with database backups to allow you to append transactions that have occurred since the last database backup. If successive logs are created, each log creates a set of the new transactions since the last transaction log backup.
Differential
This option copies only the database pages which have been modified after the last database backup. If successive differential backups are created, only the most recent differential backup is required for the recovery process. Differential backups are leveraged in combination with full backups. It is necessary to execute a full backup first and then execute the Differential backups on the needed interval. In addition, it is possible to use transaction log backups with differential backups based on the backup schedule.
File or Filegroup
For very large databases, an option is available for executing database file or filegroup backups. These backups allow you to backup a single data file at a time. One of the drawbacks with this option is that it requires more effort in planning the backup and recovery process as well as your overall database design. In most instances you only have one data file and one log file for each database and therefore this option does not make sense. Also, in order to use filegroup backups you must use transaction log backups in conjunction with this backup method.
Snapshot Backups
Using third party tools, such as Storage Area Network (SAN) solutions, you have the ability to capture file level snapshots of the database to replicate the complete database files to other disk drives on the SAN. Unfortunately, this method is expensive and not an option for most database installations.
Backup Commands
There are primarily two options when constructing backup commands, either backing up the database or the transaction log. In conjunction with these commands, there are several options which can be specified when constructing your backup commands. These additional options can be found in SQL Server Books Online in an article entitled ‘BACKUP’.
In the commands below, the {device} reference can specify either a logical or physical device. In constructing the commands you can reference the name of a physical file or you can specify a logical device that has been setup through Enterprise Manager or T-SQL. More information about this can be found in SQL Server Books Online.
DATABASE
This option specifies backing up the data portion of the database. For this command there are options to specify the full database, a list of files/filegroups or differential backups. The backup commands are constructed as follows:
DatabaseBACKUP DATABASE {databasename} TO {device}
DifferentialBACKUP DATABASE {databasename} TO {device}.WITH DIFFERENTIAL
FilegroupBACKUP DATABASE {databasename} FILE = {filename}, FILEGROUP = {filegroup} TO {device}
LOG
This option specifies a backup of the active transaction log. The log is backed up from the last successfully executed LOG backup to the end of the log. The command is constructed as follows:
BACKUP LOG {databasename} TO {device}.
Tracking Tables
Several tables exist in the msdb database that track the backup operations which occurred on the server.
These tables include:
backupfile – Contains one row for each data or log file that is backed up
backupmediafamily – Contains one row for each media family
backupmediaset – Contains one row for each backup media set
backupset – Contains a row for each backup set
Restore Commands
The restore commands are equivalent to the backup commands in terms of syntax. You have the option to execute database or transaction log restores. In addition, there are more commands available that permit checking the validity of the backup file as well as read the contents of the backup file prior to executing a restore.
DATABASE
Specifies the complete restore of the database from a backup device. This can either be a full database, differential or a filegroup restoration. If a list of files and filegroups is specified, only those files and filegroups are restored.
DatabaseRESTORE DATABASE {databasename} FROM {device}.
Database and DifferentialRESTORE DATABASE {databasename} FROM {device} WITH NORECOVERYRESTORE DATABASE {databasename} FROM {device}
FilegroupRESTORE DATABASE {databasename} FILE = {filename}, FILEGROUP = {filegroup} FROM {device} WITH NORECOVERYRESTORE LOG {databasename} FROM {device}
LOG
Specifies a transaction log restore is to be applied to the database. Transaction logs must be applied in sequential order from the oldest backup to the most recent backup. SQL Server checks the backed up transaction log to ensure that the transactions are being loaded in the correct database and in the correct sequence. To apply multiple transaction logs, use the NORECOVERY option on all restore operations except the last restore command where the database recovery is needed. In addition, a transaction log restore must be executed following the database restore.
RESTORE DATABASE {databasename} FROM {device} WITH NORECOVERYRESTORE LOG {databasename} FROM {device} WITH NORECOVERYRESTORE LOG {databasename} FROM {device}
VERFIYONLY
Verifies the validity of the backup, but does not restore the backup. This process confirms that the backup set is complete and that all volumes are readable for SQL Server to restore the backup in the future. However, RESTORE VERIFYONLY does not attempt to verify the structure of the data contained in the backup volumes. If the backup is valid, the following message is returned: “The backup set is valid.”
RESTORE VERIFYONLY FROM {device}
FILELISTONLY
Returns a result set with a list of the database and log files contained in the backup set.
RESTORE FILELISTONLY FROM {device}
HEADERONLY
Retrieves the backup header information for all backup sets on a particular backup device.
RESTORE HEADERONLY FROM {device}
Tracking Tables
Several tables in the msdb database house all of the restore operations that occurred on the server. These tables are as follows:
restorefile – Contains one row for each restored file, including files restored indirectly by filegroup name
restorefilegroup – Contains one row for each restored filegroup
restorehistory – Contains one row for each restore operation
Best Practices
Selecting the recovery model and backup options can be simple for your SQL Server implementation. The best scenario is to select the options that provide the most flexibility. The following are some guidelines that can be used for selecting the appropriate backup and recovery model as well as some additional considerations to institute.
Recovery Model Selection
If you are unsure what recovery model to use, the best bet is to implement the FULL recovery model. This option is the most flexible and gives you the most options. It allows recovery for the most amount of data in the event of a failure. Even if the FULL recovery model is selected, you are still free to choose the individual implementation backup options.
Backup Options
The best method is to perform full backups as often as possible depending on the size of your database, along with differential backups and lastly with a combination of transaction log backups. The frequency is dependent on your environment and the acceptable data loss for your company. If you have extremely large databases it will be difficult to execute frequent full backups, so you need to look at a combination of options.
A good starting point might be the following backup schedule:
Execute a full database backup on a daily basis
Perform transaction log backups every 15 minutes during the business day
Complete differential backups every 4 hours
Rationale – The differential backups will minimize the number of transaction log restores needed. If you backup the transaction logs every 15 minutes, a maximum of 15 transaction logs would need to be restored. The worse case scenario would be 18 restorations in order to bring your database online and running. The 18 restorations would be one full, one differential, the 15 transaction log restores and one last transaction log. This last log would be from your attempt to backup your active transaction log if possible, before you begin your restores.
Backup to Disk First
Backing up databases to disk first gives you the fastest means for performing database backups and restores. In the event that a database restore is needed, the most recent backups are on disk instead of having to request tapes to complete the restoration.
Disk backups give the DBA the most control. As a DBA you will have more control over the backup schedule. You know exactly when backups are going to start and exactly when they should finish. You do not need to worry about other variables outside of your database server to determine when and if good backups occurred. It is still necessary to coordinate with your Backup Operator in your organization to make sure the tape backup occurs after the database disk backup, so the latest backup is stored on tape.
When backing up the databases to disk, ensure you are backing up the database to a different physical drive on a different controller card. In the event of a disk failure, if you use the same drives for the database and backup you risk the loss of your databases and backups.
If you have the ability to backup to a different machine or a network appliance, this option allows for superior level of redundancy in the event of a failure.
Archive to Tape
Several backup vendors in the marketplace can assist with managing and verifying the success of your tape backups. These products should be part of your backup strategy, but should read the backup files from disk instead of executing backups directly from your databases. Several vendors offer external agents to read the SQL Server databases directly, but the recommendation is to write the backup to disk first and then to tape second. This method also gives you two full sets of backup files.
Test
Irregardless of the backup method, it is advantageous to periodically test the validity of the backups. It is in your best interest to randomly select databases to restore onto test servers to ensure the restore functionality works properly and is meeting your expectations. The more frequently restoration testing is conducted, the better you will prepared for a real recovery.
Verify
Take the time to verify the backup is valid. The verify option listed above allows you to have peace of mind that the backup was successful. The last thing you want is to find out that the backup will not successfully perform the restoration. Take the extra time to run the RESTORE with VERIFYONLY option to ensure the backup was successful and is available when needed.
System and User Databases
Ensure the backup procedures include all system and user databases. In the event of a complete system failure, it will be necessary to restore the system databases along with your user databases in order to recreate the SQL Server environment.
Faster and Smaller Backups
Tight on time and disk resources for your SQL Servers? SQL LiteSpeed is a superior tool to address both of these issues, backup/restore time and minimal disk space. SQL LiteSpeed delivers the same functionality as the native BACKUP and RESTORE functions that ship with SQL Server, but with an incredible amount of time and disk savings.
SQL LiteSpeed mimics the commands used for all BACKUP and RESTORE functionality through a set of SQL Server extended stored procedures. Basically all of the functionality mentioned above is available through these extended stored procedures. The only difference is how the commands are constructed and the significant time and disk space savings.
This product is available for SQL Server 2000 and 7.0 and runs on both Windows NT and Windows 2000.
Time Savings
The following chart demonstrates the time savings by leveraging SQL LiteSpeed compared to traditional backup commands. As you can see in the chart below the time is reduced by 50% or more. Actual time savings depends on your hardware and your databases, but in most cases you will realize 50% or more reduction in the amount of time to complete the backup.
SQL LiteSpeed offers the first available configuration options to improve the speed for the backup and restore process not available from any other vendor in the SQL Server marketplace. The optimal configurations are accomplished by setting the number of threads, priority and latency for the SQL LiteSpeed backups and restorations. The configurations are dependant on the server resources to include the number of CPUs and memory which can be devoted to the backup process in conjunction with the remainder of the SQL Server processing. These configuration options enable a DBA to schedule backups as needed and be able to determine the appropriate amount of resources for the server based on the processing load. This performance tuning feature is unprecedented and is not available with other backup utilities.
In addition to the time savings for backups, there is also a time savings for restores. During a critical failure, every second is valuable!
SystemConfig
TraditionalBackup (mins)
SQL LiteSpeedBackup (mins)
SpeedGain

1 CPU3 GB DatabaseSingle EMC Disk
3
38 sec
473%
LiteSpeed Test
4 CPU64 GB DatabaseStriped Local Disk
38
13
292%
LiteSpeed Test
8 CPU50 GB DatabaseStriped EMC Disk
55
23
240%
LiteSpeed Test
4 CPU50 GB Database
178
38
481%
Our Test
1 CPU1.4 GB Database
3.36
1
336%
Our Test
4 CPU34 GB Database
16
8
200%
Our Test
Source – SQL LiteSpeed – Advanced SQL Server Backup – http://www.sqllitespeed.com/slsdefault.asp
Disk Savings
The following chart demonstrates the disk savings by implementing SQL LiteSpeed compared to traditional backup commands. As you can see in the chart, the disk space needs are reduced by 60% or more. You can see that even with small databases there is still a large space savings. If you add up all of your 1GB and smaller databases you can save a significant amount of disk space. In addition, these smaller files will benefit you when you need to copy these files around your network.
There are also options to allow you to further compress the size of the backup, but our testing shows that the increased time it takes for the backup with further compression does not offset the space savings enough. The product is fairly well optimized using the default settings for compression, but you should test different options in your environment.
SystemConfig
TraditionalSize (Gb)
SQL LiteSpeedSize (Gb)
SpaceSaving

1 CPU3 GB DatabaseSingle EMC Disk
1.89
0.23
88%
LiteSpeed Test
4 CPU64 GB DatabaseStriped Local Disk
44.5
7.8
82%
LiteSpeed Test
8 CPU50 GB DatabaseStriped EMC Disk
34
5.2
85%
LiteSpeed Test
4 CPU50 GB Database
33
12
63%
Our Test
1 CPU1.4 GB Database
1.1
8MB
99%
Our Test
4 CPU34 GB Database
22
5
77%
Our Test
1 CPUNorthwind 3.9MB
2.8MB
692K
75%
Our Test
Source – SQL LiteSpeed – Advanced SQL Server Backup – http://www.sqllitespeed.com/slsdefault.asp
Encryption Capabilities
SQL LiteSpeed utilizes 128 bit encryption which provides one of the strongest levels of protection against data misuse either on site or while a tape is stored off site for disaster recovery purposes. Unfortunately, if the encryption key is lost or forgotten it is not possible to restore the backup. Therefore, having the information stored in a secure area with secure staff is crucial for the recovery process.
Command Comparison
The following chart displays a sample of how the native SQL Server commands compare to the SQL LiteSpeed commands. As you can see, the commands are very similar and all the functionality is replaced by using the SQL LiteSpeed extended stored procedures.
(To simplify the illustration parts of the command that are very similar have been reduced to …)
Command
Native SQL Command
SQL LiteSpeed
Backup Database
BACKUP DATABASE …
EXEC master.dbo.xp_backup_database …
Backup Log
BACKUP LOG …
EXEC master.dbo.xp_backup_log …
Backup FileGroup
BACKUP DATABASE … FILEGROUP = ‘PRIMARY’ …
EXEC master.dbo.xp_backup_database … , @filegroup = ‘PRIMARY’
Differential Backup
BACKUP DATABASE … WITH DIFFERENTIAL
EXEC master.dbo.xp_backup_database … , @with = ‘DIFFERENTIAL’
Restore database
RESTORE DATABASE
EXEC master.dbo.xp_restore_database
Restore without recovery
RESTORE DATABASE MyDB … WITH NORECOVERY
EXEC master.dbo.xp_restore_database …, @with=’NORECOVERY’
Restore Log to a point in time
RESTORE LOG … WITH RECOVERY , STOPBEFOREMARK = ‘LogMark’
EXEC master.dbo.xp_restore_log … , @with =’RECOVERY’ , @with = ‘STOPBEFOREMARK= “LogMark”’
Restore with move
RESTORE DATABASE … WITH MOVE … TO … , MOVE … TO …
EXEC master.dbo.xp_restore_database … , @filename = … , @with = ‘MOVE “…” TO “…”‘ , @with = ‘MOVE “…” TO “…”‘
Restore with verify only
RESTORE VERIFYONLY FROM DISK = …
EXEC master.dbo.xp_restore_verifyonly @filename = …
Summary
Based on this analysis, a few items must be addressed when developing the backup and recovery procedures for your SQL Server environment. It is necessary to think about what you are trying to recover from, how much downtime users can endure and the acceptable amount of data loss in the event of a failure.
Several options are available that can be leveraged when constructing the BACKUP and RESTORE commands. These additional options can be found on SQL Server 2000 Books Online.
Along with the BACKUP settings, ensure a comprehensive Disaster Recovery (DR) plan has been developed, tested and implemented. Executing backups and working through a restore process is a good start for a SQL Server Disaster Recovery plan. Documentation is a key component to a SQL Server DR plan and should begin with the choices you have selected for your recovery model.
For installations where disk space is limited and time is of the essence take a look at SQL LiteSpeed. The enhancements you get by using this product can benefit every SQL Server installation. You can find more information about SQL LiteSpeed at http://www.sqllitespeed.com/.

Backup/Restore Architecture

Posted: March 10, 2007 in Uncategorized

Backup/Restore Architecture.

The backup and restore components of Microsoft® SQL Server™ 2000 allow you to create a copy of a database. This copy is stored in a location protected from the potential failures of the server running the instance of SQL Server. If the server running the instance of SQL Server fails, or if the database is somehow damaged, the backup copy can be used to re-create, or restore, the database.
SQL Server 2000 provides these sophisticated backup and restore capabilities:
· Options for how a database is backed up and restored:
· A full database backup is a full copy of the database.
· A transaction log backup copies only the transaction log.
· Differential backup copies only the database pages modified after the last full database backup.
· A file or filegroup restore allows the recovery of just the portion of a database that was on the failed disk.
These options allow backup and restore processes to be tailored to how critical the data in the database is. Non critical databases that can be easily re-created from some other source may have no backups, other databases may have simple backups that can re-create the database to the night before a failure, and critical databases may have sophisticated backups that will restore the database right up to the point of failure.
· Control with the BACKUP and RESTORE statements.
Users can execute the BACKUP and RESTORE statements directly from applications, Transact-SQL scripts, stored procedures, and triggers. It is more common, however, to use SQL Server Enterprise Manager to define a backup schedule, and then let SQL Server Agent run the backups automatically according to the schedule. The Database Maintenance Plan Wizard can be used to define and schedule a full set of backups for each database. This fully automates the backup process, requiring minimal or no operator action.
· Maintenance of a set of backup history tables in the msdb database.
The backup history tables record the backups for each database. If a database has to be restored, the Restore Database dialog box in SQL Server Enterprise Manager presents the user with a list of all the backups available for the database. The Restore Database dialog box also has logic to display which set of the backups in the history can be used to restore the database in the shortest possible time. When the dialog box is displayed, the backups needed to restore the database are checked. If a user knows that one of the backups is not available (for example, if a tape cartridge was damaged or lost), the user can deselect that backup, and SQL Server Enterprise Manager calculates a new restore process. When the user agrees with the restore process, SQL Server Enterprise Manager restores the database, prompting for tapes as needed.
· Backups that can be performed while the database is in use, allowing backups to be made of systems that must run continuously.
The backup processing and internal data structures of SQL Server 2000 are structured so that backups maximize their rate of data transfer with minimal effect on transaction throughput.
· Fast data transfer rates for backup and restore operations, making SQL Server 2000 capable of supporting very large databases (VLDB).
The data structures in SQL Server 2000 databases and the backup and restore algorithms support high data transfer rates for backup and restore operations. SQL Server backup and restore operations can also run in parallel against multiple backup files or tape drives, which further increases the backup and restore data transfer rates.
· RESTORE statement re-creates the database automatically if necessary.
This eliminates the need to execute a separate CREATE DATABASE or CREATE DATABASE FOR LOAD statement if the database does not exist at the time the RESTORE statement is executed.
· Interrupted backup and restore operations started near the point of the interruption when restarted.
· Verification of a SQL Server 2000 backup before an attempt to restore the database. This includes verifying that the collation of the database is supported by the instance of SQL Server.
Backup and restore processes should be planned together. The administrators must first determine the criticality of the data in the database. They must determine if it is acceptable to just restore the database to a point such as the night before the failure, or if the database must be restored to a point as close as possible to the time of failure. They must also determine how long the database can be unavailable, whether it must be brought back online as quickly as possible, or if it does not need to be restored immediately.
After the restore requirements are determined, the administrators can then plan a backup process that maintains a set of backups that will meet the restore requirements. The administrators can choose the backup processes that can be performed with the minimum effect on the system as it runs, yet still meet the restore requirements. Based on the resource requirements, the administrators also choose the recovery model for the database. The recovery model balances logging overhead against the criticality of fully recovering the data. The recovery models are:
· Full
The data is critical and must be recoverable to the point of failure. All data modifications are logged. All SQL Server 2000 recovery options are available.
· Bulk-logged
Certain bulk operations (bulk copy operations, SELECT INTO, text processing) can be replayed if necessary, so these operations are not fully logged. Can only recover to the end of the last database or log backup.
· Simple
All data modifications made since last backup are expendable, or can be redone. Lowest logging overhead, but cannot recover past the end of the last backup.