Monday, 20 February 2012

Bulk Import (Insert and Update) using BULK INSERT and MERGE Statements in SQL Server 2008


In some projects, we may have bulk import functionality to a table which is already exist with our system. SQL Server given lots of facilities for importing data into the database, some are –
  1. Using bcp Utility
  2. Using BULK INSERT statement
  3. Using OPENROWSET
  4. Etc.,
In this post I am going do bulk import using BULK INSERT and MERGE statements. We use both BULK INSERT and MERGE statements as this implementation not only used for inserting the records, also for updating records if already exist.

So the use case will be as below –
  1. The input for the bulk import would be either .csv file or an .xml file.
  2. The records from the input file will as per the importing table schema. So the values will be able to insert or update without any issue in the same order and data type.
  3. The records from the csv may already present in the table or may not.
  4. When no records present, all the records will be inserted (Ex: initial stage).
  5. If some records are already present, the records will be updated otherwise inserted.

Note: This implementation will work only from SQL Server 2008 and later versions (as MERGE statement introduced from SQL Server 2008).

The implementation as follows-

Step 1: I am taking Customer table for importing the records (Customer table created from Northwind database schema).

Below is the table schema –
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Customers]') AND type in (N'U'))
DROP TABLE [dbo].[Customers]
GO

CREATE TABLE [dbo].[Customers](
 [CustomerID] [nchar](5) NOT NULL,
 [CompanyName] [nvarchar](40) NOT NULL,
 [ContactName] [nvarchar](30) NULL,
 [ContactTitle] [nvarchar](30) NULL,
 [Address] [nvarchar](60) NULL,
 [City] [nvarchar](15) NULL,
 [Region] [nvarchar](15) NULL,
 [PostalCode] [nvarchar](10) NULL,
 [Country] [nvarchar](15) NULL,
 [Phone] [nvarchar](24) NULL,
 [Fax] [nvarchar](24) NULL,
 CONSTRAINT [PK_Customers1] PRIMARY KEY CLUSTERED ( [CustomerID] ASC )
)
GO
Implementing with a csv file for importing the data. Some of the records in the csv file would be
ALFKI,Alfreds Futterkiste,Maria Anders1,Sales Representative,Obere Str. 57,Berlin,NULL,12209,Germany,030-0074321,030-0076545
ANATR,Ana Trujillo Emparedados y helados,Ana Trujillo,Owner,Avda. de la Constituci¢n 2222,M‚xico D.F.,NULL,5021,Mexico,(5) 555-4729,(5) 555-3745
ANTON,Antonio Moreno Taquer¡a,Antonio Moreno,Owner,Mataderos  2312,M‚xico D.F.,NULL,5023,Mexico,(5) 555-3932,NULL
AROUT,Around the Horn,Thomas Hardy,Sales Representative,120 Hanover Sq.,London,NULL,WA1 1DP,UK,(171) 555-7788,(171) 555-6750
BERGS,Berglunds snabbk”p,Christina Berglund,Order Administrator,Berguvsv„gen  8,Lule†,NULL,S-958 22,Sweden,0921-12 34 65,0921-12 34 67
BLAUS,Blauer See Delikatessen,Hanna Moos,Sales Representative,Forsterstr. 57,Mannheim,NULL,68306,Germany,0621-08460,0621-08924
BLONP,Blondesddsl pŠre et fils,Fr‚d‚rique Citeaux,Marketing Manager,"24, place Kl‚ber",Strasbourg,NULL,67000,France,88.60.15.31,88.60.15.32
BOTTM,Bottom-Dollar Markets,Elizabeth Lincoln,Accounting Manager,23 Tsawassen Blvd.,Tsawassen,BC,T2F 8M4,Canada,(604) 555-4729,(604) 555-3745
BSBEV,B's Beverages,Victoria Ashworth,Sales Representative,Fauntleroy Circus,London,NULL,EC2 5NT,UK,(171) 555-1212,NULL
CACTU,Cactus Comidas para llevar,Patricio Simpson,Sales Agent,Cerrito 333,Buenos Aires,NULL,1010,Argentina,(1) 135-5555,(1) 135-4892
CENTC,Centro comercial Moctezuma,Francisco Chang,Marketing Manager,Sierras de Granada 9993,M‚xico D.F.,NULL,5022,Mexico,(5) 555-3392,(5) 555-7293
The following store procedure is used for getting the data from csv file and importing to the Customers table (both insert and update).
CREATE PROCEDURE SP_ImportCustomerData
AS
BEGIN
    -- Creating a Temproary Table for importing the data from csv file.
    CREATE TABLE #Customers(
        [CustomerID] [nchar](5) NOT NULL,
        [CompanyName] [nvarchar](40) NOT NULL,
        [ContactName] [nvarchar](30) NULL,
        [ContactTitle] [nvarchar](30) NULL,
        [Address] [nvarchar](60) NULL,
        [City] [nvarchar](15) NULL,
        [Region] [nvarchar](15) NULL,
        [PostalCode] [nvarchar](10) NULL,
        [Country] [nvarchar](15) NULL,
        [Phone] [nvarchar](24) NULL,
        [Fax] [nvarchar](24) NULL,
     CONSTRAINT [PK_Customers1] PRIMARY KEY CLUSTERED ( [CustomerID] ASC )
    );
    
    -- Inserting all the from csv to temproary table using BULK INSERT
    BULK INSERT #Customers
    FROM 'D:\Blog\ImportData\CustomerImport.csv'
    WITH ( FIELDTERMINATOR = ',', ROWTERMINATOR = '\n' );
    
    -- Selecting the records from temproary table. This is just to know the records inserted or not.
    -- SELECT * FROM #Customers;
    
    -- By using MERGE statement, inserting the record if not present and updating if exist.
    MERGE Customers AS TargetTable                            -- Inserting or Updating the table.
    USING #Customers AS SourceTable                           -- Records from the temproary table (records from csv file).
    ON (TargetTable.CustomerID = SourceTable.CustomerID)      -- Defining condition to decide which records are alredy present
    WHEN NOT MATCHED BY TARGET                                -- If the records in the Customer table is not matched?
        THEN INSERT (CustomerID, CompanyName, ContactName, ContactTitle, [Address],    -- then INSERT the record
                     City, Region, PostalCode, Country, Phone, Fax)
            VALUES(SourceTable.CustomerID, SourceTable.CompanyName, SourceTable.ContactName, SourceTable.ContactTitle, SourceTable.[Address],
                    SourceTable.City, SourceTable.Region, SourceTable.PostalCode, SourceTable.Country, SourceTable.Phone, SourceTable.Fax)
    WHEN MATCHED                                              -- If not matched then UPDATE
        THEN UPDATE SET
            TargetTable.CustomerID = SourceTable.CustomerID,
            TargetTable.CompanyName = SourceTable.CompanyName,
            TargetTable.ContactName = SourceTable.ContactName,
            TargetTable.ContactTitle = SourceTable.ContactTitle,
            TargetTable.[Address] = SourceTable.[Address],
            TargetTable.City = SourceTable.City,
            TargetTable.Region = SourceTable.Region,
            TargetTable.PostalCode = SourceTable.PostalCode,
            TargetTable.Country = SourceTable.Country,
            TargetTable.Phone = SourceTable.Phone,
            TargetTable.Fax = SourceTable.Fax;
            
    SELECT * FROM Customers;
END
The store procedure does the following points
  1. Create a Temproary Table for importing the data from csv file.
  2. Insert all the records from csv to Temproary Table using BULK INSERT
  3. Insert the records from Temproary Table to the Customers table when not exist and Update when not exist.
Note: I keeping the CustomerImport.csv file under D:\Blog\ImportData\ path and mentioned the same in the store procedure.

Implementation for XML data:

Let us take the same example and explore using XML file as input.

The XML file looks as below:
<?xml version="1.0" encoding="utf-8" ?>
<Customers>
  <Customer>
    <CustomerID>AEHM1</CustomerID>
    <CompanyName>Around the Horn test</CompanyName>
    <ContactName>Thomas Hardy</ContactName>
    <ContactTitle>Owner</ContactTitle>
    <Address>120 Hanover Sq.</Address>
    <City>London</City>
    <Region>SP</Region>
    <PostalCode>WA1 1DP</PostalCode>
    <Country>Mexico</Country>
    <Phone>(171) 555-7788</Phone>
    <Fax>(171) 555-6750</Fax>
  </Customer>
  <Customer>
    <CustomerID>AFGK</CustomerID>
    <CompanyName>Antonio Moreno Taquer-ía</CompanyName>
    <ContactName>Antonio Moreno</ContactName>
    <ContactTitle>Owner</ContactTitle>
    <Address>Mataderos  2312</Address>
    <City>MGÇÜxico D.F.</City>
    <Region>SP</Region>
    <PostalCode>5023</PostalCode>
    <Country>Mexico</Country>
    <Phone>(5) 555-3932</Phone>
    <Fax>(5) 555-3745</Fax>
  </Customer>
<Customers>

The Store Procedure would be
CREATE PROCEDURE SP_ImportCustomerData_XML
AS
BEGIN
    -- Creating a Temproary Table for importing the data from csv file.
    CREATE TABLE #Customers(
        [CustomerID] [nchar](5) NOT NULL,
        [CompanyName] [nvarchar](40) NOT NULL,
        [ContactName] [nvarchar](30) NULL,
        [ContactTitle] [nvarchar](30) NULL,
        [Address] [nvarchar](60) NULL,
        [City] [nvarchar](15) NULL,
        [Region] [nvarchar](15) NULL,
        [PostalCode] [nvarchar](10) NULL,
        [Country] [nvarchar](15) NULL,
        [Phone] [nvarchar](24) NULL,
        [Fax] [nvarchar](24) NULL,
     CONSTRAINT [PK_Customers1] PRIMARY KEY CLUSTERED ( [CustomerID] ASC )
    );
     
    -- Inserting all the rows from xml to temproary table using OPENROWSET
    -- Thanks to http://pratchev.blogspot.com/2008/11/import-xml-file-to-sql-table.html
    INSERT INTO #Customers (CustomerID, CompanyName, ContactName, ContactTitle, [Address], City, Region, PostalCode, Country, Phone, Fax)
    SELECT X.Customer.query('CustomerID').value('.', 'nchar(5)'),       
   X.Customer.query('CompanyName').value('.', 'nvarchar(40)'),
   X.Customer.query('ContactName').value('.', 'nvarchar(30)'), 
   X.Customer.query('ContactTitle').value('.', 'nvarchar(30)'), 
   X.Customer.query('Address').value('.', 'nvarchar(60)'), 
   X.Customer.query('City').value('.', 'nvarchar(15)'), 
   X.Customer.query('Region').value('.', 'nvarchar(15)'), 
   X.Customer.query('PostalCode').value('.', 'nvarchar(10)'), 
   X.Customer.query('Country').value('.', 'nvarchar(15)'), 
   X.Customer.query('Phone').value('.', 'nvarchar(24)'), 
   X.Customer.query('Fax').value('.', 'nvarchar(24)')
 FROM 
  (SELECT CAST(x AS XML)
   FROM OPENROWSET(BULK 'D:\ImportData\XMLData1.xml',SINGLE_BLOB) AS T(x)
  ) AS T(x)
  CROSS APPLY x.nodes('Customers/Customer') AS X(Customer);
     
    -- Selecting the records from temproary table. This is just to know the records inserted or not.
    -- SELECT * FROM #Customers;
     
    -- By using MERGE statement, inserting the record if not present and updating if exist.
    MERGE Customers AS TargetTable                            -- Inserting or Updating the table.
    USING #Customers AS SourceTable                           -- Records from the temproary table (records from csv file).
    ON (TargetTable.CustomerID = SourceTable.CustomerID)      -- Defining condition to decide which records are alredy present
    WHEN NOT MATCHED BY TARGET                                -- If the records in the Customer table is not matched?
        THEN INSERT (CustomerID, CompanyName, ContactName, ContactTitle, [Address],    -- then INSERT the record
                     City, Region, PostalCode, Country, Phone, Fax)
            VALUES(SourceTable.CustomerID, SourceTable.CompanyName, SourceTable.ContactName, SourceTable.ContactTitle, SourceTable.[Address],
                    SourceTable.City, SourceTable.Region, SourceTable.PostalCode, SourceTable.Country, SourceTable.Phone, SourceTable.Fax)
    WHEN MATCHED                                              -- If not matched then UPDATE
        THEN UPDATE SET
            TargetTable.CustomerID = SourceTable.CustomerID,
            TargetTable.CompanyName = SourceTable.CompanyName,
            TargetTable.ContactName = SourceTable.ContactName,
            TargetTable.ContactTitle = SourceTable.ContactTitle,
            TargetTable.[Address] = SourceTable.[Address],
            TargetTable.City = SourceTable.City,
            TargetTable.Region = SourceTable.Region,
            TargetTable.PostalCode = SourceTable.PostalCode,
            TargetTable.Country = SourceTable.Country,
            TargetTable.Phone = SourceTable.Phone,
            TargetTable.Fax = SourceTable.Fax;
            
    SELECT * FROM Customers;
END

[Update 10-Sep-2012]
As Charles Rice comment, I am updating this post with an example for importing data into a table that has Auto increment column.

Requirement 1: There is a table which contains Country, State, City, Remarks columns. It also contains a column Id which is auto increment values. The requirement is to import these values from csv files.

The table script would be:
CREATE TABLE [dbo].[TempCity](
 [Id] [int] IDENTITY(1,1) NOT NULL,
 [Country] [varchar](50) NULL,
 [State] [varchar](50) NULL,
 [City] [varchar](50) NULL,
 [Remarks] [varchar](200) NULL
) ON [PRIMARY]
GO
The .csv file would be:
1,India,Tamil Nadu,Chennai,
2,India,Tamil Nadu,Coimbatore,
3,India,Tamil Nadu,Madurai,testremarks
4,India,Tamil Nadu,Vellore,
5,India,Karnataka,Bangalore,
6,India,Karnataka,Mangalore,
7,India,Gujarat,Ahmedabad,
Note: Add an empty line at the end.

Here, the autoincrement id act as primary key. So to identity which record to update (if exist) or insert (if not exist) - we need to include that id also in csv file (included as first column). It is important to note is, the autoincrement id will be consider to compare with already inserted records id and not for new records. So, there is no need to include exact id in the csv file. It can contain any number which is not the existing record id would be fine.

The SQL SP would be:
CREATE PROCEDURE SP_ImportCustomerData
AS
BEGIN
 CREATE TABLE #TempCity(
  [Id] [int] IDENTITY(1,1) NOT NULL,
  [Country] [varchar](50) NULL,
  [State] [varchar](50) NULL,
  [City] [varchar](50) NULL,
  [Remarks] [varchar](200) NULL
 ) ON [PRIMARY];

    BULK INSERT #TempCity
    FROM 'C:\Blog\test1.csv'
    WITH ( FIELDTERMINATOR = ',', ROWTERMINATOR = '\n' );
     
    MERGE TempCity AS TargetTable
    USING #TempCity AS SourceTable
    ON (TargetTable.Id = SourceTable.Id)
    WHEN NOT MATCHED BY TARGET
        THEN INSERT (Country, State, City, Remarks)
            VALUES(SourceTable.Country, SourceTable.State, SourceTable.City, SourceTable.City)
    WHEN MATCHED
        THEN UPDATE SET
            TargetTable.Country = SourceTable.Country,
            TargetTable.State = SourceTable.State,
            TargetTable.City = SourceTable.City,
   TargetTable.Remarks = SourceTable.Remarks;
             
    SELECT * FROM TempCity;
END
If we run this SP, this will insert/update the TempCity table from the input csv file.

Requirement 2: In some cases, the requirement is to not include the autoincrement id in the input csv/xml file. The comparision must be decided by other columns (may be more the one) to insert or update the records from csv file.

In our example, let us take the combination of Country, State, City columns are considered as identity column (primary column) for a single record. So the merging process needs to consider by combining these three column. So once a record inserted with these three values, the only possibilities is to update the Remarks column. It is important to note is the auto-increment id is not consider anywhere.

The csv file would be
India,Tamil Nadu,Chennai,
India,Tamil Nadu,Coimbatore,
India,Tamil Nadu,Madurai,
India,Tamil Nadu,Vellore,testremarks
India,Karnataka,Bangalore,
India,Karnataka,Mangalore,
India,Gujarat,Ahmedabad,
Here, I am not passing the value of Id column in the csv file. Note: Add an empty line at the end.

The SQL SP would be :
CREATE PROCEDURE SP_ImportCustomerData
AS
BEGIN
 CREATE TABLE #TempCity(
  [Country] [varchar](50) NULL,
  [State] [varchar](50) NULL,
  [City] [varchar](50) NULL,
  [Remarks] [varchar](200) NULL
 ) ON [PRIMARY];

    BULK INSERT #TempCity
    FROM 'C:\Blog\test2.csv'
    WITH ( FIELDTERMINATOR = ',', ROWTERMINATOR = '\n' );
     
    MERGE TempCity AS TargetTable
    USING #TempCity AS SourceTable
    ON (TargetTable.Country = SourceTable.Country)
 AND (TargetTable.State = SourceTable.State)
 AND (TargetTable.City = SourceTable.City)
    WHEN NOT MATCHED BY TARGET
        THEN INSERT (Country, State, City, Remarks)
            VALUES(SourceTable.Country, SourceTable.State, SourceTable.City, SourceTable.Remarks)
    WHEN MATCHED
        THEN UPDATE SET
   TargetTable.Remarks = SourceTable.Remarks;
    
    SELECT * FROM TempCity;
END

Below are the screen shot of execution job
Table Creation

SP execution at first time (with 76 records in the csv file) - All records are inserted

Adding some more records in the csv file and updating some existing records

Inserted the additional records and updated that 2 records

The updated records are pointed out

The output for XML input will the same as csv input, only the difference is the input file.

When using MERGE statement, there are some points required to keep in mind.
  1. Create an index on the join columns in the source table that is unique and covering.
  2. Create a unique clustered index on the join columns in the target table.
Refer the below url for more information http://technet.microsoft.com/en-us/library/cc879317.aspx



6 Responses to “Bulk Import (Insert and Update) using BULK INSERT and MERGE Statements in SQL Server 2008”

  • Charles Rice says:
    6 September 2012 at 02:18

    How would this work with auto incrementing IDs rather than your alphanumeric unique IDs?

  • Thiru says:
    10 September 2012 at 13:24

    Hi Charles, I updated the post merging with Auto incremental Ids. Please review the update portion.
    Many Thanks

  • Anonymous says:
    19 June 2013 at 16:52

    great job

  • vanibasu shree says:
    18 September 2013 at 12:57

    Hello,
    How do I achieve this with Unique Identifier column ..which is a primary key. Please let me know

  • Artem Novikov says:
    22 August 2015 at 03:20

    Thank you very much. The post is awesome. Step by step. The code is very clean and well commented. Great job. I'd like to see some benchmarks, though.

  • Dishan Dayarathna says:
    26 February 2016 at 17:32

    Thanks

Post a Comment