1

.csv files have 57 columns and some of the values are empty. For example, ,Jane,Doe,,35. Let's say the first value (before the comma) is for the automated ID key which increments every time it is imported to the database. The problem is the empty values are not imported thus resulting an error. What I understand here is SQL database cannot read empty values so it moves the values, like this: Jane,Doe,35, thus making Jane the value for ID. Another concern is the .csv files' columns doesn't match the table. For example, table has a column ID while as the .csv files start with a Name column. Is there a way to import it to database starting from a particular column?

Note: This is in one table only. Question: Should it be more preferable if I make a separate table that matches the columns of .csv files and then join it with the table that has a column ID (on the example)?

My codes so far:

        DataTable dt = new DataTable();

        dt.Columns.AddRange(new DataColumn[54] { new DataColumn("Delay_Code"), and so on... });

        string csvData = File.ReadAllText(e.FullPath);
        foreach (string row in csvData.Split('\n'))
        {
            if (!string.IsNullOrEmpty(row))
            {
                dt.Rows.Add();
                int i = 0;
                foreach (string cell in row.Split(','))
                {
                    dt.Rows[dt.Rows.Count - 1][i] = cell;
                    i++;
                }
            }
        }

        string consString = @"Data Source="blahblah";
        using (SqlConnection con = new SqlConnection(consString))
        {
            using (SqlBulkCopy sqlBulkCopy = new SqlBulkCopy(con))
            {
                //Set the database table name
                sqlBulkCopy.DestinationTableName = "owner.Table";
                con.Open();
                sqlBulkCopy.WriteToServer(dt);
                con.Close();
            }
        }

For the code above, I have created another table which matches the columns of the .csv files. Originally, the table has three preceding columns before the columns in the .csv files.

What's the best approach for this?

1 Answer 1

1

Your question might seem too broad from some viewpoint, but I understand it and I'm sharing somewhat broad (universal) approach which worked for me and which you can use as inspiration:

  1. Write a quality CSV reader, which can also handle CSV row like

    Value1, Value2, "Value 3", "Value ""4""", "Value
    5",, Value 7, "Value,8"
    
    • so if there was double quote character, it can accept any charcters including spaces or new line characters, other quotes (written as "") and this stops only at another solo quote (")
    • be prepared that separators can be ,, ; or tab character, decimal numbers can contain comma as . or , etc. Remember it is usually the other party who produces the CSV and you might need to adapt to its format
    • the easiest way is to utilize finite-state machine for this, it is lightweight and lighting-fast, like 50 MB/sec.
  2. [optional] Logical step is to have configurable import definitions instead of hardwired ones, e.g.

    { ExternalOrders = "Data From External Orders in CSV",
      CsvFormat = { CsvHasHeaderRow = true,
                 CsvFieldSeparator = ",",
                 DecimalSeparator = ".",
                 DateFormat = "yyyy-MM-dd",
                 DateTimeFormat = "yyyy-MM-dd hh:mm:ss",
                 TimeFormat = "hh:mm:ss" },
      ColumnMap = {
           Column1 = { SourceColumnName = "OrderID",
                       SourceColumnType = "nvarchar(50)"
                       StagingColumn = 1 },
           Column2 = { SourceColumnName = "OrderDate",
                       SourceColumnType = "date"
                       StagingColumn = 2 },
           ColumnAmount = { SourceColumnPosition = 5,
                       SourceColumnType = "decimal(18,6)"
                       StagingColumn = 3 }
           },
    StagingImportSql = "INSERT INTO Orders (Number, OrdDate, Amount)
                            SELECT CAST(c1 AS navarchar(50),
                                    CAST(c2 AS date),
                                    CAST(c3 AS money)
                                FROM StagingTable
                                WHERE ImportID = {{ImportIDToken}};"
    }
    
    • this can be in database, in text configuration file, wherever. I have it in 3 database tables (ImportAction, CsvFormat, ColumnMapping)
  3. Have a staging table in your SQL database with columns

    ID int identity(1,1)
    ImportID int
    CsvRowNumber int
    c1 sqlvariant
    c2 sqlvariant
    c3 sqlvariant
    ...
    c64 sqlvariant
    
  4. Create the engine which calls CSV import (with formats from CsvFormat) to fill staging table according to configured ColumnMap. Also have additional column in your staging table, e.g. ImportID or CsvRecordNumber auto filled by engine during the import. Issue SQL INSERT statement for each CSV for to add one record into staging table. When all are done, launch StagingImportSql once you replaced token {{ImportIDToken}} in its SQL command with number of your actual import.

  5. Optionally you can implement

    • purging of staging table if you do not purge it immediately after import, e.g.

      • keep certain count of imports AND/OR
      • keep imports not older than given number of days
    • user restrictions - who can see and run which imports

I'm not sharing specific code, in my case it is in and it has maybe 6000 lines in many classes. I'm just sharing ideas.


Note on parsing:

  • let your CSV reader return a DataTable of strings
    • returning entire table is not good for large CSV's, but up to (let's say) 500 MB it should be fine
  • use methods like Integer.Parse(), Decimal.Parse(), Double.Parse() to parse simple values according to their column data type. You might want to replace custom decimal separator or thousands separator for standard one before parsing.
  • for dates and times, use DateTime.ParseExact()
  • for internal processing of data type, enum System.Data.SqlDbType is your friend
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.