Jump to content

How can I avoid row duplication when importing in the absence of primary keys.


Guest adiazeng
 Share

Recommended Posts

Guest adiazeng

I download a .csv flat file every day from a web site. Then I upload the data into SQL Server 2005 using The Integration Services Tool (Specifically: Business Intelligence Dev Studio)I created a package and when I run the “debug” it imports the data successfully. I guess a question for later might be how can I automate this package. But I digress; My original question is: How do I avoid duplicate data when importing without first cleaning the .csv flat file. Let me back up a bit and clarify. The .csv file that I download contains information such as SALESMAN, DATE, ITEM_NUMBER, NUM_ITEMS_SOLD……ETCThis is a Month to date type of report so every time I run it, it has the same data it had yesterday plus some new data. I only want the new data but I don’t want to open the file in excell, sort by date and delete the un-needed rows. I want SQL to do the duplicate rows for me. The way I figure it if SQL can do a “SELECT DISTINCT” then what I need is some sort of “INSERT DISTINCT INTO” capability. If the data I was downloading had primary key data that would solve the problem but it does not. I was thinking I might query the table with “DISTINCT” and put the results into a temp table or cursor then delete the data from the original table and feed the temp data back into the original table but this sounds like something that SQL should be able to do more effectively if only I was aware of just how! Basically I want to remove duplicate rows after they have been imported. Or omit the rows from importing in the first place at the IST BI Dev level. Don’t be fooled by what seems like a firm grasp of SQL, I rate my SQL knowledge at a 50 on a scale of 1 to 300. So if you can provide “script for dummies” level of examples I would truly appreciate it.Thanks all

Link to comment
Share on other sites

Why not base it on the Date. I assume the date will be unique (if it includes the time aswell). Is it a requirement to only use SQL Server or have you considered a simple script that could easily strip out data that wasn't from today's date?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...