Jump to content

Eliminating duplicates into database


rkemunto

Recommended Posts

I am researching on how to eliminate duplicate records entering into our database and would appreciate any input. If anyone has a strategy in place that works for them, I'd also love to know about it too.Data entry into our database is done via:1. Web users - by registering and creating their own user Id and Password (email is their userID) This data updates the persons, web user and company tables automatically.2. Manual Data entry from paper3. Imports from our List Vendors.4. Company recordScenario (a) If a person from (1) above changes companies, their email addresses changes and instead of logging into the web account using previous email address, then changing account info - instead they ignore the step and create a new account user with their new email address which creates a duplicate record in the database. Scenario (:) if one forgets their password - instead of contacting the webmaster to retrieve their old password, they create a new userid (using a different email address) and a new user ID. This creates another record in the database.Scenario © If the person's records exists with our list supplier but with different email address, although the database compares, last names, company etc - it still creates a duplicate record because of the different email address for that person.Our database is composed of members (from persons table - email is Primary Key) from different organizations (from company table - company website is primary Key) and have access to our products on our web site (from Web User table - email is unique key). Anyone knows of a better way to aleviate dups e.g by a different key for web registrants.How should I correct the many dups we already have in the database ( over 20,000 records)I appreciate your input on this.

Link to comment
Share on other sites

:) First of all you should control the input in your database... if an account already exists then you should update it and not insert a new data... Now if you want to clean your database, try some count(distinct) queries to catch the duplicates...
Link to comment
Share on other sites

If the e-mail is the primary key, then how are you recognizing the duplicates? (name, company, etc.). With that information, you could do some different things. But it depends on what the inputs are and how you recognize the duplicates.I think ultimately your best bet would be to come up with better ways to encourage users to update their old usernames/e-mails. I know personally that I don't ever really see that as an option when I need a new username, I create one, I don't edit my old one. So maybe including some information when they are registering an account might work well to inform them of this and prevent some from doing it.Finally, as far as cleaning out the duplicates, do you keep a time stamp of when records were added? if so, you could pull duplicates and delete the oldest one. But are you sure you won't be deleting any records that are actually used? Do you by any chance keep a time stamp of when they last logged in using that User ID? If you do, you might actually be able to just create a policy stating all accounts not used within the last xx days will be deleted...just a thought.Well, let us know some more information and lets see what we can come up with.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...