Duplicater for Duplicate Search

Hi, today I would like to show you a straightforward idea that is not implemented in any operating system I know. The Duplicate is a simple solution for finding three duplicate files with different names but with the same content in folders.

image

Solution is very simple. It analyzes folder structures, gets all files, and calculates MD5 sums for each file. When any duplicate is found with the same MD5 sum, Duplicate moves it to the MoveTo directory. This way, you can analyze your big storage hard drives and eliminate duplicates of whatever you have, like photos, movies, documents, presentations, and music. Simple, it looks for all file duplicates. If you like, you may parallel calculations for multi-core systems by calling Parallel.For loop. But I am leaving it up to you. Code is very simple and looks like is shown below.

namespace Duplicater
{
    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Security.Cryptography;
    using System.Text;
    using System.Windows.Forms;
    public partial class MainForm : Form
    {
        public MainForm()
        {
            InitializeComponent();
        }
        private void btSelect1_Click(object sender, EventArgs e)
        {
            if (fbDialog1.ShowDialog() == System.Windows.Forms.DialogResult.OK)
            {
                tbSelect1.Text = fbDialog1.SelectedPath;
            }
        }
        private void btSelect2_Click(object sender, EventArgs e)
        {
            if (fbDialog2.ShowDialog() == System.Windows.Forms.DialogResult.OK)
            {
                tbSelect2.Text = fbDialog2.SelectedPath;
            }
        }
        private void btSelect3_Click(object sender, EventArgs e)
        {
            if (fbMoveTo.ShowDialog() == System.Windows.Forms.DialogResult.OK)
            {
                tbMoveTo.Text = fbMoveTo.SelectedPath;
            }
        }
        private void btMove_Click(object sender, EventArgs e)
        {
            if (string.IsNullOrEmpty(tbSelect1.Text) ||
                string.IsNullOrEmpty(tbSelect2.Text) ||
                string.IsNullOrEmpty(tbMoveTo.Text))
            {
                MessageBox.Show("Select all folder paths first!");
                return;
            }
            var d1 = Directory.EnumerateFiles(tbSelect1.Text, "*.*",
                SearchOption.AllDirectories);
            var d2 = Directory.EnumerateFiles(tbSelect2.Text, "*.*",
                SearchOption.AllDirectories);
            var sums = new HashSet<string>();
            using (var md5 = MD5.Create())
            {
                var count = 0;
                count += MoveDuplicates(d1, sums, md5);
                count += MoveDuplicates(d2, sums, md5);
                MessageBox.Show(
                    string.Format(
                    "{0} duplicates were moved or deleted.", count));
            }
        }
        private int MoveDuplicates(
            IEnumerable<string> directory,
            HashSet<string> sums, MD5 md5)
        {
            var count = 0;
            foreach (var fileName in directory)
            {
                byte[] key = null;
                using (var stream = File.OpenRead(fileName))
                {
                    key = md5.ComputeHash(stream);
                }
                var hex = new StringBuilder(key.Length * 2);
                foreach (byte k in key)
                {
                    hex.AppendFormat("{0:x2}", k);
                }
                var sum = hex.ToString();
                if (sums.Contains(sum))
                {
                    var path = Path.Combine(
                        tbMoveTo.Text, Path.GetFileName(fileName));
                    if (!File.Exists(path))
                    {
                        File.Move(fileName, path);
                    }
                    else
                    {
                        File.Delete(fileName);
                    }
                    count++;
                }
                else
                {
                    sums.Add(sum);
                }
            }
            return count;
        }
    }
}

And here you can download that solution code. The Duplicater Source Code (2032 downloads). One more thing, please use this solution as a prototype only. For a product, you can write a service that will use File System changed notification events and calculate the hash sum for everything on your hard drive, automatically moving duplicates to the folder. When I built this for my wife, who was trying to eliminate duplicates of photos, I realized that there is no such thing in any operating system I know. But it would be awesome to always have duplicate free file systems, and for Cloud storage systems for many files, eliminating duplicates and saving disk space would be nice. So, think about the big picture when you read that code solution because it is something more. Enjoy!

p ;).

Leave a Reply

Your email address will not be published.

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.