Monday, September 14, 2009
What's working so far
I had to work double time over the weekend to catch up on my project to meet some deadlines, but here is what is working so far.
Using Java, I am able to traverse all the subdirectories and files of a parent directory recursively, and filter out a subset of those files. For example the directory c:\music may have 100 directories and 5000 songs within them. The program can filter any type of file, which is usually mp3 in my testing, but it can also filter other audio files.
Once the program has found all of the files within all subdirectories, it attempts to find files that have an iteration count appended to the end of the filename. For example, Aerosmith's Dream On will be placed in the folder ..Aerosmith/Aerosmith's Greatest Hits/01 Dream On.mp3. The file is stored like this because iTunes keeps my library folders and files organized.
However, if I accidently add another copy of Dream On to my library, then iTunes will store the file 01 Dream On 1.mp3. iTunes has appended the iteration 1 to the end of the file. I am taking advantage of the appended number to help find duplicated files in the subdirectories.
There is a problem with searching for just numbers on the end of the filename, however; Suppose the name of the song is "You are the 1", iTunes will store this file as ../Artist/Album/01 You are the 1.mp3 and now the song has a number on the end of the filename that is not an iteration count. The trick is to remove the last two characters of the filename and check to see if that file exists and if it is in the list of files that were found within all of the subdirectories.
So far my program is able to accomplish both of these requirements. The program finds all the duplicates that have iteration counts from 1 to 9 and checks to make sure that the number at the end of the filename is truly an iteration count.
The screenshot is an output taken by my program that lists a portion of the duplicates in the library. You can see that a family of four, all using the same music directory, has added quite a few duplicate songs over the years!