Wednesday, October 19, 2016

Downloading Ancestry.com Media

Ancestry.com has never created a good system to back up the data that is on their site. The only option people have is to download the GEDCOM file, which is often a mere fraction of the data they have invested in the site.

For years now my family has been investing time in building a well documented family tree on ancestry.com, and I have made sure to frequently backup the GEDCOM file so we didn't lose the data.  With thousands of media files on the site I have become increasingly concerned about the loss of a major portion of our research if I was unable to download them.

To combat this problem I finally wrote a small Chrome extension that allows you to screen scrape your media files in an automated fashion.  It is not perfect, but considering it is the only way to get your data, it is nice to have.

The extension downloads the files in mass and dumps them all into your default downloads folder.  So, you should clear all files out of your default downloads folder before using this extension so you know that all downloaded files belong to Ancestry.

After installing this extension, you need to:
- log into your ancestry.com account
- open up your tree
- open up the Media page, which is one of the items on the menu inside the tree
- click on either the Photos or Stories tab, currently the All, Audio, and Video tabs are not supported
- OPTIONAL: click on the page you want to resume downloading at.  This is useful if you need to stop downloading for a period of time, say if you only want to run the downloads during the night.  You will likely end up with a few duplicate files if you attempt this.
- click on the Extension icon in your Chrome browser upper right corner, and click the Download Media button

If your browser is asking you to save every file then the extension will not be able to correctly calculated the required download time and it will go on to the next page before you are done. It is expecting the download to automatically start.  To fix this problem, go into your browser settings and adjust it so downloads do NOT always prompt you for a save location. Instead set a default location for it to use every time. The only way I could fix this issue would be to create a setting that allows the user to manually click the next page button instead of having it automated. And I think that would kind of defeat the point of the extension.

If the downloads are occurring too quickly and swamping your computer then let me know.  I am planning on allowing the download speed to be adjusted in the future.  Currently a file is downloaded every 2 seconds, which works well for me, but for people who upload lots of large files it will not work as well for.

NOTE: There is a small chance that using this is against Ancestry's terms of use.  Their terms of use forbid scraping tools, however the language seems directed at automated scripts that run in the background.  This tool is simply a browsing aid performing clicks for you, but your browser is still open and doing all the browsing and preventing you from doing other browsing simultaneously.

EDIT 10/2018:

Just released a new version that works with Ancestry's latest photo hosting method.  In order to get it working again the extension is going to need permissions to open and close windows as well as download files.  When you first try and use the photo downloading feature Chrome will likely popup a few requests asking you to let the extension do downloads and open and close windows.  So expect that as you try to get it going.

This version also handles the Start/Stop downloading sensing better.  Although, I have noticed that sometimes it takes one or two tries to get it to stop downloading depending on at what point in its execution you click the stop button.