Wednesday, October 19, 2016

Downloading Ancestry.com Media

Ancestry.com has never created a good system to back up the data that is on their site. The only option people have is to download the GEDCOM file, which is often a mere fraction of the data they have invested in the site.

For years now my family has been investing time in building a well documented family tree on ancestry.com, and I have made sure to frequently backup the GEDCOM file so we didn't lose the data.  With thousands of media files on the site I have become increasingly concerned about the loss of a major portion of our research if I was unable to download them.

To combat this problem I finally wrote a small Chrome extension that allows you to screen scrape your media files in an automated fashion.  It is not perfect, but considering it is the only way to get your data, it is nice to have.

The extension downloads the files in mass and dumps them all into your default downloads folder.  So, you should clear all files out of your default downloads folder before using this extension so you know that all downloaded files belong to Ancestry.

After installing this extension, you need to:
- log into your ancestry.com account
- open up your tree
- open up the Media page, which is one of the items on the menu inside the tree
- click on either the Photos or Stories tab, currently the All, Audio, and Video tabs are not supported
- OPTIONAL: click on the page you want to resume downloading at.  This is useful if you need to stop downloading for a period of time, say if you only want to run the downloads during the night.  You will likely end up with a few duplicate files if you attempt this.
- click on the Extension icon in your Chrome browser upper right corner, and click the Download Media button

If the downloads are occurring too quickly and swamping your computer then let me know.  I am planning on allowing the download speed to be adjusted in the future.  Currently a file is downloaded every 2 seconds, which works well for me, but for people who upload lots of large files it will not work as well for.

NOTE: There is a small chance that using this is against Ancestry's terms of use.  Their terms of use forbid scraping tools, however the language seems directed at automated scripts that run in the background.  This tool is simply a browsing aid performing clicks for you, but your browser is still open and doing all the browsing and preventing you from doing other browsing simultaneously.

22 comments:

  1. This is awesome! Can you adapt it to also pull down images of documents (census records, birth/death certificates, etc) in an individual's gallery?

    ReplyDelete
  2. In order to answer your question, I need to clarify how data exists on ancestry.com. There are two major categories of Media type data on Ancestry: stuff you upload to your tree, and stuff you attach to your tree.

    Stuff you upload goes into your personal Media Gallery (eg. Photos, Stories, Audio, Video). Stuff you attach to your tree is saved and cataloged in a central location so that anyone can link to a single copy of it.

    This extension downloads stuff from your personal Media Gallery. I believe you are asking if it can be modified to download stuff you have merely linked to.

    Technically yes, it probably could be. Here is why it will not be. In order to find a list of all such linked items it would have to open up every single record in your tree, from a time perspective that is just not feasible, no user would be interested it sitting through something that was doing that unless they had a really small tree.

    The other reason this is a bad idea is that linked data is not your data, it could be removed or changed at anytime, you have no control over it. I have a standard in my trees of taking a screen shot of census records and uploading them to my personal media gallery which this extension can then download and backup. Once I do this the data is mine, Ancestry.com cannot randomly change the data, or lock me out of it if they want to start charging me to see that particular census record in the future.

    Does that answer your question and possibly give a solution?

    ReplyDelete
    Replies
    1. I came across this Python script which goes after all the Ancestry.com Database records that are attached to persons in a GEDCOM file. The author claims it is against the Ancestry.com T&C, but for those interested...

      https://nerok00.github.io/ancestry-image-downloader/

      Delete
  3. I am grateful for your work in creating this script. I don't understand how to stop and restart without restarting at the beginning since the download will only start from the main page of photo or stories. (I too need to download overnight only.) Would you be able to take a few minutes and explain? Thank you in advance.

    ReplyDelete
  4. Check out the "OPTIONAL" step. Did you attempt to click on the page number before resuming? If so, what error did you get? Don't forget, it will start at the beginning of whatever page you click on, so you will probably get a few duplicate files from that page if that page was already being downloaded prior.

    ReplyDelete
    Replies
    1. it does not seem to preserve the file name such as John2000web comes down as 27d275e0-ae78=480c-b8f3-9181ef84fb74b74.jpg ?

      Delete
    2. Unfortunately that is true in many instances. It was a limitation I found in either javascript or the browser. The best I was able to do was a mapping file matching the file name up with the record.

      Delete
  5. Jereme - Amazing work, and sorely needed! Unfortunately it doesn't seem to work for me - the button text changes, with no obvious error messages, but it doesn't seem to download any files. Could it be because I'm a user of Ancestry.co.uk, not .com?

    ReplyDelete
    Replies
    1. I added a small update I think might make it work for .co.uk. Give it a try and let me know the results.

      Delete
    2. Unfortunately your latest comment about it still not working for .co.uk does not seem to have saved to the site.
      Would you check the version of the extension you are using? The one with the update is 0.122. I have noticed that sometimes it takes a couple of days for the updates to push out to users.

      Delete
  6. Yes, you are correct, it was designed specifically for the .com version.

    Is there any way you can use the .com version to log into your account? I think I can modify it to work with .co.uk as well, but I am unlikely to be able to test it easily.

    ReplyDelete
  7. thank you Jereme! I downloaded over 850 images in less than one hour saving them as tmp files. It doesn't seem to be able to download stories as easily but perhaps I am missing a step. When it downloads stories it also saves them as tmp files but I can't open them. Am I missing something? thanks for all your work and for sharing your extension to ancestry users

    ReplyDelete
    Replies
    1. Hi Franco,

      It should not be saving any files as tmp files. Pictures should be downloading as whatever extension they are saved in Ancestry with, along with a .csv summary file. And Stories should be downloaded in their original format, or as .txt files.

      Other than some form of browser corruption during download I cannot think of a reason for the results you are seeing. I just tested the extension to make sure it is working correctly for me in downloading both Photos and Stories.

      Delete
  8. Could you possibly update it (or made a separate version) to work with .ca as well?

    ReplyDelete
  9. I just published an update that works on .ca for me. It should also allow the .co.uk to work which was not working before.

    it looks like any of the ancestry websites can be logged into with your normal username and password. So a user of a different domain can always log into the .com version to use this extension. It looks like the .com version would show up in your native language.

    ReplyDelete
  10. I have tried to use your extension to retrieve my Ancestry.com images. I cannot get it to work. I followed your instructions and no photos are being copied to my hard drive. I have version 0.123 of your extension. I checked the Temp folders, the default download folder and did a search for them. I did notice that my download folder is filling up with files I have deleted. Any ideas would be appreciated. I am grateful that you have used your talents to figure a way to get my 3000+ images from Ancestry.

    ReplyDelete
  11. Unfortunately it appears as though Ancestry has made some changes to their site breaking this extension. I will fix it as soon as I have time.

    ReplyDelete
  12. I just published version 0.124 which should get things working again.

    ReplyDelete
  13. Works for me. Much appreciated! Saved me a TON of time.

    ReplyDelete
    Replies
    1. Thank you, by the way for creating the script.

      Delete
  14. This is awesome!! works great! Had zero issues with got 100 images downloaded in about 2 minutes! Its not fully automatic but I have no complaints. Thank you for creating this!

    Gary

    ReplyDelete

Please leave your thoughts, I love hearing what you got out of the post. Spam comments will be removed.