Wednesday, October 19, 2016

Downloading Ancestry.com Media

Ancestry.com has never created a good system to back up the data that is on their site. The only option people have is to download the GEDCOM file, which is often a mere fraction of the data they have invested in the site.

For years now my family has been investing time in building a well documented family tree on ancestry.com, and I have made sure to frequently backup the GEDCOM file so we didn't lose the data.  With thousands of media files on the site I have become increasingly concerned about the loss of a major portion of our research if I was unable to download them.

To combat this problem I finally wrote a small Chrome extension that allows you to screen scrape your media files in an automated fashion.  It is not perfect, but considering it is the only way to get your data, it is nice to have.

The extension downloads the files in mass and dumps them all into your default downloads folder.  So, you should clear all files out of your default downloads folder before using this extension so you know that all downloaded files belong to Ancestry.

After installing this extension, you need to:
- log into your ancestry.com account
- open up your tree
- open up the Media page, which is one of the items on the menu inside the tree
- click on either the Photos or Stories tab, currently the All, Audio, and Video tabs are not supported
- OPTIONAL: click on the page you want to resume downloading at.  This is useful if you need to stop downloading for a period of time, say if you only want to run the downloads during the night.  You will likely end up with a few duplicate files if you attempt this.
- click on the Extension icon in your Chrome browser upper right corner, and click the Download Media button

If the downloads are occurring too quickly and swamping your computer then let me know.  I am planning on allowing the download speed to be adjusted in the future.  Currently a file is downloaded every 2 seconds, which works well for me, but for people who upload lots of large files it will not work as well for.

NOTE: There is a small chance that using this is against Ancestry's terms of use.  Their terms of use forbid scraping tools, however the language seems directed at automated scripts that run in the background.  This tool is simply a browsing aid performing clicks for you, but your browser is still open and doing all the browsing and preventing you from doing other browsing simultaneously.

49 comments:

  1. This is awesome! Can you adapt it to also pull down images of documents (census records, birth/death certificates, etc) in an individual's gallery?

    ReplyDelete
  2. In order to answer your question, I need to clarify how data exists on ancestry.com. There are two major categories of Media type data on Ancestry: stuff you upload to your tree, and stuff you attach to your tree.

    Stuff you upload goes into your personal Media Gallery (eg. Photos, Stories, Audio, Video). Stuff you attach to your tree is saved and cataloged in a central location so that anyone can link to a single copy of it.

    This extension downloads stuff from your personal Media Gallery. I believe you are asking if it can be modified to download stuff you have merely linked to.

    Technically yes, it probably could be. Here is why it will not be. In order to find a list of all such linked items it would have to open up every single record in your tree, from a time perspective that is just not feasible, no user would be interested it sitting through something that was doing that unless they had a really small tree.

    The other reason this is a bad idea is that linked data is not your data, it could be removed or changed at anytime, you have no control over it. I have a standard in my trees of taking a screen shot of census records and uploading them to my personal media gallery which this extension can then download and backup. Once I do this the data is mine, Ancestry.com cannot randomly change the data, or lock me out of it if they want to start charging me to see that particular census record in the future.

    Does that answer your question and possibly give a solution?

    ReplyDelete
    Replies
    1. I came across this Python script which goes after all the Ancestry.com Database records that are attached to persons in a GEDCOM file. The author claims it is against the Ancestry.com T&C, but for those interested...

      https://nerok00.github.io/ancestry-image-downloader/

      Delete
  3. I am grateful for your work in creating this script. I don't understand how to stop and restart without restarting at the beginning since the download will only start from the main page of photo or stories. (I too need to download overnight only.) Would you be able to take a few minutes and explain? Thank you in advance.

    ReplyDelete
  4. Check out the "OPTIONAL" step. Did you attempt to click on the page number before resuming? If so, what error did you get? Don't forget, it will start at the beginning of whatever page you click on, so you will probably get a few duplicate files from that page if that page was already being downloaded prior.

    ReplyDelete
    Replies
    1. it does not seem to preserve the file name such as John2000web comes down as 27d275e0-ae78=480c-b8f3-9181ef84fb74b74.jpg ?

      Delete
    2. Unfortunately that is true in many instances. It was a limitation I found in either javascript or the browser. The best I was able to do was a mapping file matching the file name up with the record.

      Delete
  5. Jereme - Amazing work, and sorely needed! Unfortunately it doesn't seem to work for me - the button text changes, with no obvious error messages, but it doesn't seem to download any files. Could it be because I'm a user of Ancestry.co.uk, not .com?

    ReplyDelete
    Replies
    1. I added a small update I think might make it work for .co.uk. Give it a try and let me know the results.

      Delete
    2. Unfortunately your latest comment about it still not working for .co.uk does not seem to have saved to the site.
      Would you check the version of the extension you are using? The one with the update is 0.122. I have noticed that sometimes it takes a couple of days for the updates to push out to users.

      Delete
  6. Yes, you are correct, it was designed specifically for the .com version.

    Is there any way you can use the .com version to log into your account? I think I can modify it to work with .co.uk as well, but I am unlikely to be able to test it easily.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Can you check your extemsion still works, doesm't work for me just leaves an excell file in my downloads folder AncestryMetaFile_Photos_pg_1

      Delete
  7. thank you Jereme! I downloaded over 850 images in less than one hour saving them as tmp files. It doesn't seem to be able to download stories as easily but perhaps I am missing a step. When it downloads stories it also saves them as tmp files but I can't open them. Am I missing something? thanks for all your work and for sharing your extension to ancestry users

    ReplyDelete
    Replies
    1. Hi Franco,

      It should not be saving any files as tmp files. Pictures should be downloading as whatever extension they are saved in Ancestry with, along with a .csv summary file. And Stories should be downloaded in their original format, or as .txt files.

      Other than some form of browser corruption during download I cannot think of a reason for the results you are seeing. I just tested the extension to make sure it is working correctly for me in downloading both Photos and Stories.

      Delete
  8. Could you possibly update it (or made a separate version) to work with .ca as well?

    ReplyDelete
  9. I just published an update that works on .ca for me. It should also allow the .co.uk to work which was not working before.

    it looks like any of the ancestry websites can be logged into with your normal username and password. So a user of a different domain can always log into the .com version to use this extension. It looks like the .com version would show up in your native language.

    ReplyDelete
  10. I have tried to use your extension to retrieve my Ancestry.com images. I cannot get it to work. I followed your instructions and no photos are being copied to my hard drive. I have version 0.123 of your extension. I checked the Temp folders, the default download folder and did a search for them. I did notice that my download folder is filling up with files I have deleted. Any ideas would be appreciated. I am grateful that you have used your talents to figure a way to get my 3000+ images from Ancestry.

    ReplyDelete
  11. Unfortunately it appears as though Ancestry has made some changes to their site breaking this extension. I will fix it as soon as I have time.

    ReplyDelete
  12. I just published version 0.124 which should get things working again.

    ReplyDelete
  13. Works for me. Much appreciated! Saved me a TON of time.

    ReplyDelete
    Replies
    1. Thank you, by the way for creating the script.

      Delete
  14. This is awesome!! works great! Had zero issues with got 100 images downloaded in about 2 minutes! Its not fully automatic but I have no complaints. Thank you for creating this!

    Gary

    ReplyDelete
  15. I agree this is awesome!! However I have 34 pages of photos on one tree and it stopped after page 1 (or 25 photos). No complaints if I have to do it page by page as still is awesome but by reading the description and comments thought it would download all 35 pages or 800 photos without me having to restart it manually. Thoughts? Thanks again!

    Lisa

    ReplyDelete
    Replies
    1. It should automatically page for you. They must have changed something on their site, whenever they do that it breaks functionality in this extension until I am able to get it fixed. Thanks for letting me know.

      Delete
  16. This is amazing. you wonderful wonderful person :)

    ReplyDelete
  17. Finding your tool was like receiving a Christmas gift, yay!! I appreciate that you created this, and I will share it with others who may need it as well.

    ReplyDelete
  18. I just published a fix for the extension that should fix the paging issue Lisa found. The updated package should start showing up for users in about an hour according to google.
    The symptom fixed was after clicking Download, the pages contents would be downloaded, and the next page would be navigated to, however that next pages contents would not automatically start downloading.

    ReplyDelete
  19. Thanks so much for doing this! I just downloaded 1400 records.

    ReplyDelete
  20. Thank you so much. I have wanted to migrate my photos, but it was too time intensive! Your script was so fast. :)

    ReplyDelete
  21. I did the appropriate steps but it only downloaded a CSV file that includes the links to the desired files:
    AncestryMetaFile_Stories_pg_1.csv
    AncestryMetaFile_Photos_pg_1.csv

    ReplyDelete
    Replies
    1. BTW, I am using a Chromebook with ChromeOS but the Chrome browser is standard.

      Delete
    2. I just tested downloading both Photos and Stories on my computer and they are both working correctly for me. I have never tested on a Chromebook, but as long as you can download other files normally then I see no reason why this extension would not work for you.

      Delete
  22. I just tried again with my "stories" tab. I only have three so it is easy. Again, the script downloaded a CSV file with the names and locations of the three files but NOT the files themselves.
    Is it normal for the script to download these files? Or, are they used by the script to download the files them selves -- and deleted afterwards?
    Since ChromeOS now holds over half the market share of school laptops, it is very significant. More and more people are using ChromeBooks. It is my go-to device.
    Would you like for me to be a test bed for you to make your script work on ChromeOS?

    ReplyDelete
    Replies
    1. I appreciate the offer, however this script is not targeted at a school audience or the ChromeOS. Most people doing significant ancestry research are using traditional desktop computers where they can store all their research.

      Yes, it is normal for the script to download those .csv files. They are used to store meta data about the files being downloaded that I am unable to put in the filename of the file itself during download.

      The main purpose for this script is simply to allow me to backup my ancestry files on my Windows computer. I just released it in case anyone else had the same need.

      Delete
  23. I download the app today. I am on media files, and hit download. It copied the first picture in csv file and only the first picture. csv is all text...how do I get it to continue on the next picture and how do I get it in something other than csv? thanks

    ReplyDelete
    Replies
    1. the stories worked perfectly...this is referring to the photos

      Delete
    2. That is very odd, I just tested the photos and they are working for me. Someone has had issues doing it on a Chromebook. What type of computer are you using? I test on Windows 10.

      Delete
    3. I am using a Lenovo computer with a windows 10 os

      Delete
    4. I am at a loss as to why it would not work for you when the other tab does. You could try opening up the page and going straight to photos. Perhaps doing stories first caused a problem when you moved on to photos.

      Delete
  24. hey, I'm having the same issue as Gina - on the latest version of Chrome on Win 7 and it will download all the story data but not download the images. I just get the csv file. Anything I can do to troubleshoot it?

    ReplyDelete
  25. I just updated my Chrome, and I am now seeing the same issue. It is redirecting to the first image instead of downloading it.
    It is possible this is a Chrome issue, or an ancestry issue, you could try rolling back to an earlier version of Chrome, which does not sound fun.
    Otherwise, as soon as I get time I will try to figure out the issue and fix it.

    If you are a javascript developer, all Chrome extensions get downloaded to your computer upon install, which gives you the source code, and you can modify and re-package the extension on your own.

    ReplyDelete
    Replies
    1. I have the same issue and it used to work until the last week or so

      Delete
    2. It looks like ancestry.com has made two recent changes to their site. The first change is that they started using a separate subdomain to serve up their images. This breaks the javascript i was using.

      However, in doing this change, they also seem to have broken their own site. Clicking on the image in ancestry.com no longer brings it up for me.

      I think we are going to have to wait until they fix their site and things settle down to see if there is going to be an option to bring my plugin back to life.

      Delete
  26. Hi, I'm using ancestry.com.au and the download button doesn't work at all? Cheers

    ReplyDelete
    Replies
    1. .au is probably not one of the supported domains. Currently ancestry.com has changed their structure taking advantage of a new security protocol introduced into Chrome which breaks this extension.

      I have not had time to find a successful hack around this. At the moment it looks like it really will have to be a hack to reduce browser security if I am ever able to get it working again.

      Delete
  27. Hi, Jereme,
    I've just downloaded your extension and followed the instructions for downloading photos from ancestry.com. When I click on "Download Media" (after having navigated to the ancestry.com media photos page), nothing happens. Any ideas?

    ReplyDelete

Please leave your thoughts, I love hearing what you got out of the post. Spam comments will be removed.