archive.org urls of interest: urls marked with an * need to be logged into archive.org with a userid urls marked with a ** need to be logged into archive.org as the owner of the collection or item Documentary Film segment about Archive,org https://archive.org/details/archive_documentary_internet_archive_sequence Create an Internet Archive account https://archive.org/account/login.createaccount.php Interactive Upload Form https://archive.org/upload/ How archive.org items are structured: https://blog.archive.org/2011/03/31/how-archive-org-items-are-structured/ Presetting metadata with the new Uploader https://blog.archive.org/2013/02/08/presetting-metadata-with-the-new-beta-uploader/ https://archive.org/upload/?collection=access_humboldt Bulk Uploader Program uses a CSV (Comma Separated Value) file for metadata fields https://github.com/kngenie/ias3upload A Python Interface to archive.org https://github.com/jjjake/ia-wrapper https://internetarchive.readthedocs.io/en/latest/ http://raj.blog.archive.org/2013/08/08/a-python-interface-to-archive-org/ see also: https://github.com/vmbrasseur/iadownload https://github.com/vmbrasseur/iaupload what step are my submitted items in** https://archive.org/catalog.php?justme=1 https://catalogd.archive.org/catalog.php?all=1&search_submitter=uploads@accesshumboldt.net search results page https://archive.org/search.php?query=access%20humboldt%20AND%20collection%3Aopensource_movies&page=1 advanced search results page format:mpeg2, most recent 50 items, descending dateadded order, cclicense, mediatype:movies https://archive.org/advancedsearch.php?q=format%3Ampeg2+AND+mediatype%3Amovies+AND+licenseurl%3A[http%3A%2F%2Fcreativecommons.org%2Fa+TO+http%3A%2F%2Fcreativecommons.org%2Fz]&fl[]=collection&fl[]=description&fl[]=downloads&fl[]=identifier&fl[]=licenseurl&fl[]=publicdate&fl[]=subject&fl[]=title&sort[]=addeddate+desc&sort[]=&sort[]=&rows=50&page=1&callback=callback&save=yes&output=tables advanced search results format:mpeg2, most recent 500 items, descending date added order, cclicence, mediatype:moveis https://archive.org/advancedsearch.php?q=format%3Ampeg2+AND+mediatype%3Amovies+AND+licenseurl%3A[http%3A%2F%2Fcreativecommons.org%2Fa+TO+http%3A%2F%2Fcreativecommons.org%2Fz]&fl[]=collection&fl[]=description&fl[]=downloads&fl[]=identifier&fl[]=licenseurl&fl[]=publicdate&fl[]=subject&fl[]=title&sort[]=addeddate+desc&sort[]=&sort[]=&rows=500&page=1&callback=callback&save=yes&output=tables details page https://archive.org/details/AH-949-1-Tour_of_the_Ten_Ninety_One items page "Show All " hyperlink on right center side of page item manager page** https://archive.org/item-mgr.php?identifier=AH-949-1-Tour_of_the_Ten_Ninety_One item history page* https://archive.org/catalog.php?history=1&identifier=AH-949-1-Tour_of_the_Ten_Ninety_One log_show page from task id for derive.php task on item history page* https://catalogd.us.archive.org/log_show.php?task_id=31660938 username=jhauser%40pobox.com&password=i8b4you%21Ia&remember=CHECKED&referer=https%3A%2F%2Farchive.org%2Fdetails%2FRSA_Replay_-_Professions_and_the_Value_of_Virtue&action=login&submit=Log+in edit item page** https://archive.org/editxml.php?type=movies&edit_item=AH-949-1-Tour_of_the_Ten_Ninety_One item derive queue** https://archive.org/catalog.php?whereami=1 download MPEG2 url https://archive.org/download/AH-949-1-Tour_of_the_Ten_Ninety_One/AH-949-1-Tour_of_the_Ten_Ninety_One.mpeg download the thumbnail used as the icon for an item: https://archive.org/services/img/AH-949-1-Tour_of_the_Ten_Ninety_One all file formats page https://archive.org/services/find_file.php?file=AH-949-1-Tour_of_the_Ten_Ninety_One&loconly=1 all metadata in one place https://archive.org/services/find_file.php?file=AH-949-1-Tour_of_the_Ten_Ninety_One google search "access humboldt" site:archive.org https://www.google.com/search?hl=en&q=%22access+humboldt%22+site%3Aarchive.org&btnG=Search Wolfram Alpha - Bandwidth Calculation - Time to upload 2.2GB at 330KB/sec - Answer: 1 hour, 51 minutes https://www.wolframalpha.com/input/?i=time+to+transfer+2.2GB+at+330KB%2Fsec Wolfram Alpha - Bandwidth Calculation - Bandwidth Needed to Upload 2.2GB in 1 Hour - Answer: 4.89 Mb/s or 611kB/sec https://www17.wolframalpha.com/input/?i=2.2GB+in+1+hour Open Office Calc/Microsoft Excel Hyperlink Functions =HYPERLINK("https://archive.org/details/" &A2) # link to details page for identifier in A2 =HYPERLINK("https://archive.org/download/" & A2 &"/" & D2, "download link") # link to download MPEG2 version when id is in cell A2 and filename is in cell D2 =HYPERLINK("https://archive.org/download/" &A2 & "/format=MPEG2", "download link") - link to MPEG2 version =HYPERLINK("https://archive.org/download/" &A2 & "/format=MPEG2&avinfo=1", "avinfo link") - link to avinfo for MPEG2 version =HYPERLINK("https://archive.org/download/" &A2 & "/format=h.264", "stream link") - link to transcoded h.264 MPEG4 version Internet Archive - Derivatives for Movie Items https://archive.org/help/derivatives.php Internet Archive - JSON Interface https://archive.org/help/json.php Internet Archive S3-Like API https://archive.org/help/abouts3.txt Community Media Archive Wiki https://accesshumboldt.net/wiki/index.php/Community_media_archive Community Media Archive on the Roku to add the private Internet Archive collection (the Nowhere Archive) that contains the CMA to your Roku https://owner.roku.com/add/NMJS5 http://accesshumboldt.net/Community_Media_Archive/ACM_2013_Presentation/Roku_Pics_2-Large.m4v What's New at The Internet Archive Blog http://blog.archive.org Internet Archive Forum: Announcements https://archive.org/iathreads/forum-display.php?forum=announcements Internet Archive Forum: Forums https://archive.org/iathreads/forum-display.php?forum=forums Bulk Uploader Program uses a CSV (Comma Separated Value) file for metadata fields https://github.com/kngenie/ias3upload Internet Archive's Video Derive ffmpeg Process https://github.com/traceypooh/deriver-archive modern ffmpeg parameters for letterboxing 16:9 H264 MPEG4 format into 4:3 MPEG2 format used by Tedtalks, Poptech: ffmpeg -i '/media/data/khanacademy.org/Khan Academy/Finance - 01 - Core Finance/025 Credit Default Swaps (CDS) Intro.mp4' -ac 2 -acodec mp2 -ar 48000 -ab 256k -vcodec mpeg2video -r 29.97 -g 15 -b:v 5000k -maxrate 6000k -minrate 4000k -bufsize 635k -packetsize 2048 -muxrate 10080k -vf scale=640:360,pad=640:480:0:60:black,scale=720:480 -aspect 4:3 '/media/data3/ka_mpegs/Finance - 01 - Core Finance/025 Credit Default Swaps (CDS) Intro.mpg' getting only certain formats in .zip files from items https://blog.archive.org/2012/10/30/getting-only-certain-formats-in-zip-files-from-items-new-feature/ getting all identifiers (CSV file ouput) for a collection: wget https://archive.org/advancedsearch.php?q=collection%3Aaccess_humboldt+\&fl[]=identifier\&sort[]=identifier+asc\&sort[]=\&sort[]=\&rows=5000\&page=1\&callback=callback\&save=yes\&output=csv -O ah_identifiers.csv getting an RSS feed for the last 50 items in a collection: https://archive.org/services/collection-rss.php?collection=access_humboldt getting an RSS feed for a series ("series" metadata element must exist for this to work) of items that have an MPEG2 version: https://archive.org/advancedsearch.php?q=series%3A%22Vantage+Point%22+AND+collection%3Aaccess_humboldt+AND+format%3AMPEG2&fl[]=identifier&sort[]=&sort[]=&sort[]=&rows=50&page=1&callback=callback&save=yes&output=rss getting metadata for all items for a collection: #!/bin/bash DATE=`date +"%Y-%m-%d"` echo $DATE MM=`date +%m` YYYY=`date +%Y` echo $MM echo $YYYY wget http://archive.org/advancedsearch.php?q=collection%3Aveteranshelpingveterans+\&fl[]=identifier\&sort[]=identifier+asc\&sort[]=\&sort[]=\&rows=5000\&page=1\&callback=callback\&save=yes\&output=csv -O vhvtv_identifiers.csv # for each identifier, download the meta.xml file, json output, all-in-one metadata xml, catalog, and the log_show output for the latest derive.php step # note the "archive_cookies" file used in the catalog history extract task needs to be a recent dump of the cookies.txt file from a browser used to login to the internet archive with your admin id sed 's/\"//g' vhvtv_identifiers.csv |grep -v ^identifier$ |sort >VHVtv_identifiers_sorted.txt for i in `cat VHVtv_identifiers_sorted.txt` do wget -nv https://archive.org/download/$i/format=MPEG2?avinfo=1 -O ../archive_stats/VHVtv/$i\_avinfo_json.txt wget -nv https://archive.org/download/$i/$i\_meta.xml -O ../archive_stats/VHVtv/$i\_meta.xml wget -nv https://archive.org/download/$i/$i\_files.xml -O ../archive_stats/VHVtv/$i\_files.xml wget -nv https://archive.org/download/$i/$i\_archive.torrent -O ../archive_stats/VHVtv/$i\_archive.torrent wget -nv https://archive.org/details/$i\&output=json -O ../archive_stats/VHVtv/$i\_json.txt wget -nv https://archive.org/metadata/$i -O ../archive_stats/VHVtv/$i\_metadata_json.txt wget -nv https://archive.org/services/find_file.php?file=$i -O ../archive_stats/VHVtv/$i\_item.xml wget -nv https://archive.org/services/img/$i -O ../archive_stats/VHVtv/$i.jpg wget -nv --load-cookies=archive_cookies https://archive.org/catalog.php?history=1\&identifier=$i -O - | \ tee $i\_catalog.txt |grep derive | cut -f1,7 -d, | sed 's/\"//g' | sed 's/\[//' > $i\_derive_taskid.txt head -1 $i\_derive_taskid.txt |awk -F, -f ./extract_runtime_from_log_show.awk > ./runtime.sh sh ./runtime.sh mv -v $i\_*.txt ../archive_stats/VHVtv done OSCON Presentation about Internet Archive APIs https://archive.org/details/oscon2015-ia-apis https://archive.org/download/oscon2015-ia-apis/Internet_Archive-Universal_Access_Open_APIs.pdf https://www.zotero.org/groups/internet_archive_-_open_apis_and_examples/items poor man's Granicus solution using archive.org hyperlinked meeting agenda: https://archive.org/download/AH-fcc_10-5-15/10-5-2015AgendaWithLinks.pdf "video clip" example showing start= and end= parameters https://archive.org/download/AH-fcc_10-5-15/AH-fcc_10-5-15.mp4?start=965&end=1334 an example of how to remove a metadata element id:REMOVE_TAG ia metadata The_Reverend_Shawn_Amos_on_KDRT-LP_s_Live_on_the_Loam --modify licenseurl:REMOVE_TAG longer format documentation http://internetarchive.readthedocs.org/en/latest/cli.html bulk downloading collections from archive.org: http://gareth.halfacree.co.uk/2013/04/bulk-downloading-collections-from-archive-org https://blog.archive.org/2012/04/26/downloading-in-bulk-using-wget/ Description of Gift Culture http://www.catb.org/esr/writings/homesteading/homesteading/ar01s06.html