Research » Bookmaking and Archiving Content

Research » Bookmaking and Archiving Content

Best Free Site Archiving Services

archive.today

Help:Using archive.today - Wikipedia

Blazingly fast and Memento API compatible Link to the most recent URL archive http://archive.is/http://en.wikipedia.org/

archive.today - Wikipedia https://archive.is/?url=https://www.cnc24.com/

Comparison between the Wayback Machine and Archive.Today - EverybodyWiki Bios & Wiki

Internet Archive aka Wayback Machine

Latest URL archive

https://web.archive.org/web/wikipedia.org

Arquivo.pt

Resources:

Web archives support the Memento protocol natively: Memento Depot


Archive Wallabag Links Forever

Adding an URL to Wallabag

Is complicated more than simple cURL

https://doc.wallabag.org/en/developer/api/oauth.html

Obtain API token

API token is expiring and is obtained from Client ID, Client Secret, username and user password:

1
curl -s "https://pocket.cvladan.com/oauth/v2/token?grant_type=password&client_id=3_1jkgq7s2if8k84gskwwwcc00ss8ggwgsgo4s0owsw80w04w4ks&client_secret=nlc6w6c065cg0g8o84sks8og04gg008s48w8w4ws40g8ww0ss&username=cvladan&password=kr5manija+Padobran"

Create a client_id+client_secret to access Wallabag via API and then run the curl (the password must be URL encoded):

1
2
3
4
5
$ curl -i "https://pocket.cvladan.com/oauth/v2/token?grant_type=password
  &client_id=<the_client_id> \
  &client_secret=<the_client_secret> \
  &username=<username> \
  &password=<urlencoded_password>

Again: Password must be URL encoded.

Then add an URL

cURL request must be POST and not GET, so the following WON’T work: … this one is not working:

1
curl "https://pocket.cvladan.com/api/entries.json?access_token=<access_token>&url=<url>"

This one works:

1
curl -d 'access_token=<access_token>&url=<url>' https://pocket.cvladan.com/api/entries.json

Archiving URL’s that were posted to Wallabag

Everything is bookmarked using Wallabag and then via RSS distributed to other archival engines.

  1. Save to Wallabag. Expose RSS feed of URLs
  2. Also save to Notion, separate database “The New Pocket”, just in case
  3. Also save in Archive.org for the best

I’ve implemented most of the automation tasks using pipedream.com and ifttt.com

In the future, I could host my own ArchiveBox as it is amazing. That will be only 2-step process.

Archive.org

Posting URL to Archive.org:

1
curl https://web.archive.org/save/https://www.cnc24.com/

Za archive.today, nije tako lako: archiving - How do I archive a webpage to archive.today using wget or curl? - Web Applications Stack Exchange wabarc/archive.is: A command-line tool and Go package for wayback webpage to archive.today

jjjake/internetarchive: A Python and Command-Line Interface to Archive.org


Wallabag

awesome-selfhosted/awesome-selfhosted: A list of Free Software network services and web applications which can be hosted on your own servers

Quite serious competitors:

Pinboard & Pocket Alternatives



Wallabag Setup on ISPConfig

Wallabag is open-source self-hostable Pocket alternative with full-text search, text and media archiving and a lot of clients for all the mobile platforms and e-book Kindle platform.

Must set internal option to “Download images locally”.

Nginx options in ISPConfig:

client_max_body_size 3000m;

##subroot web ##

location / {
    try_files $uri /app.php$is_args$args;
}

location ~ \.php$ {
  fastcgi_split_path_info ^(.+\.php)(/.*)$;
  {FASTCGIPASS}
  include /etc/nginx/fastcgi_params;
  fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
  fastcgi_param DOCUMENT_ROOT $document_root;
  internal;
}

Symfony

https://symfony.com/doc/current/setup/web_server_configuration.html#nginx


Installation on a shared hosting

The static package requires each command to be appended by --env=prod as the static package is only usable as a prod environment. You must create your first user by using the command php bin/console wallabag:install --env=prod If an error occurs at this step due to bad settings, you must clear the cache with php bin/console cache:clear --env=prod before you try again the previous command.

Default configuration uses MySQL for the database and database setting are inside app/config/parameters.yml. Passwords must be surrounded by single quotes (’). Same config file is used for other configuration options like domain_name and others.


Importing

https://doc.wallabag.org/en/admin/asynchronous.html#launch-redis-consumer

Let some files being executable:

cd web/bin chmod g+x *

Trigger import:

bin/console wallabag:import:redis-worker –env=prod pinboard -vv » import-pinboard.log


New entrant in 2023:

Briefkasten Bookmarks Briefkasten is a very attractive and open-source project that is well coded. The only limitation is that the “full-text” search function only searches the description of the links and not the content of the pages linked to.

Github: ndom91/briefkasten


How to save a web page on the Internet Archive? | LearnTips


Web Archiving Services | AlternativeTo


Spisak svih arhivera:

Archive Aggregators: servis koji pretražuje sve arhive i daje ti koja ima arhivirano a koja nema


Alternatives Ghostarchive, a website archive - savršeno, uses ReplayWeb.page from Webrecorder Tools FreezePage - čudno Library of Congress Web Archives - ne može da se arhivira? Stanford Web Archive Portal - ne može da se arhivira?

Wayback Machine Wayback Machine (All) Google Cache Google Cache (Text-Only) Bing Cache Yandex Cache Archive.is Archive.is (All)

Gigablast Cache Yahoo Japan Cache Megalodon Baidu Snapshot Yahoo Cache Qihoo 360 Search Snapshot Mail.ru Cache

Valjda ovako? http://timetravel.mementoweb.org/memento/2022/https://www.cnc24.com/ a ne može kasnije


? Archive.St - Free web page archiving service


Da arhiviram sam sa wget?

Tutorial: Back Up a Web Page or Web Site – Data Horde


Common Crawl


jsvine/waybackpack: Download the entire Wayback Machine archive for a given URL.


motherboardgithub/mass_archive: A basic tool for pushing a web page to multiple archiving services at once. Python tool for pushing a web page to multiple archiving services at once

oduwsdl/archivenow: A Tool To Push Web Resources Into Web Archives


ArchiveReady.com is a very interesting idea and tool that checks how suitable a website is for archiving, it’s known as “Website Archivability Evaluation Tool"Tool”


Browser Extensions

Archive Site Extensions
Retrieve Archive Extensions

a button to the Mozilla Firefox toolbar. When clicked, it sends the URL of the current tab to archive.today to preserve a snapshot of the page, and opens the result in a new tab. Adds a button to the Mozilla Firefox toolbar. When clicked, it sends the URL of the current tab to archive.today to preserve a snapshot of the page, and opens the result in a new tab.

Tool:

Stephen Ostermiller’s Cache Bookmarklets

JS Small: Stephen Ostermiller’s Cache Bookmarklets

Android dodaci

  • Share2Archive will open web page in Archive.Today using the default browser


Update Wallabag

make update only runs fine if you installed wallabag using git. If you installed it using the shared hosting way (the tar archive) you have to follow the manual process: https://doc.wallabag.org/en/admin/upgrade.html#upgrade-on-a-shared-hosting

runuser www-data -s /bin/bash

# standard permissions
chmod 755 ./{web,var,bin,vendor,app/config} -R

# more permissions
chmod -R 775 var/{logs,cache}/

bin/console cache:clear --env=prod

# chown -R www-data:www-data /srv/wallabag/{web,var,bin,vendor,app/config}
cat /etc/group | grep www-data

# ad web27 to group www-data
usermod -a -G www-data web27


# log
tail var/logs/prod.log -f -n0


Omnivore

Omnivore je najzad savršena Wallabag alternativa koja radi sa Puppeteer i to baš sve šta želim. Self-hosted je sa repo na omnivore-app/omnivore: Omnivore is a complete, open source read-it-later solution for people who like reading. ali možeš za sada besplatno da koristiš njihovu hosted verziju. Dokumentacija je ovde Docs a napominjem da omnivore will perform full text search across library item’s content, title, description, and site by default - read here Search | Omnivore Docs

rubiojr/omnivore-exporter: omnivore.app article exporter

https://github.com/Y2Z/monolith which you could use to archive the entire HTML page (including images, CSS, etc) without re-inventing the wheel. Vrlo sličan je i go-shiori/obelisk: Go package and CLI tool for saving web page as single HTML file

Monolith – CLI tool for saving complete web pages as a single HTML file | Hacker News


Linkwarden with repo linkwarden/linkwarden: ⚡️⚡️⚡️Self-hosted collaborative bookmark manager to collect, organize, and preserve webpages and articles.


Hoarder with repo hoarder-app/hoarder: A self-hostable bookmark-everything app (links, notes and images) with AI-based automatic tagging and full text search za AI Tagging koristi Ollama integration


MyMemo-Empower Your Mind with AI


date 09. Nov 2022 | modified 17. Aug 2024
filename: Research » Bookmarking and Archiving Content