Research » Bookmaking and Archiving Content

Best Free Site Archiving Services

Resources:

Web archives support the Memento protocol natively: Memento Depot

Archive Wallabag Links Forever

Adding an URL to Wallabag

Is complicated more than simple cURL

https://doc.wallabag.org/en/developer/api/oauth.html

Obtain API token

API token is expiring and is obtained from Client ID, Client Secret, username and user password:

1

curl -s "https://pocket.cvladan.com/oauth/v2/token?grant_type=password&client_id=3_1jkgq7s2if8k84gskwwwcc00ss8ggwgsgo4s0owsw80w04w4ks&client_secret=nlc6w6c065cg0g8o84sks8og04gg008s48w8w4ws40g8ww0ss&username=cvladan&password=kr5manija+Padobran"

Create a client_id+client_secret to access Wallabag via API and then run the curl (the password must be URL encoded):

1
2
3
4
5


$ curl -i "https://pocket.cvladan.com/oauth/v2/token?grant_type=password
  &client_id=<the_client_id> \
  &client_secret=<the_client_secret> \
  &username=<username> \
  &password=<urlencoded_password>

Again: Password must be URL encoded.

Then add an URL

cURL request must be POST and not GET, so the following WON’T work: … this one is not working:

1

curl "https://pocket.cvladan.com/api/entries.json?access_token=<access_token>&url=<url>"

This one works:

1

curl -d 'access_token=<access_token>&url=<url>' https://pocket.cvladan.com/api/entries.json

Archiving URL’s that were posted to Wallabag

Everything is bookmarked using Wallabag and then via RSS distributed to other archival engines.

Save to Wallabag. Expose RSS feed of URLs
Also save to Notion, separate database “The New Pocket”, just in case
Also save in Archive.org for the best

I’ve implemented most of the automation tasks using pipedream.com and ifttt.com

In the future, I could host my own ArchiveBox as it is amazing. That will be only 2-step process.

Archive.org

Posting URL to Archive.org:

1

curl https://web.archive.org/save/https://www.cnc24.com/

Za archive.today, nije tako lako: archiving - How do I archive a webpage to archive.today using wget or curl? - Web Applications Stack Exchange wabarc/archive.is: A command-line tool and Go package for wayback webpage to archive.today

jjjake/internetarchive: A Python and Command-Line Interface to Archive.org

Wallabag

awesome-selfhosted/awesome-selfhosted: A list of Free Software network services and web applications which can be hosted on your own servers

Quite serious competitors:

Pinboard & Pocket Alternatives

Shiori in Go, has web extensions and phone problem solve with amazing HTTP Shortcuts
shaarli/Shaarli: The personal, minimalist, super-fast, database free, bookmarking service in PHP
MarceauKa/shaark: Self-hosted platform to keep and share your content: web links, posts, passwords and pictures. inspired by Shaarli, built with Laravel and Vue.js
ArchiveBox Web Archiving Community
Conifer
Reminiscence is Python
Webrecorder | Webrecorder Webrecorder | Tools
WebCrate is a tool on Deta as personal Cloud
WebBites - A modern bookmarking service
Pincone → Bookmark manager for teams. • Pincone but Personal Pincone is 100% free.

Paid

Pocket košta oko 3.4€/m
Raindrop.io — Keep your favorites handy, nije skup: oko 2.7€/m
Dropmark | Organize, collaborate, and share online
REVISIT.IO - Better bookmarking
Bookmarks.io - Better Bookmarking is defunct
WorldBrain’s Memex
booky.io
Matter
WebSatchel
Conifer = Webrecorder.io

Odličan tekst o izboru Article Extraction libraryja, odnosno keyword extractors koje su koristili: Unsupervised Auto-labeling of Websites • Pincone Inače, autori su Zagrebačka firma Ars Futura – product design, mobile and web app development agency

scrapinghub/article-extraction-benchmark: Article extraction benchmark: dataset and evaluation scripts

Wallabag Setup on ISPConfig

Wallabag is open-source self-hostable Pocket alternative with full-text search, text and media archiving and a lot of clients for all the mobile platforms and e-book Kindle platform.

Must set internal option to “Download images locally”.

Nginx options in ISPConfig:

client_max_body_size 3000m;

##subroot web ##

location / {
    try_files $uri /app.php$is_args$args;
}

location ~ \.php$ {
  fastcgi_split_path_info ^(.+\.php)(/.*)$;
  {FASTCGIPASS}
  include /etc/nginx/fastcgi_params;
  fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
  fastcgi_param DOCUMENT_ROOT $document_root;
  internal;
}

Symfony

https://symfony.com/doc/current/setup/web_server_configuration.html#nginx

Installation on a shared hosting

The static package requires each command to be appended by --env=prod as the static package is only usable as a prod environment. You must create your first user by using the command php bin/console wallabag:install --env=prod If an error occurs at this step due to bad settings, you must clear the cache with php bin/console cache:clear --env=prod before you try again the previous command.

Default configuration uses MySQL for the database and database setting are inside app/config/parameters.yml. Passwords must be surrounded by single quotes (’). Same config file is used for other configuration options like domain_name and others.

Importing

https://doc.wallabag.org/en/admin/asynchronous.html#launch-redis-consumer

Let some files being executable:

cd web/bin chmod g+x *

Trigger import:

bin/console wallabag:import:redis-worker –env=prod pinboard -vv » import-pinboard.log

New entrant in 2023:

Briefkasten Bookmarks Briefkasten is a very attractive and open-source project that is well coded. The only limitation is that the “full-text” search function only searches the description of the links and not the content of the pages linked to.

Github: ndom91/briefkasten

How to save a web page on the Internet Archive? | LearnTips

Web Archiving Services | AlternativeTo

Spisak svih arhivera:

Archive Aggregators: servis koji pretražuje sve arhive i daje ti koja ima arhivirano a koja nema

Memento Time Travel - to samo daje spisak sa drugih, agregator
Tutorial: Back Up a Web Page or Web Site – Data Horde
Cached Pages
CachedView

Alternatives Ghostarchive, a website archive - savršeno, uses ReplayWeb.page from Webrecorder Tools FreezePage - čudno Library of Congress Web Archives - ne može da se arhivira? Stanford Web Archive Portal - ne može da se arhivira?

Wayback Machine Wayback Machine (All) Google Cache Google Cache (Text-Only) Bing Cache Yandex Cache Archive.is Archive.is (All)

Gigablast Cache Yahoo Japan Cache Megalodon Baidu Snapshot Yahoo Cache Qihoo 360 Search Snapshot Mail.ru Cache

Valjda ovako? http://timetravel.mementoweb.org/memento/2022/https://www.cnc24.com/ a ne može kasnije

? Archive.St - Free web page archiving service

Da arhiviram sam sa wget?

Tutorial: Back Up a Web Page or Web Site – Data Horde

Common Crawl

jsvine/waybackpack: Download the entire Wayback Machine archive for a given URL.

motherboardgithub/mass_archive: A basic tool for pushing a web page to multiple archiving services at once. Python tool for pushing a web page to multiple archiving services at once

oduwsdl/archivenow: A Tool To Push Web Resources Into Web Archives

ArchiveReady.com is a very interesting idea and tool that checks how suitable a website is for archiving, it’s known as “Website Archivability Evaluation Tool"Tool”

Browser Extensions

Archive Site Extensions

thefoofighter/The-Archiver-WebExtension both for Firefox and Chrome, will archive on Archive.org and Archive.Today
rahiel/archiveror for Firefox works on Archive.org and Archive.Today plus some more. In Chrome, it can make local copies of webpages in a single MHTML file using Ctrl+Shift+S. For Firefox consider the “Save Page WE” add-on.
Archive Page for Firefox and Chrome is only for Archive.Today, both for archiving and searching for archive
tjhorner/archivebox-exporter for Firefox and Chrome can send pages from your browser to your ArchiveBox self-hosted archiver
AaronLenoir/SendToArchive is basic Firefox extension for Archive.org
arantius/resurrect-pages searches through page archives on Firefox and some fork? Albirew/resurrect-pages-isup-edition
Official Archive.org extensions for Firefox and Chrome
jonathanmccann/archive-url-firefox-addon is Firefox only and only Archive.org

Retrieve Archive Extensions

dessant/web-archives for Firefox and Chrome is extension for viewing archived and cached versions of web pages

a button to the Mozilla Firefox toolbar. When clicked, it sends the URL of the current tab to archive.today to preserve a snapshot of the page, and opens the result in a new tab. Adds a button to the Mozilla Firefox toolbar. When clicked, it sends the URL of the current tab to archive.today to preserve a snapshot of the page, and opens the result in a new tab.

Tool:

Stephen Ostermiller’s Cache Bookmarklets

JS Small: Stephen Ostermiller’s Cache Bookmarklets

Resurrect Pages arantius/resurrect-pages: A tool to expose cached copies of webpages, especially when they are unavailable.

Android dodaci

Share2Archive will open web page in Archive.Today using the default browser

PaperSpan nije nešto epohalan

Update Wallabag

make update only runs fine if you installed wallabag using git. If you installed it using the shared hosting way (the tar archive) you have to follow the manual process: https://doc.wallabag.org/en/admin/upgrade.html#upgrade-on-a-shared-hosting

runuser www-data -s /bin/bash

# standard permissions
chmod 755 ./{web,var,bin,vendor,app/config} -R

# more permissions
chmod -R 775 var/{logs,cache}/

bin/console cache:clear --env=prod

# chown -R www-data:www-data /srv/wallabag/{web,var,bin,vendor,app/config}

cat /etc/group | grep www-data

# ad web27 to group www-data
usermod -a -G www-data web27


# log
tail var/logs/prod.log -f -n0

Omnivore

Omnivore je najzad savršena Wallabag alternativa koja radi sa Puppeteer i to baš sve šta želim. Self-hosted je sa repo na omnivore-app/omnivore: Omnivore is a complete, open source read-it-later solution for people who like reading. ali možeš za sada besplatno da koristiš njihovu hosted verziju. Dokumentacija je ovde Docs a napominjem da omnivore will perform full text search across library item’s content, title, description, and site by default - read here Search | Omnivore Docs

rubiojr/omnivore-exporter: omnivore.app article exporter

https://github.com/Y2Z/monolith which you could use to archive the entire HTML page (including images, CSS, etc) without re-inventing the wheel. Vrlo sličan je i go-shiori/obelisk: Go package and CLI tool for saving web page as single HTML file

Monolith – CLI tool for saving complete web pages as a single HTML file | Hacker News

Linkwarden with repo linkwarden/linkwarden: ⚡️⚡️⚡️Self-hosted collaborative bookmark manager to collect, organize, and preserve webpages and articles.

Hoarder with repo hoarder-app/hoarder: A self-hostable bookmark-everything app (links, notes and images) with AI-based automatic tagging and full text search za AI Tagging koristi Ollama integration

MyMemo-Empower Your Mind with AI

date 09. Nov 2022 | modified 13. Feb 2025

filename: Research » Bookmarking and Archiving Content