Research » Bookmaking and Archiving Content
Best Free Site Archiving Services
archive.today
Help:Using archive.today - Wikipedia
Blazingly fast and Memento API compatible Link to the most recent URL archive http://archive.is/http://en.wikipedia.org/
archive.today - Wikipedia https://archive.is/?url=https://www.cnc24.com/
Comparison between the Wayback Machine and Archive.Today - EverybodyWiki Bios & Wiki
Internet Archive aka Wayback Machine
https://web.archive.org/web/wikipedia.org
Arquivo.pt
Resources:
Web archives support the Memento protocol natively: Memento Depot
Archive Wallabag Links Forever
Adding an URL to Wallabag
Is complicated more than simple cURL
https://doc.wallabag.org/en/developer/api/oauth.html
Obtain API token
API token is expiring and is obtained from Client ID, Client Secret, username and user password:
|
|
Create a client_id+client_secret to access Wallabag via API and then run the curl (the password must be URL encoded):
|
|
Again: Password must be URL encoded.
Then add an URL
cURL request must be POST and not GET, so the following WON’T work: … this one is not working:
|
|
This one works:
|
|
Archiving URL’s that were posted to Wallabag
Everything is bookmarked using Wallabag and then via RSS distributed to other archival engines.
- Save to Wallabag. Expose RSS feed of URLs
- Also save to Notion, separate database “The New Pocket”, just in case
- Also save in Archive.org for the best
I’ve implemented most of the automation tasks using pipedream.com and ifttt.com
In the future, I could host my own ArchiveBox as it is amazing. That will be only 2-step process.
Archive.org
Posting URL to Archive.org:
|
|
Za archive.today, nije tako lako: archiving - How do I archive a webpage to archive.today using wget or curl? - Web Applications Stack Exchange wabarc/archive.is: A command-line tool and Go package for wayback webpage to archive.today
jjjake/internetarchive: A Python and Command-Line Interface to Archive.org
Wallabag
Quite serious competitors:
Pinboard & Pocket Alternatives
-
Shiori in Go, has web extensions and phone problem solve with amazing HTTP Shortcuts
-
shaarli/Shaarli: The personal, minimalist, super-fast, database free, bookmarking service in PHP
-
MarceauKa/shaark: Self-hosted platform to keep and share your content: web links, posts, passwords and pictures. inspired by Shaarli, built with Laravel and Vue.js
-
Reminiscence is Python
-
Pincone → Bookmark manager for teams. • Pincone but Personal Pincone is 100% free.
Paid
-
Pocket košta oko 3.4€/m
-
Raindrop.io — Keep your favorites handy, nije skup: oko 2.7€/m
-
Bookmarks.io - Better Bookmarking is defunct
-
Odličan tekst o izboru Article Extraction libraryja, odnosno keyword extractors koje su koristili: Unsupervised Auto-labeling of Websites • Pincone Inače, autori su Zagrebačka firma Ars Futura – product design, mobile and web app development agency
Wallabag Setup on ISPConfig
Wallabag is open-source self-hostable Pocket alternative with full-text search, text and media archiving and a lot of clients for all the mobile platforms and e-book Kindle platform.
Must set internal option to “Download images locally”.
Nginx options in ISPConfig:
client_max_body_size 3000m;
##subroot web ##
location / {
try_files $uri /app.php$is_args$args;
}
location ~ \.php$ {
fastcgi_split_path_info ^(.+\.php)(/.*)$;
{FASTCGIPASS}
include /etc/nginx/fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_param DOCUMENT_ROOT $document_root;
internal;
}
Symfony
https://symfony.com/doc/current/setup/web_server_configuration.html#nginx
Installation on a shared hosting
The static package requires each command to be appended by --env=prod
as the static package is only usable as a prod environment.
You must create your first user by using the command php bin/console wallabag:install --env=prod
If an error occurs at this step due to bad settings, you must clear the cache with php bin/console cache:clear --env=prod
before you try again the previous command.
Default configuration uses MySQL for the database and database setting are inside app/config/parameters.yml
. Passwords must be surrounded by single quotes (’).
Same config file is used for other configuration options like domain_name
and others.
Importing
https://doc.wallabag.org/en/admin/asynchronous.html#launch-redis-consumer
Let some files being executable:
cd web/bin chmod g+x *
Trigger import:
bin/console wallabag:import:redis-worker –env=prod pinboard -vv » import-pinboard.log
New entrant in 2023:
Briefkasten Bookmarks Briefkasten is a very attractive and open-source project that is well coded. The only limitation is that the “full-text” search function only searches the description of the links and not the content of the pages linked to.
Github: ndom91/briefkasten
How to save a web page on the Internet Archive? | LearnTips
Web Archiving Services | AlternativeTo
Spisak svih arhivera:
Archive Aggregators: servis koji pretražuje sve arhive i daje ti koja ima arhivirano a koja nema
- Memento Time Travel - to samo daje spisak sa drugih, agregator
- Tutorial: Back Up a Web Page or Web Site – Data Horde
- Cached Pages
- CachedView
Alternatives Ghostarchive, a website archive - savršeno, uses ReplayWeb.page from Webrecorder Tools FreezePage - čudno Library of Congress Web Archives - ne može da se arhivira? Stanford Web Archive Portal - ne može da se arhivira?
Wayback Machine Wayback Machine (All) Google Cache Google Cache (Text-Only) Bing Cache Yandex Cache Archive.is Archive.is (All)
Gigablast Cache Yahoo Japan Cache Megalodon Baidu Snapshot Yahoo Cache Qihoo 360 Search Snapshot Mail.ru Cache
Valjda ovako? http://timetravel.mementoweb.org/memento/2022/https://www.cnc24.com/ a ne može kasnije
? Archive.St - Free web page archiving service
Da arhiviram sam sa wget?
Tutorial: Back Up a Web Page or Web Site – Data Horde
jsvine/waybackpack: Download the entire Wayback Machine archive for a given URL.
motherboardgithub/mass_archive: A basic tool for pushing a web page to multiple archiving services at once. Python tool for pushing a web page to multiple archiving services at once
oduwsdl/archivenow: A Tool To Push Web Resources Into Web Archives
ArchiveReady.com is a very interesting idea and tool that checks how suitable a website is for archiving, it’s known as “Website Archivability Evaluation Tool"Tool”
Browser Extensions
Archive Site Extensions
-
thefoofighter/The-Archiver-WebExtension both for Firefox and Chrome, will archive on Archive.org and Archive.Today
-
rahiel/archiveror for Firefox works on Archive.org and Archive.Today plus some more. In Chrome, it can make local copies of webpages in a single MHTML file using Ctrl+Shift+S. For Firefox consider the “Save Page WE” add-on.
-
Archive Page for Firefox and Chrome is only for Archive.Today, both for archiving and searching for archive
-
tjhorner/archivebox-exporter for Firefox and Chrome can send pages from your browser to your ArchiveBox self-hosted archiver
-
AaronLenoir/SendToArchive is basic Firefox extension for Archive.org
-
arantius/resurrect-pages searches through page archives on Firefox and some fork? Albirew/resurrect-pages-isup-edition
-
jonathanmccann/archive-url-firefox-addon is Firefox only and only Archive.org
Retrieve Archive Extensions
- dessant/web-archives for Firefox and Chrome is extension for viewing archived and cached versions of web pages
a button to the Mozilla Firefox toolbar. When clicked, it sends the URL of the current tab to archive.today to preserve a snapshot of the page, and opens the result in a new tab. Adds a button to the Mozilla Firefox toolbar. When clicked, it sends the URL of the current tab to archive.today to preserve a snapshot of the page, and opens the result in a new tab.
Tool:
- ArchiveTeam/grab-site: The archivist’s web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
- birros/web-archives: A web archives reader
Stephen Ostermiller’s Cache Bookmarklets
JS Small: Stephen Ostermiller’s Cache Bookmarklets
- Resurrect Pages arantius/resurrect-pages: A tool to expose cached copies of webpages, especially when they are unavailable.
Android dodaci
- Share2Archive will open web page in Archive.Today using the default browser
- PaperSpan nije nešto epohalan
Update Wallabag
make update
only runs fine if you installed wallabag using git. If you installed it using the shared hosting way (the tar archive) you have to follow the manual process: https://doc.wallabag.org/en/admin/upgrade.html#upgrade-on-a-shared-hosting
runuser www-data -s /bin/bash
# standard permissions
chmod 755 ./{web,var,bin,vendor,app/config} -R
# more permissions
chmod -R 775 var/{logs,cache}/
bin/console cache:clear --env=prod
# chown -R www-data:www-data /srv/wallabag/{web,var,bin,vendor,app/config}
cat /etc/group | grep www-data
# ad web27 to group www-data
usermod -a -G www-data web27
# log
tail var/logs/prod.log -f -n0
Omnivore
Omnivore je najzad savršena Wallabag alternativa koja radi sa Puppeteer i to baš sve šta želim. Self-hosted je sa repo na omnivore-app/omnivore: Omnivore is a complete, open source read-it-later solution for people who like reading. ali možeš za sada besplatno da koristiš njihovu hosted verziju. Dokumentacija je ovde Docs a napominjem da omnivore will perform full text search across library item’s content, title, description, and site by default - read here Search | Omnivore Docs
rubiojr/omnivore-exporter: omnivore.app article exporter
https://github.com/Y2Z/monolith which you could use to archive the entire HTML page (including images, CSS, etc) without re-inventing the wheel. Vrlo sličan je i go-shiori/obelisk: Go package and CLI tool for saving web page as single HTML file
Monolith – CLI tool for saving complete web pages as a single HTML file | Hacker News
Linkwarden with repo linkwarden/linkwarden: ⚡️⚡️⚡️Self-hosted collaborative bookmark manager to collect, organize, and preserve webpages and articles.
Hoarder with repo hoarder-app/hoarder: A self-hostable bookmark-everything app (links, notes and images) with AI-based automatic tagging and full text search za AI Tagging koristi Ollama integration
MyMemo-Empower Your Mind with AI