124 lines
4.8 KiB
Org Mode
124 lines
4.8 KiB
Org Mode
|
#+TITLE: SingleFile
|
||
|
#+DATE: 2024-03-20
|
||
|
* Prologue
|
||
|
I am using the [[https://en.wikipedia.org/wiki/Zettelkasten#Use_in_personal_knowledge_management][Zettelkasten]] methodology for personal knowledge
|
||
|
management, with [[https://www.orgroam.com/][Org-roam]] to implement it.
|
||
|
|
||
|
While the Internet does not forget things in general, it might be
|
||
|
difficult to find things again. Maybe someone published some
|
||
|
information in a GitHub gist or has their whole blog there.
|
||
|
Suddenly GitHub falls out of favour because it got bought by an
|
||
|
evilcorp and information disappears.
|
||
|
|
||
|
For those reasons I store a link to the information directly, a link
|
||
|
to the [[https://web.archive.org/][Wayback Machine]] and a local copy. My Zettelkasten and all
|
||
|
references, including a local copy, are stored in git.
|
||
|
|
||
|
* SingleFile
|
||
|
[[https://github.com/gildas-lormeau/SingleFile][SingleFile]] is a Firefox[fn::Other browsers are supported as well, I
|
||
|
just think Firefox is the least-evil one.] extension to save a website,
|
||
|
including images and css, into a single HTML file. It's perfect for
|
||
|
personal archival purposes. When run as a plugin it will save the page
|
||
|
as it is currently rendered. For example dismissed cookie banners will
|
||
|
not be part of the safes page.
|
||
|
|
||
|
The downside is that it's some semi-manual process to get the backup
|
||
|
into the Zettelkasten note.
|
||
|
|
||
|
* SingleFile-CLI
|
||
|
There is also a [[https://github.com/gildas-lormeau/single-file-cli][CLI]] version, like all modern stuff it runs in
|
||
|
docker. Of course I do not have docker on OpenBSD, nor the alternative
|
||
|
podman.
|
||
|
|
||
|
So I setup a Fedora VM, installed podman and run singlefile-cli via
|
||
|
ssh. On the Fedora VM I use this wrapper:
|
||
|
#+begin_src sh
|
||
|
#! /bin/sh
|
||
|
|
||
|
set -e
|
||
|
|
||
|
content=$(podman run --privileged -u 0:0 singlefile "$@")
|
||
|
|
||
|
now=$(date --iso-8601=seconds -u)
|
||
|
title=$(echo ${content} | /home/florian/.cargo/bin/htmlq --text title | \
|
||
|
awk '{$1=$1};1' | \
|
||
|
head -1 | tr -d '\n' | tr -c '[:alnum:]' '_' | cut -c -128)
|
||
|
|
||
|
fn="/tmp/${title}_${now}.html"
|
||
|
|
||
|
echo ${content} > ${fn}
|
||
|
echo $fn
|
||
|
#+end_src
|
||
|
|
||
|
It fetches a copy of the website, uses [[https://github.com/mgdm/htmlq][htmlq]] to extract the title,
|
||
|
removes special characters and saves the page in ~/tmp~. It then spits
|
||
|
out the file-name of the backup.
|
||
|
|
||
|
On my laptop I have this script:
|
||
|
#+begin_src sh
|
||
|
#! /bin/sh
|
||
|
|
||
|
set -e
|
||
|
|
||
|
fn=$(ssh fedora bin/singlefile "$@")
|
||
|
bn=$(basename ${fn})
|
||
|
scp fedora:${fn} .
|
||
|
open ${bn}
|
||
|
#+end_src
|
||
|
This causes the Fedora VM to fetch a backup of the website and then
|
||
|
copies it over and stores it in the current working directory. Finally
|
||
|
it opens it in the browser for inspection.
|
||
|
|
||
|
* org-cliplink
|
||
|
We still do not have links stored in the Zettelkasten note.
|
||
|
|
||
|
For that I use [[https://github.com/rexim/org-cliplink][org-cliplink]] to insert org mode links from the
|
||
|
clipboard.
|
||
|
|
||
|
A bit of elisp code massages the link text to add "Archive: " or
|
||
|
"Local: " in front of the title. It also changes the URL for the local
|
||
|
backup to have it relative so that it works correctly no matter where
|
||
|
the git repo with the Zettelkasten is checked out.
|
||
|
|
||
|
#+begin_src elisp
|
||
|
(use-package org-cliplink
|
||
|
:config
|
||
|
(defun custom-org-cliplink ()
|
||
|
(interactive)
|
||
|
(org-cliplink-insert-transformed-title
|
||
|
(org-cliplink-clipboard-content) ;take the URL from the CLIPBOARD
|
||
|
(lambda (url title)
|
||
|
(let* ((parsed-url (url-generic-parse-url url)) ;parse the url
|
||
|
(turl
|
||
|
(cond
|
||
|
;; if type is file make the url relative to zettelkasten
|
||
|
((string= (url-type parsed-url) "file")
|
||
|
(url-unhex-string (replace-regexp-in-string "\\(.+\\)\/zettelkasten\/\\(.+\\)" "file:\\2" url)))
|
||
|
;; otherwise keep the original url
|
||
|
(t url)))
|
||
|
(ttitle
|
||
|
(cond
|
||
|
;; if type is file, add Local: to title
|
||
|
((string= (url-type parsed-url) "file")
|
||
|
(replace-regexp-in-string "\\(.+\\)" "Local: \\1" title))
|
||
|
;; otherwise keep the original title
|
||
|
(t title))))
|
||
|
;; forward to the default org-cliplink transformer
|
||
|
(org-cliplink-org-mode-link-transformer turl ttitle)))))
|
||
|
:custom
|
||
|
(org-cliplink-title-replacements
|
||
|
'(("https://github.com/.+/?"
|
||
|
("\\(.*\\) · \\(?:Issue\\|Pull Request\\) #\\([0-9]+\\) · \\(.*\\) · GitHub" "\\3#\\2 \\1"))
|
||
|
("https://twitter.com/.+/status/[[:digit:]]+/?"
|
||
|
(".+ on Twitter: \\(.+\\)" "\\1"))
|
||
|
("https://web.archive.org/.+/?"
|
||
|
("\\(.+\\)" "Archive: \\1"))))
|
||
|
:bind (:map org-mode-map
|
||
|
("C-c C-S-L" . custom-org-cliplink)))
|
||
|
#+end_src
|
||
|
|
||
|
* Epilogue
|
||
|
I only recently added singlefile-cli to my work-flow, so far it is a
|
||
|
big improvement. Time will tell if the backups are cluttered with
|
||
|
banners and I have to go back to semi-manual mode.
|