Write about SingleFile
This commit is contained in:
parent
46d247c0e6
commit
4298004b30
123
SingleFile.org
Normal file
123
SingleFile.org
Normal file
@ -0,0 +1,123 @@
|
||||
#+TITLE: SingleFile
|
||||
#+DATE: 2024-03-20
|
||||
* Prologue
|
||||
I am using the [[https://en.wikipedia.org/wiki/Zettelkasten#Use_in_personal_knowledge_management][Zettelkasten]] methodology for personal knowledge
|
||||
management, with [[https://www.orgroam.com/][Org-roam]] to implement it.
|
||||
|
||||
While the Internet does not forget things in general, it might be
|
||||
difficult to find things again. Maybe someone published some
|
||||
information in a GitHub gist or has their whole blog there.
|
||||
Suddenly GitHub falls out of favour because it got bought by an
|
||||
evilcorp and information disappears.
|
||||
|
||||
For those reasons I store a link to the information directly, a link
|
||||
to the [[https://web.archive.org/][Wayback Machine]] and a local copy. My Zettelkasten and all
|
||||
references, including a local copy, are stored in git.
|
||||
|
||||
* SingleFile
|
||||
[[https://github.com/gildas-lormeau/SingleFile][SingleFile]] is a Firefox[fn::Other browsers are supported as well, I
|
||||
just think Firefox is the least-evil one.] extension to save a website,
|
||||
including images and css, into a single HTML file. It's perfect for
|
||||
personal archival purposes. When run as a plugin it will save the page
|
||||
as it is currently rendered. For example dismissed cookie banners will
|
||||
not be part of the safes page.
|
||||
|
||||
The downside is that it's some semi-manual process to get the backup
|
||||
into the Zettelkasten note.
|
||||
|
||||
* SingleFile-CLI
|
||||
There is also a [[https://github.com/gildas-lormeau/single-file-cli][CLI]] version, like all modern stuff it runs in
|
||||
docker. Of course I do not have docker on OpenBSD, nor the alternative
|
||||
podman.
|
||||
|
||||
So I setup a Fedora VM, installed podman and run singlefile-cli via
|
||||
ssh. On the Fedora VM I use this wrapper:
|
||||
#+begin_src sh
|
||||
#! /bin/sh
|
||||
|
||||
set -e
|
||||
|
||||
content=$(podman run --privileged -u 0:0 singlefile "$@")
|
||||
|
||||
now=$(date --iso-8601=seconds -u)
|
||||
title=$(echo ${content} | /home/florian/.cargo/bin/htmlq --text title | \
|
||||
awk '{$1=$1};1' | \
|
||||
head -1 | tr -d '\n' | tr -c '[:alnum:]' '_' | cut -c -128)
|
||||
|
||||
fn="/tmp/${title}_${now}.html"
|
||||
|
||||
echo ${content} > ${fn}
|
||||
echo $fn
|
||||
#+end_src
|
||||
|
||||
It fetches a copy of the website, uses [[https://github.com/mgdm/htmlq][htmlq]] to extract the title,
|
||||
removes special characters and saves the page in ~/tmp~. It then spits
|
||||
out the file-name of the backup.
|
||||
|
||||
On my laptop I have this script:
|
||||
#+begin_src sh
|
||||
#! /bin/sh
|
||||
|
||||
set -e
|
||||
|
||||
fn=$(ssh fedora bin/singlefile "$@")
|
||||
bn=$(basename ${fn})
|
||||
scp fedora:${fn} .
|
||||
open ${bn}
|
||||
#+end_src
|
||||
This causes the Fedora VM to fetch a backup of the website and then
|
||||
copies it over and stores it in the current working directory. Finally
|
||||
it opens it in the browser for inspection.
|
||||
|
||||
* org-cliplink
|
||||
We still do not have links stored in the Zettelkasten note.
|
||||
|
||||
For that I use [[https://github.com/rexim/org-cliplink][org-cliplink]] to insert org mode links from the
|
||||
clipboard.
|
||||
|
||||
A bit of elisp code massages the link text to add "Archive: " or
|
||||
"Local: " in front of the title. It also changes the URL for the local
|
||||
backup to have it relative so that it works correctly no matter where
|
||||
the git repo with the Zettelkasten is checked out.
|
||||
|
||||
#+begin_src elisp
|
||||
(use-package org-cliplink
|
||||
:config
|
||||
(defun custom-org-cliplink ()
|
||||
(interactive)
|
||||
(org-cliplink-insert-transformed-title
|
||||
(org-cliplink-clipboard-content) ;take the URL from the CLIPBOARD
|
||||
(lambda (url title)
|
||||
(let* ((parsed-url (url-generic-parse-url url)) ;parse the url
|
||||
(turl
|
||||
(cond
|
||||
;; if type is file make the url relative to zettelkasten
|
||||
((string= (url-type parsed-url) "file")
|
||||
(url-unhex-string (replace-regexp-in-string "\\(.+\\)\/zettelkasten\/\\(.+\\)" "file:\\2" url)))
|
||||
;; otherwise keep the original url
|
||||
(t url)))
|
||||
(ttitle
|
||||
(cond
|
||||
;; if type is file, add Local: to title
|
||||
((string= (url-type parsed-url) "file")
|
||||
(replace-regexp-in-string "\\(.+\\)" "Local: \\1" title))
|
||||
;; otherwise keep the original title
|
||||
(t title))))
|
||||
;; forward to the default org-cliplink transformer
|
||||
(org-cliplink-org-mode-link-transformer turl ttitle)))))
|
||||
:custom
|
||||
(org-cliplink-title-replacements
|
||||
'(("https://github.com/.+/?"
|
||||
("\\(.*\\) · \\(?:Issue\\|Pull Request\\) #\\([0-9]+\\) · \\(.*\\) · GitHub" "\\3#\\2 \\1"))
|
||||
("https://twitter.com/.+/status/[[:digit:]]+/?"
|
||||
(".+ on Twitter: \\(.+\\)" "\\1"))
|
||||
("https://web.archive.org/.+/?"
|
||||
("\\(.+\\)" "Archive: \\1"))))
|
||||
:bind (:map org-mode-map
|
||||
("C-c C-S-L" . custom-org-cliplink)))
|
||||
#+end_src
|
||||
|
||||
* Epilogue
|
||||
I only recently added singlefile-cli to my work-flow, so far it is a
|
||||
big improvement. Time will tell if the backups are cluttered with
|
||||
banners and I have to go back to semi-manual mode.
|
Loading…
Reference in New Issue
Block a user