defanor's notes

Distributed systems

2015-05-16T12:00:00Z

Distributed systems

There are quite a few definitions of "distributed" and "decentralized" in use, in this note I'm using the following ones:

Centralized: Clients interacting with a single server (either physical or controlled by the same entity).
Decentralized: Clients interacting with multiple servers (controlled by different entities), which often build a federated network.
Distributed: Clients interacting with other clients directly, acting as servers themselves.

A "system" may also mean different things; here I focus on network protocols, on systems of network-connected independent actors.

Distributed systems are useful for various purposes, but the commonly considered and achievable niceties are:

No single point of failure.
No necessity in a central authority.
Potentially good software: more motivation to work on it if it doesn't mean putting time and effort into assisting unethical activities and wasting it once a service is discontinued.

These are mostly shared with federated systems, but take it further.

The common advantages of centralized systems over these seem to be search/discovery, often sort-of-free hosting for end users, greater UI uniformity in some cases, easier/faster introduction of new features.

Usable systems

Actually usable (reliably working, specified, having users and decent software) systems so far are usually federated/decentralized; those can, in principle, be quite close to distributed systems (simply by setting their servers on user machines). So, generally it seems more useful to focus on those if the intention is to get things done: SMTP (email, possibly with OpenPGP), NNTP (Usenet), XMPP (jabber), IRC, and HTTP (World Wide Web; possibly together with RSS or Atom, and their aggregators, and/or RDF) are relatively well-supported, standardized, and usable for various kinds of communication.

Sometimes even centralized but non-commercial projects and services are okay: OpenStreetMap, The Internet Archive, Wikimedia Foundation projects (Wikipedia, Wiktionary, Wikidata, Wikibooks, etc), arXiv, FLOSS projects, possibly LibGen and Sci-Hub (though they infringe copyright), possibly Libera.Chat (they had issues arising out of centralization, which is why it is not Freenode anymore, but it was also a good example of handling such issues well). As long as they are easy (and legal, and free) to fork and are not in a position to extort users, centralization can be fine. Conversely, there can be technically distributed systems effectively controlled by a single entity (e.g., a distributed PKI with a single root, or anything legally restricted). While this note is mostly about distributed network protocols, they are neither necessary nor sufficient for a community control over a system, but rather just may be a useful tool to achieve it.

Existing systems

There are quite a few of them; I am going to write mostly about those that work over Internet. There's also the "Distributed computing architecture" Wikipedia category, including thing slike cluster computer, grid computing, etc.

Generic networks

Tor and I2P: both support "hidden services", on top of which many regular protocols can be used, but it is more about privacy (and a bit about routing) than about decentralisation: they provide NAT traversal, encryption, and static addresses. Tor documentation is relatively nice, and there are I2P docs. Tor provides a nice C client, I2P uses Java.

Mesh networks

Some mesh networks, like Telehash, provide routing as well, though advantages for decentralisation seem to be similar to those of Tor and I2P; just better in that they extend it beyond the existing networks, aiming to build more. Telehash documentation is also pretty nice and full of references.

Cjdns (or its name, at least) seems to be relatively well-known, but it relies on node.js. Netsukuku and B.A.T.M.A.N. are two more protocols the names of which are known.

One of the large Wi-Fi mesh networking projects is Freifunk, but apparently it's only widespread in DACH countries.

Those would be nice to get someday, but they would require quite a lot of users to function, and various government restrictions seem to complicate their usage (this varies from jurisdiction to jurisdiction and from year to year, but seems to be pretty bad in Russia in 2018, and even worse by 2023).

And then there are the ones working over Internet, building overlay networks, usually with technologies similar to those used for VPNs (though yet again, in Russia by 2023 they seem to be about to start blocking protocols used for VPNs, with occasional outages/likely testing reported; since about that time, IPsec or WrieGuard connections to the outside Internet are interrupted, based on DPI). Yggdrasil is like that. There is an overview of similar mesh networks: "Easily Accessing All Your Stuff with a Zero-Trust Mesh VPN".

IM and other social services

Tox implements its own network (DHT, onion routing, NAT traversal, etc), and has some documentation. Works, though not particularly easy to build, and toxic (apparently the primary implementation) ceases to work after a few days here, requiring a restart.
Rival Messenger and Bleep are based on Telehash and BitTorrent, respectively. Have not tried those.
RetroShare provides a bunch of features, but with a web-based UI, and I gave up on building it.
Matrix seems to be getting relatively popular, but uses HTTP APIs, the specification is not available without JS, there are SDKs (I wonder whether it's ever a useful thing to provide an SDK instead of a single documented library; usually it's just additional pain to work with), web-based clients, etc – seems to be pretty unpleasant overall, following poor practices. Though it's federated, not distributed; functionally it's similar to XMPP with a few XEPs included into the core. Apparently awkward security issues happen.
Ricochet reuses Tor network, its protocol is documented and doesn't seem to be bloated. Unfortunately, it's bundled with GUI, apparently there is no separate library, and it's in C++ anyway, which would make bindings harder if there was one. Probably it wouldn't be that hard to reimplement, or to extract the non-GUI code bits and make C bindings, to get a reusable library.
XMPP is nice and is supported relatively widely (with a choice of servers, clients, and libraries), but federated, rather than distributed, though the former may be converted into the latter. The XMPP Standards Foundation is prosecuted in Russia in 2025, for not complying with a law intended for information distribution systems, and some of the XMPP clients' websites (conversations.im, xabber.com) are blocked already.
Email: likewise, but using it in a distributed fashion wouldn't be interoperable with common deployments in most cases, and some software may assume a federated setting.
ActivityPub: federated, replaces OStatus, partially supported by Mastodon (which seems to be getting popular); used for both microblogging and private messaging. RDF-compatible (though awkward JSON-LD is used in Activity Streams), W3 recommendation. Hence good specification, and generally doesn't look too bad, but the specification doesn't include authentication and authorization as of now (January 2018), and the existing implementations seem to be all awkward: rather poor web UIs, languages such as JS. I finally gave Mastodon a try in 2023, as a user; not bad and generally works, but that "RDF compatiiblity" (as opposed to actually using RDF) shows: for instance, to add metadata even within a single instance, the Mastodon Glitch edition appends emojis to textual messages. I hear it is done that way to keep it compatible with the vanilla version. The primary web-based UI is pretty awkward and buggy. Another somewhat popular ActivityPub-based project is Lemmy, a federated link aggregator and forum.
Secure Scuttlebutt is akin to NNTP, RSS, or Atom feeds with signed posts, which include hashes of previous ones, but uses a gossip protocol, rather than a fixed address per feed. Perhaps it is more like a VCS repository with signed commits, where posts are only added. But apparently the primary client is in JS and buggy, and it does not seem to be actively developed (as of 2024), though there is the Tilde Friends client. The protocol itself is JSON-based (relatively awkward for implementations in some languages), while not quite human-readable/writable, relies on Ed25519 (no key agility).
Briar's Bramble protocol suite.uses a custom binary data format, and custom cryptographic protocols, yet it piggybacks on Tor for Internet connections, and alternatively supports direct Bluetooth or Wi-Fi connections. Somewhat similar to Secure Scuttlebutt in incorporating sneakernet elements. Apparently will not help with private message relaying, but should work for that of news groups. Similar to people talking to each other. The messenger's website is blocked in Russia since 2024.
Other social networking tools: the secushare's capability comparison of privacy-oriented distributed networking tools and the Wikipedia comparison of software and protocols for distributed social networking.

File sharing and websites

BitTorrent, of course, with Mainline DHT.
IPFS seems to be getting, well, maybe not popular, but mentioned here and there. There are papers and it is documented, but the implementations are currently in Go (reference), JS (incomplete), and Python (started). So, that would involve setting the whole Go thing to try, but the IPFS whitepaper looks nice. There is documentation, and a few separate parts (which can be and are isolated into libraries; though would be more helpful if they were actually reusable C libraries), but they still are a part of a single project, which is not small or simple. There's a growing number of projects using it, such as OrbitDB, and then distributed IMs like Berty (though these projects tend to continue the awkward theme of semi-broken websites, Go + JS, poor interoperability and documentation). Though later it was merged with a cryptocurrency.
Freenet is a distributed data store, apparently not very interactive. Or maybe it is; it's in Java, and I didn't try it myself.
ZeroNet: haven't tried it, and it's in Python, but apparently it's popular enough to at least mention. Apparently it doesn't care much about security (see a HN thread). There are other similar projects (e.g., Beaker Browser), which seem to market slightly disguised WWW as a new invention.
HTTP, rsync, Gopher, etc, possibly over Tor or a similar network to mitigate CGNAT. RSS and Atom can be quite useful there, along with their aggregators working as hubs (relays) for both distribution and discovery, and maybe with the help of RDF.
Gnutella: see below.
GNUNet: see below.
Dat protocol uses small public keys for addressing, and various discovery methods, somewhat similar to using regular file transfer protocols over Tor. The primary implementation is in JS, and the documentation suggests to install it with curl ... | bash. Apparently gets praised for its documentation, most of which is just awkward raster images.

Search

Web crawling

YaCy and a few more (some of which are dead by now) distributed search engines exist. I have only tried YaCy, and it works, though haven't managed to find its technical documentation – so it's not clear how it works.

Other information

These networks include search for files, but by their names, not content-addressable (so they can't be easily verified, which brings additional challenges).

Gnutella again: used for file sharing, with query-based search (an unstructured system, as opposed to DHT-based and content-addressable structured ones). Somewhat limited and hardly secure/reliable for search, but seemed to work in practice. The first version used query flooding, while gnutella2 uses a random walk.

Cryptocurrencies

Plenty of those popped up recently. Bitcoin-like ones (usually with a proof of work and block chaining) look like quite a waste of resources (and perhaps a pyramid scheme) to me, though the idea itself is interesting. I was rather interested in "digital cash" payment systems before, but those didn't quite take off so far.

As of 2021, Bitcoin-like cryptocurrencies seem to be eating other distributed projects: many of those are merged with their custom cryptocurrencies, or occasionally piggyback on existing ones, but either way they become more complicated and commercialized. As of 2022, the "crypto" clipping seems to be associated more widely with cryptocurrencies and related technologies than with cryptography in general. But as of 2024, it seems that the hype wave is mostly over, with "AI" filling up all the hype slots.

General P2P networking tools

GNUnet

Not sure how to classify it, but here are some links: gnunet.org, GNUnet article in Wikipedia, "A Secure and Resilent Communication Infrastructure for Decentralized Networking Applications". Seems promising, but tricky to build, to figure how it all works, and to do anything with it now (a lack of documentation seems to be the primary issue, though probably there are others). Apparently it is also being blocked in Russia by 2024, at least the gnunet.org website is (via TSPU, it seems), which makes it yet harder to debug. Apparently it is easier to setup in a single-user mode, but none of the retrieved bootstrap peer addresses seem to be available. An up-to-date hostlist can be found (having to use some proxying to access lists.gnu.org from Russia, where it is blocked as well), and then bootstrapping works.

Taler and secushare (using PSYC) are getting built on top of it, but it's not clear how's it going, how abandoned or alive it is, etc. Their documentation also seems to be obsolete/outdated/abandoned/incomplete. Update (January 2018): apparently secushare prototype won't be released this year.

libp2p

libp2p apparently provides common primitives needed for peer-to-peer networking in the presence of NATs and other obstructions. At the time of writing there's no C API (so it's only usable from a few languages) and its website is quite broken. At the same time worldwide IPv6 adoption reaches more than 32%, so possibly NATs will disappear before workarounds will become usable.

General tools useful for P2P networking

Many netowrking-related tools can be used for peer-to-peer networking. socat(1) is among particularly flexible tools for relaying, which can be combined with many other Unix tools for ad hoc networking: openssl, gnutls-cli, and netcat for data encryption and transmission, sox, opusenc, rec, play, pw-record, pw-play, ffplay for audio capture, encoding, decoding, and playback.

Generic protocols

There are more or less generic network protocols that may be used, possibly together with Tor, to get working and secure peer-to-peer services.

SSH is quite nice and layered. Apparently its authentication is not designed for distributed systems (such as distributed IMs or file sharing), its connection layer looks rather bloated, and generally it's not particularly simple. Those are small bits of a large protocol, but they seem to make it not quite usable for peer-to-peer communication.

TLS may provide mutual authentication, and there are readily available tools to work with it.

IPsec uses similar to TLS, but is a generally better way to solve the same problems. Individual addresses (which IPv6 should bring) are needed to use it for P2P widely though. IPv6 gets adopted, but slowly. Once computers will become addressable individually (again), and transport layer encryption will be there by default, it may render plenty of the contemporary higher-level network protocols obsolete.

Pretty much every distributed IM tries to reinvent everything, and virtually none are satisfactory, but at least some of the problems are already solved separately: one can use dynamic DNS, Tor, or a VPN to obtain reachable addresses (even if the involved IP addresses change, and/or are behind NAT), and then use any basic/common communication protocol on top. Or even set a VM and rely on SSH access, communicating inside that system then.

Search, FOAF, and the rest of RDF

Some kind of a distributed search/directory may connect small peer-to-peer islands into a usable network. While it is hard to decide on an algorithm, lists of known and/or somewhat trusted nodes are common for both structured and unstructured networks, as well as for use of social graphs: if those would be provided by peers, a client may decide by itself which algorithm to apply. This reduces the task to just including known nodes into local directory entries, which can be shipped over any other protocols (e.g., HTTP, possibly over Tor).

Knowledge representation, which is needed for a generic directory structure, is tricky, but there is RDF (resource description framework) already. There is FOAF (friend of a friend ontology), specifically for describing persons, their relationships (including linking the persons they know), and other social things. A basic FOAF search engine must be fairly straightforward to set: basically a triple store filled with FOAF data. See also: Semantic Web.

Hubs and addressing

As mentioned in the "usable systems" section above, the systems relying on peering seem to fare better in practice: they are still distributed, on the level of servers (or hubs generally), which then take care of tricky parts on behalf of the users. This is also how postal systems, telephone ones, and the Internet itself are organized. And some of those federated systems can be quite close to distributed ones: for instance, it is easy and viable to set an XMPP or a WWW server on one's personal machine, although normally addressing is centralized in those cases.

The Magnet URI scheme combines content addressing, which is not centralized, with a list of addresses to bootstrap from. Perhaps one can similarly use public keys, with claims signed by those, which would be very similar to certificates and key servers. No nice and human-readable addresses that way, as usually is the case with distributed addressing, but this creates a decentralized identity, decoupled from any particular nodes.

There is the similar concept of self-sovereign identity, with decentralized identifiers (DIDs) as a fairly generic framework. Similarly to Activity Streams, they are based on the awkward (but RDF-compatible) JSON-LD. See DID Methods for more specific specifications, though many of those are blockchain-based (probably because DID appeared when those were particularly hyped/popular).

GNUNet's GNS (RFC 9498) has a DID method defined. It combines local "pet names" (aliases) and memorable labels (subdomains), with public keys as unique zones (identifiers). For DID identifiers, they simply use GNS zone keys, and store DID documents as records of type DID_DOCUMENT under the "apex label". Zone delegation is similar to that of regular DNS. Both GNS and R5N (GNUNet's DHT) look fine. But TLSA records don't seem to work with its dns2gns, and even if they did, they would not be trusted without DNSSEC, while CAs do not support GNS. So the software would have to support GNS explicitly, at which point it could as well use GNUNet's CADET instead of TLS. But the main GNUNet implementation is under AGPL, which is not likely to help a wide adoption via embedding into existing software.

Another effort to organize name lookups not dependent on ICANN is OpenNIC, but there is an alternative DNSSEC hierarchy, including the keys at root, which breaks usual validation for ICANN domains. And it is still a centralized system. Maybe memorizable and human-readable addresses are not that important anyway: it seems that people rarely remember those, do not operate those directly (using non-unique nicknames instead), and happily use phone numbers, sometimes even preferring those over memorizable addresses.

But back to more practical (readily usable) systems, OpenPGP certificates actually are quite similar to Magnet links, in that they ship a public key, along with one or more identities, which usually are email addresses, and those can be retrieved by various means (DANE, WKD, various key servers, manual exchange, etc). I think it keeps being "pretty good", for many use cases.

Weather data

Except for common messaging and file sharing, one of the distributed (or at least federated) system applications I keep considering is weather data sharing: it'd be useful, and it's quite different from those other applications.

Weather data is commonly of interest to people, and it's right out there, not encumbered by patents or copyright laws, just has to be measured and distributed. But commercial organizations working on that try to extract some profit, so they don't simply share that data with anyone for free. There are state agencies too, paid out of taxes, but at least in Russia apparently you can't easily get weather data out of it either -- only a lot of bureaucracy, and even if it was possible, there are many awkward custom formats and ways to access the data, which won't make a reliable system. People sharing this data with each other would solve that problem.

Though there is at least one nice exception: the Norwegian Meteorological Institute shares weather data freely and for the whole globe. While Germany has Deutscher Wetterdienst: API, and the US has weather.gov. Also open-meteo.com appeared recently.

The challenges/requirements also differ from those with messaging or file sharing, since there's a lot of data regularly updated by many people, and potentially being requested many times, but confidentiality isn't needed. There already are protocols somewhat suitable for that: NNTP (which is occasionally used for weather broadcasts, just in a free form), DNS, and IRC explicitly aim relaying; SMTP (with mailing lists) and XMPP (with pubsub) may be suitable too, possibly with ad hoc relaying.

For reference, as of 2022 there are about 1200 cities with a population of more than 500 thousand people; individual hourly measurements from each of those would constitute a message per 3 seconds. Wouldn't harm to have more than one weather station per city, to cover smaller cities, and so on, but the order seems to be manageable even with modest resources and without much of caching or relaying, assuming that there are not too many clients receiving all the data just as it arrives.

The links/peering can be set manually, and/or data can be signed (DNSSEC, OpenPGP, etc) and verified by end users with a PKI/WOT; the former may just be simpler, and appears to work in practice.

Collaboration/coordination/organization is likely to be tricky, though possible: plenty of people contribute their computing resources to BOINC projects, OONI, file sharing networks, and so on. But weather collection is different in requiring special equipment (at least a temperature sensor) being set outside, complicating contribution.

Post-quantum cryptography

Many of the protocols mentioned here rely on asymmetric cryptography, which is particularly vulnerable to attacks by a quantum computer, and it seems that at this rate we may have usable quantum computers before widely used distributed systems. Use of symmetric cryptography, or at least cryptographic agility of the protocols, is needed to mitigate that.

As a side note, it seems to me that asymmetric cryptography is often used where symmetric cryptography would fit better: even for messaging confidentiality, there are multiple competing standards using asymmetric cryptography of varied complexity and awkward failure modes, most of them requiring to verify keys over a safe channel (usually in person) to actually be reliable, at which point the parties can as well establish a secret shared key. While I am not aware of any standard for simple PSK-based messaging. Perhaps OpenPGP comes close, as a generic format supporting symmetric encryption, though anything else capable of symmetric encryption would be similar, and still with no special support in email or IM clients.

Beyond technologies

Primarily technologies are covered here, but non-technical means may be quite helpful as well. Social skills and connections may be more useful to stay connected, and to actually engage into social activities. While a decent government is supposed to help people, rather than to be a threat actor, both online and offline. Throw in good ISPs, and a few centralized systems maintained by well-meaning and competent people, and one wouldn't even need any channel encryption for most tasks.

People don't quite work that way though, with governments apparently trying to turn into autocracies, any non-awful ISPs being acquired by awful ones, people in general being prone to mischief, and some of them engaging into crime, so some technical measures are needed, but some social and organizational help is important as well.

Additionally, the combination of social connections and relatively basic technologies allows to build friend-to-friend networks, reducing network abuse.

Yet another approach to consider is the focus on a more delayed communication, through years or centuries, via books and similar larger works: near-real-time communication can be blocked or otherwise disrupted relatively easily, but if the delays are already longer than the regimes that impose network blocking, or than transient network issues, such communication is unaffected. Related notes: personal data storage.

Users

Distributed systems, particularly when used for social activities, require users – so that there would be somebody to send messages to in case of an IM. It is quite a problem, since even by sticking to federated protocols it is easy to lose or decrease contact with people.

People in general are capable of dealing with even more complicated and less sensible systems, as digital bureaucracies demonstrate, but apparently not motivated enough. I am somewhat interested and motivated myself, yet occasionally after looking at software with many dependencies, which reinvents many parts, and generally goes against what I view as good practices, I do not feel motivated enough to try it.

Search in particular is tricky in such systems, though usually some form of communication with strangers and self-organization (e.g., via multi-user chats, web pages) is possible, so that people can find groups with shared interests. Perhaps being sociable is easier and more useful than technical solutions there, too.

Politics

As observed and stated by Aristotle ("Politics", book V, chapter XI), Niccolo Machiavelli ("The Prince"), Michel Foucault ("Discipline and Punish"), James C. Scott ("Seeing Like a State"), touched on by Benedict Anderson ("Imagined Communities"), and likely many others, surveillance is an important tool for control (including tyranny, but not only that), while orderly, simplified, measured, uniform, predictable structures and societies are legible, pliable, and manipulable. Distributed systems may help to limit or slow down the spread of unjust power (of either governments or commercial companies, as two common examples), but using this perspective, one may additionally consider other means for the same purpose: diverse, custom, and obscure protocols, varied and changing means of communication, heterogeneous networks, including overlay networks and steganography. Anything complicating mapping and analysis, turning the infrastructure into jungles instead of cities.

Meantime, it is also important to advance towards a more just society, which is generally desirable, and any network protocols will be irrelevant if an oppressive government ends up shutting down the Internet access or creating other more pressing issues for its citizens. John Rawls's "A Theory of Justice" is a book I enjoyed on the topic, which also happens to remind me of distributed network protocol designs.

Debian 11 (to 13) workstation

2021-08-17T18:00:00Z

Debian 11 (to 13) workstation

These are my notes on setting and maintaining a desktop/workstation system, a successor to the older CentOS 7 workstation, to be used--among other things--with the private server setup and simpler server setup.

Installation

My goals were a working setup, along with an old system, simple and close to the standard one, and with encrypted /home (see also: personal data storage). To avoid possible confusion during installation or when some repairs are needed, I keep a sheet of paper with partitions listed on it.

I went for Unofficial non-free images including firmware packages, since I need GNU documentation and the Nvidia proprietary driver anyway (unnecessary as of Debian 12, since proprietary firmware is included into official images, and that Nvidia card is not supported anymore), and it is more suitable for a rescue USB stick. Picked a live Xfce image, to be able to poke it briefly (and ensure that it works fine with the hardware) before installation, as well as for possible later use as a rescue system. Though live images come with a drawback of installing live-task-* packages, including localization ones for all the supported languages, so you end up with hundreds of additional and unused packages to upgrade regularly; netinst produces a cleaner system, but they can also be removed manually afterwards. Xfce is not as bloated and broken as GNOME and KDE, but not as half-baked and broken as most of the others. Apparently MATE and Cinnamon aim a similar level of complexity, and I hear good things about those, too. I downloaded the image via BitTorrent, and as the Installation Guide suggests, did the equivalent of cp debian.iso /dev/sdX && sync.

There is a graphical installer available from the live system itself, which is handy for looking up documentation on the web while installing, but its functionality differs from that of the regular installer: there is no option to make an EFI system partition (ESP) explicitly, so I rebooted and used the regular installer. Although while installing Debian on another machine a bit later, I noticed that it would handle fine a FAT32 partition mounted into /boot/efi, without requiring to mark it explicitly as ESP.

As usual, I wanted to keep the old system usable and independent, so I have set this one on a separate disk, with a separate ESP, which I had to add (about 500 MB in size); the installer presented a warning about possibly making other systems hard to boot into if EFI is forced, but I've installed it on a separate disk (and adjusted UEFI boot priorities accordingly), so it was fine.

I used btrfs for a while, but decided to go with ext4 this time, since I use btrfs's advanced features less and less, while a simpler filesystem may be more reliable. Decided to minimize dealing with partitioning in the installer, and just made a single 500 GB partition for everything (not counting ESP, and while having 1.5 TB unpartitioned on the disk). No swap partition either, since in my experience it's not helpful and only freezes the system when something goes wrong. Didn't choose a network mirror to download new packages either, so the installation went quickly and smoothly.

While the en_US.UTF_8 locale is very common, C.UTF_8 may be better to set at once, since it has 24-hour time format, sensible string sorting, and DBMSes (particularly PostgreSQL) are more portable when set with it, not running into collation version mismatches on replication between databases hosted on different operating systems. This is simply adjusted in /etc/default/locale.

Initial setup

As with CentOS about 7 years prior to this setup, apparently the nouveau driver was causing the system to freeze, so I installed the NVIDIA Proprietary Driver.

Then I've added my user into the sudo group, have set the keyboard layout to colemak with sudo dpkg-reconfigure keyboard-configuration (since the installer doesn't provide that option), have set it in Xfce's settings to use the system layout (actually in a couple of places, not sure why there are so many). While at it, removed the useless bottom panel (application launcher), have set a dark theme, nicer icons, disabled icons on the desktop.

As with servers, and perhaps more importantly than with those, decent and varied nameservers should be set. In this case /etc/resolv.conf mentions that it's generated by NetworkManager (which is rather awkward and unnecessary, and an example of little bloat task-xfce-desktop pulls), so one can adjust nameservers with nm-connection-editor.

Then I've set the previously mentioned encrypted /home (this method is a bit verbose, since I've checked that things work as intended):

sudo fdisk /dev/sda
# created another 500 GB partition for /home, sda3
sudo apt install cryptsetup
sudo cryptsetup luksFormat /dev/sda3
sudo cryptsetup luksOpen /dev/sda3 enchome
sudo mkfs.ext4 -L home /dev/mapper/enchome
sudo cryptsetup close enchome
sudo blkid | grep sda3
sudo -e /etc/crypttab
# added the following:
# enchome		UUID=PARTITION_UUID_HERE none luks
sudo -e /etc/fstab
# added the following:
# /dev/mapper/enchome   /mnt/home          ext4    defaults        0       2

Then rebooted to ensure that /mnt/home mounts fine, moved the files from /home there (with cp -a), renamed /home, have set fstab to mount it into /home. Rebooted again, checked again that everything is fine, and removed the old /home.

One may also mount /tmp into memory, reducing the data leaking to the unencrypted root filesystem, slightly speeding up some tasks, and reducing disk usage; it works for me and I like it, but there is plenty of criticizm and possible issues with that:

tmpfs           /tmp            tmpfs   size=1g,nosuid      0       0

Moved/imported my SSH and GPG keys, ~/.authinfo, some other files.

I had to remap the "menu" key (keycode 135) to left alt, which is always awkward and different; in Xfce I had to enter the GUI settings, then "session and startup", and add the xmodmap -e "keycode 135 = Alt_L" command there. Also had to unmap C-M-f to be able to use it in Emacs, in "settings" - "keyboard" - "application shortcuts".

XFCE's default key bindings for basic tiling functionality aim a numpad, which I do not have, but those can be adjusted in "settings" - "window manager" - "keyboard".

To disable GnuPG's annoying requirment to use non-alpha characters in a passphrase (which is contrary to NIST SP 800-63B, and complains about passwords in the style of XKCD #936, such as those generated with xkcdpass), echo 'min-passphrase-nonalpha 0' >> ~/.gnupg/gpg-agent.conf.

More software: sudo apt install emacs emacs-common-non-dfsg telnet vlc tor mu4e isync rsync xsltproc clementine git elpa-magit elpa-haskell-mode cabal-install lynx whois nmap ncat dnsutils knot-dnsutils tmux fbreader inkscape blender godot3 gimp darktable lmms musescore texlive texlive-plain-generic auctex texlive-latex-extra texlive-science python3-sympy octave octave-symbolic libxml2-utils jmtpfs xkcdpass, and better-defaults, mu4e-alert, and cdlatex via Emacs's package manager (since they weren't in the system repositories). Generally it's a good idea to stick to a single package manager, since then you shouldn't run into version mismatches. update-alternatives --config editor to set vim as the default editor (running a new emacs instance may be a bit slow for quick sudo -e editos, emacsclient won't always work, setting a small emacs clone just for that seems excessive, and the default nano is just awkward, so vim is an okay option; though perhaps one can also set emacs -Q -nw). Over time a bunch of other things were added, including mpd (running as a user service) and mpc, strongSwan, likely more development tools.

Then I set xterm and Emacs themes (.Xresources, Elisp), from my dotfiles repository.

By 2022, I had to start using Tor bridges (since Tor is being blocked around here, and Internet connectivity is crippled in general, with Tor helping to fix some of it): install obfs4proxy, then append to /etc/tor/torrc:

UseBridges 1
ClientTransportPlugin obfs4 exec /usr/bin/obfs4proxy managed

And bridge records received from bridges.torproject.org or by other means, prefixed with "Bridge" (Bridge obfs4 ...). Though by 2024, many of those are blocked.

Configured Firefox: Sans Serif font, disallowed pages to choose their own fonts, increasing monospace font size to be the same as others (16), setting a minimal font size equal to those, "wp" keyword for Wikipedia search and "wt" for Wiktionary search, installing uBlock Origin (with "annoyance" lists additionally enabled) to cut out junk, NoScript to cut out more junk, FoxyProxy to use Tor for websites blacklisted around here and the ones I don't want to track me, HTTPS everywhere to mitigate local data retention practices (superceded by the Firefox's built-in HTTPS-Only Mode, which should be enabled in settings), Stylus to set a global dark theme for comfortable browsing when it is dark around.

Configured isync and Emacs, later installed rexmpp's xmpp.el. Attempted a minimal Emacs configuration this time (though most likely it'll grow), so used the built-in rcirc (with rcirc-track-minor-mode and just setting rcirc-server-alist), not much of mu4e configuration. Something like this:

(require 'package)
(add-to-list 'package-archives '("melpa" . "https://melpa.org/packages/") t)
(package-initialize)

(require 'better-defaults)
(global-set-key [mode-line mouse-4] 'previous-buffer)
(global-set-key [mode-line mouse-5] 'next-buffer)

;; https://github.com/defanor/cyrillic-colemak
(require 'cyrillic-colemak)
(add-to-list 'custom-theme-load-path "~/.emacs.d/elisp/")
(load-theme 'blueish t)

(setq org-preview-latex-default-process 'dvisvgm
      org-babel-python-command "python3"
      org-src-preserve-indentation t)
(with-eval-after-load 'org
  (plist-put org-format-latex-options :scale 1.5)
  (require 'ob-python))

(rcirc-track-minor-mode t)
(setq rcirc-buffer-maximum-lines 2000
      rcirc-server-alist
      '(("irc.libera.chat" :port 6697 :encryption tls
         :user-name "defanor" :channels ("#emacs")))
      rcirc-authinfo
      '(("libera.chat" sasl "defanor" "password-here")))

(require 'haskell-interactive-mode)
(require 'haskell-process)
(add-hook 'haskell-mode-hook 'interactive-haskell-mode)
(add-hook 'haskell-mode-hook 'haskell-decl-scan-mode)

(require 'html-wysiwyg)
(add-hook 'html-mode-hook 'html-wysiwyg-mode)

(add-hook 'after-init-hook #'mu4e-alert-enable-mode-line-display)
(setq mail-user-agent 'mu4e-user-agent
      read-mail-command 'mu4e)
(with-eval-after-load "mu4e"
  (require 'smtpmail)
  (setq mml-secure-openpgp-encrypt-to-self t)
  (defun suppress-messages (old-fun &rest args)
    (cl-flet ((silence (&rest args1) (ignore)))
      (advice-add 'message :around #'silence)
      (unwind-protect
          (apply old-fun args)
        (advice-remove 'message #'silence))))
  (advice-add 'mu4e-update-mail-and-index :around #'suppress-messages)
  (advice-add 'mu4e-index-message :around #'suppress-messages)
  (advice-add 'progress-reporter-done :around #'suppress-messages)
  (setq mu4e-change-filenames-when-moving t)
(add-to-list
   'mu4e-contexts
   (make-mu4e-context
    :name "thunix"
    :enter-func (lambda ()
                  (mu4e-message "Switch to the thunix IMAP context")
                  ;; (mu4e~request-contacts)
                  )
    :leave-func (lambda () (mu4e-clear-caches))
    :match-func (lambda (msg)
                  (when msg
                    (mu4e-message-contact-field-matches
                     msg
                     :to "defanor@thunix.net")))
    :vars '( (user-mail-address            . "defanor@thunix.net")
             (user-full-name               . "defanor")
             (smtpmail-default-smtp-server . "thunix.net")
             (smtpmail-local-domain        . "thunix.net")
             (smtpmail-smtp-user           . "defanor")
             (smtpmail-smtp-server         . "thunix.net")
             (smtpmail-stream-type         . starttls)
             (smtpmail-smtp-service        . 25)
             (message-send-mail-function   . message-smtpmail-send-it)
             (mu4e-get-mail-command        . "mbsync -q thunix")
             (mu4e-update-interval         . 300)
             (mu4e-view-show-addresses     . t)
             (mu4e-maildir                 . "~/Maildir/thunix/")
             (mu4e-mu-home                 . "~/.mu/thunix")
             (mu4e-user-mail-address-list  . ("defanor@thunix.net"))
             )))
;; more contexts here
)

And .mbsyncrc records like this:

IMAPAccount thunix
Host thunix.net
Port 993
User defanor
SSLType IMAPS
Pass "password-here"
AuthMechs *

IMAPStore thunix-remote
Account thunix

MaildirStore thunix-local
Path ~/Maildir/thunix/
Inbox ~/Maildir/thunix/inbox/

Channel thunix
Far :thunix-remote:
Near :thunix-local:
Patterns * !drafts
Create Both
Remove Both
Expunge Both
SyncState *

Then mu stores can be initialized with commands like mu init --muhome=~/.mu/thunix --maildir=~/Maildir/thunix --my-address=defanor@thunix.net.

This was a sufficient setup to listen to a radio (vlc 'http://s3.radionetz.de/1a-rock.mp3'; as of 2025-10-27 and 2026-01-14, that is blocked here, along with many CDNs and hosting companies, some of the alternatives are vlc 'http://113fm.cdnstream1.com/1740_128', vlc 'https://s8.yesstreaming.net:7099/RblLgn', see dir.xiph.org for other online radios), local music collection (which I keep on a separate partition, so just mounted it via fstab into the same path as before, and the playlist also stored on it contained correct paths), communicate (IRC, XMPP, email), do Haskell programming, browse WWW relatively comfortably, play Discworld MUD over telnet, and publish these notes. At that point I've adjusted dwproxy to be able to build it using only dependencies from the system repositories (for related rants and musings, see the notes on software packaging and deployment and everyday programming in Haskell), and built a few work projects: since it's Cabal 3 now, had to set cabal.project in order to use internal libraries, and made some other minor adjustments to handle newer versions of dependencies. C projects (rexmpp in particular) also required minor adjustments to handle newer versions of the compiler and libraries, but fairly straightforward.

Adjustments

Realtime Policy and Watchdog Daemon (rtkit) can be quite spammy in the logs with its debug messages, but that can be fixed by overriding its systemd service (sudo systemctl edit rtkit-daemon.service, followed by sudo systemctl daemon-reload and sudo systemctl restart rtkit-daemon.service to apply it) with the following:

[Service]
LogLevelMax=info

Update to Debian 12

Following the instructions (Chapter 4. Upgrades from Debian 11 (bullseye)), I executed apt full-upgrade to find out that my graphics card (GTX 660) is not supported by the NVIDIA proprietary driver anymore. Chose to not install the new nvidia-driver, but that interrupted the process, so had to apt --fix-broken install, and then apt full-upgrade again. Afterwards removed nvidia-driver, chose mesa-diverted in update-glx --config glx in order to de-blacklist nouveau drivers, rebooted, the system only worked for some minutes before freezing, rendering it unusable. Fortunately I have integrated graphics here (Xeon E3-1275 v2 on ASUS P8C WS), which I picked precisely because this sort of thing keeps happening; took the graphics card out, connected the display to the motherboard's DVI output. Apparently I disconnected the system disk while taking the graphics card out, so failed to boot; then reconnected it, and saw it via UEFI, but failed to boot still, with different priorities (possibly messed up the UEFI boot settings while poking them without the disk connected properly). Managed to boot into the system by booting grub from a live USB stick, then pointing it to the system's grub.cfg using grub shell's configfile command. Tried to fix it with efibootmgr, that did not work, but it worked to just do grub-install and update-grub, leading to a working system into which I can boot directly, albeit without a graphics card. See GrubEFIReinstall for more options.

Additionally, some texlive packages failed to update, and some fcitx5 ones were kept back.

Afterwards I did apt autoremove, which removed telnet, so had to apt install telnet again.

mu4e broke as well: had to update mu4e-alert via Emacs, since it came from melpa, but then it kept failing with "Mu server process ended with exit code 1". Dug the approximate command out of the sources (/usr/bin/mu server --debug --muhome=~/.mu/thunix), executed it manually, saw the error message: "error: expected schema-version 465, but got 451; cannot auto-upgrade; please use 'mu init'", "Please (re)initialize mu with 'mu init' see mu-init(1) for details". Did mv ~/.mu/ ~/.mu-old/, then mu init --muhome=~/.mu/thunix --maildir=~/Maildir/thunix --my-address=defanor@thunix.net (and similar ones, for other mailboxes), and then it worked. As many other programs, mbsync deprecated "master/slave" terminology, introducing its unique alternative: "far/near".

Had to M-x customize-group RET ansi-colors RET, since ansi-color-names-vector became obsolete.

I had an unused PostgreSQL 13 (used primarily for local testing), and PostgreSQL 15 was installed by the system upgrade, so I just cleaned up the old version: sudo pg_dropcluster --stop 13 main, sudo apt remove postgresql-13 postgresql-client-13.

Then I was left with a bunch of other "installed,local" packages (apt list '?narrow(?installed, ?not(?origin(Debian)))'), so cleaned some of those up, after checking that they do not seem to be necessary: sudo apt remove haskell-platform gcc-10 gcc-9-base gcc-10-base clang-11 python-numpy-doc openjdk-11-jre openjdk-11-jdk openjdk-11-jre-headless openjdk-11-jdk-headless libx264-160 libx265-192 libwebp6 libvpx6 libswresample3 libssl1.1 libsepol1 firmware-intelwimax linux-image-5.10.0-8-amd64 linux-image-5.10.0-23-amd64 iukrainian libffi7 libbpf0 libprocps8.

Had to use a workaround for the FBReader's hyphenation-after-each-word bug.

Update to Debian 13

Similarly to the previous update, following the Debian 13 release notes chapter on Upgrades from Debian 12 (bookworm), I upgraded it to 13, which went mostly smoothly, but slowly, taking a couple of horus (with HDD and more than 3000 packages to upgrade or install, even though I tried to clean them up a little, and generally trying to avoid installing unnecessary ones).

After the upgrade and reboot, I ran sudo apt autoremove, and cleaned up a little more of the leftover NVIDIA packages, which I noticed that I still have, and which I picked with a combination of apt search and aptitude why: sudo apt remove xserver-xorg-video-nvidia nvidia-vdpau-driver nvidia-kernel-dkms, followed by another sudo apt autoremove.

mu4e once again required to re-run mu init as described above. And as before, apt list '?narrow(?installed, ?not(?origin(Debian)))' listed a bunch of dated packages, some of which I cleaned out manually: sudo apt remove janus clang-14 freerdp2-x11 openjdk-17-{jdk,jre}{,-headless} postgresql-15 postgresql-client-15.

Then I noticed that debian.map.fastlydns.net (199.232.174.132) is blocked here, so had to replace deb.debian.org with an unblocked (local) mirror in /etc/apt/sources.list.

I had magit installed from melpa in Emacs, in addition to the one installed from system repositories (IIRC I had to install it to keep up with git), but after the update it ceased to work, with odd "transient" library issues. I tried to switch back to magit from system repositories by removing that from melpa, and was able to remove melpa magit itself, but not its dependencies, since package.el kept seeing those as dependencies (despite there being other versions available). So I had to remove ~/.emacs.d/elpa/{magit-section,transient}*, as well as ~/.emacs.d/eln-cache/30.1-c7a97098/{transient,magit-section,magit}-*, restart Emacs, and then it worked fine.

Following this update, I have also upgraded from bird (version 1.*, which is no longer in repositories, but not automatically replaced by a newer version) to bird3, which required (in my basic configuration, for use with a VPN) to wrap import and export directives within protocol blocks into ipv4 and ipv6 blocks, as shown in examples in /etc/bird/bird.conf itself.

Servers

It is handy to host servers locally, particularly for communication: they are always available from the primary system then, the latency is reduced, regular TLS allows for peer-to-peer connections. As a downside, issues with the primary system also lead to downtime of those.

XMPP server

Eventually I decided that having a properly configured XMPP server locally is useful as a backup, for lower-latency calls, and to decrease load on remote servers. Having just an A record pointing to my static IP address (a free dyndns service in this case, to avoid dependencies on domain names at once), and port forwarding configured on the router for ports 80, 5222, 5269, 5281, 3478, 49152-49155, I have set nginx and uacme to obtain an X.509 certificate for TLS, configured nftables to decrease spam in the logs (only accepting connections on port 80 when renewing a certificate), then configured Prosody and coturn. sudo apt install nginx uacme nftables prosody coturn. My /etc/nftables.conf, slightly abridged to focus on relevant parts:

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
  set not-clients {
    type ipv4_addr
    flags interval
    elements = { 1.0.0.0/8 }
  }
  set blocks {
    type ipv4_addr
    flags interval
    elements = { 1.1.1.1 }
  }
  set open-ports-s2s {
    type inet_service
    flags interval
    elements = { 5269 }
  }
  set open-ports-c2s {
    type inet_service
    flags interval
    elements = { 5222, 5281, 3478, 49152-49155 }
  }
  chain input {
    type filter hook input priority 0; policy drop;

    # Mitigate TCP reset attacks performed by the ISP.
    ip saddr @blocks tcp sport 443 tcp flags rst drop;

    # Allow traffic from established and related packets.
    ct state established,related accept

    # Allow loopback traffic.
    iifname lo accept

    # Allow incoming TCP and UDP packets on @open-ports-s2s.
    tcp dport @open-ports-s2s accept;
    udp dport @open-ports-s2s accept;

    # Drop connections from spammy addresses.
    ip saddr @not-clients drop;

    # Allow incoming TCP and UDP packets on @open-ports-c2s.
    tcp dport @open-ports-c2s accept;
    udp dport @open-ports-c2s accept;
  }
  chain forward {
    type filter hook forward priority 0;
  }
  chain output {
    type filter hook output priority 0;
  }
}

Then set /usr/local/bin/uacme-hook.sh, modifying /usr/share/uacme/uacme.sh:

--- /usr/share/uacme/uacme.sh   2023-02-15 23:31:43.000000000 +0300
+++ /usr/local/bin/uacme-hook.sh        2024-01-30 09:49:06.505761694 +0300
@@ -16,7 +16,7 @@
 # You should have received a copy of the GNU General Public License
 # along with this program.  If not, see .
 
-CHALLENGE_PATH="${UACME_CHALLENGE_PATH:-/var/www/.well-known/acme-challenge}"
+CHALLENGE_PATH="${UACME_CHALLENGE_PATH:-/var/www/html/.well-known/acme-challenge}"
 ARGS=5
 E_BADARGS=85
 
@@ -37,6 +37,8 @@
         case "$TYPE" in
             http-01)
                 printf "%s" "${AUTH}" > "${CHALLENGE_PATH}/${TOKEN}"
+                # Temporarily allow connections to port 80
+                sudo nft add element inet filter open-ports-s2s {80}
                 exit $?
                 ;;
             *)
@@ -48,7 +50,10 @@
     "done"|"failed")
         case "$TYPE" in
             http-01)
+                sudo nft delete element inet filter open-ports-s2s {80}
                 rm "${CHALLENGE_PATH}/${TOKEN}"
                 exit $?
                 ;;
             *)

Then:

sudo mkdir -p /var/www/html/.well-known/acme-challenge
sudo mkdir /etc/prosody/certs/example.com/
sudo touch /etc/prosody/certs/example.com/{fullchain,privkey}.pem
sudo chmod 640 /etc/prosody/certs/example.com/{fullchain,privkey}.pem
sudo chown root:prosody /etc/prosody/certs/example.com/{fullchain,privkey}.pem
sudo uacme -v new
sudo uacme -h /usr/local/bin/uacme-hook.sh issue example.com
sudo -e /etc/cron.daily/uacme-cert-update
sudo chmod +x /etc/cron.daily/uacme-cert-update

With the following in /etc/cron.daily/uacme-cert-update:

#!/bin/sh
set -e
/usr/bin/uacme -h /usr/local/bin/uacme-hook.sh issue example.com
cp /etc/ssl/uacme/example.com/cert.pem /etc/prosody/certs/example.com/fullchain.pem
cp /etc/ssl/uacme/private/example.com/key.pem /etc/prosody/certs/example.com/privkey.pem

In /etc/turnserver.conf I have only set external-ip, static-auth-secret, use-auth-secret, max-port=49154.

Relevant lines of /etc/prosody/prosody.cfg.lua:

interfaces = { "192.168.1.8", "127.0.0.1", "::1" }
modules_enabled = {
--- [...]
	-- Other modules
                "turn_external";
                "http";
}
-- TURN
turn_external_host = "example.com"
turn_external_secret = "secret here"

-- HTTP
http_host = "example.com"

VirtualHost "example.com"

Component "upload.example.com" "http_file_share"

Then restart or reload the services, add users with sudo prosodyctl adduser, and it works.

Voice conferences

For voice conferences, apparently a particularly easy to set and properly working option is Mumble. sudo apt install mumble-server mumble, set a password in /etc/mumble-server.init, open UDP and TCP ports, and it is ready to use with desktop clients or Mumla or Android.

IRC

Similarly to XMPP and voice conferences, one may set an IRC server (or a small network) for private chatting. InspIRCd is available from Debian repositories and easy to configure, simply by setting the desired hosts, names, and passwords in its configuration file. And links (the spanningtree module) for use with multiple servers. Anope IRC services seem popular, and also available from Debian repositories, but perhaps unnecessary for a small private (and possibly local) network. To make it available over Internet, one may want to both enforce TLS and add restrictions for those connection classes; to do so, one may define a single connection class allowing no connections, then inherit one for plain connections, and one for TLS connections on a different port (corresponding to the Internet-facing endpoint), with additional restrictions (e.g., requiring a password).

Shared machines

If a machine is shared among multiple users, one may prefer to encrypt home directories, or at least subdirectories within those, individually in addition to the block device (LUKS/dm-crypt) encryption. That can be done with fscrypt, eCryptfs (an older option; also other stacked file systems). For instance, to create an encrypted directory:

# with fscrypt
# Enable and check the "encrypt" feature for the target ext4 file system
sudo tune2fs -O encrypt /dev/sda1
sudo dumpe2fs /dev/sda1 | grep features
# Install fscrypt and its libpam module at once
sudo apt install fscrypt libpam-fscrypt
# Setup fscrypt for the root partition (globally)
sudo fscrypt setup
# Create and encrypt a directory
mkdir private
fscrypt encrypt private/

# with eCryptfs
sudo apt install ecryptfs-utils
# Load the module
sudo modprobe ecryptfs
# Load it on boot as well
echo ecryptfs | sudo tee /etc/modules-load.d/ecryptfs
# Setup a private directory, in ~/Private/
ecryptfs-setup-private
# Mount it
ecryptfs-mount-private

See ecryptfs-migrate-home(8) for encryption of the whole home directory.

Public data backups

2026-05-27T09:00:00Z

Public data backups

These notes grew out of those on personal data storage, which cover the technical means. I used to keep a local music collection since the times before broadband and unmetered connectivity around here, and generally preferred to avoid reliance on online services, particularly commercial ones, since those tend to let users down. As the local censorship advanced, complete with a partial Internet blackout, and threatening to impose a complete blackout, while inexpensive storage device capacities increased, I started storing more of public data, in addition to my private data backups.

Apart from Internet blackouts or individual resource blocking by a government, usual data sources may become unavailable because of a technical issue (along with the rest of the Internet if the issue is near the user), or due to the publisher changing their policies. These notes include suggestions on the kinds of public data to backup, along with links to some of them, their size estimates.

Texts

Written works tend to be the most information-dense, making it easy to collect and store much more of those than one could hope to read in a lifetime.

Kiwix (with its OpenZIM archives) is a nice project. Its primary viewer may seem awkward for use in normal circumstances, but apparently it aims to be useful to general public and in bad circumstances: it provides archives as packages, while the viewer—with versions for every common OS—can also serve those to others in a local network via a web browser. library.kiwix.org provides, among others, indexed archives of Project Gutenberg (about 75,000 public domain books by 2026), Wikipedia, Wikisource, Wikibooks, Wikiversity, Wiktionary, ready.gov, WikiHow, various StackExchange projects, Khan Academy, and many smaller bits like ArchWiki, RationalWiki, Explain XKCD (contains the comics).

textfiles.com provides archives of files grouped by category, which are well-compressed, curious, and entertaining. RFC Editor bulk retrieval ceased to serve readily available archives by 2026, but one can rsync it, optionally archiving and compressing afterwards, e.g.:

rsync -avz --delete rsync.rfc-editor.org::rfcs-text-only/ rfcs-text-only/
tar --group=nogroup --owner=nobody -czf rfcs-text-only.tgz rfcs-text-only/

The POSIX (SUS) specification is useful to have at hand: POSIX.1-2024 is available as an archive (see "Downloads"). Along those lines, there are programming language specifications (reports), and other relevant specifications and references: ISO C, Haskell Language Report, Scheme Reports, Python documentation downloads, RISC-V specification, Intel 64 and IA-32 Architectures Software Developer's Manual, AMD64 Architecture Programmer's Manual, Linux Foundation Referenced Specifications, USB specifications, Bluetooth specifications, ACPI and UEFI specifications, PostgreSQL manual, XMPP Extension Protocols, etc.

Then there are copyright-infringing but much larger libraries like Library Genesis (a trimmed down, txt-only version used to be available at offlineos.com, but apparently not anymore), the-eye.eu books, Anna's Archive, Z-library. The Pirate Bay or similar torrent trackers may help to find book collections, including MIT mathematics and physics books, Cambridge Histories and philosophy companion books, Oxford "Very Short Introductions", Routledge books. As well as works grouped by an author (e.g., Gardner, Feynman). Other topics to consider acquisition of modern (text)books on: major philosophy works, electronics and radio, engineering, sociology, economics, computing, cooking, physical exercises, survival, fiction, medicine (e.g., the Merck manual), any topics of interest and other sciences. Other individual books on physics and mathematics, history. Consider the list of books complementary to Wikisource and PG (about 30 GB). Literary awards and charts can be handy for finding books: Pulitzer, Nebula, Locus, Bentley, Booker, Nature's analysis of the 100 most cited papers, The Guardian's top 100 books of all time, The Guardian's 100 best novels written in English, The NYT's 100 Best Books of the 21st Century, and similar lists. UN and other organizations' reports may also be of interest.

One can reduce PDF size (compress the images) with GhostScript or ImageMagick, among others, sometimes reducing the size by an order of magnitude: see "Efficient PDF optimization with Ghostscript CLI". For instance: gs -q -sDEVICE=pdfwrite -dPDFSETTINGS=/screen -dCompatibilityLevel=1.4 -o out.pdf in.pdf (possibly with -dCompressFonts=true and other options). Its -dFirstPage=$START -dLastPage=$END options are also handy sometimes, to extract pages of interest (including cases when some crackpottery is attached to books: that is one of the ways in which the crackpots try to promote it). While EPUBs (basically ZIP archives with HTML and images) can be compressed by compressing individual images within those. Sometimes files can be removed from an EPUB archive, and it can be trimmed down by passing through pandoc (which would remove included fonts, for instance).

OpenStax provides good and freely available textbooks under the CC BY license, available for download in PDF. See OpenStax GitHub repositories for their CNXML sources and related tools, though in 2024 I found it tricky to build HTML out of those, and then it still was not good enough for printing. LibreTexts is supposed to be similar, though the licensing information is unclear in some cases, some links lead to HTTP 404 errors, and some of the books are quite messy (attempting to embed YouTube videos into PDFs, having every other page filled with listings of undeclared licenses, or with "welcome" messages). While its subdomains (math, phys, etc) geo-block direct requests from Russia, the books are available without proxying via commons.libretexts.org. One can also search for libre book sources on platforms like GitHub, possibly querying for TeX sources: there are occasional seemingly decent and not well-known textbooks, like Introductory Physics: Building Models to Describe Our World, An Infinitely Large Napkin.

As of 2026, all those (Wikipedia, Wikisource, Wiktionary, Project Gutenberg, OpenStax and other complementary books) would take just 400 to 500 GB, even with images and some non-English versions added. While much of programming documentation, particularly manuals, library references, and sources, is available from system repositories.

Software

Apart from censoring books and the Internet, dictatorships like to issue "national operating systems" and mandate their spyware, or simply disrupting connections to system repositories as collateral damage, so backing up software can also be useful.

Software sources are particularly useful to backup for potential isolated usage, ensuring the ability to study and customize those, but one needs some binaries to bootstrap a system. Some of the options to consider are (with size estimates from January of 2026):

Debian archive mirroring: about 230 GB when done with debmirror, for amd64 trixie (13.3) with sources. While the Mirror Size page lists numbers for mirroring all suites. One may also consider usage of a caching proxy server, apt-cacher-ng, and Modifying Debian CD. Unlike most others, Debian repositories contain all the source packages, which include upstream sources.
Slackware downloads: a whole mirror (for a single version) is under 20 GB, but it has few packages, and rather dated as well. But seems to be one of the few distributions with complete sources.
Gentoo source mirrors, particularly distfiles, almost 600 GB. Those include multiple versions of the same programs.
Arch Linux Mirrors take a little over 110 GB for packages ("pool"), and 31 GB for sources (though the wiki claims it is 80 GB and 110 GB, respectively; also most mirrors do not seem to host sources); apparently sources for many packages are not present.
Fedora mirroring: about 356 GB for "Everything" x86_64 packages, 123 GB for source ones.
OpenBSD mirrors: the sources may be in distfiles directories (as used by OpenBSD ports), but I have not found mirrors with such directories available via rsync.
NetBSD mirrors: about 200 GB in distfiles, under 70 GB for precompiled amd64 packages. Those include multiple versions of the same programs.

Debian, in addition to being an all-around good system, seems to be a good option for such mirroring as well. The mirroring itself is done rather easily:

sudo apt install debmirror debian-keyring
gpg --no-default-keyring --keyring trustedkeys.gpg --import /usr/share/keyrings/debian-archive-keyring.gpg
gpg --list-keys --keyring trustedkeys.gpg
debmirror -v -d trixie -a amd64 --source -h mirror.mephi.ru --method=rsync /mnt/backup/debian/mirror/

An up-to-date live Debian CD/USB image is useful to store along with it, and perhaps a Debian wiki dump. As well as necessary additional firmware for one's hardware, and possibly firmware for devices other than regular computers, such as OpenWRT images for routers, GrapheneOS or LineageOS images for phones and tablets (along with individual program distributions, APKs; some software I use is listed in the note on mobile computing), KOReader for e-readers. Consider F-Droid mirroring and OpenWRT source code saving, or backups of individual packages.

Audio

As mentioned in the introduction, I always kept a music collection, and probably this is quite common. While musical records may seem less important than books and other written works, they still have a cultural value, provide entertainment. My music discovery note seems relevant here.

Audibooks (including BBC radio collections) may also be useful to collect, even if one does not listen to those normally.

Video

Also for cultural and entertainment purposes, there are movies, and particularly long TV series may be suitable for hoarding; out of nice sci-fi ones, there are Doctor Who, Star Trek, Red Dwarf, Farscape, Lexx, Firefly, Defiance, Battlestar Galactica, Babylon 5, The X-Files, First Wave; plenty more can be found in Wikipedia; for humorous ones, see Black Books, The IT Crowd, Taskmaster, plenty of sitcoms.

Music videos are nice to have around, for the same reasons.

Lectures and educational videos on varied subjects can be both useful in addition to books (as 3blue1brown, providing useful illustrations and intuitive explanations, or various arts and crafts, or exercises, demonstrating how to do something), and work as book substitutes, to share with those who do not read much, or in case if there are not many books on a given topic (say, recent local legal practices). Unfortunately those are often hosted on YouTube, which, in addition to being blocked here (and in other places, see censorship of YouTube), tries to prevent downloads itself, but there is yt-dlp, which may work. I usually download videos for archival at 480p if the visual details matter (perhaps 2 to 5 MB per minute), or even 360p if it is mostly a speaker standing and talking for the whole long video (under 2 MB per minute), which is done with the -S "res:480" option. I have collected some video links, including interesting YouTube channels. One may consider relatively information-dense ones (lectures, online lessons) first, possibly followed by entertainment-education, pop-sci, and documentaries.

Other

Other large and legal archives to consider for backing up: Wikimedia Downloads, Complete OSM Data, arXiv and other Open Access sources. If one gets into tape storage, Common Crawl can be considered. For select website downloads, I use wget --mirror --page-requisites --convert-links --no-parent --continue --adjust-extension https://example.com/~foo/, occasionally adding something like --exclude-directories=photos,pictures or just listing URLs manually (since it can be hard to separate heavy bits of little interest from the others otherwise), and sometimes having to add --compression=gzip if wget gets confused otherwise, or --max-redirect=0 if there are redirects to semi-blocked websites with freezing connections (and while trying to download those directly, given that wget does not support SOCKS proxies). But some websites make archives available (as mine does, see ../files/archive.tgz), or they are hosted at GitHub/Codeberg/Tilde/etc "pages", making the archive available for download (also as mine does, see codeberg.org/defanor/pages). Some wiki-based websites also provide data dumps, static HTML or database ones.

Statistical ("ML", "AI") models for LLMs (llama.cpp) and speech recognition (whisper.cpp) may be useful to collect as well. LLMs in particular, while they do hallucinate, also contain plenty of information, and in a way that may make it easier to retrieve in some cases.

Computer hardware

2019-04-16T00:00:00Z

Computer hardware

The following is my hardware shopping list, more or less. Observations and rants are included.

Workstation

The term "workstation" can mean many things, but for brevity, here I use it to denote a relatively reliable desktop computer for daily usage and work, rather than for gaming, with ECC memory.

CPU: (Low-end) Intel Xeon processors are generally nice and suitable for a workstation: ECC memory support, fine TDP, and all the perks of being mainstream. Though there are security vulnerabilities, potential backdoors (particularly enterprise features, ME), vulnerabilities in backdoors, and numerous backwards compatibility warts, but there are comparable ones in other affordable and suitable for common computing tasks CPUs (PSB in AMD CPUs). Though as of 2019, it seems that AMD CPUs may be a generally better option: ECC is not disabled even in Ryzen (desktop, unlike the considerably more expensive EPYC or Threadripper) CPUs, and they seem to beat Intel in benchmarks/specifications at the same price. After 2022, intel.com geo-blocks me, nudging even closer to AMD. cpubenchmark.net provides a variety of benchmarks, including "best value" ones, useful for budget builds. Tom's Hardware has the "best picks" category with good pointers, aiming different needs (budget, workstation, gaming). As a side note, some suggest to choose by performance/watt, rather than by announced TDP, and then possibly throttle a CPU with software.
Memory: Software keeps taking all the available memory, and even if one manages to avoid memory hogs, it is still nice to cache more. So it is usually a good idea to have plenty of memory. Kingston seems to be relatively reliable and produces ECC memory (not only their server-oriented models, but also the embarrasingly named "FURY Renegade Pro" line); Crucial and SuperMicro seem fine; personally I have only had issues with Corsair (which makes non-ECC memory anyway). All DDR5 memory has in-chip ECC, but the "ECC" versions still come with additional lanes, to allow detection of in-transit errors. Dual rank (possibly double-sided) memory tends to be a little faster, more expensive, and possibly heat more. A common suggestion is to prefer using 2 modules over 4, to put less stress on the memory controller. Some suggest to avoid mixing different memory kits, though it is unclear how risky it actually is, ECC memory rarely comes in kits, and when it does, those are large ones: 4 or 8 modules.
Storage: Probably it is the time to move to SSDs, but I am still using HDDs as the primary storage. There are reliability statistics around (usually it is, from least reliable to most: Seagate, WD, Hitachi and Toshiba, which is also reflected in prices); it's hard to deduce reliability by a vendor, but WD Red disks work fine for me: by 2024, I only had one faulty WD disk, after about 15 years of regular usage.
Graphics card: Integrated CPU graphics are useful as a backup, and sufficient if you do not do heavy gaming, video editing, or things like that. They also take the price down and reduce the number of components, including moving parts, so there is less noise, less heat, lower power consumption, fewer possible failures. As for discrete video cards, the primary issue for me is software support (both drivers and higher-level software such as X compositors). NVIDIA is most problematic: proprietary drivers are not supported for long, and reverse-engineered libre ones are not usable at all for some cards, and slow for others. AMD is better: in addition to proprietary drivers, there are mostly working open ones. Integrated Intel graphics seem to be the most reliable. h-node.org listing alone does not guarantee that drivers will work any smoothly.
Motherboard: ASUS motherboards seem to be fine, and usually there is a few to choose from, with ECC support. Non-workstation ones tend to come with LEDs and other things one may prefer to not have. Though generally it is better to check reviews and benchmarks for motherboards on a chosen chipset at the time of buying. As of 2024, AMD "workstation" ASUS motherboards are quite expensive, while non-workstation ones support ECC as well. With other manufacturers, workstations motherboards (e.g., those with sockets and chipsets for Threadripper CPUs) also tend to support ECC and to be more expensive, but it is harder to pick a motherboard for a more modest yet reliable computer. Though there are cheaper mATX ASRock Pro ones, also with ECC support.
CPU heat sinks and fans: Noctua is nice. Painless CPU mounting is great, it is silent, and cools CPUs well. Newer AMD stock coolers are not so bad either (except for LEDs), though still behind Noctua.
Power supply: Since a PSU malfunction can fry a motherboard and components on it, it may be a good idea to attempt to pick a reliable one, which would easily handle the used hardware. "80 Plus" ratings can be consulted, and Thermaltake PSUs are not the worst, though their newer models are covered in gaudy LEDs. ATX PSUs are most common for desktop computers, but SFX ones may be preferable for smaller builds, like those with microATX motherboards.
Chassis: Full-tower metal cases are good for building and for cooling, and often come with handy features that are less common on smaller cases (e.g., front panel ports for SATA HDDs and other I/O, large/slow/silent fans), though tend to be heavy. Unfortunately annoying and ugly LEDs are common these days, especially on full-towers. Proper internal 3.5-inch bays for HDDs are increasingly hard to find on computer cases, as of 2025, with online stores counting places for bolting HDDs onto the case's walls as "bays", but adding a filter for cases having a 5.25-inch external bay helps to find those with proper internal 3.5-inch bays in front.
UPS: APC by Schneider Electric is nice (except for its software, as usual for software shipped by hardware vendors, but it is usable without that software). An RBC7 battery lasts for about 3 to 5 years (and it is recommended to change them every 3 years), though it is a pain to recycle one properly. I hear Falcon Electric and Eaton are nice as well. But APC ones tend to make regular beeping noises, and may not be quite suitable for bedrooms. Also heavier ones are quite inconvenient to deal with: even if you rarely move them or their batteries, it happens sometimes, and it is nice to have something more manageable then. After my larger APC UPS started malfunctioning (after about 15 years of usage), I switched to more home-oriented, quieter, and lighter CyberPower (1300 VA, which is still an overkill). This model (CP1300EPFCLCD) was handled by Debian 12 easily, without any tweaking, and estimated to keep my computer setup (85 W) running on battery (while it is new) for about 40 minutes.
Keyboard: The "Truly Ergonomic" keyboard has a relatively nice layout, though custom keyboards may suit one better (and are fun to build). Split keyboards seem nice too, but I haven't tried them yet.
Mouse: Gaming hardware tends to be unreliable, but mice advertised as gaming ones tend to be handy. Logitech mice seem to live longer than others (and particularly than those made by gaming companies, like Razer). They have gaudy LED lights, but those can be controlled with Piper (available from Debian repositories), at least on G102.
Home router: So far I had D-Link and ASUS routers that died, Linksys that lived until it got outdated, a TP-Link router (TL-WR841N/ND v8) that worked for a while and started hanging up after years of use, followed by another TP-Link router (Archer C7 v5). Apparently Zyxel shipped backdoored firmware, so it may be better to avoid. LibreCMC and OpenWRT maintain supported hardware lists, which are handy for choosing from. OpenWRT seems to be better at supporting router models long-term, while LibreCMC drops support sooner and supports much fewer models. And there are interesting router projects like Turris Omnia (open and quite overpowered, by CZ.NIC). OpenWrt One looks like a particularly nice option in 2025, though only has a single LAN Ethernet port.
Printer: I don't have a printer, but apparently Brother makes nice and inexpensive black-and-white laser printers with working Linux drivers. Unlike HP, without chipped and locked down ink cartridges: third-party ones can be used, its own can be refilled. Though by 2025, Brother started locking down the ink cartridges as well, via forced firmware updates. And there are horror stories about HP printers.
Computer speakers: Heavy computer speakers are heavy to move around, loud ones malfunction loudly, small ones tend to make annoying noises. So now I prefer medium-sized ones, with a volume knob and an accessible on/off switch, maybe a headphone output, and a sensible volume range. Though there are many more aspects of both the speakers and the overall setup (including the room around them), and there is the whole "audiophile" group of people occasionally overdoing it in weird and silly ways.
Microphone: While not using a dedicated microphone, I've investigated those. Apparently (and as one may expect) decent microphones are standalone (not embedded into headsets, cameras, etc) and fully analog (that is, don't include sound cards and USB interfaces, but just focus on being microphones, usually with an XLR interface). Dynamic microphones are said to be more suitable for non-studio setups, and condenser/capacitor ones -- for studio setups. Condenser microphones require phantom power, so a suitable audio interface is required; for dynamic ones one may get away with just an XLR-to-TRRS cable (although a preamplifier is commonly recommended, so it may be better to get a basic audio interface anyway). The popular options (for speech, basic and inexpensive ones) seem to be Shure SM58 for a dynamic microphone, Audio-Technica AT2020 and plenty of others for a condenser microphone, Focusrite Scarlett external audio interfaces.
Power cords: Apparently accidental unplugging is a fairly common issue, so IEC locks may be nice to have (even though the IEC 60320 appliance coupling has no interlocking, unlike the industrial IEC 60309): locks on C13 work like finger traps, on C14 they work like tension sleeves, but perhaps they are better than nothing. APC also makes cords, but they come either with no locking at all, or with non-standard interlocking locks (requiring support on both ends). It also seems that contacts become loose with older female connectors, so occasionally replacing those may be useful. They all are supposed to handle 10A, but one may also check current-carrying capacity tables, as well as their claimed certification (some companies, including Cablexpert/Gembird, violate the standard and make C13-C14 cord versions for other maximum currents as well). Apparently APC cords are good and expensive, Cisco ones are similarly priced, Tripp Lite is inexpensive and seemingly okay, others (not counting weird audiophile ones) are inexpensive and their quality varies.; Since C13 and C14 connectors can be rewirable, one can also acquire those and make cords of a desired length (and potentially be more picky about the connectors and wires themselves, paying more attention to plating, insulation, etc), but they can be fiddly, and it may be challenging to find good ones (just as with premade cords).

Generally it is a good idea to look up the models on websites of vendors in order to get accurate and complete specifications, though it doesn't guarantee availability in local stores, and may take a few iterations. As of 2019, tech companies didn't adopt structured/machine-readable data exchange/publishing, so hardware search/picking services tend to provide and use incomplete information. Though they still may be easier to get information from, since official websites tend to be infested with JS and marketing. I've considered composing a table with various vendors, indicating whether they cover hardware in LEDs, make websites unusable and drivers hard to download, etc, but it's basically as bad as it gets for every major vendor.

Might be worthwhile to pay attention to capacitors on motherboards and in PSUs, and possibly it is even more important to keep them relatively cool and dry in order to prolong their lifespan.

One can also get a small server rack and server hardware, which generally aims reliability and is less prone to silly designs, but it may be more challenging to keep it quiet than a desktop computer, and there are likely to be minor annoyances: for instance, usually there's no analog audio I/O in server motherboards.

Media/gaming/entertainment centre

A basic setup can be quite similar to that of a workstation: a computer, a screen, speakers, some input devices. The major issues are content retrieval and manipulation (documented separately, in the Home entertainment centre note), and awkward hardware (documented below).

A computer

It is much easier to begin with giving up on workstation priorities (such as ECC memory and not having gaudy LEDs), since there are plenty of compromises to be made even without those. In the end of 2019, I went for a build with Ryzen 7 3700X (because of a relatively low TDP, and a stock cooler; although later that turned out to be quite annoying, with its bright LEDs), ASUS TUF GAMING X570-PLUS (WI-FI), HX432C16PB3K2/32 memory (which seemed a bit strange, with my workstation from 2012 also having 32 GiB, though this memory is faster), GV-R57XTGAMING OC-8GD graphics card, Corsair HX750 PSU, a couple of NVMe SSDs, and just a voltage stabilizer instead of an UPS (which probably was a mistake: brief power cuts happen quite frequently here; or possibly it's just voltage going too far down sometimes, but either way it's not quite fixable and leads to computers losing power). Finally tried an NZXT case (H710); it's indeed quite nice, though heavy for a mid-tower.

Input devices

The Xbox One controller works easily with MS Windows 10 over Bluetooth (though the batteries only lasted for 40 hours of gaming, and one has to select "mice, keyboards, etc" when adding a device, despite MS Windows suggesting to pick a separate option for Xbox controllers) and over an USB cable (micro-usb). For some reason (which I have no idea how to debug with a reasonable effort, and likely it would violate long and unreadable game licenses) games lag when it vibrates, but disabling vibration gets rid of the lags. Seems to work well on Linux as well.

Wireless input devices may be particularly convenient for a setup like that, but one should keep in mind that they tend to use proprietary protocols, which are almost always insecure (see, for instance, Penetration testing wireless keyboards from 2022, and HN comments, though I think it was pretty much common knowledge before that).

M-Audio Keystation 88 MK3 is an inexpensive MIDI keyboard; I don't have other MIDI keyboards to compare it to, and only played a regular piano before, but it seems fine. Both Yoshimi and LMMS work easily with it, on both Windows and Linux. Synthesia mostly works with it on Android too (though apparently misses some events, especially key releases, and then almost hangs; no idea where the issue is). Z-shaped keyboard stands are sometimes recommended for their stability and independent height and width adjustments, which indeed seem nice (I went for an OnStage one, which seems nice -- but once again, I don't have much to compare it to). I've also acquired an M-Audio SP-2 pedal, with its switch either being broken before it arrived or breaking on the first attempt to use it (and given that it's pretty cheap, attempting to replace it looks like more trouble than it's worth); fortunately a MIDI pedal is just a basic on-off switch, so one can try to replace it with a paperclip or two, but that's rather junky.

A screen

OLED matrices seem to be used relatively commonly for media-oriented "TVs", but modern "TVs" are monitors with built-in computers, loaded with proprietary software, malware, and even advertisements (see also: HN thread discussing spyware on smart TVs). Apparently there are similar screens marketed as "conference room" or "commercial" ones, and perhaps non-OLED can be fine too. With comparable specifications, regular screens seem to be quite a bit more expensive than TVs; possibly that's because TVs can feature frame interpolation and double frame rate in their specifications, and/or advertise resolutions with interlacing. Though it's commonly suggested that preinstalled spyware and adware lead to lower prices as well.

I went for a gaming LG screen (32GK850F-B, VA matrix) in 2019, which seems rather nice and not particularly expensive.

Old cable television

While OTT services may make more sense these days, one may want to preserve regular TV (such as DVB-C). There are receivers (aka "set-top box") that can output video over HDMI and sound separately (e.g., over RCA), as well as speakers with dual inputs (e.g., also RCA), and computer screens commonly support multiple inputs, so that both DVB-C receiver and a computer can be connected to both a screen and speakers (so that TV can function independently of a computer). There are PCI and USB TV tuners too, but according to comments on the Internet their quality is very low (both hardware and software), so solving it with additional wires seems like a better option. See also: MythTV, LinuxTV, DVB-C devices in LinuxTV wiki. See the home entertainment centre notes for more on those.

Builds

I decided to put together approximate builds I would consider, so that I will have those at hand in case if I will need to replace a computer urgently, and just as a reference. I have not tried those though, so there may be compatibility issues. Historical ones (which I built) are explicitly marked. The approximate prices I refer to are taken mostly from Russian stores, where hardware is more expensive, but pcpartpicker links with similar builds are provided for reference.

2012, a workstation (built), $2000: Xeon E3 1275 v2, 32 GB (4 * 8) of memory (Kingston, ECC, DDR3, unbuffered), ASUS P8C WS (ATX) motherboard, 1 old WD Green HDD (2 TB, died after 12 years), 3 new WD Red HDDs (3 TB each), GeForce GTX 660 (ASUS; switched to integrated after Nvidia EOL'd it; their proprietary drivers were always a pain), an overkill Thermaltake PSU, Noctua NH-D14 CPU cooler, Thermaltake Overseer RX-I case.
2019, a gaming computer (built), about $2000 including peripherals: Ryzen 7 3700X with a stock cooler, 32 GB of DDR4 non-ECC memory (HX432C16PB3K2/32), ASUS TUF GAMING X570-PLUS (WI-FI) motherboard, Radeon RX 5700 XT (8 GB, GV-R57XTGAMING OC-8GD), Corsair HX750 PSU, two NVMe SSDs, NZXT H710 case (okay for SSDs, but perhaps too big, and would not work well for HDDs or optical disc drives).
2024, budget computer, $600 (pcpartpicker): AMD Ryzen 5 5600GT, 64 GB non-ECC memory, the B550 chipset (e.g., MSI PRO B550M-VC WIFI, microATX), Kingston NV2 1 TB M.2 SSD. Virtually any PSU (300 W should suffice), CPU cooler (TDP 65 W), and case.
2024, cheap computer, $250: AMD Athlon 200GE, 8 or 16 GB non-ECC memory, MSI A520M-A PRO (microATX), maybe a 500 GB SSD, possibly SATA (some motherboards may not support NVMe with this CPU), any PSU, CPU cooler, case.
2024, modest workstation, $900 to $1800 (pcpartpicker): AMD Ryzen 5 9600X, 32 to 128 GB of ECC memory (e.g., KSM48E40BD8KM-32HM), ASUS TUF GAMING B650M-E WIFI or ASUS PRIME B650M-A WIFI II (microATX, ECC support), Kingston NV2 1 TB M.2 and optionally WD Red 4 TB. PSU, CPU cooler, and case do not matter much (CPU TDP 65/88 W, relatively little overall power consumption).
2025, prebuilt computers: mini PC, $120 to $240: Intel N150, 16 to 32 GB DDR4, 0.5 to 1 TB SSD, okay I/O including Wi-Fi. Apparently Lenovo ThinkPad are good and Linux-friendly options for laptops ($900 to $1000 for the low-end E series), IdeaPad are similar (but cheaper and without Ethernet ports, $600 to $900), and Lenovo ThinkCentre are small desktop computers (nettops, slim desktop) with similar qualities.

Personal data storage

2021-03-23T12:00:00Z

Personal data storage

These are my data storage notes, targeting primarily personal data backups: regular files (documents, photo and music collections, not databases), moderate volume, added or edited rarely, backups are managed manually.

Notes on public data backups are extracted into a separate document.

General approach

The "3-2-1 rule" for backups suggests to keep at least 3 copies of data, on at least 2 different storage devices, with at least one copy off-site.

The exact requirements and methods to achieve those may depend on one's threat model: in addition to device failures, bit rot, and unauthorized access by scrapers, one may have to consider fire or flooding, burglaries and robberies, book burning campaigns and censorship with isolation, hardware seizures and imprisonment without ability to maintain the remaining backups for years, inability--or a limited ability--to acquire replacement storage devices, and even uncommon and hypothetical scenarios, such as a global high energy EMP.

Considering the information security "CIA" triad (confidentiality, integrity, availability), we need encryption, so that lost or decommissioned drives will not leak personal data (i.e., crypto-shredding can be employed); integrity checking, so that we will either read back the data that was written or detect data corruption (and preferably even repair it); varied and common technologies (hardware interfaces, drivers, filesystems, file formats), so that there will be a good chance that at least some of the backups can be accessed with reasonable effort in different situations in the future.

Most of the technologies covered here are usable for both backups and working storage. I prefer to use more general tools, since they tend to be better maintained, and learning them usually is a more useful time investment than learning specialized backup systems (but for those, see Bacula, Borg, restic, DAR), some of which are quite similar to actual file systems (e.g., Borg is), while apparently often lacking error correction codes and redundancy within a single repository, but those may still be suitable for the task. Fortunately in this case the variety is preferable, and one can combine those. See also: Debian Reference Manual - 10. Backup and recovery, BackupAndRecovery - Debian Wiki, Synchronization and backup programs - ArchWiki.

As for portability, judging by experimentation in 2024, Android (as on Google Pixel phones) and Windows only support single (Ex)FAT partitions on USB drives, and probably only with MBR or without a partition table; no LUKS or filesystems such as Btrfs and ext4. So having to give up on compatibility with those for my regular backups, though when used for data transfer or unavoidable otherwise, one can use VeraCrypt (open-source, but not always considered FLOSS, for Windows, also supported for opening by cryptsetup, but creation would require additional tools: e.g., VeraCrypt itself or zuluCrypt) and exFAT. The /\:*?"<>| characters must be avoided in file names to stay compatible with exFAT.

Hardware

Reliable computer hardware is desirable to minimize errors and hardware failures: an UPS, ECC memory, and quality hardware (including storage) in general.

External HDDs (or combinations of internal ones and external boxes) are inexpensive and handy for local backups, allowing to keep them safely disconnected most of the time, and to easily plug into virtually any computer when needed.

USB flash drives seem more suitable for off-site backups, being more robust for physical transfer. Flash memory is not suited for a long-term storage without power though, so it is suggested to have them powered up at least for a few hours per year, letting the controllers to do maintenance, or even do data scrubbing (via a filesystem, if it supports that, or simply by forcing reading of all the files, possibly by verifying checksums) to nudge the rewrites. Writing onto cheap Kingston USB thumb drives (e.g., 256 GB DT Exodia) can be very slow, especially once about 2/3 of space is used and with ext4 on top of LUKS: writing at about 200 KB/s (less than 1 GB per hour). Even if you are not in a hurry, it makes one to wonder whether the device malfunctions, so perhaps it is better to not neglect the write speed completely, even for backup storage devices. I saw Apacer USB flash drives of the same capacity, which are even cheaper, having sustained write speeds of about 10 MB/s, at least with exFAT.

Having an erratic USB port, bus, or wires (built into a Thermaltake chassis) that occasionally disconnects devices during active writing, I had a Transcend JetFlash (64 GB) thumb drive apparently dying (hanging on any writing attempt, "Device not responding to setup address.") after such a disconnect, while Kingston ones survived a few of those. As a side note, this seems more hazardous than non-ECC memory.

Optical drives (CD, DVD, Blu-ray) are commonly suggested for archieval, though they seem less convenient for updates and for usage in general, and it is not quite clear whether the recordable ("burned" with a laser and a dye, as opposed to being stamped at a factory) CDs and DVDs are that long-lasting, but apparently they are still quite durable (see 2024 Optical Media Durability Update). And some aim archival storage explicitly (e.g., M-DISC, mostly with BD).

Paper backups may be useful as well, and quite reliable, particularly for texts and images. Acid-free paper should be used for those, and one may play with bookbinding then. Some use QR codes and other two-dimensional barcodes to store arbitrary digital data on paper. Out of hardware, one would need a printer and a scanner for those, though I should investigate that better. To combine human-readability with relative machine-readability, special fonts like OCR-A and OCR-B can be useful, possibly combined with error correction codes.

One may also consider keeping backup storage devices and related items in a specialized storage shelf, a Faraday cage, or a fire-resistant and/or waterproof safe.

To go further than that, including storage of physical items, one may also look into general archieval- and collection-related materials, such as the Preservation Self-Assessment Program.

Backup operating system

I find it useful (for the peace of mind, at least) to set a bootable operating system on at least one of the backup drives, with all the necessary software to read the backups. So there usually is EFI system partition (ESP), an unencrypted partition for /boot (GRUB2 can handle encrypted ones, but it would not make much difference), an encrypted partition for the rest of the system (to prevent possible data leaks via cache, for instance, after backups are accessed from it), and a separate encrypted partition for the backup itself.

When installing a system using an installer, on a machine with more than one disk and some existing systems present, the installer would often use a seemingly random ESP on one of the internal disks, instead of the one on the backup drive. Fixing it may involve booting via the GRUB shell after GRUB fails to find or access its config from the /boot partition, remounting (and fixing in /etc/fstab) /boot/efi/, to point to the correct drive's ESP, and then running grub-install to install it there. Also removing undesirable directories from ESP manually, and adjusting things with efibootmgr. Or one can opt for a more involved/manual installation, setting it properly at once: see, for instance, "Installing Debian GNU/Linux from a Unix/Linux System" and "Full disk encryption, including /boot: Unlocking LUKS devices from GRUB".

Alternatively, or additionally, one may set a personalized live system image, as described in the Debian Live Manual and similar documents for other systems.

Storage setups

I do partitioning with fdisk, mostly because other common tools (or at least their fancy user interfaces) tend to be buggy, and/or to hide technical information, neither of which is desirable when partitioning storage devices. fdisk is nice, commonly available, and works well. With the setups described below, it works to set LUKS or an encrypted filesystems directly on a block device, without any partitioning, but it may also be desirable to store some public data backups on a separate partition of the same storage device, unencrypted.

RAID 1 (or possibly 5, 6) is nice to set if there are spare disks, but usually not as critical for redundant personal backups as it is, for instance, for a production server.

As of 2021 and for Linux-based systems, some of the common software options are:

LUKS and friends: LVM or mdadm (software RAID), cryptsetup/dm-crypt (encryption), integritysetup/dm-integrity (integrity)
ZFS (software RAID, encryption, integrity, added redundancy)
Btrfs (software RAID, integrity, added redundancy)
Regular checksums, such as sha256sum (integrity)

Those can be combined, even the ones serving the same purpose: for instance, storing file checksums would not harm even if the underlying filesystem supports those already. Likewise, it should not harm to encrypt the more important files (cryptographic keys, passwords), even while storing those on encrypted disks.

Below are notes and command cheatsheets for the setups I use.

LUKS and ext4

This is probably the most basic and widely supported setup for Linux-based systems. Only authenticated integrity checks are supported by cryptsetup (and those are experimental), so no CRC and no recovery from minor errors without RAID. Perhaps dm-integrity can be set separately to use CRC32C, but that would complicate the setup. Or it can be skipped altogether, since integrity checking is experimental, and wiping can slow down the process considerably (while skipping the wiping easily leads to errors).

Initial setup:

# Optionally, add: --type luks2 --integrity hmac-sha256
cryptsetup luksFormat /dev/sdXY
cryptsetup open /dev/sdXY backup2
mkfs.ext4 /dev/mapper/backup2
cryptsetup close backup2
mkdir /var/lib/backup2

A typical session (CLI-based, though this is also handled by graphical file managers, such as Thunar):

cryptsetup open /dev/sdXY backup2
mount -t ext4 /dev/mapper/backup2 /var/lib/backup2/
# synchronize backups
umount /var/lib/backup2/
cryptsetup close backup2

When done, in order to safely eject a device, run eject /dev/sdX, or possibly udisksctl power-off -b /dev/sdX.

To change a passphrase, cryptsetup luksChangeKey /dev/sdXY.

For RAID with mdadm, see "dm-crypt + dm-integrity + dm-raid = awesome!".

ZFS

ZFS is not modular like LUKS and friends, there are license compatibility issues, and it is rather unusual overall, but apparently a good filesystem containing all the features needed here. Be warned that installing it on Debian involves building a kernel module, which takes notable time on updates, and heats up the CPU (leading to laptop fans spinning loudly), discharging the battery at once, so it may be a good idea to have one dedicated machine to deal with it, but avoid it on others.

Initial setup:

# Ensure that linux headers are installed, needed for zfs-dkms
apt install linux-headers-amd64
# Install zfsutils-linux (from "contrib" repositories)
apt install zfsutils-linux
# Find a partition ID
ls -l /dev/disk/by-id/ | grep sda4
# Use that ID to create a single-device pool. The "mirror" keyword
# should be added to set RAID 1.
zpool create tank usb-WD_Elements_...-part4
# Create an encrypted file system.
mkdir /var/lib/backup/
# For redundancy within a dataset, add to the command below: -o copies=2
zfs create -o encryption=on -o keyformat=passphrase -o mountpoint=/var/lib/backup tank/backup

ZFS comes with its own mounting and unmounting commands, and if it is to be used from different systems, the pools should be exported and imported (or just force-imported). A typical session, assuming that it is used from different systems:

# List pools available for import
zpool import
# Import the pool
zpool import tank
# Mount an encrypted file system
zfs mount -l tank/backup
# (Synchronize backups here)
# Unmount the file system (or it will happen on export)
zfs unmount tank/backup
# Unmount the pool (also unnecessary to do manually though)
zfs unmount tank
# Export the pool
zpool export tank
# And eject or udisksctl power-off -b, as mentioned above

To change a passphrase, zfs change-key tank/backup.

LUKS with Btrfs

This one is set with the DUP profile for both metadata and data, adding redundancy, and with sha256 checksums (instead of the default crc32c), to reduce chances of collisions.

Initial setup:

# LUKS, as with ext4
cryptsetup luksFormat /dev/sdXY
cryptsetup open /dev/sdXY backup
# The file system
mkfs.btrfs --csum sha256 -m dup -d dup -L backup /dev/mapper/backup
cryptsetup close backup
mkdir /mnt/backup

A session:

cryptsetup open /dev/sdXY backup
mount -t btrfs /dev/mapper/backup /mnt/backup/
# synchronize backups here
umount /mnt/backup/
cryptsetup close backup
eject /dev/sdX
udisksctl power-off -b /dev/sda

Bit rot

As mentioned above, it is important to be able to detect errors with some integrity checks, but one may also aim single-device redundancy for a recovery using that single device (and a better overall chance of successful data recovery), as well as calculate checksums on top of a filesystem (e.g., for ext4, which does not support those on its own).

For integrity checking with basic checksums, one can use find and sha256sum or similar tools:

# Store checksums
mkdir checksums
find . -type f ! -path './checksums*' -exec sha256sum {} \; \
  > checksums/sha256
# Check them
sha256sum --quiet --check checksums/sha256
# Add new ones
find . -type f -newer checksums/sha256 ! -path './checksums*' \
  -exec sha256sum {} \; >> checksums/sha256

Alternatively:

# Store (new) checksums
mkdir -p checksums
find . -type f ! -path './checksums*' -exec sha256sum {} \; \
  > checksums/$(date -I).sha256
# Compare them to old ones
diff -U 0 <(sort checksums/2026-01-12.sha256) \
  <(sort checksums/2026-01-23.sha256) | less

For redundant error correction codes (forward error correction, FEC), with ability to repair, one may employ par2, dvdisaster (aiming optical discs), zfec (a library with Python, C, Haskell APIs), libfec (a C library), GNU Radio FEC API, though those may be quite inefficient to use for collections of files that are updated. There are projects like blockyarchive (blkar), but just as specialized backup systems, they tend to require specialized tools to access the files backed up with them at all. A software RAID (1, 5, or 6) set on different partitions of the same device is a more time-efficient way to achieve some redandancy within a storage device, though less space-efficient, and protecting against different bit rot patterns. ZFS's "copies" parameter and Btrfs's DUP profile (for both data and metadata) do something similar, storing multiple copies of blocks within a dataset.

Other useful tools

S.M.A.R.T. monitoring and testing can be done with smartmontools, and usually supported even by external and older USB drives.

I normally use just rsync --archive for the initial backup, then rsync --exclude='lost+found' --archive --verbose --checksum --dry-run --delete to compare backups and for data scrubbing, and without --dry-run afterwards, if everything looks fine. Using -rt or -r instead of -a may be preferable sometimes though, if file permissions and ownership data are not to be preserved.

For data erasure, dd is handy for wiping both disks and partitions (before decommissioning drives, or if there were unencrypted partitions before), e.g.:

dd status=progress if=/dev/urandom of=/dev/sdX bs=1M
dd status=progress if=/dev/urandom of=/dev/sdXY bs=1M

GnuPG is there for individual file encryption, as well as for signing. In some cases it may be useful together with tar and gzip.

For more compact music backups, one may wish to backup just the files referenced from a playlist, and not the whole archive. An example command for counting the total size of files involved in a playlist:

xmllint --xpath '//*[local-name()="location"]/text()' music.xspf |
  sed -E 's/&/\&/g' |
  tr '\n' '\0' |
  du -s --files0-from=- |
  awk '{ sum += $1 } END { print sum }'

While rsync has the --files-from option, to work with a given list of files only:

xmllint --xpath '//*[local-name()="location"]/text()' music.xspf |
  sed -E 's/&/\&/g' |
  rsync --dry-run -avz --files-from=- . ~/mnt/

Remote backups

When backing up private data to a remote (and usually less trusted) machine, it should be encrypted and verified client-side (so options like plain rsync over SSH are not suitable), but preferably still allowing for incremental backups (so tar and gpg are not suitable in general, either). One can still employ LUKS or ZFS though, by accessing remote block devices via iSCSI (in particular, tgt and open-iscsi seem to work smoothly on Debian), NBD, or similar protocols, possibly on top of IPsec or WireGuard (though as of 2024, those are blocked in Russia between local and foreign machines), tunnels made with SSH port forwarding, TLS (e.g., with stunnel), or anything else establishing a secure channel, to add encryption and a more secure authentication.

A test iSCSI setup example:

# server (192.168.1.2)
apt install tgt
dd if=/dev/zero of=/tmp/iscsi.disk bs=1M count=128
tgtadm --lld iscsi --op new --mode target --tid 1 --targetname iqn:2024-07:com.example:tmp-iscsi.disk
tgtadm --lld iscsi --op show --mode target
tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 -b /tmp/iscsi.disk
tgtadm --lld iscsi --op new --mode account --user foo --password bar
tgtadm --lld iscsi --op show --mode account
tgtadm --lld iscsi --op bind --mode target --tid 1 --initiator-address 192.168.1.3 --initiator-name foo
tgtadm --lld iscsi --op unbind --mode target --tid 1 --initiator-address 192.168.1.3 --initiator-name foo
tgtadm --lld iscsi --op bind --mode target --tid 1 --initiator-address 192.168.1.3

# client (192.168.1.3)
apt install open-iscsi lsscsi
iscsiadm --mode discovery --type sendtargets --portal 192.168.1.2
iscsiadm  --mode node  --targetname iqn:2024-07:com.example:tmp-iscsi.disk --portal 192.168.1.2 --login
iscsiadm --mode session --print=1
lsscsi
# a block device is available at this point
iscsiadm  --mode node  --targetname iqn:2024-07:com.example:tmp-iscsi.disk --portal 192.168.1.2 --logout

Apart from own (or rented) remote machines, such a setup can be used with "backup buddies", exchanging some of your local storage space for someone else's. Sneakernet-based backup buddies (that is, occasionally exchanging storage devices) is a fine and easier option for remote backup storage.

A popular option for remote backups is online services (aka "the cloud" and a few other names), with many people relying on those even in place of local backups, or any local storage (as with music and video streaming, hosted photo albums, password managers, book collections, general document storage), delegating all those worries to somebody else. It seems convenient, but decreases direct control over the data, introduces dependencies on the service providers' continued existence and continued acceptable terms of service, on network connectivity to them, on ability to transfer payments. In my--possibly unrepresentative--experience, all those are unreliable, but it may still work as a redundant backup copy for some, particularly in predictable democratic countries, with a reputable service provider. Throw in the rule of law and sensible laws (or some kind of a hypothetical anarchist or communist utopia), and one may worry less about keeping some information private, as well as about aiming long-term isolated backups of public information.

Data sharing

For less private data (perhaps for almost everything but cryptographic keys and passwords -- that is, explicit secrets), a good way to preserve it is by sharing with others: for instance, pictures from an event or gathering are commonly shared among all the participants, while creative works (particularly books and music) can be shared among people with similar interests or tastes. Everything work-related can be backed up on work machines. While the data that is not private at all, like this very note, or other own creative works under permissive licenses, is generally useful to publish, sharing even more widely.

Adverse services

One may consider use of relatively adverse services for both storage and transfer, such as censored and monitored ones. Usually they are best to avoid, but they may still be useful for redundancy, or when there are no other working options.

For file storage or sharing services, storage on block devices, as described above, can be handled with a file-backed loop device. GnuPG and other file- or stream-oriented methods would also work, but since encrypted data may attract unwanted attention, it is better at least to not advertise the encryption with headers. An easy option to do that (using a passphrase without additional files, the widely available openssl CLI tool) with a single file or stream is openssl enc -aes-256-ctr -nosalt -pbkdf2, but PBKDF2 is relatively weak (argon2id is recommended), and this skips a salt entirely, so the passphrase must be high-entropy then. Alternatively, one may consider using and storing the salt, and employing argon2 manually, e.g.:

sudo apt install openssl argon2
SALT=$(openssl rand -hex 16) # 128 bits
# https://www.rfc-editor.org/rfc/rfc9106#name-parameter-choice
# The first recommended option
argon2 $SALT -id -t 1 -p 4 -m 21 -l 32
# The second recommended option
argon2 $SALT -id -t 3 -p 4 -m 16 -l 32
# Use -e or -r options for scripting

Another option is cryptsetup (dm-crypt) without LUKS: in its plain mode, on a loop device, with some basic non-journaling file system (such as ext2, or ext4 with journal disabled) on it. The passphrase must also be high-entropy, since it does not use a KDF: ideally matching the 128-bit key size, which would take 8 random words picked out of a dictionary of 65536 words. Or a separately derived (or generated, simply random) key must be supplied, perhaps with the --key-file option. An example:

dd if=/dev/urandom of=test.img bs=1M count=128
sudo losetup --find --show test.img
sudo cryptsetup open --type plain /dev/loop0 test
# No journal, no reservation
mke2fs -t ext4 -O ^has_journal -m 0 /dev/mapper/test
mkdir test
sudo mount /dev/mapper/test ./test
# Add the files here
sudo umount ./test
sudo cryptsetup close test
sudo losetup -d /dev/loop0

To make use of audio channels or audio data storage services (including the ones that re-encode audio files), a straightforward way is to use modem software, such as minimodem. That can be combined with forward error correction for reliability, and mixed with another audio stream to stay more covert. One may also try setting a system in the style of numbers stations, using TTS (text-to-speech, festival or espeak) and STT (speech-to-text, CMU Sphinx or more advanced ones).

For data encoding as text, one can use plain base64, some words-based encoding, maybe Markov chains (which would require custom tools and data though).

For video, a similarly ad hoc and basic approach is to encode a sequence of QR codes (or other matrix barcodes). But as for other options, one may also consider more involved steganography.

HMAC can be useful for authenticated integrity checks with such services. Common CLI tools for that include hmac256(1) from the libgcrypt20-dev Debian package and openssl-dgst(1ssl): openssl dgst -sha256 -hmac [file ...].

I would not recommend to rely on online services generally, but using them for added redundancy, particularly for public data, may be okay, and potentially fun to play with. If arbitrary and random-looking data storage is not explicitly allowed by a service, it may lead to account suspension, including other services on that account (though it occasionally happens with online services even without such a trigger).

Network abuse

2022-09-07T09:00:00Z

Network abuse

Here is my log of spotted and reported network abuse incidents. It started as private notes aiming to keep track of those being fixed, and to block the hosts if they keep spamming. I decided to make it public, since there is no private information in it (though I'm omitting the bits I may discover that aren't public, such as server administrator email addresses), and it may be of interest for people trying to decide whether reporting is worthwhile.

Spam messages

Below are incidents with spam messages that got through the usual filters: dates, hosts, the abuse contact and other report information, other notes.

XMPP

2021-09-12, 188.243.192.232, abuse@sknt.ru: no response and spam kept coming, submitted a JabberSPAM blacklist PR.
2021-09-12, 138.201.50.174, stian@barmen.nu: replied that he will investigate. Probing from ether@jabber.no.
2021-09-12, 54.36.115.48, info@xmpp.gg and abuse@ovh.net: no reply from either and the spam kept coming, submitted a blacklist PR. Probing from ink@jabber.gg.
2022-08-25, 138.201.25.9, abuse@hetzner.com followed by Hetzner abuse reporting form. Subscription requests and OMEMO-encrypted messages, similar ones from multiple services and JIDs, with occasional plaintext being just silly. This one is from klassic@isgeek.info. Those kept coming for at least a month.
2022-08-25, 185.146.232.56, vesselwave@protonmail.com: they deleted the user and started looking more closely for spammers. From klassic@satisprivacy.org.
2022-08-25, 95.168.217.72, support@jabbim.zendesk.com and abuse@superhosting.cz (since the first one had no effect). From multiks@jabbim.sk.
2022-09-06, 170.187.181.190, abuse@linode.com. From multiks@rows.im.
2022-09-10, 86.250.242.174. Did not notice at first, and then it ceased. Probing (presence subscription requests) from multiks@im.azurs.fr.
2022-10-01, 89.147.108.127, info@outerrealm.net on 2022-10-06, within 30 minutes received a reply saying that it will be looked into, and apparently it was solved. From ehf@msg.outerrealm.net: subscription requests at first, an odd message saying "Request Subscription" (followed by opportunistic OTR's whitespaces, similarly to some of the past spammy/probing messages) on 2022-10-06.
2022-10-18, 78.72.102.36. Have not reported, but then it disappeared; possibly somebody else did. From swe@qwik.space, a subscription request.
2022-10-18, 78.72.102.36. Same as above: haven't reported, but then it disappeared. From basik@qwik.space, a subscription request.
2022-11-01, 138.201.50.174, stian@barmen.nu. From floki@jabber.no: "Hi there, free for chat?". Then a subscription request from the same JID arrived on 2023-01-03.
2022-11-16, 138.201.25.9, the Hetzner reporting form (since have not found aministrator contact information). Received an acknowledgement on 2023-01-11, a reply from the XMPP server aministrator on 2023-01-13 saying that it doesn't look like spam; described the issue in more detail, another reply saying that it sounds like "complete nonsense" and suggesting to use iptables. Asked on operators@muc.xmpp.org to ensure that my approach is sensible, and replied to abuse@hetzner.com, asking about their policy on XMPP spam; no reply, as of 2023-05-05. Unexpected presence subscription request and no message (likely probing) from basik@isgeek.info.
2022-12-13, 138.201.50.174, stian@barmen.nu. Then again on 2023-03-08 (after an additional message from the same XMPP address). From prtship@jabber.no/_, a presence subscription request, and a "Hi, Free for chat?" message 3 months later.
2023-01-18, 167.179.180.180, abuse@octothorn.com (on 2023-01-19). Received a reply on 2023-02-15, mentioning that the user is being kicked off, and the account had more than 1000 contacts in the roster, most of which were pending a subscription approval. From aus@jabber.octothorn.com/_, a presence subscription request. The last one arrived on 2023-01-31.

Email

2021-02-09, 103.66.105.237, noc@cmjainimpex.in.
2021-03-31, 205.201.133.233, abuse@mailchimp.com.
2021-06-24, 2a00:1450:4864:20::641, Gmail abuse reporting form. Apparently reporting didn't work, nothing happened on "submit".
2021-06-25, 91.223.3.194, admin@skynode.pl.
2022-04-25, 146.19.173.107, abuse@ipconnect.services.
2022-04-28, 5.181.80.128, noc@4vendeta.com.
2022-05-29, 200.93.248.119, rolfex@powerfast.net.
2022-05-30, 193.218.204.206, abuse@heficed.com. The client replied that it was solved a long time ago.
2022-05-31, 2607:f8b0:4864:20::e41, Gmail abuse reporting form.
2022-06-30, 211.100.47.38. A Chinese ISP, probably not worth reporting, Blacklisted in postscreen_access.cidr.
2022-08-15, 159.183.196.221, abuse@sendgrid.com.
2022-11-01, 2607:5500:3000:1176::2, support@hostwinds.com.
2023-05-05, 106.75.10.112, ipas@cnnic.cn. From ucmail25.sendcloud.io.
2023-05-30, 69.12.91.126, abuse@quadranet.com.
2023-06-16, 117.50.66.12, ipas@cnnic.cn. From ucmail17.sendcloud.io, added sendcloud.io REJECT spammers into the file referenced by postfix's check_client_access. dnswl.org returned 127.0.15.0 for it, reported it to them as spam.
2023-06-22, 192.119.65.137, abuse@hostwinds.com. Their mail server (Gmail) rejects messages with the spam message attached, reported without an attachment.
2023-07-21, 220.133.13.91, hostmaster@twnic.net.tw. According to the received mail headers, it originated from 185.225.74.219.
2023-09-15, 46.17.43.50, noc@baxet.ru. With valid SPF for tiaohu.net: apparently a Chinese organization's domain name, but a Russian hoster's IP address. Quickly received a reply saying "Blocked" from support@justhost.asia.
2023-09-15, 2607:f8b0:4864:20::935, Gmail abuse reporting form.
2023-09-22, 2607:f8b0:4864:20::72c, Gmail abuse reporting form. Same address as the previous one (polachek@squadhelp.co), a follow-up.
2023-09-23, 2607:f8b0:4864:20::72a, Gmail abuse reporting form. Same address as the previous two, the spammer claimed it is the last message.
2023-09-25, 2607:f8b0:4864:20::f29, Gmail abuse reporting form. A new subdomain, polachekg@go.squadhelp.co, but continuation of the previous 3, and Gmail does nothing; blacklisted the domain in postfix (check_sender_access).
2023-10-19, 209.85.128.177, Gmail abuse reporting form. From masonlambert190@gmail.com
2023-11-01, 209.85.128.172, Gmail abuse reporting form. From katherinesophia523@gmail.com
2023-12-05, 31.192.235.11, abuse@profitserver.ru. Phishing, envelope-from abuse@q03.1cooldns.com, with valid DKIM and SPF.
2023-12-11, 31.192.237.60, abuse@profitserver.ru. Phishing again, envelope-from abuse@origin.1cooldns.com.
2023-12-11, 209.85.219.180, Gmail abuse reporting form. From haileyjtanner@gmail.com, asking to add a link to some furniture selling website (which supposedly has a blog post on astronomy) from my "links" page.
2023-12-18, 209.85.128.170, Gmail abuse reporting form. From haileyjtanner@gmail.com again, Gmail does not seem to do much about outgoing spam.
2023-12-19, 31.192.239.9, abuse@profitserver.ru. Phishing yet again, envelope-from=no-replies@batixtaneve.com this time. Blacklisted 31.192.232.0/21.
2023-12-26, 209.85.128.169, Gmail abuse reporting form. From haileyjtanner@gmail.com yet again, Gmail still does nothing. Blacklisted the address in postfix (check_sender_access).
2024-02-29, 204.152.197.177, abuse@quadranet.com. Spam about electric bicycles
2024-03-12, 185.218.100.84, abuse@ipxo.com.
2024-03-18, 194.53.136.174, abuse@virtono.com. Spam about electric bicycles, same as on 2024-03-12.
2024-03-20, 104.223.121.26, abuse@quadranet.com. Same as the last two, and as on 2024-02-29: e-bikes.
2024-04-25, 2024-04-26, 216.9.224.143, abuse@dchost.com. Scam, 3 messages. And one more message from the misconfigured mail server, notifying about a failed delivery (the "from" address matched the "to" address).
2024-05-09, 173.249.144.124, abuse@liquidweb.com. Posing as a Docusign notification.
2024-06-12, 193.188.192.139, abuse@pipenet.hu.
2024-07-31, 47.90.198.34, abuse@alibaba-inc.com.
2024-08-08, 103.224.90.82, abuse@nexcess.net. Phishing
2024-09-23, 208.234.3.27, abuse@verizon.net, abuse@ait.com. A scam, as described in "Beware of Chinese Domain Scams" or "Chinese domain registration emails". Verizon pointed to AIT.com, I wrote there, the "support ticket" was closed quickly without a comment.
2024-09-24, 2a00:1450:4864:20::42b, Gmail abuse reporting form. From saracody9@gmail.com, a request to link some irrelevant website from mine.
2024-10-27, 219.134.170.101, anti-spam@chinatelecom.cn. Router advertisements.
2024-11-18, 46.23.108.219, abuse@bullethost.net. Electric bicycle advertisement.
2024-11-19, 192.154.230.159, abuse@host4yourself.com. Electric bicycle advertisement.
2024-11-22, 181.214.99.201, abuse@ipxo.com. E-bikes.
2024-11-30, 188.127.247.224, abuse@smartape.net (though SmartApe is reported to be a Russian hosting for cybercriminals itself). Probing.
2024-12-01, 120.241.40.88, abuse@chinamobile.com. Spam about shipping from China.
2024-12-04, 91.193.18.13, abuse@hostzealot.com. E-bikes.
2024-12-06, 181.214.99.132, report@abuseradar.com. E-bikes.
2024-12-10, 84.32.41.141, report@abuseradar.com. E-bikes.
2024-12-13, 162.250.189.12, complaints@servarica.com. The ticket was automatically created and automatically closed without response in 36 hours; blacklisted its subnet in postscreen_access.cidr.
2024-12-29, 222.125.131.176, xujing@topway.cn. Shipping from China.
2025-01-12, 39.189.22.39, abuse@chinamobile.com.
2025-01-14, 45.147.167.60, abuse@thinkhuge.net. E-bikes.
2025-01-15, 217.12.203.132, abuse@greenfloid.com. E-bikes.
2025-02-03, 39.189.22.212, abuse@chinamobile.com. Blocked SMTP connections from Chinese IP addresses via nftables at this point, since there is a lot of spam and no ham at all coming from those.
2025-02-11, 85.120.223.178, abuse-nav@rnc.ro and abuse@nav.ro. E-bikes. Connections to the rnc.ro mail server time out.
2025-02-15, 209.85.208.176, Gmail abuse reporting form. From svcodie@gmail.com, again about adding some supposedly astronomy-related links to my "links" page (the URL that is dead for a few months), with a shady "unsubscribe" link.
2025-02-17, 85.120.223.139, abuse@nav.ro. E-bikes again.
2025-02-23, 209.85.218.42, Gmail abuse reporting form. From jessigfrost@gmail.com, again on astronomy links.
2025-02-24, 85.120.223.179, abuse@nav.ro. E-bikes again, repeatedly from the same ISP. nav.ro rejected my report as spam. Added a rejection rule for 85.120.223.0/24 into postscreen_access.cidr.
2025-02-27, 209.85.208.47, Gmail abuse reporting form. From the previously reported jessigfrost@gmail.com.
2025-02-27, 194.102.104.66. Phishing, but nowhere to report: it lists abuse-alexhost@rnc.ro, and rnc.ro's mail servers are not responsive, as discovered recently. Blacklisted the subnet, as with the other rnc.ro one.
2025-03-08, 209.85.214.175, Gmail abuse reporting form. Probing of active mailboxes, apparently (sent to two of my email addresses), from insangleeq@gmail.com.
2025-03-14, 209.85.222.53, Gmail abuse reporting form. Follow-up probing, from mrsirishboudreau86@gmail.com.
2025-03-21, 209.85.160.52, Gmail abuse reporting form. Probing again, from ukpabimberi892@gmail.com.
2025-03-24, 39.189.23.79. Chinese spam again: I had nftables.service disabled, so did not apply the filters after the reboot.
2025-03-26, 2607:f8b0:4864:20::b44, Gmail abuse reporting form. From mberiukpabi611@gmail.com.
2025-03-28, 209.85.208.66, Gmail abuse reporting form. Still probing, which seems to be quite regular (weekly), from ukpabimberi353@gmail.com. It was sent to two of my email addresses.
2025-04-04, 179.61.221.11, report@abuseradar.com. E-bikes.
2025-04-09, 209.85.208.196, Gmail abuse reporting form. Probing yet again, sent to at least two of my email addresses, from noorawilliams015@gmail.com.
2025-04-17, 209.85.219.171, Gmail abuse reporting form. Some probing again, from quydai079@gmail.com, referencing a phone number for use with Telegram.
2025-05-31, 193.52.142.199, certsvp@renater.fr.
2025-06-20, 185.130.249.144, abuse@smartape.ru. "Unpaid invoice" scam.
2025-06-23, 209.85.160.43, Gmail abuse reporting form. From mrsirishboudreau5@gmail.com, probing for live email addresses.
2025-07-09, 193.42.36.71, abuse@hostzealot.com. E-bikes.
2025-07-09, 209.85.219.196, Gmail abuse reporting form. From mrsirishboudreau288@gmail.com, probing for live email addresses.
2025-07-28, 38.45.89.36, abuse@cogentco.com. Posing as a Docusign notification.
2025-07-29, 191.252.13.209, abuse@locaweb.com.br. Posing as booking.com.
2025-07-29, 191.252.13.197, 191.252.12.56, 177.153.3.113, 179.188.6.145 (all Locaweb, as the one above). Blacklisted 191.252.0.0/16, 177.153.0.0/16, 179.188.0.0/16.
2025-08-04, 192.154.230.149, abuse@host4yourself.com. E-bikes.
2025-08-05, 79.141.174.230, abuse@hostzealot.com.
2025-08-07, 45.86.230.19, abuse@bluevps.com. E-bikes. They promptly responded "This client is blocked". I added /e-bike/i REJECT E-bike spam into postfix's body_checks.
2025-08-08, 179.61.221.2, report@abuseradar.com. E-bikes, adjusted the body_checks rule to /e-?bike/ REJECT E-bike spam (the i flag actually turns case-insensitivity off), blacklisted 179.61.221.0/24 in postscreen_access.cidr.
2025-08-12, 108.165.213.11, abuse@dartnode.com. Phishing.
2025-08-22, 209.85.217.68, Gmail abuse reporting form. From mrs.info.jashok@gmail.com.
2026-02-24, 77.83.39.16, abuse@lanedo.net.
2026-03-08, 209.85.216.48, Gmail abuse reporting form. From maviswanczykp82@gmail.com.
2026-03-11, 163.223.211.186, hm-changed@vnnic.vn.
2026-03-11, 163.223.211.186, hm-changed@vnnic.vn. Same message as earlier, blacklisted 163.223.210.0/23 in postscreen_access.cidr.
2026-04-28, 160.30.136.30 and 160.30.136.43, hm-changed@vnnic.vn. The messages pretended to be from gmail, with a DKIM signature failing verification. Blacklisted 160.30.136.0/23 as well.

General observations

A lot of network abuse (spam, vulnerability scans, brute-force attacks) comes from China, plenty from Russia as well. As a side note, Chinese researchers similarly spam the world with fabricated research papers (though apparently they try to combat it, up to a death penalty for researchers who commit fraud if it harms people). Apparently wider agreements, policies, and cultures help to fight network abuse about as well as technological methods do. I think it is okay to rate-limit regional IP address blocks (as described in the private server setup and simpler server setup notes), though one should think twice before blocking them completely: if there are non-abusive users, it would be unfair to them. And then there are large mail providers, particularly Gmail, not caring much about outgoing spam, while blocking them is a bad option, given the number of legitimate users: the ham-to-spam ratio is less than 1, but more than 0.

Information security basics

2025-03-15T12:00:00Z

Information security basics

There are information security guides for different audiences around, including EFF's Surveillance Self-Defense and Email Self-Defense, NIST's Cybersecurity. But I fail to find concise, relatively general, and sensible guidelines aiming personal information security and information literacy to refer people to, so I wrote down the suggestions I would normally share.

I am not a security expert, but a programmer and a small-scale system administrator paying attention to security. So it is a good idea to consider these suggestions critically, just as any others, but I think that they will improve the average state of such guides.

General advice

Question things, do not trust blindly, require evidence and verifiability of claims, check those, do not share personal information or give away control without a good reason to, assume that "anything that can go wrong will go wrong" (Murphy's law). That is, employ scientific and engineering approaches, and try to stay honest: do not nudge things to look better (e.g., more trustworthy or certain) than they are; better to err on the side of safety, assuming that they may be worse than they seem. A lack of understanding makes one vulnerable to deception, so study the relevant subjects: how computers, banks, online stores, governments and scammers work, how software and relevant systems are developed, how the research used by those is done. Computing context is a part of it. Try to avoid fallacies and cognitive biases, as they tend to be exploited by adversaries.

Do not shy away from learning. It is tempting (and commonly suggested) to stick to certain newbie-friendly tools, but that is a very fragile approach: without sufficient understanding, people easily lose the tools (e.g., when the government blocks their secure messengers), or manage to misuse them (e.g., by ascribing strange and unexpected properties to the tools: no user-friendly UI or API will protect from a user assuming that anything going through a system becomes "secure" in all senses and for all purposes, for instance).

Conversely, when providing a service, publishing software, asking for information, sharing information or software, it is nice to make it easy for others to follow that: provide references, evidence, source code, explain why the requested information is required (and ensure that it actually is required); generally, do not ask to believe or trust blindly, do not encourage and normalize dangerous practices.

And as with any other pursuit, give it a try, do not give up, do not view it as "all or nothing": learning a little, paying some attention to security, and avoiding some of the potential losses that way is already better than being successfully attacked all the time.

Threats

Information security includes a few areas, but personal security usually revolves around privacy and confidentiality. Some of the common threat actors targeting individuals are scammers, oppressive governments, and thrill seekers. All those seem to be commonly underestimated: scammers' victims think that they cannot be scammed, and are surprised afterwards; thrill seekers are often neglected because "why would anybody want to do that?"; governments are often ignored because of one's political views (loyalty to the regime, beliefs that it will not turn authoritarian, is not authoritarian even after it turned so, abandoning presidential term limits, introducing numerous censorship laws and persecution of dissent, and so on; belief that they will not reach you) or learned helplessness. "I have nothing to hide" is another common sentiment, often extended to the private information of one's friends and family that they possess, useful to threat actors. That usually implies a certainty that the government is on your side and will stay that way, in addition to one's immunity to the other risks. And then there are the likes of "the world is just, I am good, so nothing bad can happen to me"; a variety of denial strategies and excuses, religious beliefs.

Entities collecting information, even if they do not use it against you intentionally and immediately, may also be viewed as threats, since they tend to leak it via data breaches, or to abuse it themselves later. Those include commercial companies, government organizations, and individuals.

People may also engage in a crime of opportunity if the conditions for that are created: e.g., someone picking up or buying a discarded unencrypted storage device may access (recover) the private data stored on it. Same with information made available online: apparently even IT professionals manage to accidentally allow unauthenticated access to databases quite regularly, making it a common source of data breaches.

Mitigation

Principle of least privilege

The principle of least privilege is generally useful: share the minimum required information to receive a service, or give minimal required and controlled access to your system. E.g., buying most items, using most public transport, or visiting most public places should not require identifying yourself: doing so imposes an unnecessary risk. Likewise with running custom software to access online services, especially if it is closed-source (and possibly proprietary), so you cannot (and possibly not allowed to by the license) check what it is doing with your system. Communicating over the Internet does not require to provide your full name, phone number, or to identify yourself at all. Identifying yourself by sending pictures of documents is one of the sillier and dangerous practices. Software should not run with superuser (root) privileges, and generally the usual security mechanisms must not be bypassed, unless there is a good reason to.

If someone asks you to take unnecessary risks like that, that itself is a cause for suspicion, and to look for other options. Often it involves accepting inconveniences (such as visiting places and standing in queues instead of using proprietary software, dealing with paper documents, possibly with cash, missing some online conversations), resisting peer pressure (e.g., "just set a sensible password like 1234", "install our software with curl | sh and run its custom updater to be up to date", "let's run everything as root to avoid dealing with permissions").

If the private information is not requested by a service, or superuser privileges are not requested by software, it is safest to not volunteer to provide those: e.g., use screen names for online services and as a system user name (which is used as the default name for information sent online occasionally: the best way to ensure that the real name is not leaked accidentally is to never enter it), use dedicated system users or sandboxing facilities to run programs.

Cryptography

Cryptography provides useful tools, perhaps encryption being the most notable one, useful for personal data storage (including encrypted backups), as well as for communication (over email or instant messengers, such as XMPP), and for channel security (for network connections). Another common use of cryptography is for data integrity checks.

Following general advice given above, one should look for trustworthy (transparent, verifiable, openly developed) tools, ideally using free and open-source software exclusively, retrieving it from trusted sources (such as operating system's repositories, where the packages are signed), preferably checking the code, but at least preferring the tools used and inspected by many.

I personally use mostly LUKS for disk encryption and OpenPGP for file and mail encryption and signing, on a Debian system. And TLS, SSH, IPsec, Wireguard for channel security. Those are widely available, well-known tools.

The usage of LUKS with cryptsetup(1) is described in the personal data storage notes linked above, while that of OpenPGP is described in GnuPG's user guides; it is supporetd out of the box in mail clients such as mu4e (an Emacs client), mutt (a standalone TUI client), Thunderbird (a standalone GUI client), and the GnuPG's gpg(1) command-line tool is fairly easy to use. For email, one may want to ensure that the messages are encrypted not just for recipients, but also for the sender, so that the sender can read them later: mutt does it by default (the pgp_self_encrypt option), for mu4e one should enable it in mml-secure-openpgp-encrypt-to-self.

There are endless alternatives, which tend to incorporate newest and shiniest algorithms (which is dangerous by itself: better to stick to heavily analyzed ones), to be written in this month's most trendy language (possibly to be abandoned soon), clean of the backwards--or standards--compatibility cruft accumulated by older tools, and supposedly easier to use, providing fun colors and supportive emojis. Some also like to write their own software, but there are many gotchas and cryptographic attacks that basic algorithm descriptions do not mention, which may easily compromise the system. Both scammers and governments like to advertise malware as security software, occasionally to disguise attacks as security measures. While more legitimate commercial companies tend to sell virtually useless security products, but not necessarily malware: perhaps more of placebo. Security theater is a shady practice along those lines.

OpenPGP is criticized quite persisently, and it is indeed imperfect, as even its name points out: merely "pretty good". But as with other things, the "best" kind, as judged for a particular situation, is often that which is actually used at all, while OpenPGP usually beats the proposed alternatives in its applicability and (continued) availability, and in many cases its issues are irrelevant. There is a room for improvement though. For an alternative OpenPGP implementation, see Sequoia-PGP. Out of standalone (but incompatible with OpenPGP) encryption and signing alternatives, age and Minisign are somewhat prominent. While the OpenSSL CLI tool is more widely available and versatile. And then there are OTR, OMEMO, and MLS for IMs specifically. But I think it can be quite a rabbit hole, while GnuPG is versatile and good enough for most tasks, so at least it is worthwhile to look into first.

Other tactics

There are minor tactics and useful habits, some of which can be described as simply common sense:

Use strong passwords (e.g., generate those with xkcdpass), do not reuse those across services, maybe do not reuse logins and other identifying information, either. That may include things like the IP address, web browser fingerprints, and so on.
Update software (including firmware) regularly to ensure that known vulnerabilities are fixed in it, and pick reputable FLOSS options in the first place. Look into software projects such as GNU, Linux, Debian, OpenWrt, F-Droid. And security-focused alternatives such as OpenBSD and QubesOS, though be careful: some people jump into rather radical, demanding, and possibly experimental setups, do not study those sufficiently, run back into bloated and proprietary systems, and possibly keep switching between those. I personally use Debian stable with Xfce.
Think twice before publishing or otherwise sharing any private or sensitive information, as it is practically irreversible.
If you have to use public services and expose sensitive information (possibly correspondence) to them, prefer the ones that are not easily accessible by entities that can harm you. For instance, it would be reckless to discuss civil liberties over unencrypted email while living in a dictatorship and under surveillance, and using a domestic mail server on top of that.
Rely on yourself, do not assume that arbitrary systems are properly designed and make sense: it may seem like systems (software, services) made by professionals are supposed to be that way, but often they are not. Not only because of programmers' incompetence or malice, but also because odd decisions are made when multiple developers, managers, multiple interacting commercial companies, poorly composed requirements, cost-cutting, hurried development and following changes, pressure to make things more "user-friendly" are involved: there can be mostly competent and well-meaning people creating an insecure mess. Common and visible issues include password restrictions, mandatory recovery mechanisms, their silly combinations with multi-factor authentication. So do not rely on others for security, try to ensure it yourself: do not hand them private information, use end-to-end encryption when applicable, ensure that the software does not run with unnecessarily high privileges, etc.
Try to reduce the impact of possible compromises: do not "put all your eggs in one basket", do employ other risk management tactics. For instance, do not tie all your online accounts to a single email address, domain name, identity provider, or phone number. And reduce the amount of sensitive information that you have written down (even encrypted), especially on Internet-connected devices.
Pay attention to incentives. Particularly marketing of security-related services or software, often employed by commercial companies, tends to focus on selling things that are free otherwise (such as X.509 certificates, usually called "SSL certificates" by those, many years after SSL was renamed into TLS), or features that are not particularly useful, but help them to stand out, since they are not used by others. At which point their usefulness may be exaggerated. Even non-commercial projects may engage in a light version of that, with their developers looking for ways to improve existing systems, convincing themselves that some properties they could add are desirable, then promoting them.
Avoid unnecessary risks (complexity): as with engineering in general, the more complexity there is, the harder it is to analyze, and the more likely it is that something will go wrong. As an example, to turn on the lights, usually a basic mechanical switch would suffice: there is no need for complex controllers, Wi-Fi, Internet connection, some remote servers controlling your lights, and you asking them to operate those, using additional software. Yet such Rube Goldberg machines seem to be worryingly common these days. This is also related to unnecessary loss of control, and to poor availability, which is another aspect of security. In programming, this usually amounts to avoiding unnecessarily complex architectures and tools, as well as unnecessary dependencies.
Employ proper (usually standard and built-in) mechanisms when available: database roles and security policies, system users (with properly set file permissions) and capabilities. Often those are neglected by programmers, who implement such mechanisms from scratch, usually poorly, with risks and consequences similar to implementing custom cryptographic software.
Employ defense in depth.
Reduce the attack surface.
Keep learning, extending and revising practices.

Further application

Sharing

Ensuring secure practices can be interesting and fun, and one may be enthusiastic about it, which helps to follow them. Then it is tempting to share that with others, improve their security practices, which is what I am trying to do by writing this. But keep in mind that people may simply not care about it, as many do not care about their health enough to take care of it, of environment (ecology, as well as politics), of self-improvement, and of a variety of other topics that yet others do care about. Even among those who do care about information security, the threat models and views on ways to achieve it may differ considerably, also as with the other mentioned topics. And it can be difficult to idly observe people you care about doing what you think is bad for them. I think a fine balance between being unhelpful and annoying is to let people know that you are willing to help, to answer and explain things when asked to, but not to try to force those onto others. And maybe to work on useful tools, infrastructure, and documentation in order to satisfy the impulses to share and help, as well as to learn more in the process.

At work

The same principles apply to information security in organizations, when setting company's servers or developing enterprise software. Just as with software and hardware generally. There may be more bureaucratic approaches (with occasional checklists for compliance checks), scales are different, NIST's frameworks are more useful there, but it is basically the same thing.

Mobile computing

2017-07-08T12:00:00Z

Mobile computing

Mobile computing can be a pain, especially when done in uncomfortable positions, on downsized and/or underpowered hardware, possibly in a noisy environment and while being distracted. Unsuitable conditions can also make it much harder to focus on computing-related activities. Yet a mobile computer is often better than nothing, and a comfortable workplace is not always available exactly where you want or need it to be.

These are my notes on dealing with mobile computers over time: mostly the software for underpowered computers with poor input and output capabilities, focusing on Linux-based systems.

A netbook in 2017

I've been stuck with an old netbook (Intel Atom, 1 GB of main memory) for a couple of weeks, so wrote down some of the things I've learned. That's on Debian stable (Stretch was just released; using it with "non-free" repositories to get GNU documentation), with i3 window manager, and using Emacs for most of the tasks.

Wi-Fi is one of the most important things to set. This time, both wpa_cli and wicd claimed that the password is wrong, but nmtui (NetworkManager TUI) has connected just fine – though maybe it has messed up some settings for others somehow. Wicd was hogging resources even while not doing anything useful, as Python programs tend to do, so I've disabled it – it rarely worked anyway. wpa_supplicant writes log messages such as "result=4" and doesn't document those codes in its man page, requiring source code to see what's going on. And NetworkManager just repeats those.

Firefox just starts for 30-40 seconds, and then lags even without JS. I gave up on it, and switched to w3m (emacs-w3m); web services such as online banking don't work with it, but it is keyboard-friendly, generally works, and does not lag too much. To use DDG for search, one should customize w3m-search-default-engine.

As for maps, there is FoxtrotGPS – an OpenStreetMap client that can cache and pre-download maps. It's pretty lightweight and usable.

For video playback, VLC appears to be more reliable than mplayer, even though has its issues (including bloating, lack of documentation, and resource hogging even while idle). Unfortunately, many videos are not available via bittorrent, being only hosted on youtube.com or similar websites; youtube-dl works to extract those.

One of the painful tasks to perform without a mouse is to copy and paste things between a terminal emulator and other programs (such as GUI Emacs). Actually it's somewhat awkward with a mouse, but even worse without it. Well, Emacs-to-terminal-emulator is easy: there are M-w to copy from Emacs and shift + insert to paste into a TE. Copying from a TE can be done by selecting with a touchpad, and then M-: (mouse-yank-primary (point)) RET in Emacs, though it won't work to insert into a TE; but turns out that one can emulate the middle mouse button by pressing the two touchpad keys simultaneously. It's not great, but works; perhaps a nicer way is to use a terminal multiplexer functionality for that, though then one may have to use nested terminal multiplexers, if they are also using those remotely. Or one could use an Emacs TE instead of a separate one, but that could also get awkward.

Speaking of terminal multiplexers: even though normally I'm not using tmux, it is more useful to run remotely with an unstable connection: a remote persistent session partially compensates for the lack of a persistent connection and/or local session.

Doing Haskell programming would be a pain on a netbook because the REPL and cabal would require too much of resources, so I've planned to use a remote server for that: just run both Emacs and a REPL process there. Didn't have to do that in those two weeks though.

xpdf, mupdf, and zathura are relatively lightweight and portable PDF viewers. Xpdf has ugly GUI buttons and a mostly useless left pane that takes space, others use partially qwerty-oriented (vi-style) key bindings (while I'm using Colemak), and the scrolling is quite messy in both mupdf and zathura (in mupdf, there's no way to tell whether you're at the end of a page or not, but scrolling by a little amount would jump a page if you're at the end; zathura may skip a line when scrolling with spacebar). Both xpdf and mupdf allow to adjust colors, zathura doesn't. So I've used both mupdf and zathura, but then discovered Emacs pdf-tools; didn't try it on a netbook, but it works nicely on a desktop: the colors are adjustable, keyboard-friendly, no notable issues like those with scrolling in others.

Bittorrent clients are not so nice to set and use: both rTorrent and Transmission (transmission-daemon with transmission-cli) have broken Emacs interfaces, which I gave up on after brief attempts to debug, since using a netbook doesn't make debugging more fun. Transmission is nicer in that it uses a daemon, which is more suitable for a program like that. To simplify authentication, one should either use netrc (.authinfo.gpg), or disable authentication and only allow local connections:

"rpc-authentication-required": false,
"rpc-bind-address": "127.0.0.1",

Then it's not so bad to control with transmission-remote: -a and -w options to add a torrent and write files into a specified path, -l to list tasks, etc. The Transmission IRC channel (#transmission at Freenode/Libera.chat) is quite helpful, and minor bugs get fixed quickly there.

The situation with music players is pretty similar. I've tried mpd multiple times before, and it never worked, but worked this time (well, after mpc update); mpc is usable to control it, even if not that fancy (i.e., plain CLI). There are some Emacs packages: emms supports mpd, but tries to handle all kinds of players, so the support is not so great; bongo seems to have nicer UI, but doesn't support mpd at all; mingus appears to work, but it refreshes its whole buffer all the time, resulting in annoying blinking and rendering it unusable. And there is ncmpc, which is fine; though ncmpc-lyrics has a lot of dependencies, including Ruby. Music playback seems to be one of the most CPU intensive tasks in a system with relatively little bloat.

The rest of my regular software is keyboard-oriented and lightweight: mu4e with mbsync for mail, circe/erc/rcirc for IRC, bitlbee and circe (later rexmpp) for XMPP, org-mode for notes and things like that, and other Emacs-based and CLI/TUI tools.

Later, in 2023, I have installed Debian 12.2 with Xfce on it. It takes almost 600 MB of main memory, leaving 400 for work. But by 2025, even Debian 13 dropped support for 32-bit systems.

A tablet computer in 2022

During the unfortunate events in Russia in the early 2022, I decided to finally get a tablet computer while they are still available here and while I can afford one. At first I've looked into ones supported by LineageOS, but those were rather old ones, so I went for a model that is newer, and possibly can be supported later -- Samsung Galaxy Tab A8. I don't have much to compare it to (only used one Android phone out of similar devices, and just as a phone, for calls), but it appears to work and to be a tablet.

Samsung groups the awkward software required to be installed by the local government into the "law" group, so it's easy to remove it all at once. Avoiding Google and Samsung account creation, and aiming its usage as both a general household appliance (maybe for use in the kitchen, to read in bed, etc) and a useful device in an isolated wasteland if/when desktop computers will break and have no replacement, I've set F-Droid by downloading its APK, and then installed most of the software from it (though occasionally with APKs from their official websites too): OsmAnd for maps (including offline ones, from OSM); KOReader (as I use on an e-ink reader), Librera, OpenDocument Reader, and Kiwix to read things; VLC as a music and video player; Fennec (a Firefox version available from F-Droid); Sketches for basic sketching; Notes for note taking; a couple of fancier calculators with graphing; Conversations as an XMPP client; the Wikipedia client out of curiosity, but it turned out to be handy. Also Synthesia to try it out with a MIDI keyboard, which mostly worked, but that's proprietary. Termux provides plenty of regular GNU/Linux system functionality, including Emacs in its repositories.

A laptop in 2022

I hear that ThinkPad (IBM originally, Lenovo now) laptops are nice for Linux, but they are expensive; Dell and Lenovo ones are commonly suggested for Linux-based systems too. Lenovo IdeaPad seem to be Linux-compatible, but cheaper than ThinkPad, with less advanced I/O (targeting consumers, not businesses). Here is one of the articles on the topic, linking more: On modern laptop requirements.

Issues with Wi-Fi hardware support are common; see Existing Linux Wireless drivers, ensure that there are drivers for a given laptop's hardware. Linux Hardware Database is another potentially helpful database.

One can also look into fwupd's vendor list to estimate Linux driver support from vendors, or perhaps the Linux on Laptops website, and other erlevant websites linked from the linuxhardware subreddit.

I've picked a relatively inexpensive Dell Vostro 3515, which seems suitable for non-gaming tasks and inexpensive: a 15.6-inch display, plastic, no discrete graphics card, Ryzen 5 3450U and 8 GB of main memory (2 of those are used as video memory, leaving about 6 for the rest of the system), 512 GB SSD, and a 8P8C/Ethernet port (many laptops don't have those anymore), in addition to the common set of I/O ports.

To boot from an USB stick with a Debian 11 installer, I tried to add it in the boot options in the UEFI menu, but that was rather confusing: it asked to choose an exact .efi file, and then failed with a "Something has gone seriously wrong: shim_init() failed" message. Apparently that's common on laptops, with different Linux distributions and laptop vendors, but I haven't found descriptions of any working solutions, except for installing an older version first. What worked for me is just to choose a different .efi file, and then hold F12 during the boot to enter a boot menu, selecting the USB stick from it.

I'm always uncertain about the size of a boot partition (and sometimes about that of the ESP partition too), and how exactly to set encryption (e.g., apparently one can encrypt even the boot partition while using grub, but it doesn't seem that useful, and would lead to double password prompts). And about the swap partition too: usually just disabling it, but perhaps it's more useful on a laptop, and it's commonly suggested to use. I've settled on about 500 MB for ESP (/boot/efi), 500 MB for /boot, encrypted swap and ext4 root partition (/), without a separate /home. Then tried Debian's guided partitoning, and it did exactly that (after selecting use of encryption and of a single partition), so I just went with it. Though as of 2024, some recommend 1 GB or 2 GB for /boot, with Ubuntu apparently defaulting to almost 2 GB, and it is likely to be a pain to change later in such a setup, without reinstalling everything. After updating to Debian 13, which suggests at least 768 MB for /boot, I recreated those, reducing EFI to 200 MB, and increasing the boot partition to 800 MB.

In this case it was a Debian Xfce Live version, with non-free software and documentation (just as for the Debian 11 workstation). It is nice and almost everything works well out of the box, though DPI tends to be wrong on laptops: it is 96 by default, while laptop screens have something closer to 144. That can be adjusted in the "Appearance" settings, the "Fonts" tab. I have also adjusted the touchpad behaviour.

In 2023, after hardly any use, the laptop ceased to charge the battery (it is on the "pending-charge" status all the time, even at 0% charge, with any UEFI charging settings), unclear why. I have not found a way to fix it so far. Also attempts to update the UEFI/BIOS firmware via "BIOS flash update" lead to an "invalid file" error. Some suggest to run it from FreeDOS, but it relies on BIOS, and the laptop appears to only support UEFI boot. Another option is Windows (possibly the live and lightweight version, Windows PE), though microsoft.com bans Russian addresses from downloading it, and bans hoster addresses where proxies are hosted as well, as of 2023 (while dell.com also refuses to serve requests from Russian addresses, but proxies work with it). Plenty of images on The Pirate Bay (which is blocked in Russia, but at least does not refuse to serve requests coming from non-residential addresses, so proxies work) though. I managed to install Windows ADK on a Windows 10 machine, then to prepare a Windows PE USB stick from it. Had to add firmware files into the "media" directory (actually added into a few locations, initially failing to find any), then to run diskpart.exe and its rescan command to find the firmware (I think it was on disk C). The firmware complained that "The AC adapter and battery must be plugged in before the system bios can be flashed", had to run it with /forceit option. Then it seemed to be working, but got stuck on "update progress: completed". I ended up resetting the laptop, then it complained that "battery pack is removed or less than 10%". I turned it off, unplugged the cable, plugged it back again, and the charging LED finally stayed on. Waited for half an hour, turned it on, it ran the BIOS (UEFI) and EC update process again, but then rebooted itself. It forgot where the boot media is, I pointed it manually to a Debian's .efi file again. Then it booted and was charging. Better to look for laptops with a sane firmware update process.

With this laptop, I have also experienced odd touchpad issues, which unfortunately seem quite common: in this case, it ceases to move the cursor after a seemingly random time after the boot, though clicking works, and it is fine again after a reboot. Sounds similar to the "Touchpad stops working after a while" issue, but there is no touchpad mode setting in this laptop's BIOS/UEFI settings. Later noticed that Bluetooth does not work well, either, at least with a Bluetooth speaker: there are occasional audible interrupts, and a stream of kernel module error messages in the logs.

A smartphone in 2022

I acquired a Google Pixel 6a (not exported here officially, so without a warranty, and no spare parts available; but at least not certified in Russia, so no mandatory malware installed on it), which has a plain Android system, and is supported by most of the alternative Android distributions. The software to set on it is similar to that on a tablet: F-Droid (with Guardian Project repositories), then Conversations, ConnectBot, OsmAnd+ (with pale road style, 150% text size), Compass (com.bobek.compass), Wikipedia, VLC, Fennec (+ uBlock Origin, noscript, HTTPS everywhere), Tor Browser (with a bridge set manually), Notes, Librera, Yaaic, Termux (with Emacs on it, as well as openssh and rsync, and allowing it access to storage, so that pictures and other files can be transferred over SSH with rsync: for instance, to synchronize the pictures -- rsync -av -e 'ssh -p 8022' --exclude='.trashed-*' user@host:storage/dcim/OpenCamera/ ~/Pictures/OpenCamera/; but by 2025 it ceased to work, since Android increasingly locks everything down). Later I added strongSwan and WG Tunnel (to connect to a home network as a "road warrior") and baresip (though still mostly using Conversations for calls), Just Another Workout Timer, Open Camera, WiFiAnalyzer, Kiwix, Aegis Authenticator (hOTP/TOTP), Orgzly (an org-mode viewer/editor), Material Files (a file manager with WebDAV, FTP, SFTP, SMB support), a couple of games (Shattered Pixel Dungeon, Mindustry), K-9 Mail, Briar.

The camera on this phone appears to produce rather bleak (washed out, desaturated) pictures, which is particularly apparent after enabling raw (DNG) picture writing. There are multiple ways to saturate it in darktable, but "tone curve" with independent CIELAB channels in particular is handy and versatile; the "denoise" module then helps to get rid of the produced noise. Perhaps one may also change the input color profile: colors look almost fine with sRGB instead of the embedded one; apparently it is a common problem with Pixel phones, see "Colours washed out from Pixel 7 DNG".

A laptop in 2026

Lenovo IdeaPad Slim 3 16AHP10 looks like a fine option: the I/O is not as good as on ThinkPad or IdeaBook ones, but it is inexpensive, has a power-efficient CPU, okay specifications, and Debian runs well on it. I have set it to "battery saving mode" in the UEFI settings, booted from a Debian live Xfce USB stick, partitioned its 512 GB disk on installation (using Debian's regular installer, not the GUI Calamares one) as follows: 1 GB for ESP, 1 GB for /boot, then LUKS with LVM on top: 80 GiB for the root file system, the rest for home; used ext4 this time, with noatime,nodiratime mount options (see SSDOptimization in Debian Wiki). Then have set a more suitable DPI (142 for a 16-inch screen with 1920 by 1200 resolution; see also: Arch Wiki HIDPI, Debian Wiki MonitorDPI, Arch Wiki LightDM HIDPI) and larger fonts (12) in Xfce settings, as well as xft-dpi=142 in /etc/lightdm/lightdm-gtk-greeter.conf, ran sudo dpkg-reconfigure console-setup to set a larger tty font size (DejaVu, 16x30), added /usr/bin/setxkbmap -option "ctrl:nocaps" into Xfce startup commands (another option is to set it via /etc/default/keyboard, e.g.: XKBOPTIONS="ctrl:nocaps,grp:shifts_toggle", XKBLAYOUT="us,us,ru", XKBVARIANT="colemak,,"), have set locale to C.UTF-8 in /etc/locale.conf, disabled the loud PC speaker with sudo rmmod pcspkr && echo 'blacklist pcspkr' | sudo tee /etc/modprobe.d/nobeep.conf, set additional DNS servers (74.82.42.42, 208.67.222.222, 8.8.8.8) for NetworkManager, installed a few Firefox extensions (uBlock Origin, noscript, FoxyProxy), configured input methods and touchpad behavior in Xfce settings, configured some of its panels, generated an SSH key with ssh-keygen, added it to the agent with ssh-add ~/.ssh/id_ed25519, disabled menu access keys in Xfce terminal preferences (so that it does not intercept shortcuts like M-f), removed some of the unnecessary packages, installed useful ones, configured a little more:

sudo apt install task-laptop task-english smartmontools dkms
sudo apt remove live-task-localisation live-task-localisation-desktop
sudo apt autoremove
sudo apt remove 'hunspell*'
sudo -e /etc/apt/sources.list # Add "contrib non-free non-free-firmware"
sudo apt update
sudo apt install systemd-timesyncd openssh-server rsync emacs mu4e isync git \
  elpa-{magit,haskell-mode,nov} ghc cabal-install texinfo mtr-tiny nftables \
  mpv vlc telnet xsltproc clementine lynx mutt irssi whois nmap ncat dnsutils \
  knot-dnsutils tmux fbreader inkscape gimp lmms musescore libxml2-utils \
  xkcdpass wireguard tinc tor obfs4proxy shadowsocks-libev kiwix kiwix-tools \
  autoconf autoconf-doc libtool pkgconf libexpat1-dev libgsasl-dev \
  libssl-dev libcurl4-openssl-dev build-essential dino-im goldendict \
  dict-freedict-{deu-eng,fra-eng,lat-eng,eng-rus,deu-rus,fra-rus,eng-deu}  \
  dict-gcide transmission pandoc audacity festival sox postgresql \
  nginx libnginx-mod-http-dav-ext aptitude jmtpfs emacs-common-non-dfsg \
  texlive texlive-plain-generic texlive-xetex texlive-lang-cyrillic \
  blueman sqlite3 libsqlite3-dev gcc gcc-doc glibc-doc-reference \
  python3-{sympy,scipy,numpy,matplotlib,psycopg,doc} \
  info guile-3.0 guile-3.0-doc oathtool iotop \
  opus-tools vorbis-tools flac cuetools wavpack ffmpeg
# ... darktable blender librecad freecad kicad evince
# prosody coturn uacme inspircd mumble-server mumble qemu-system icecast2
# libvirt-clients libvirt-daemon-system virtinst dnsmasq-base bridge-utils
# debian-reference-en debian-kernel-handbook linux-doc user-mode-linux-doc
sudo -e /etc/ssh/sshd_config # "PasswordAuthentication no"
# Disable some services: going to run them manually, as needed.
sudo systemctl disable --now tor tinc shadowsocks-libev nginx postgresql \
  bluetooth
killall pulseaudio # restart to load pulseaudio-module-bluetooth
# (for PipeWire, use libspa-0.2-bluetooth instead)
# Optionally, upload hardware information to the linux hardware database
sudo hw-probe --all --upload
# Enable battery conservation mode (remembered between boots),
# so it does not charge above 80%.
# Also can be done with TLP, STOP_CHARGE_THRESH_BAT0=1.
echo 1 | sudo tee /sys/bus/platform/drivers/ideapad_acpi/VPC2004:00/conservation_mode

Then it was left to copy personal files (dotfiles, documents, books, music, etc) onto it, and configure things further, but this is a basic initial setup.

One may try to select "Advanced install options", "Text installer", "Expert Install" in the installer, so that there will be an option to install a "normal" system instead of "live", but it still installs some of those "live-task" packages.

Though the hardware seems to work well generally, eventually I noticed an I/O error during reading from its SSD, reporting a timeout and a controller reset, similar to "Ubuntu 24.04 freezes with "nvme nvme0: I/O" timeout error" or "NVME timeout woes"; have not tried adding pcie_aspm=off or nvme_core.default_ps_max_latency_us=100 nvme_core.io_timeout=3000 into /etc/default/grub myself, yet, but probably will try it, if it will keep happening. Apparently those things happen on some laptops. While Wi-Fi, Bluetooth, touchpad, and SD card reader work smoothly.

I have also set Debian 13 and Windows 11 on another 16AHP10 laptop (not for myself), and surprisingly, while Wi-Fi worked on Debian out of the box, it required to manually install drivers on Windows.

E-readers

Kobo devices are supported by Koreader, among a few others. Apparently both Kobo and reMarkable are suitable for running Linux and custom software on them; see "E-ink is so Retropunk".

Power

Some devices, particularly infrequently used and mostly stationary ones, such as radio receivers, may be awkward to power: batteries may be wasteful for stationary ones, and inconvenient to keep in a usable state for infrequently used devices, while inbuilt cheap AC-to-DC converters are unreliable and occasionally humming, and mimicking batteries with external AC-to-DC converters is tricky (they tend to provide higher voltages than 1.5 V of common batteries, and connecting those snugly would be tricky). A good option is to pick devices relying on external AC-to-DC converters, as laptops, phones, and e-readers do: it is usable for both direct usage and battery recharging, and battery chargers can be very simple, not adding complexity; many devices use USB for DC power input these days.

By 2026, there are laptops supporting USB-C for power input.

Email

2016-06-28T12:00:00Z

Email

I quite like email: perhaps not so much because of its design or technical qualities, but because nice tools exist and there are plenty of users, so it can be used for communication easily. Though even the design is not bad: SMTP by itself is quite usable, OpenPGP is better than plain text messages (though could be much better, and there is criticizm), it is all open and federated. Some of the email criticizm goes as far as to propose to replace it with something, but without proposing any viable alternative, so it does not seem like the time to abolish it yet, and here are some email-related notes.

Server

Configure (and install if needed – though usually it's present, but barely used) Postfix or other MTA. There are guides around, it is pretty simple, and actually that's it: the rest builds around it.
To not look like a spammer to other servers:
- Set DKIM: DNS record and OpenDKIM
- Set SPF DNS record
- Set DMARC DNS record
- Set reverse DNS records
- Get into DNSWL
- If IPv6 is used, make sure that a /64 subnet is assigned (as per RFC6177)
To filter spam, set postscreen and regular Postfix settings (see Postfix Anti-UCE Cheat Sheet and rob0's postscreen(8) configuration; a local caching DNS server is useful to speed things up a bit). It works well to filter the spam, while spamassassin (via spamass-milt, for instance) may hog too much memory for a small VM, leading to OOM killer rage. Other options include bogofilter, which would require training, and Rspamd. Postgrey may also be used.
LE to obtain X.509 certificates for TLS. ACME clients are mostly poor, but uacme and certbot are fine after some tweaking (particularly setting them to run as a dedicated user, rather than root).
Dovecot or something else for IMAP, possibly for SMTP submission, and/or synchronization over SSH (optionally: as an alternative, one can read messages via ssh on a server, retrieve them into a local maildir with rsync, or just read and compose them on the server).
Optionally, set Web Key Directory, DANE (RFC 7929), or other OpenPGP key discovery method.

Dovecot can also be used for SASL (for both Dovecot and Postfix). See the private server setup and simpler server setup documentation for more precise instructions, and possibly the "user authentication" note for more options.

IPv6 and DNSBLs

DNSBL records appear for no apparent (or discoverable) reason in spamhaus's CSS blacklist (part of ZEN), /64 IPv6 subnets at once; delisting procedure is automated but complicated by Google captcha and partially broken (it reports success without actually delisting, and sometimes reports a captcha error even after solving the captcha, which is quite hard when using Tor). See also: Blacklisted by Spamhaus SBLCSS.

One way to mitigate it is to stick to IPv4: smtp_address_preference = ipv4 in /etc/postfix/main.cf. Another one is to get a /64 IPv6 subnet, assuming that they don't just blacklist subnets at random.

Being marked as spam

Gmail (and maybe other large email providers) would occasionally mark/hide messages coming from smaller servers (and/or just not from themselves) as spam, even with SPF, DKIM, whitelists, messages being sent/delivered from them to you first. Not much can be done about it: once a mail server accepts a message, it is its responsibility to deliver it. Large commercial companies just keep messing up interoperability, as they always do.

Spam that gets through

Not much spam gets through with just configured Postfix and postscreen, but when it does, it should be possible to report the abuse to its ISP. Though spam from those who accept such reports and resolve the issues is unlikely, and as a last resort there are client_checks (or a firewall) to reject messages from spammy IP addresses or subnets. But one should be careful with that, since it is rather frustrating (and all too common) when you're a good actor being treated as a bad one.

Dealing with spam coming form large providers is about as tricky as sending messages to them: they deliver spam just as regular messages, don't get blacklisted by honeypots automatically, and you proboably don't want to blacklist them manually because of all the legitimate users. Yet Gmail's abuse report form seems to be broken (simply nothing happens when I hit "submit": no network requests or UI changes, even with JS enabled), and their support is infamousely unreachable even by their own users. Then there's IP address's abuse contact (ripe-contact@google.com), but since they are their own hoster, it's probably also broken (as with the web UI, there's no visible reaction, not even automated; though at least there's a possibility of it working).

For more on incoming spam, see my network abuse notes.

Port 25 redirection

Residential ISPs tend to block incoming SMTP connections, which is supposed to stop spam somehow, but if it was not for that, an IP address without NAT (and preferably static) would be sufficient at least to receive email directly, without a remote server. To get around that, there are services for port redirection, though I have not tried any, and they seem to be odd and/or to cost about as much as a remote VM (similarly to paid email).

Client

Both notmuch and mu4e use xapian, which provides fast search. It is also nice to compose and read messages in Emacs (unless you are a vi user, perhaps), so I target those.

Some prefer mutt, which has a simpler configuration, and less modular, more self-contained. But its default key bindings are based on those of Vim, QWERTY-oriented, which is awkward if you use a different keyboard layout. Thunderbird is quite bloated, but perhaps more suitable for casual users, including mail services that require OAuth. It also supports OpenPGP now, but Maildir is not quite supported. Evolution looks similar to Thunderbird. Claws Mail looked odd and half-baked all around to me each time I tried it over the years, but it is a relatively lightweight GUI client, supporting OpenPGP and Maildir, but not OAuth, being similar in that to most other lightweight clients. But I focus on simpler Emacs clients (such as mu4e) in the following sections.

Option 1: IMAP + SMTP

mbsync can be used to retrieve messages via IMAP, and Postfix can also be set locally to get more flexibility and better SASL options than emacs smtpmail library provides (see the user authentication note).

Option 2: SSH

SSH-only setup allows to use just SSH keys, with no SMTP or IMAP between client and server. Messages can be sent with a remote sendmail, while a remote Maildir can be accessed via sshfs, or messages can be retrieved with, for instance, doveadm sync. An example with relevant mu4e context variables:

(message-send-mail-function   . message-send-mail-with-sendmail)
(sendmail-program             . "/home/defanor/bin/example-sendmail.sh")
(mu4e-get-mail-command        .
,(concat "doveadm sync sh -c "
"\"SSH_AUTH_SOCK=$SSH_AUTH_SOCK ssh mail.example.com doveadm dsync-server\""))

And example-sendmail.sh:

#!/bin/sh
ssh mail.example.com /usr/sbin/sendmail "$@"

Though an issue with this method of synchronisation (as described here, without additional customisations) is that messages removed from mu4e would be reloaded by doveadm sync, and one would have to use doveadm search and doveadm expunge instead, or switch to IMAP for cleanup. Or use the sshfs method.

Another caveat is that even setting the remote sendmail script as sendmail in $PATH won't necessarily make all the programs to use it: for instance, git would still require to set it explicitly (in .gitconfig or as a command-line argument), as "smtp-server":

[sendemail]
    smtpServer = /home/defanor/bin/example-sendmail.sh

Later it turned out to be handy to set mail sending this way, while retrieving it via IMAP, when a public provider (Yandex) that I used for work email with my own domain, to avoid dependencies on a personal server, decided to charge for using a custom domain and disabled SMTP. That way, the work server does not have to accept SMTP connections still, and it was already configured to send mail notifications from local clients (for both a website hosted there and munin), while incoming mail is handled as it used to be.

OpenPGP

GnuPG can be used with mu4e (and perhaps most of the other common Emacs MUAs) out of the box, does not require any special setup.

mu4e with git

While git-send-email(1) bypasses mu4e, receiving patches still requires to point git (or another DVCS) to a message that is normally first seen in one's MUA. I find it handy to define a custom mu4e message action that simply does (kill-new (mu4e-message-field msg :path)), so that the result can then be fed into git-am(1).

MIME part detachment

Sometimes people attach large files (particularly high-resolution images of their pets) to messages, which quickly inflate the total mail archive size, complicating their backups and migrations. When the messages also contain texts, it is undesirable to remove the correspondence, but some MUAs can remove individual parts. Particularly mutt can: backup the maildir, save the attachments separately if needed, mutt -f maildir-path, then open a message, v to view attachments, select an image, d to delete it. Then one may have to rebuild indexes, synchronize messages.

Etiquette

While there are different views and advices on email etiquette, relatively common ones are to use plain text, to properly quote relevant parts of messages when needed, to avoid bloating messages with signatures, and of course to adhere to general writing practices. Or, in other words, to be considerate and make minimal assumptions about readers' MUAs. RFC 1855 (Netiquette Guidelines) is worth reading.

Public providers

With seemingly decent email providers (e.g., fastmail.com (banned in Russia), migadu.com), accounts cost like a hosted VM (VPS, VDS, or whatever they are called this year) or more, so it may be desirable to get a remote VM at once. Although there are slightly cheaper (or even partially free) ones as well: mailbox.org (blocked in Russia), runbox.com, mailfence.com (also blocked in Russia), posteo.de, maybe mailo.com (blocked in Russia). As for free ones, there is a few seemingly fine options, though usually they don't seem that nice after an attempt to use them; the ones commonly advertised as secure and/or ethical tend to not even provide SMTP and/or IMAP, not to mention SSH. Domain registrars tend to provide email services, though the quality varies. And there are ones like sdf.org and other pubnixes, including tildeverse ones, financed primarily with donations. Also disroot.org, dismail.de (no new registrations since 2021-05-28 though), riseup.net (rather politicized, blocked in Russia).

In 2024, I registered at Microsoft's hotmail.com, but my account (which only received one confirmation message from OpenStreetMap) was locked in a couple of days, with Microsoft claiming that it violated an unidentified part of the agreement, and that they need my phone number in order to resolve it. Apparently people who provide their phone numbers are unexpectedly locked out of Microsoft accounts as well, and there are regular stories like that about Google's Gmail, too. Though it also looks like many people do manage to use those larger services. As mentioned above, I ran into an unpleasant change of service terms with Yandex as well. Also receiving Gmail spam, reporting it, but spam from the same addresses keeps coming afterwards. Interaction with those larger commercial IT companies is generally a bad experience.

On reliability

My primary concern with using private email for everything has been that regarding reliability, which is actually broader than just email. And if it is set on a single machine that you also use for everything else, that is a single point of failure for many things.

There are potential issues with public services as well: the companies that maintain those can go out of business, usually can do whatever they want with user accounts and data (commonly selling the data, messing up authentication and blocking accounts for strange reasons, with no way to contact customer support, sometimes mangling messages, restricting access to accounts until you provide more of personal data after a policy change), with the services they provide (including turning unlimited plans into limited ones, free into paid, cheap into more expensive), etc. Even technical issues with larger services may be equally or more common: though they have dedicated staff, larger setups tend to be considerably more complex and unusual, hence less reliable.

But private ones require regular payments and maintenance. It is not much harder than maintaining your personal machine, and usually cheaper than paying for an internet connection, electricity, and so on, but it is an additional burden. Very small one, but collecting things like that is always unpleasant: there is no shortage of other ways to get into trouble simply by staying idle.

Using 2-3 servers instead of one and teaming up with others (for both payments and maintenance) may be helpful to mitigate those issues, but that requires some trust. That is a hard part, since not many people seem to care about service providers, control, etc. Maybe it is a good approach though: worrying about all the small things and possibilities may be too much, whether one uses a private or a public service.

It is particularly unfortunate when other online services depend on email, allowing email-based account recovery: that way, the loss of an email address compromises those accounts as well. Sometimes it is possible to set a two-factor authentication, with the second factor being something relatively sensible, like TOTP, effectively disabling email-based account recovery that way, since that usually only allows to reset the password.

Music studies

2022-03-13T09:00:00Z

Music studies

While music itself is pleasant to listen to, the theory behind it, along with maths for processing or synthesizing it, as well as the process of performing it, can be quite fun.

Music theory

Music theory for nerds is a great starting point. "What Makes Music Sound Good?" is another overview and introduction, though perhaps more opinionated.

Some of the related and interesting research areas are those of music origin and purpose, such as evolutionary musicology, and how it's perceived by humans: psychoacoustics, music psychology, music and emotion.

The Ask HN: Tools to learn music theory? discussion contains a few more relevant links.

Open Music Theory looks like a nice textbook.

As with computing and maths, it is useful to study history of the subject as well, so that more of it will make sense, and it will be easier to put into a perspective. Videos on history of music can be found on YouTube, as well as on PeerTube, where some of the PianoTV videos are available.

Generation and processing

The PCM format is to audio basically what netpbm/PPM/PGM/PBM/PNM is to graphics: very simple and straightforward, can be played with ffplay and others, easy to generate programmatically and write into a file without any encoder libraries, as well as to read without a special decoder. Audio I/O libraries (e.g., PortAudio) and codec libraries (e.g., libopus) tend to work with it.

DCT/DFT are often involved in processing (and in compression, also similarly to graphics), Mel-frequency cepstrum can be useful and/or interesting to look into.

Analysis

Audacity is handy for checking the spectrum and notes in it, for music transcription and other checks.

MIDI keyboard

To practice playing piano using a MIDI keyboard, one needs at least a software synthesizer and some music scores.

The keyboard in this case is M-Audio Keystation 88 MK3, which worked easily with Linux (5.10, Debian), Windows 10, and an Android tablet (Samsung Galaxy Tab A8, connected with a USB-A-to-USB-C adapter). For a synthesizer, I've used Yoshimi on Linux, LMMS (mostly with its sf2/soundfont plugin) on Linux and Windows, and Synthesia (not in F-Droid repositories, and I don't have a Google account, but grabbed an apk from their website) on Android.

MuseScore allows to compose sheet music and export it into MIDI rather quickly and easily, and there are more editors and converters of that kind available from Debian repositories.

PianoBooster looks like a nice trainer, akin to GNU Typist, but I found it quite annoying that it counts it as a mistake if you press a key too soon, so switched back to just reading scores and playing from those.

I use my computer screen to read sheet music, with the keyboard stand placed behind my computer chair, so it has to be zoomed in (Xfce's zooming in is quite handy when software can't zoom in on its own), and scrolling is needed for larger compositions, but the regular computer keyboard and mouse are out of reach. The MIDI keyboard has directional keys, messages from which come from a separate MIDI port; I haven't found readily available software (possibly LMMS plugins) helping to scroll the notes from a MIDI keyboard, but it took just a small script to achieve:

import mido
from xdo import xdo

# apt install python3-mido python3-xdo libportmidi-dev python3-rtmidi

# perhaps can be done in bash, with something like amidi + xdotool

# https://gitlab.com/dkg/python-xdo/-/blob/main/xdo/__init__.py
# https://gitlab.com/cunidev/gestures/-/wikis/xdotool-list-of-key-codes

# print(mido.get_input_names())

mapping = {
    96: 'Page_Up',
    97: 'Page_Down',
    98: 'Left',
    99: 'Right',
    100: 'space'
}

x = xdo()

with mido.open_input('Keystation 88 MK3:Keystation 88 MK3 MIDI 2 24:1') as port:
    for msg in port:
        # print(msg)
        if msg.note in mapping and msg.velocity == 127:
            x.send_keysequence_window(mapping[msg.note])

Fingering Scales on the Piano is a handy outline.

Sheet music

IMSLP.org is a nice source of public domain or otherwise freely available scores (including solo piano arrangements). Additionally, there are MIDI music collections around, which are lightweight, but encode melodies, which can then be viewed as scores (e.g., with MuseScore 2). Musopen also provides sheet music, as well as recordings of classical music.

Composition

Music composition seems to be rather similar to poetry, and to arts in general: a creative process, but one can reuse a musical form, learn and use a variety of approaches and tricks (by analyzing existing works, in addition to just reading about techniques), experiment and try things out.

Music appreciation seems useful to study as well; "Inside the Score" is one of the YouTube channels focusing on that.

Ryan Leach on YouTube makes nice videos explaining the composition process. David Bennett Piano brings up plenty of interesting subjects and analyzes songs.

On singing

While I don't sing, a brief look into it suggests that as with most of other skills, it's primarily about learning and practicing, exercising.

Yet even without singing, it is interesting to learn about vocal registers and related topics.

Possibly it is a wrong way to learn and practice, but I found it fun to set a tuner program on a phone (e.g., Tuner from F-Droid repositories), and try hitting notes.

Ear training

For Android, there is the Open Ear program, available from Android: seems to be a little buggy (making noises), but allows to practice recognition of scale degrees.

The musictheory.net website also provides exercises, including those for ear training. A similar one to train playing from memory is lend-me-your-ears.specr.net.

And I composed a shell script using SoX for practice of identification of scale degrees and intervals.

Motivation

Sometimes I find myself questioning the usefulness of these amateur music studies, particularly of playing instruments (while the theory and composition may conceivably be applied somehow), but it helps to view it as a recreational activity, quite similar to a game: the process itself should be enjoyable.