11

Brief

Q: How can I cryptographically secure a credentials file that is stored on disk as plaintext?

Or, rather: how can I avoid storing credentials like those for Gmail and other API keys on disk? For existing programs that assume such an unencrypted file containing secrets.

I ask this question motivated by wanting to access Gmail using gmi/lieer and notmuch - which AFAICT use an unencrypted credentials file on disk. But there are lots of other programs that require similar credentials files.

Surely there must already be a generic solution to this problem? Something like ssh-agent, that asks the user for a passphrase and then decrypts the secrets into memory for some time. But not necessarily as fancy as ssh-agent... the agent doesn't need to do all of the crypto operations, which might differ by application or API or protocol. IMHO just decrypting the credentials file into memory would be of value.

TL;DR - You might be able to stop here without reading the rest


Some people will understand what I'm asking for from the above BRIEF section.

Others, probably not.

Surely there must be a generic solution to this problem?

Surely there is something like ssh-agent that reads such secrets from an encrypted file, asks the user for password (or better), decrypts the secrets, and keeps them only in memory for some time, so that you don't constantly have to reenter the password/etc?

Doesn't have to go quite as far as ssh-agent, where the agent does all or most of the cryptographic operations - and hence the protocol between ssh client and ssh-agent is not just "give me the credential", but must also describe the operations to be performed. Since there are lots of different protocols that have lots of different credentials with lots of slightly different operations, there may be an obstacle to creating a custom agent for each right away. But simply having a persistent agent ask the user and then decrypt credentials from disk into memory would be an improvement over nothing at all.

Surely this has already been done, in a manner that can work with lots of different apps XYZ?

But I certainly don't know of anything like this. Nor, for that matter, do any AI assistants that I have tried - although it might be a question of me not phrasing the LLM prompt or Google search correctly.

For that matter, ChatGPT suggested that I do the following:

  • encrypt the credentials file on disk
  • when I want to use it
    • temporarily create an unencrypted credentials file - on disk
    • let the client program like gmi/lieer access the unencrypted credential file while it is running
    • and when I no longer am running the client, delete the unencrypted temporary credentials file

I hope I don't need to explain how unsatisfactory this is.

Could this be done using UNIX domain sockets or FUSE? Has it been done already?

If I knew that the client application was always reading or replacing the entire credential file, I could imagine having an XYZ-agent write the unencrypted secrets to the socket all at once. or if I do not know the access pattern, e.g. if the secret is large enough that seeks a random-access are performed, I could imagine that a user domain filesystem like FUSE could be used.

Q: has anyone created such a generic "decrypt secrets into memory, so it looks like an unencrypted credentials file to software that cannot handle an encrypted credentials file?".

  • Using UNIX domain sockets
  • or FUSE
  • or whatever

Even better if such as change to the namespace were limited to a partent process and its children, such as you might be able to do in OSes like Plan9 or Brazil, although AFAIK existing UNIXes like Linux do not make this easy to do.

Details

As is my wont, I provide way too much background detail for my question. For many people reasonably knowledgeable about security this much detail should not be necessary. But sometimes it may not be clear exactly what I am talking about. Sometimes I may be using incorrect terms. And so on.

Hence, I provide all this extra detail hoping to short-circuit misunderstandings.

If you truly know of an answer to my question, you can probably stop without reading all the rest.

Heck, I might as well admit it: I'm trying to short-circuit stupid nonanswers to my question. But previous attempts to do this I'm not always been successful.

Motivating Example: gmi/lieer access to gmail uses an unencrypted credential file

E.g. lieer, a program to synchronize gmail with local storage, stores an unencrypted credentials file for Gmail in the filesystem. This file, .credentials.gmailieer.json, is completely unencrypted ordinary plaintext.

Excerpting: https://github.com/gauteh/lieer?tab=readme-ov-file#usage

gmi init will now open your browser and request limited access to your e-mail. … The access token is stored in .credentials.gmailieer.json in the local mail repository. If you wish, you can specify your own api key that should be used.

Of course file system permissions should make it accessible only by my UNIX login id. It is used by the gmi/lieer program to access my gmail account. But unless I am totally missing something, any program running as me can access this file. E.g. one of the umpteen sandbox escapes in web browsers might allow it to access this file. Or I might have filesystem permissions set incorrectly. Or I might have misconfigured filesystem/disk encryption, and other user IDs on my machine may be able to access it. Etc.

I thought that it was standard/best practice for security that plaintext secrets should never be stored on disk. I have long been somewhat surprised by how many software systems require credentials like API keys to be stored on disk. I have usually avoided using such systems, although it gets in the way of doing things like Google API development that require such API keys. Or I might use such systems or work, but resist using them for stuff that is personal. However, I really do want to use such systems for personal stuff. Not just personal software development, but for gmi/lieer access my Gmail account, which is about as personal as I can get, much more sensitive to me than a GitHub project.

This is not just an issue with gmi/lieer. Many programs, many software systems, require you to store credentials like API keys on disk. I don't think I've encountered any of them that keep them encrypted on disk.

Except, of course, for ssh/ssh-agent and gpg/gpg-agent, where the credential files are protected, not only protected by file system permissions, but also by a passphrase, and are decrypted only within the ssh-agent's process memory.

ssh-agent => no plaintext credentials on disk

Except, of course, for ssh/ssh-agent and gpg/gpg-agent.

  • Where the key files are protected
    • not only protected by file system permissions, but are also encrypted by a passphrase.
      • When you load an ID into ssh-agent
        • it asks you for the passphrase,
        • reads the encrypted key file(s),
        • and decrypts them into it's process memory.
        • ssh-agent is persistent, so you only have to do this once in a while
      • ssh, if configured appropriately,
        • won't be able to run without asking ssh-agent to "do stuff".
        • communicates yo ssh-agent via a UNIX domain socket
        • ssh-agent actually does all or most of the public key computations
        • => ssg itself does not have the private keys

I thought that it was standard/best practice for security that

  1. Plaintext secrets should never be stored on disk

    • with the possible exception of swap files,
      • but that should be a solved problem
    • so even if someboday can access the raw data on disk you should be safe
      • e.g. if you don't have full disk or filesystem encryption
      • and the disk drive is accessed outside of its "home" OS
  2. Plaintext private keys may be stored in ssh-agent's process memory

    • not in the ssh client program
    • and should not be accessible by any other programs,
      • even running as the same user in the same machine
      • possibly also not by more privileged users like root or admin
        • with the possible exception of debuggers
        • but that also should be a solved problem (although not so much in my experience)

Going beyond ssh-agent…

Skip this section, It isn't really necessary for my question, except it helps me organize the issues in my mind. Also, if somebody can tell me that these items (3) and (4) below are in much wider use and I currently know, I'd love to hear about it.

Items (1) and (2) are, AFAIK, the state of the art, or at least practice. But they leave some hardware/logic analyzer security holes vulnerable, which have been addressed by certain academic and industry projects, but which as far as I know are much less common:

  1. In most present-day systems plaintext secrets may be stored in DRAM

    • unless the programmer has been very careful to keep them only in registers
      • and has control of context switches that might save the registers to memory
    • but various hardware memory encryption proposals and products prevent even this from happening
      • e.g. data may be stored unencrypted in cache, but may be encrypted between cache and DRAM.
  2. and similarly various proposals and products ensure that all of the traffic on buses and connections etc. where you could attach a logic analyzer are encrypted.

  • 2.5: I'm actually a little bit uncomfortable that the ssh client/agent communication is done via a UNIX domain socket
    • AFAIK any process running with the appropriate user ID can access that socket, and can get the ssh-agent to do stuff
    • AFAIK the UNIX domain socket is protected only by filesystem permissions
      • AFAIK the ssh-agent and ssh program do not talk via an encrypted channel
    • Although the fact that the UNIX domain socket can be somewhat random reduces exposure. And I know that some operating systems - not standard LINUX, AFAIK - allow permissions to be restricted not just by user ID but also by executable ID, or position in the process tree.
    • You can of course use JNIX user IDs to accomplish this, but as far as I know this is not commonly done.

Example: plaintext file containing gmail credentials for gmi/lieer credential

gmi/lieer, a program to synchronize gmail with local storage, stores an unencrypted credentials file for Gmail in the filesystem

Excerpting: https://github.com/gauteh/lieer?tab=readme-ov-file#usage

gmi init will now open your browser and request limited access to your e-mail. … The access token is stored in .credentials.gmailieer.json in the local mail repository. If you wish, you can specify your own api key that should be used.

The credential lieer stores looks like the below. I hope that I have edited out anything that is sensitive. "Gibberish" is of course what looks like random letters and numbers with occasional punctuation, the sort of stuff one associates with a credential.

{"access_token": "xyzzy ~200 bytes of gibberish",
 "client_id": "~40 bytes of gibberish.apps.googleusercontent.com",
 "client_secret": "~20 bytes of gibberish",
 "refresh_token": "~100 bytes of gibberish",
 "token_expiry": "2024-09-15T07:16:24Z",
 "token_uri": "https://accounts.google.com/o/oauth2/token",
 "user_agent": "Lieer",
 "revoke_uri": "https://oauth2.googleapis.com/revoke",
 "id_token": null,
 "id_token_jwt": null,
 "token_response": {"access_token": "~200 bytes of gibberish",
 "expires_in": 3599,
 "scope": "https://www.googleapis.com/auth/gmail.readonly https://www.googleapis.com/auth/gmail.modify https://www.googleapis.com/auth/gmail.labels",
 "token_type": "Bearer"},
 "scopes": ["https://www.googleapis.com/auth/gmail.labels",
 "https://www.googleapis.com/auth/gmail.readonly",
 "https://www.googleapis.com/auth/gmail.modify"],
 "token_info_uri": "https://oauth2.googleapis.com/tokeninfo",
 "invalid": false,
 "_class": "OAuth2Credentials",
 "_module": "oauth2client.client"
}

This is completely plaintext, although of course it is accessible only by my Linux user id.

It is used by the gmi program to authenticate to gmail. If not present, I cannot acces my gmail. I don't get asked for my password, etc.

Unless I am missing something, this credential could allow almost any program that can read this file to access my Gmail. this concerns me. It's not just gmi/lieer -- many programs. I'm not going to bother listing more examples. But just googling API KEY should yield a lot of them.

Is it just obsolete legacy software?

Possibly, but IMHO not completely.

E.g. the gmi/lieer source code and/or documentation indicates that it using an old Gmail API, and should be upgraded. Possibly more recent APIs solve this problem - but not as far as I can tell. Possibly there is already a generic OpenAuth-agent - but not that I can find. AFAICT Google really prefers to keep the OpenAuth stuff in its own libraries, used by Google Chrome and other web browsers, and has not really done much to support command line or other non-browser utilities. They would really prefer that you did not use such utlities, unless Google wrote them. They only grudgingly support such utlities, allowing you to obtain API KEYs, etc. If there are security holes caused by storing such credentials unencrypted on disk, they will just use that as more evidence to justify locking things down, and locking other software out.

Anyway: If there is a generic OpenAuth-agent (probably not called that) - I would love to hear about it.

But anyway, furthermore: even if there is a generic OpenAuth-agent, there are a lot of existing programs that assume an unencrypted creditials file on disk. There would be value in having a generic solution fir these, until they can be upgraded. Assuming they can be.

10
  • I emphasize that this is a generic issue, not just gmi/lieer related. It would be nice to have a solution that worked for any program that expects an unencrypted credential file. But a gmi/lieer solution would be better than nothing. Where would I post such a question? unix.stackexchange.com has several lieer threads, and emacs.stackexchange.com. But none is about credentials. AFAIK crossposting is discouraged. For that matter, OpenAuth might also have a Not completely generic solution. And of course this is a security issue. Commented Oct 20, 2024 at 23:45
  • 1
    Could you state that as a question? Could you create an encrypted filesystem to store your credentials, and control access to which processes can mount the decrypting overlay? Commented Oct 21, 2024 at 1:00
  • I think this should be quite straightforward to implement generically with a named pipe? A named pipe can usually be provided in all places where a process expects a file. And you can code whatever logic you want when anything reads from the named pipe. One possible solution would be a resident process , which can be accessed via a command to provide the decryption password. If it is present then a read from the named pipe will decrypt the data on the fly, if not it will just error out with a message "please provide password before accessing this file" Commented Oct 21, 2024 at 8:45
  • I don't know if a unix process could also block on the read and prompt for a password. It would need to put the currently running process (gmail) in the background and capture STDIN to read the password and after that put the original process (gmail) back into the foreground. Commented Oct 21, 2024 at 8:47
  • 4
    What is your threat model? I.e., what attacks are you trying to defend against? Someone stealing the disk drives and reading your password? Someone compromising a process on your server and reading your password? Something else? Questions on security measures really can't be answered without understanding the relevant threats. Commented Oct 22, 2024 at 12:11

4 Answers 4

8

You've gone down a lot of possible paths on this and done a lot of investigation of the problem space; well down (plus vote on the question).

But I think you've got the answer; there is no generic solution. Tools like ssh-agent (or gpg-agent) require the software to be written to work with them. If the software is written to just read a file of disk then they won't ever talk to those agents.

And realise these agents aren't secure in a multi-user environment; root on your machine can access the credentials stored in your agent.

Other tools (eg rclone) can encrypt their config files and require you to enter a password to use them.

Even on Apple devices, applications need to be written to talk to the keychain to securely store credentials.

So can we kludge this? It depends on your paranoia levels (and the software).

For example, you could create an encrypted filesystem. It doesn't need to be a partition; it could be a file you create in the existing filesystem and mount. See luks and related tools. This will keep your data secure from someone stealing your disk because it requires you to enter your password to unencrypt the disk. Then you can potentially symlink or bindmount the config files into this filesystem.

As before. this won't protect you from root users or malware that can read your files 'cos the file is readable once mounted. But it does protect from theft (eg lost laptop). Although you might want to consider encrypting the whole filesystem if that's your concern. If everything is encrypted then your config files also are!

If you want to require a reauthentication event every so often ("oh it's been an hour since I last entered my password") then things get harder. I don't know of any 'out of box' solution. I would consider writing a FUSE filesystem that can be mounted and would require a password prompt once an hour to continue to decrypt.

0

Encrypted filesystem and appropriate filesystem access control measures. Unless you have an application written to keep the credentials secure across the entire path, you can't get better than that without impacting the entire system.

An encrypted filesystem ensures that the data is encrypted at rest, and that the data is only accessible to trusted OS functionality. We can't get better than this. Because of swap, any data that isn't specifically flagged to be kept in memory, can leak to disk.

Appropriate file access control mechanisms, whether that is in the form of ACLs, dedicated users, or some other mechanism, can ensure that only the relevant programs (and root) are allowed to access the files from the OS. If you assume the OS and root are trusted, which are necessary prerequisites, then this will protect you from any other source having access to that data.

5
  • One of the most common vulnerabilities is to code running as the user themself - e.g. a web browser running as the user, within a sandbox, but where there has been a sandbox escape that allows read access to files outside the sandbox. Commented Nov 6, 2024 at 0:25
  • @KrazyGlew At which point, from the standard Unix security model, its game over. The OS does not protect against processes running within the same user from intentionally reading each others memory. And the credentials are going to exist in an unencrypted form in memory, and with a high likelihood on disk. The traditional unix answer to that threat is to run your entire sensitive application under a different user from the high risk ones. Commented Nov 6, 2024 at 2:05
  • What you say it reminds me of a computer architect who once said "Why do we care about buffer overflows and stack smashing for ordinary user programs? The code only runs as the user that was broken into." Does your web browser run as a setuid process, or does it run as you? (Probably you.) Do you want JavaScript code that has escaped the user mode sandbox in your web browser to be able to read arbitrary files on your file system and exfiltrate them? Commented Nov 6, 2024 at 4:20
  • @KrazyGlew: consider ssh/ssh-agent: encrypting private keys in ~/.ssh doesn't protect against root with a debugger. Heck, I see that ssh-agent on my system runs as me, so unless there's something else that prevents me from reading that processes memory, it's insecure by your lights. Nevertheless, it's worth encrypting. ssh-agent now uses mlock/mlockall, but the early versions did not.did not. Nevertheless, ssh-agent without mlock* was better than nothing, and with mlock* is better still. Commented Nov 6, 2024 at 4:38
  • I'm saying the real answer is to not run the browser as you. Run it as a dedicated user. So when it does escape? Its limited to only accessing stuff in the browser. For ssh-agent, its specifically written to minimise the amount of time that stuff ves unencrypted in memory, and use mlock to keep it there. Minimising the window. Something like gmi? It isn't, you have to assume it will be sitting unencrypted for the lifetime of the application, spilling to unallocated heap and swap, and that it will be recoverable. Commented Nov 6, 2024 at 11:09
0

I thought of this approach after I posted my question, and I will post it here as answer not because I think it's the right or best answer - although it may be - but because I would appreciate comments, thoughts as to whether it affords any security advantages, or problems.

possible approach: xxx-agent spawns client(s) with file descriptors

Start off much like ssh-agent or gov-agent:

C

  • Credentials file encrypted on disk.
  • Persistent agent - I'll call it xyz-agent.
  • User asks agent to load encrypted file, provides passphrase
  • Decrypted in memory. Possibly only briefly.

Here's where it differs from ssh-agent:

Rather than starting the client howsoever you may want, and having the client connect to the agent by a channel that is explicitly visible as something in the filesystem, eg a UNIX domain socket or a named pipe...

Instead let the user who started the persistent agent ask the agent[*] to start an instance of the client.

The agent starts the client instance with a file descriptor across which the unencrypted data can be communicated.

The file for the unencrypted data NOT being present in the filesystem.

My first thought was that the file descriptor could be something like stdout from the agent piped to stdin on the client. Not necessarily fildes 0 and 1 -- the streaming nature of pipes being the main thing.

  • Pipe file descriptors are of course not necessarily visible in the filesystem
  • Nor are they easily accessible by other processes
  • Except of course via /dev/stdin... but that's process private
  • or by /proc/PID/fd/0, etc - which is not process private, but which is normally configured to be readable only with global privilege like root, or possibly by the uid/euid of the running process

Of course, in the UNIX programming model it is typically impossible to secure against root. But if /proc/pid/fd is not accessible by the user, then you have secured against other user processes. Somewhat. Depending on other aspects, like how the user asks the agent to spawn a new client instance.

Of course, many clients require a file pathname for the unencrypted credentials file. While coding such clients to use a filedescriptor is possible and reasonably generic - it could work with muktiole clients x1-client, x2-client, etc, and would not require the agent to know the nature of the client or protocols involved - it doesn't help with completely legacy clients that still require a file/path NAME but not a file descriptor.

/proc/stdin etc come to our assistance here - if the client requires a pathname, but allows it to be specified rather than hardwired. Or if it is hardwired, eg in a place like the current working directory, and if a symlink to /proc/stdin etc works. That should probably work, unless the app expects to be able to unlink the file/pathname it is provided.

When I first started thinking about this I was worried that the file descriptor would have to be treated as a stream or pipe. Which would break if the client expects to seek.

Then I started thinking about whether sockets can be passed like this. TBD: I should look this up when I have time.

But I subsequently realized that the agent could create an actual file on disk, open file descriptors for both read and write, and then immediately unlink the file so that it no longer appears in the filesystem. Vanilla UNIX systems keep the file storage around even after unlinking, as long as there are file descriptors around. (Even before /dev/stdin.) This breaks in some slightly non-vanilla UNIX-like systems, but you can't keep everyone happy.

I was worried about the brief window of vulnerability after the client has created the file in the filesystem, before it is unlinked. A bad guy could conceivably open such a file themself, and keep the file descriptors around.

But then I remembered that this is a well known race condition for temporary files. Surely there should be a syscall to atomically create a temporary file that does not exist in the filesystem. fd=mmstmp(pathname) doesn't quite work - it addresses the race condition where the bad guy has write access to the directory containing the file - but that may be good enough for in-government work.

And perhaps a syscall has been created that atomically creates a temporary file without being present in the filesystem.

How does the user ask the agent to spawn a client instance

Must be done carefully, but looks doable.

One approach might be for the agent to act as a shell in which the user types commands. This naturally restricts access to the child process subtree. I think that works, but I dislike the complexity of implementing a shell. Q: can a generic agent spawn a generic shell so that a generic user can spawn a generic command that talks back to the generic agent? (Without going through the fikesystem, using the same sort of filedescriptor trick

Or, the user (ie a user process that is not necessarily a child of the agent) could send requests to the agent via the usual UNIX domain socket. Visible in the filesystem. I suspect this is not much better than having the xyz-client app talk to the cyz-agent through sockets visible in the filesystem. Slightly better - at least the plaintext data stays out of the visibly named filesystem.

summing up: ssh-agent topology vs this xyz-agent concept

Skipping past how ssh-agent actually does the encryption work, whereas the xyz-agent passes decrypted data. That's orthogonal: the xyz-agent approach could do the crypto work just like ssh-agent. It might actually be more secure. That's just not the problem I set out to address.

Both approaches have encrypted data in disk, decrypted in agent memory.

The main difference is that the persistent ssh-agent approach only needs the agent to respond to ssh-client requests, and user requests to load or unload credentials once running.

Whereas the xyz-agent approach needs the agent to manage one more channel: (1) requests to load/unload credentials; (2) requests to spawn new xyz-client instances; and (3) the actual communication with the spawned client, which may be restricted to child process subtree, depending.

Arguably a bit more complicated. But arguably a bit more secure than ssh-agent. And certainly more secure than having unencrypted credentials in the filesystem at rest.

conclusion

I haven't tried coding it up yet. (I'm typing this on my phone in a hospital waiting room, and won't have time to code it up for a few days.)

But I would appreciate comments - can you see any problems with this approach? If fundamentally broken, save me wasting time coding it up.

14
  • Two key flaws: Seeking, and swap. If the application requires the ability to seek, that will error on a pipe. And if the application loads the credential into memory, and hasn't been implemented to protect against the threat environment that would justify this effort, than its likely just going to leak the credential to the swap file and thus disk at some point. Commented Oct 22, 2024 at 12:27
  • Just because something is in application memory doesn't mean it won't end up on disk. An application must be specifically written to specify to the kernel that a particular chunk of data must not touch disk. Commented Oct 22, 2024 at 12:29
  • Unless I missed it, you don't have the use case where a program may update a file (eg refreshing oauth tokens). Many programs don't overwrite the existing file; they create a new file, verify it's written correctly, then move the new file over the old. At this point the unprotected file is now on the file system. So programs would need to written explicitly to use your method of "secrets access"... and at this point it sounds like you're re-inventing keyrings. Commented Oct 23, 2024 at 9:18
  • Thanks, user1937198 ans Stephen_Harris for relevant comments. As opposed to blindly parroting stuff people learned in security 101 about threat models. Commented Nov 5, 2024 at 23:55
  • 1
    Even if mlockall survived exec, it is unreasonable to expect arbitrary code to never require any paging services. Actually, IMHO mlock and mlockall are shortsighted: user specific encrypted swapping would be more general. But of course harder to implement. Commented Nov 6, 2024 at 1:01
-1

The question is so long. As a security question you have to say what you are trying to protect from. I'm assuming you are trying to protect from someone who may physically steal your computer, but will not return it in secretly backdoored state. You are not asking about hackers and malwares, because that would be the wrong place to ask. And you physically own the machine or you trust its physical owner, because otherwise you shouldn't expect there is a solution.

What you have exactly described is tmpfs. It's automatically mounted in /dev/shm, and sometimes also /dev/tmp. You could create more by:

mount -t tmpfs not_used_could_be_anything path_to_existing_empty_directory

It causes everything written into the directory later only stored in the memory (and also swap, unfortunately). So you could decrypt the credentials into it. You could link it using ln -s if the file has to be in a specific location that has other files.

There are also solutions that also included the encryption and decryption part:

  • encrypts a file system.
  • encrypts a directory.
  • fscrypt on ext4, the builtin encryption of a file system.

It's recommended to also encrypt the swap partition or swap file using dm-crypt, while using any of the solutions. It's only painful if you do this, and also want to use hibernation. It's possible but I don't know a distro that makes this easy to use.

It's also recommended to mount /tmp as tmpfs, if you use programs without security in consideration (such as LibreOffice) to work with the files.

2
  • What you really need to do is evaluate a whole list of threat models all at the same time, and see which ones you cover and which ones you don't. People who only think about 1 threat model at a time are people who create security holes. Commented Nov 6, 2024 at 0:22
  • @KrazyGlew It's your question. It's your responsibility. I wrote in the first paragraph what you shouldn't expect in this answer, and you should check whether it fit your use cases. They are mostly because you compared it with ssh-agent, and I assume you shouldn't expect it to do better than ssh-agent. If you already blindly believed ssh-agent could solve those problems without thinking about the threat model, that's really your problem, not mine. Commented Nov 6, 2024 at 8:02

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.