reconstructing ext4 inode structure after folder deletion

Question

My ext4 partition that had my whatsapp data had its entire folder deleted ( that is file / folder delete - no partition ). As I understand inodes have the metadata tree for files including their names location etc. The problem I face is the raw files ( without original filenames and paths ) can be recovered but original original file names with paths are missing . It was the app that deleted folder. This folder was mounted to ext4 partition on android phone. Partition image is already copied & to backup

What I did

& I tried following Testdisk -- No valid partition Extundelete and ext4Magic : No luck . Inode data came out empty and foremost got some data but metadata missing ( explained further down ) 3rd Party tools : Easus data recovery / Hetmann / r-studio : Please see screenshot . The folder and files I am looking for are 0 byte Running entire deep scan takes some 8 hrs and I get RAW files without original file names ( e.g. filexxx.jpg ) and that is of NO USE . I need file names with metadata e,g. original filenames and their paths i.e. somehow the raw files recovered should be mapped to the 0 byte file names or any other way the original file names can be recovered See the attached picture.

Is there any utility that can reconstruct the deleted inodes ? or any methodology approach to do it.

genericuser99 · Accepted Answer · 2024-07-18 16:10:37Z

tl;dr While you may be able to theoretically recover some file data/content, you're unlikely to recover full file paths and file names

This may be a little late as a response, but perhaps someone will find the information useful.

I think you may not be fully understanding what an inode is and how they're used within an EXT4 file system. File names are not metadata in the sense that inodes store metadata. Inodes have no reference to a file name, and given an inode alone, you cannot recover a file name, or file path. File names are really only of benefit to a user.

EXT4 (like most file systems), splits data up into manageable logical blocks (of a fixed number of bytes, e.g. 4096 bytes). These blocks are stored in block groups (e.g. 32768 blocks). Typically (depending on how the file system was configured when it was created; this won't be the case if the FLEX_BD feature is enabled, for example), every block group will have a group descriptor table (describing where the bitmaps, inode table and data blocks start within that block group), a block bitmap (which tracks how many blocks within that block group are in use), followed by an inode bitmap (which tracks how many inodes within that block group are in use), followed by an index node (inode) table, which is just an array of inode structures, where the inode number of a file is just an index into this table. The remaining blocks within that block group are then used for storing file data. An inode stores things like the permissions for a file, the size of the file, file attributes, file owner, etc., and importantly, where (i.e. which block, usually within that block group) the file data is stored (which may be split across multiple blocks). The inode does not store the name of the file (because the file might not actually have a name, because it's not intended for a user to interact with it, for example, such as the EXT4 JBD2 journal).

Every file on an EXT4 file system has an associated inode, and every inode has some associated data, which may be data blocks, or data within the inode itself. Inodes don't get deleted as such, only the content is modified and reused. A directory is a file - it has an associated inode, and some data. A directory is essentially an array of directory entry (dentry) structures, and each entry essentially consists of an inode number, size of that entry, file name length, and the file name itself. In effect, a directory just maps an inode number to a file name. A directory always contains, as a minimum, and entry for itself (the . entry, so it knows its own inode number), and an entry for its parent directory (the .. entry). The produces a hierarchical structure going back to the root (/ directory entry; typically inode number 2). There's some caching magic and similar that happens, but effectively, to find the data associated with the file path /etc/passwd for example (e.g. to read this file), we parse the dentry for / to find the inode number for entry for "etc", then parse that to find where the data is for the "etc" directory, then parse that dentry to find the inode number for "passwd" entry in there, and can then parse "passwd"s inode to find where the data for that file is, and finally read that data. Given only the inode for "passwd" we cannot work our way backwards to find the file name, or the path to that file.

I am going to assume that the WhatsApp directory was deleted from the device in a standard way. Generally, what happens when a file is deleted on an EXT4 file system is that the respective directory which contains the file or directory is looked up, and the file is unlinked from the directory. The consists of removing the directory entry for that file (though slightly more complicated regarding pointing the entry to the next entry instead, and then possibly zero-ing out the file name, depending on settings). The result is that there is now no way to associate a file path/file name with the file. Next, the inode associated with the deleted file is modified. The mapping where the inode points to where the file data is on disk is zero-ed out. Although the file data is still on disk, there is now nothing that points to where it is. The block and inode bitmaps are also freed, link counts decremented, block count / file size zero-ed, etc. Again, at this point (again, depending on how the file system was created, but typically), the file data still exists on disk but there is nothing pointing to where it starts and ends, etc., though it is still technically recoverable. This is where file carving software like testdisk/photorec, foremost, scalpel, etc. can help. Such software typically looks for known file signature, like known file headers (e.g. "PE" for a windows executable, "PK" for a zip file, "JFIF" for a JPEG file, etc.), and continue carving until it finds a likely end (if the file has a known tail signature, this can be easy, otherwise, it'll often just keep incrementing and storing iteration of a file until one seems valid - this is why things like photorec will result in multiple versions of the same file). Typically, for efficiency, new files will be stored in the same block group as their parent, or an adjacent one, if no blocks are free to accommodate it. For example, So let's assume I create the file /usr/local/share/myfile.txt. If the inode and data for the directory /usr/local/share is in block group 42, it will typically try to create an entry for myfile.txt in block group 42, too, and then move on to 41 or 43 if there's no free blocks or inodes. Thus, we can limit our search and carving to those specific block groups rather than the whole block device if we're trying to recover data.

However, this also means that if we delete myfile.txt and it's blocks are released. If we try to create another file in /usr/local/share, we'll likely reuse the delete file's inode, and overwrite its data blocks with new data. In this instance, we won't be able to recover and file data.

Either way, once a file is deleted - or even just unlinked, the file name and thus, full path to that file will usually be lost. There is a potential niche scenario where this may not be the case, however. When a directory grows sufficiently large, it changes from using direct or indirect block addressing (i.e. data for file a is stored in block x, data for file b is stored in block y, etc.) and switches to using hashed directory indexing. This allows for significantly more files to be stored in a directory, and accessed more quickly and efficiently, by calculating a hash of the file name, and storing this, which in turn references data blocks. In switching to hashed indexes, depending on how many files are present, some of the original directly entry data will be overwritten, but not necessarily all data (some may remain as obsolete slack, and not be overwritten). In such a case, theoretically, old file names could be pulled from here, but I'm not sure any automated tools actually looks for such edge cases.

The other final possibility is the JBD2 journal. EXT4 has a journal that essentially records transaction - when files are written to disk, deleted, etc. This is typically only metadata, but the journal can be configured to also journal actual file data, too. The idea is that if the file system becomes corrupt in some way, and something bad happens before file data is written out to disk, the system has a record of what data was to be written to which block on disk. Generally, each journal transaction has some associated data block (typically inodes, but could be file data if that's enabled), and the location (block address) where that data should appear on disk. By parsing the journal, we could theoretically find some data which may be of value, as well as where that should've been written to on disk (which we can then use to better target for carving, for example, assuming the data isn't in the journal itself).

I think that should be a fairly comprehensive overview of why it's difficult to recover deleted files on an EXT4 file system (assuming there is no "recycle bin" set up on the respective system, or it's not a VM with snapshots or something similar!), and why it's even more infeasible to actually recover full file names and paths.

Stack Exchange Network

reconstructing ext4 inode structure after folder deletion

1 Answer 1

You must log in to answer this question.

Hot Network Questions

reconstructing ext4 inode structure after folder deletion

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions