Fsearch, a fast file search utility for Unix-like systems

207 points by karlicoss 2 years ago · 97 comments

Reader

dzek69 2 years ago

This is great tool, I use it everyday, but far from it's Windows based original Everything.

Also this is anbandoned apparently, which makes me extra sad, because it lacks few crucial features like: - being able to just remove a file from the index if you delete it from the app directly (insted it shows a window how it "soon" gonna be implemented) - while i understand that indexing service is more complex job - at least caching the index would be nice, because right now when i start the app i have to wait for it to index everything again, but usually i search for files that exists for a long time, not these that was created between my fsearch uses

So yeah. Cool dead and incomplete piece of software ¯\_(ツ)_/¯ From time to time I look for better alternative, if you happen to know one - let me know.

wander_homer 2 years ago

Hi, I'm the author of this little piece of software.
> Also this is anbandoned apparently, which makes me extra sad, because it lacks few crucial features like:
PersonalIy I wouldn't call it abandoned. I'm still working on it — not as often as I'd like to, but I'm still making progress towards the next release. Though it's still months away from being released.
> - being able to just remove a file from the index if you delete it from the app directly (insted it shows a window how it "soon" gonna be implemented)
That feature is already implemented, but there are no official builds with it yet, because other parts of the software haven't been updated after the rewrite of the database engine (e.g. loading/saving the database file is broken at the moment). Once the old feature set is working again, I'll publish the first official dev builds of the 0.3 release.
> while i understand that indexing service is more complex job - at least caching the index would be nice, because right now when i start the app i have to wait for it to index everything again, but usually i search for files that exists for a long time, not these that was created between my fsearch uses
This is already supported and part of the stable releases. The index is cached and loaded upon application start, so you can search right away, even while the new index is being built. You can also disable auto index updates when the application is launched, if you prefer manual or scheduled index update instead. Or do you mean something else?
- antisthenes 2 years ago
  
  What's the best way to help you with this project?
  - wander_homer 2 years ago
    
    Most definitely code and documentation contributions and to a degree donations — although I clearly prefer the former, simply because it keeps me engaged the most by talking with others about this project, getting new ideas, etc.
    But I really welcome any sort of contribution. For example there's also things like improving the main interface language (English isn't my first language, so there's likely room for improvement there), helping with support questions and bug reports, artwork, ...
eredengrin 2 years ago

Last commit 2 weeks ago? Doesn't look dead. Perhaps not actively developed by original author but seems they're still acting as a maintainer and willing to take PRs.
RachelF 2 years ago

Me too - strangely, one of the reasons I stay on Windows is Everything (https://www.voidtools.com/) - it is just so useful.
- rkagerer 2 years ago
  
  I had Everything installed but didn't use it as much as expected. I've gravitated toward FileLocator Pro instead, which uses extremely fast metadata table searching instead of requiring an index (I don't use the Agent Ransack features).
  Not a Linux expert, but out of curiosity did you try Recoll when you looked at other platforms? (https://www.lesbonscomptes.com/recoll/pages/index-recoll.htm...)
- BOOSTERHIDROGEN 2 years ago
  
  What is the pattern syntax you use most often?
  - porridgeraisin 2 years ago
    
    Just regular fuzzy finding is the most common. For example, to attach a file instead of faffing around clicking in the open dialog, I type into fluent search(a frontend to everything) and it enters it in the current context i.e the open dialog.
    It has a few other qualifiers, like big: huge: etc to search large files. Same way with time etc.
    Rarely when you need it you prefix with regex: and you get everything you need.
    It's very useful that both everything and fluent search integrate into explorer in that if you right click on the search result you get the same context menu as you would in explorer. Drag drop works, etc,. The issue with every other tool in every other OS is the lack of this feature. You basically have a virtual fully featured directory for every search result, simply cannot beat that.
    
    BOOSTERHIDROGEN 2 years ago
    
    Wait, I didn't know Everything can full search text ? is it like search preview in macOS ?
    
    porridgeraisin 2 years ago
    
    Sorry, I meant that you can fuzzy find the filenames, not the contents.
    For the latter, explorer top right search bar can do it surprisingly. As for a tool with an index, recoll comes to mind.
    
    aragonite 2 years ago
    
    You can use the "si:" prefix in Everything alpha to query Windows index.
    https://www.voidtools.com/forum/viewtopic.php?f=12&t=9793
    I have some use scenario (finding backlinks in a folder containing thousands of markdown files) where this method fits perfectly and returns results instantaneously (even ripgrep takes a second or two to find all the backlinks as it doesn't use an index).
    Unfortunately the text snippet showing the context that you get in explorer is not easily retrievable via Windows API. I made a suggestion [1] to the author and he seems open to implement it.
    [1] https://www.voidtools.com/forum/viewtopic.php?t=13739
    
    porridgeraisin 2 years ago
    
    Wow, that's amazing!
    Let me know how recoll works for you, if you try it. It can search within XML stuff like word docs, PDFs and can even do OCR.
    
    8n4vidtmkvmk 2 years ago
    
    I use astrogrep for searching file contents on windows, it's pretty fast. Not nearly as fast as everything but I guess that's the nature of searching full contents. Just restrict by file type and filter out node modules and it's fast enough
    
    majkinetor 2 years ago
    
    dngep is great. It can also use everything
  - drewtato 2 years ago
    
    The most important thing is to sort by date modified by default. Usually, the file you want is very new.
    After that I mostly just use "pic:" or "path:".
rand0mx1 2 years ago

Try this one
https://news.ycombinator.com/item?id=33816014
hiAndrewQuinn 2 years ago

fd (fdfind on some distributions)
drekipus 2 years ago

rg or fd?
- atoav 2 years ago
  rg if you want to find stuff in files
  fd if you want to find stuff in filenames
  fzf for when you want a fuzzy menu type of search on top of this.
  I can't recommend fzf enough you can do some really powerful stuff with it. If you don't know it: it gives you a fuzzy search on things you pipe into it. It is powerful because it also can do things like running special commands ("preview") on the currently selected entry/line and allows for displaying the output in a separate pane.
  So you could build a thing that e.g. let's you search and multiselect (enqueue) your music collection and on each entry display audio metadata using a custom script.
  Or a blazingly fast PDF-content searcher that opens the PDF in the end. The possibilities are endless.
  Edit: Here a short video showing my basic git log alias: https://youtu.be/9W27D8lrn-s
  gl: aliased to git log --all --pretty=oneline --pretty=format:"%Cgreen%h%Creset %s %Cred%d%Creset" --color=always | fzf --ansi --preview 'git show --pretty=medium --color=always $(echo {} | cut -d" " -f1)' | cut -d" " -f1
  - darkwater 2 years ago
    
    If you want to put this (fantastic, thank you very much for it!) command in your git toolbox, personally I did it like this:
    1) create an executable script called `git-l` and put it in a place in your PATH, and make it executable 2) use `git l` to invoke it
    You will avoid escaping hell and you can even expand/complicate it much more. The same extension principle works with other CLI tools like `kubectl`.
- mutant 2 years ago
  
  rg isn't really a file finder, it's a grepper.
  - _andrei_ 2 years ago
    
    rg --files

Beijinger 2 years ago

"Performance. On Windows I really like to use Everything Search Engine. It provides instant results as you type for all your files and lots of useful features (regex, filters, bookmarks, ...). On Linux I couldn't find anything that's even remotely as fast and powerful."

https://www.lesbonscomptes.com/recoll/pages/index-recoll.htm...

"Recoll finds documents based on their contents as well as their file names."

"Recoll will index an MS-Word document stored as an attachment to an e-mail message inside a Thunderbird folder archived in a Zip file (and more… ). It will also help you search for it with a friendly and powerful interface, and let you open a copy of a PDF at the right page with two clicks. There is little that will remain hidden on your disk."

wander_homer 2 years ago

Recoll serves a different purpose as it's primarily build to index and search within your personal documents. That's why it doesn't work well when you point it to the root folder, in an attempt to search within the entire system of millions of files and that's also the reason why it's not as fast, since it's doing more work (parsing complex file formats, searching within a more complex database structure and more data, ...).
FSearch is primarily built to find files on the entire system instantly (by that I mean that all results should be ready by the time you press the next character while typing), based on their name, size, time, filetype, etc. This is less work than what Recoll does and that's why it is much faster.
That's why I also use both tools.
- Beijinger 2 years ago
  
  "FSearch is primarily built to find files on the entire system instantly "
  I am not sure, but I think my Bodhi "everything starter" solves this problem for me. If I look for something more specific, I use recoll.

eviks 2 years ago

Do any of the modern filesystem on Linux cover this very important use case of instant search anywhere like the good old NTFS does (that's how you get Everything's awesome performance)?

OvbiousError 2 years ago

Between mlocate and rg I've never felt like I needed anything else.
- eviks 2 years ago
  
  Have you tried anything else like Everything? It's common to feel no need if you haven't experienced the awesomeness of immediate feedback that in many parts is easily refinable with a shortcut maintaning the same properties, e.g., turn case sensitivity: no need to get your query from history, edit flags and rerun it; or add another column with extra data, same thing, no need to remember the rarely used flags or run a help command to get them, edit query and rerun it

apt-get 2 years ago

fsearch is the best locate front-end for Linux, but sadly, I've got many gripes with it... crappy drag'n'drop, no daemonization to minimize to system tray, closing the app resets the clipboard for some reason (EXTREMELY annoying when you open it to copy a file path), the list is long. Not to mention that locate itself doesn't auto-update the index with fs changes.

I miss Everything :(

porridgeraisin 2 years ago

Not to detract from your overall point, but the app is not resetting the clipboard. In X11 if you close any app, you lose anything you copy from it. This is because in X11 windows do message passing to emulate a clipboard. Your browser say and fsearch send messages to each other to send data in few KB chunks(yes there's an incremental transfer protocol where you have to support all kinds of irrelevant clients even if the last such client died before the turn of the century) So when one of the windows closes, it's gg. It's a pretty convoluted over-engineered idea when instead a single file ~/. clipboard would have sufficed.
For a more articulated rant from 20 years ago, see the rant file in https://github.com/porridgewithraisins/x11cp. Also shameless plug.
To fix this, use a clipboard manager. Also shameless plug https://github.com/porridgewithraisins/coffee-pesto :)

CyberDildonics 2 years ago

How does this search filesystems quickly on linux?

wander_homer 2 years ago

Author here. The app works in two steps:
Step one is building an index of the file system. This is simply done by walking the filesystem. The resulting index is stored in RAM and a file. On the next app start the index ia loaded from that file, which is much quicker than walking the file system.
Step two is using this in RAM index for searching. This scales really well with the number or CPU cores and on modern systems a normal case insensitive substring search should finish almost instantly with few million files.
The next release will support file system monitoring with inotify and fanotify to keep the index updated. Although this has some drawbacks.
- CyberDildonics 2 years ago
  
  This is simply done by walking the filesystem.
  This is the part I'm wondering about. Everything scans the filesystem very fast and there is no way it is just using 'stat' on every file then diving into the directories.
  Are you just using stat from C to walk the filesystem or are you doing something else?
  I've used sqlite to cache filesystem results and it is also extremely fast once everything is in there, but I think a lot of approaches should work once the file attributes are cached.
  - soundarana 2 years ago
    
    On NTFS Everything reads the MFT, which is sequential on disk.
    Then on subsequent starts it reads the NFTS update journal to see what changed.
  - lelanthran 2 years ago
    
    > Everything scans the filesystem very fast and there is no way it is just using 'stat' on every file then diving into the directories.
    The last time I checked, Everything worked by using the AV calls microsoft provides; anytime a file is written, the name (and other metadata) can be written to a log that Everything can check once every 5 seconds or so.
    If I thought there was any money at all to be made from providing an Everything equivalent[1] on Linux, I'd spend the week or so to write it, but as far as I can tell there's just no market for something like this.
    [1] By that I mean "similar in performance and query capabilities"; I would obviously need more time than that to hook into the common file-open dialog widgets (Gnome/KDE/etc) so that users could run their queries straight from existing file dialog widgets.
    
    CyberDildonics 2 years ago
    
    What you are talking about is file change notifications. A huge part of Everything's speed comes from reading the master file table that other people mentioned, so you would need a way to quickly read file table entries on linux.
    https://learn.microsoft.com/en-us/windows/win32/devnotes/mas...
    
    lelanthran 2 years ago
    
    > What you are talking about is file change notifications. A huge part of Everything's speed comes from reading the master file table that other people mentioned, so you would need a way to quickly read file table entries on linux.
    Not a problem. And no, I'm not talking about inotify either, and I'll additionally index the contents of (text) files as well with a negligible additional performance hit. It can be done as fast as, or faster than, `Everything`.
    TBH, if I thought I could make even $100 in donations from this, I'd start it tomorrow, but absolutely no one misses ultra-fast searching when they don't have it.
    Even on Windows, the number of users who go out and look for something that searches as fast as Everything is a rounding error - statistical noise. Now go and divide that fractional percentage of Everything users on Windows by 100 to get the number of Linux users who might use this.
    
    wander_homer 2 years ago
    
    > Not a problem. And no, I'm not talking about inotify either, and I'll additionally index the contents of (text) files as well with a negligible additional performance hit. It can be done as fast as, or faster than, `Everything`.
    Please enlighten us how that would work.
    > TBH, if I thought I could make even $100 in donations from this, I'd start it tomorrow, but absolutely no one misses ultra-fast searching when they don't have it.
    You can easily make $100 in donations with this. I did it with this piece of software while it was still less performant and powerful and without an official release and by only mentioning it on one or two forums.
    If the software delivers what you're saying, I'll guarantee you, that this will lead to more than 100$ per month in donations.
    
    lelanthran 2 years ago
    
    Firstly, I appreciate you taking the time to engage with me. I hope that I didn't come off as dismissive of your hard work or of being disrespectful of what you have delivered.
    My point was that the incentive to produce something like `Everything` on Linux just isn't aligned with what the target market wants or needs. I think that what you have produced satisfies what the target market wants.
    > You can easily make $100 in donations with this.
    Honestly, I'm still very skeptical that even a $100 target is possible. I have to also admit that I've looked at stuff in the past, gone "No one could possibly want that, at that price point" and been horribly wrong.
    I feel like I should test the claim of how many people want an `Everything` equivalent on Linux: I'll make it, package it with a MVP GUI, and mention it on a few forums in addition to posting a show HN here.
    For ideal reproducibility, let me know which forum(s) you initially got traction on. I'll try to mirror your marketing as closely as possible.
    I'd also like to know how you went about benchmarking performance against existing stuff for your project; for comparison against `Everything` I was thinking that the metric to beat is delta between file creation/removal time and the time that the file shows up in the results set (or index).
    Like the other responder here, I also think that once something is in the index, retrieval time should be almost instant, so there's not much point in benchmarking "How long does it take to update results after every keypress" once that metric falls below 100ms or so.
    
    wander_homer 2 years ago
    
    > I hope that I didn't come off as dismissive of your hard work or of being disrespectful of what you have delivered.
    Not at all, I'm just incredibly curious of how you'd solve the issue of creating an index of a filesystem as fast as Everything, because I've thought and read a lot about it in the last couple of years and haven't found any solution at all, nor did I find any other software which achieved something like that on Linux systems.
    > For ideal reproducibility, let me know which forum(s) you initially got traction on. I'll try to mirror your marketing as closely as possible.
    One post on the Arch Linux forum and one on the r/linux sub on Reddit. From there I got enough users to get more than 100$ in donations. Nowadays it's obviously more.
    > I'd also like to know how you went about benchmarking performance against existing stuff for your project;
    Everything has an extensive debug mode with detailed performance information about pretty much everything it's doing. That's how I know exactly how long it took to create the index, perform a specific search, update the index with x file creations, deletions or metadata changes etc.
    > for comparison against `Everything` I was thinking that the metric to beat is delta between file creation/removal time and the time that the file shows up in the results set (or index).
    That's not particularly interesting, because it's quite straight forward to achieve a similar performance.
    The crucial metric is how long it initially takes to create the index and then update it when the application starts (i.e. finding all changes to the filesystem which happened while the application wasn't running). That's where Everything excels and to which I and others haven't found a solution on non-Windows systems (without making significant changes to the kernel of course). The best and pretty much only solution I'm aware of is the brute force method of walking the filesystem and calling stat, which obviously is much slower.
    
    lelanthran 2 years ago
    
    > The crucial metric is how long it initially takes to create the index and then update it when the application starts (i.e. finding all changes to the filesystem which happened while the application wasn't running)
    That's what I meant by " delta between file creation/removal time and the time that the file shows up in the results set (or index)."
    Basically, how fast can we update the index?
    > That's where Everything excels and to which I and others haven't found a solution on non-Windows systems (without making significant changes to the kernel of course).
    I've got a couple of out-there ideas which may or may not pan out, one of which was, indeed, a kernel module.
    Another idea is to deploy the indexer as a daemon with the applications all using IPC to query and update it. This will give the query applications a significant advantage on startup compared to Everything.
    As for updating the index timeously, I've got a few ideas there as well. Walking the filesystem starting at `/` for each update will result in only performing index updates once a day or so (hence, the reason I expressed the metric as a delta) so I feel that that is no good.
    I'll do an implementation and try to message you (if you want to check it out) because code talks louder than words :-)
    
    wander_homer 2 years ago
    
    > Basically, how fast can we update the index?
    The two core issues are:
    1) How do you quickly get a list of all files and their attributes from the filesystem, without recursively visiting all directories? The kernel has no such functionality and neither do most filesystems (except NTFS with the MFT, which is how Everything solves that).
    2) How do you know which files have been modified on a filesystem since it was last mounted on the system or since your monitoring daemon/application was running the last time? This information also needs to be stored persistently on the filesystem (like the USN journal, which Everything is using) if you want to avoid slow recursive traversals.
    > I've got a couple of out-there ideas which may or may not pan out, one of which was, indeed, a kernel module.
    Well the problem is, my kernel isn't the only kernel who changes the filesystems I'm using. Hence a kernel module only works if your system is the only one whose modifying the data you're working with or most other systems need to be using the same kernel module, which isn't realistic.
    > Another idea is to deploy the indexer as a daemon with the applications all using IPC to query and update it. This will give the query applications a significant advantage on startup compared to Everything.
    Everything uses a daemon as well and it's not a solution to that issue, because somehow the daemon also has to get the list of files/folders and their attributes out of a filesystem without walking it. How else would the daemon know which files belong to the volume which was just mounted moments ago?
    > As for updating the index timeously, I've got a few ideas there as well. Walking the filesystem starting at `/` for each update will result in only performing index updates once a day or so (hence, the reason I expressed the metric as a delta) so I feel that that is no good.
    Walking the filesystem shouldn't be done at all, because it's just too slow.
    > I'll do an implementation and try to message you (if you want to check it out) because code talks louder than words :-)
    Of course, I'd appreciate that.
    
    lelanthran 2 years ago
    
    > How else would the daemon know which files belong to the volume which was just mounted moments ago?
    I wasn't intending to include transient filesystems in the index.
    > Of course, I'd appreciate that.
    Gimme about a week :-)
    
    wander_homer 2 years ago
    
    > I wasn't intending to include transient filesystems in the index.
    There's absolutely no difference between transient and persistent filesystems in regards to that problem. Every time a filesystem gets mounted, you have no idea what you're going to get. The last time it was mounted there could have been 13 million files on it and now when you mount it all of them could be gone or renamed. This is also super common on modern Linux systems, because many of them boot into a minimal boot environment to perform system updates and hence alter the filesystem heavily while such daemons as a file system monitor isn't running.
    So the question is: how do you know, whether /some/random/file has been modified while your daemon or application wasn't running or the filesystem wasn't mounted on your system, without performing a stat call on it? If you don't have an answer to that, which also needs to be orders of magnitudes faster, then you'll never match the performance of Everything. And that's not some uncommon situation, because your daemon/app has to figure that out every time it gets launched for every file and folder.
    
    lelanthran 2 years ago
    
    > So the question is: how do you know, whether /some/random/file has been modified while your daemon or application wasn't running or the filesystem wasn't mounted on your system, without performing a stat call on it? If you don't have an answer to that, which also needs to be orders of magnitudes faster, then you'll never match the performance of Everything.
    Well, my intention is to match the feature list of Everything, but on Linux, and as far as I knew, Everything did not have full support for external drives - you'd have to convert them to NTFS, or add them to be indexed manually.
    The use-case I've seen for Everything has always been for a local user searching their local PC; I wasn't even sure until now that Everything can sometimes search transient filesystems because know one I ever saw using it used it for files on a transient filesystem.
    You're correct; what I cannot do is monitor transient filesystems; but doing permanent filesystems at a speed better than or equal to Everything is still better than anything I've used on Linux, many of which don't even search system files, nevermind transient filesystems. And they all use the locate db which is always a day or so out of date.
    And yes, it can be done purely by monitoring filesystem changes. Sure, a full index needs to be built the first time, but that's a one-off cost - index updates after that should be fast enough to do for each write/remove/move operation that you can update the index dozens of times per second.
    For non-transient filesystems, performance should be the same as, or better than, Everything.
    
    wander_homer 2 years ago
    
    > And yes, it can be done purely by monitoring filesystem changes. Sure, a full index needs to be built the first time, but that's a one-off cost
    And how do you build the full index initially without recursively walking the filesystem? Otherwise you're not going to match Everything's performance on initial index creation.
    And regarding the second crucial question: How do you know that a file you saw the last time your app or daemon was running, hasn't been modified in the meantime?
    You still haven't answered those two fundamental questions. Everything else are solved issues anyway.
    > index updates after that should be fast enough to do for each write/remove/move operation that you can update the index dozens of times per second
    Like I already said, that has never been a problem. My app can currently update the index several thousand times per second and there's still a lot of room for improvements with many low hanging fruits.
    > For non-transient filesystems, performance should be the same as, or better than, Everything.
    You keep saying that, but you're also not giving an answer to how you're going to solve the two major and pretty much only issues.
    
    lelanthran 2 years ago
    
    Since it seems we hit the max thread limit (I can't reply to your reply to me), I'll post my reply here, quoting your post as best as I can.
    >> I wasn't planning to; it's a once-off cost - the user experience while using any software isn't degraded by the installation time, is it?
    > This whole topic started with you claiming that you can even beat Everything in that regard
    Nope.
    I never claimed that I can beat Everything in "reading the metadata when the app starts". I claimed that I can match the startup and search performance of Everything.
    Those are two different claims, and the latter is obviously possible if the application performs queries by querying a daemon that is always running with an in-memory index.
    > Remember, your response to:
    >> A huge part of Everything's speed comes from reading the master file table that other people mentioned, so you would need a way to quickly read file table entries on linux.
    > Was
    >> Not a problem. And no, I'm not talking about inotify either, and I'll additionally index the contents of (text) files as well with a negligible additional performance hit. It can be done as fast as, or faster than, `Everything`.
    It's "not a problem because the index will already be in in-ram and available before the user launches the app". You read that to mean "not a problem because there is a fast way to read file metadata on startup".
    I think that there's a difference in that. My proposal is to never have the app need to read anything on startup (other than configuration settings).
    > And btw. indexing content will obviously only put you even further behind. The cost is not negligible.
    How will it put me further behind? I did say that it will only be done during software installation, right?
    >> It's a daemon. if it isn't running while the user is using their desktop system, then it's not working or the user has turned it off.
    >> My desktop system currently has 2.5m files. There are maybe a dozen files which will be modified during a maintenance-mode bootup, which has happened exactly zero times in the last decade.
    I have many users, myself included, who use things like shared filesystems, which get modified by multiple systems. And like I already said, modern Linux systems also perform all of their updates in such a maintainance mode. So your app will give thousands of false positives or miss thousands of files completely on those systems.
    There's two things there:
    1. Shared filesystems - I don't care about this because Everything doesn't care about this being performant: In Everything, as far as I understand it, the user manually indexes network shares.
    2. Maintenance modes won't give you thousands of false positives; at most you're looking at a diff of maybe dozens of index entries, if that.
    > Sigh... So you're also not going to solve the second issue. I mean I clearly asked you these questions multiple times and I tried to make it clear, that this is where the problem is, to save you and me time, and still you kept it a secret up until now that you're not even attempting to fix those problems.
    I didn't keep it a secret - I made it clear that a daemon will hold the index, and the app will talk to it, and that the index will be built once during software installation.
    > So I'll have to take back my claim: Under these circumstances I can't guarantee that you'll make a lot of donations, because your app won't do anything special compared to others.
    Well, it will be a few orders of magnitude faster to start up than checking for filesystem changes on startup, no?
    >> For a Linux desktop file finding utility, monitoring all file writes, moves and deletes pretty much puts you ahead of any game in town right now, right?
    > Well kind off, but it's not particularly difficult to solve that issue. The dev versions of FSeaerch already can do that.
    If you don't mind me asking, how do you do it? Because inotify is out of the question if you want to monitor 2.5m files. Even for just the home directory you will run the risk of exhausting file descriptors by using inotify.
    >> Issue 1 - Initial index creation: I will create the index during the s/ware installation process and never create it again unless it is missing. To speed the creation during installation, I will use the mlocate.db file if it is found.
    > So you're doing exactly what everyone else is doing.
    All the existing utilities create the index only during installation?
    > You can also ingore the mlocate index, because it doesn't contain enough information (size, date modified, ... aren't indexed by it so you'd need to stat all of those files anyway).
    >> Issue 2 - Files that are changed/moved/removed when daemon is turned off: I don't really care, mostly. Those files a) have such a small probability of both existing and being of interest to the desktop user that lottery jackpots have a higher chance of happening to the user
    > Like I already said, you're ignoring the hard and important problem. That's fine, but you suggested otherwise and now you're again doing nothing out of the ordinary.
    Which hard and important problem? That changes made in maintenance mode aren't seen?
    >> I believe that this is enough to satisfy my original claim[1] of " "similar in performance and query capabilities""[2].
    > Well, it depends, you're not going to beat Everthing in the areas me and others care and in an attempt to get anywhere near that, you're trading accuracy for speed.
    Going from 100.0000000% accurate to 99.9999999% accurate is hardly "sacrificing accuracy for speed", considering that you're still in the statistical rounding error group.
    > That's fine, but this is nothing new or special, so I'm not really interested in that.
    "Faster than existing Linux tools" would, actually, be something new and novel. "Faster than Everything in some specific areas" almost certainly counts, especially when accuracy is within error bars.
    I have one last batch of questions, after which I will simply shut up and get to coding something. I kinda hope that you will answer these questions.
    A major feature of Everything when people wax on about its speed is how quickly new entries in the filesystem show up in the applications query results.
    Even while the results is open, the user can see files that were added since the last keystroke.
    1. How does FSearch handle this common and obvious use-case?
    2. What's the newest filesystem change you can expect to see when performing a query in FSearch? Is it "the last change made prior to the application startup"? Is it "The last change made prior to the query"? Is it "The last change made since we walked the filesystem"?
    3. What's the p99 for startup time in FSearch? The p99 for query results of N (where N is a suitably large number)?
    4. You mentioned "areas that you and others care about". Can you briefly list the areas, other than complete and 100% accuracy during maintenance mode. All I know about is what Everthing users appear to care about, and they simply aren't caring about USB memory sticks, cameras plugged in, network drives, maintenance mod diffs, etc. They do appear to care that it is responsive.
    
    wander_homer 2 years ago
    
    > Those are two different claims, and the latter is obviously possible if the application performs queries by querying a daemon that is always running with an in-memory index.
    But the daemon also has to start at one point (you're just shifting the problem down that stack) and that's where it gets expensive IF you want to be as accurate as Everything. But of course, if you don't care about accuracy, starting up the daemon isn't time consuming. I've already discussed this with my users in the past and we settled for a toggle switch where users can opt-in to that behavior of more speed at the cost of having false results.
    > How will it put me further behind? I did say that it will only be done during software installation, right?
    Everything also only does this whenever a filesystem is first detected and scanned; still people care about the performance in those cases. Especially when you're often plugging in USB HDDs and such.
    > 1. Shared filesystems - I don't care about this because Everything doesn't care about this being performant: In Everything, as far as I understand it, the user manually indexes network shares.
    This is not only about network shares, but also about dual boot system, where multiple OSes use the same filesystem and USB HDDs/SSDs.
    > 2. Maintenance modes won't give you thousands of false positives; at most you're looking at a diff of maybe dozens of index entries, if that.
    Of course it does. Just in the last week ~13,000 files and folders were modified on my system with the system update (which ran in a maintenance boot environment where other daemons don't get started). That's 13,000 files and folders which will either be missing in your indexing solution or show up as false positives (because you're using outdated metadata, like their old size or timestamps).
    > Well, it will be a few orders of magnitude faster to start up than checking for filesystem changes on startup, no?
    Of course, but again that's not the problem. The problem is doing what Everything does: Start up a few orders of magnitude faster AND at the same time checking for filesystem changes on startup.
    > If you don't mind me asking, how do you do it? Because inotify is out of the question if you want to monitor 2.5m files. Even for just the home directory you will run the risk of exhausting file descriptors by using inotify.
    I'm using fanotify by default and inotify as a fallback in the case the filesystem or kernel doesn't support fanotify with the feature set I need. Running out of file descriptors is usually not an issue, because you don't need to keep file descriptors open for all files. My system has more than 3 million files and even using just inotify for that does work.
    > All the existing utilities create the index only during installation?
    Obviously not all, because some don't even create an index to begin, but many do.
    And btw. I doubt that your solution, of creating an index only once, even works, because sooner or later you need to rescan larger parts of the filesystem, when the inconsistencies become to frequent (like when you suddenly become filesystem change notifications for files which you didn't even know about).
    > Which hard and important problem? That changes made in maintenance mode aren't seen?
    Getting the index in a consistent state with the filesystem after boot.
    > A major feature of Everything when people wax on about its speed is how quickly new entries in the filesystem show up in the applications query results.
    > Even while the results is open, the user can see files that were added since the last keystroke.
    > 1. How does FSearch handle this common and obvious use-case?
    It detects filesystem events with fanotify, queues some of them for batch processing, then applies them to the index and results.
    > 2. What's the newest filesystem change you can expect to see when performing a query in FSearch? Is it "the last change made prior to the application startup"? Is it "The last change made prior to the query"? Is it "The last change made since we walked the filesystem"?
    In the development version with monitoring support changes to the filesystem show up in the results almost immediately; it's usually less than a second. Only in the rare case when many thousand files get modified almost simultaneously, it can take a few more seconds. Hence when you sort your results by date modified, you can live monitor all the recent changes that are being made on your system.
    > 3. What's the p99 for startup time in FSearch? The p99 for query results of N (where N is a suitably large number)?
    This depends on the storage type. But on modern SSDs with a few million files it's usually a second or so to load the index from the database file. You can then search right away and depending on whether you've configured the system to also be accurate or not, a rescan might be triggered in the background, which obviously takes much longer to finish, but then you'll guaranteed to have correct results.
    > 4. 4. You mentioned "areas that you and others care about". Can you briefly list the areas, other than complete and 100% accuracy during maintenance mode. All I know about is what Everthing users appear to care about, and they simply aren't caring about USB memory sticks, cameras plugged in, network drives, maintenance mod diffs, etc. They do appear to care that it is responsive.
    I'll have to answer that in a few hours if you don't mind, I have to get going now.
    
    lelanthran 2 years ago
    
    > And how do you build the full index initially without recursively walking the filesystem? Otherwise you're not going to match Everything's performance on initial index creation.
    I wasn't planning to; it's a once-off cost - the user experience while using any software isn't degraded by the installation time, is it?
    > And regarding the second crucial question: How do you know that a file you saw the last time your app or daemon was running, hasn't been modified in the meantime?
    It's a daemon. if it isn't running while the user is using their desktop system, then it's not working or the user has turned it off.
    In any case, if a component of the software is not running, then the software is not running.
    I mean, seriously, even during regular updates, daemons still run. Even during distro upgrades daemons are still running. The rare cases where files are removed/changed/moved while daemons are turned off are fractions of fractions of a percentage.
    My desktop system currently has 2.5m files. There are maybe a dozen files which will be modified during a maintenance-mode bootup, which has happened exactly zero times in the last decade.
    For a Linux desktop file finding utility, monitoring all file writes, moves and deletes pretty much puts you ahead of any game in town right now, right?
    > You keep saying that, but you're also not giving an answer to how you're going to solve the two major and pretty much only issues.
    Perfect is the enemy of good.
    Issue 1 - Initial index creation: I will create the index during the s/ware installation process and never create it again unless it is missing. To speed the creation during installation, I will use the mlocate.db file if it is found.
    Issue 2 - Files that are changed/moved/removed when daemon is turned off: I don't really care, mostly. Those files a) have such a small probability of both existing and being of interest to the desktop user that lottery jackpots have a higher chance of happening to the user, and b) After an MVP, if the userbase requests those files, I'll either hardcode their locations and always check only for those dozens of files that can possibly be changed when daemons are turned off, or allow the user to specify via configuration, the pathname patterns to always check.
    I believe that this is enough to satisfy my original claim[1] of " "similar in performance and query capabilities""[2].
    [1] https://news.ycombinator.com/item?id=38686022 [2] I don't recall making any claim along the lines of "walking the filesystem tree is never used".
    
    wander_homer 2 years ago
    
    > I wasn't planning to; it's a once-off cost - the user experience while using any software isn't degraded by the installation time, is it?
    This whole topic started with you claiming that you can even beat Everything in that regard, which is why I even got involved in that discussion.
    Remember, your response to:
    > A huge part of Everything's speed comes from reading the master file table that other people mentioned, so you would need a way to quickly read file table entries on linux.
    Was
    > Not a problem. And no, I'm not talking about inotify either, and I'll additionally index the contents of (text) files as well with a negligible additional performance hit. It can be done as fast as, or faster than, `Everything`.
    And btw. indexing content will obviously only put you even further behind. The cost is not negligible.
    > It's a daemon. if it isn't running while the user is using their desktop system, then it's not working or the user has turned it off.
    > My desktop system currently has 2.5m files. There are maybe a dozen files which will be modified during a maintenance-mode bootup, which has happened exactly zero times in the last decade.
    I have many users, myself included, who use things like shared filesystems, which get modified by multiple systems. And like I already said, modern Linux systems also perform all of their updates in such a maintainance mode. So your app will give thousands of false positives or miss thousands of files completely on those systems.
    Sigh... So you're also not going to solve the second issue. I mean I clearly asked you these questions multiple times and I tried to make it clear, that this is where the problem is, to save you and me time, and still you kept it a secret up until now that you're not even attempting to fix those problems.
    So I'll have to take back my claim: Under these circumstances I can't guarantee that you'll make a lot of donations, because your app won't do anything special compared to others.
    > For a Linux desktop file finding utility, monitoring all file writes, moves and deletes pretty much puts you ahead of any game in town right now, right?
    Well kind off, but it's not particularly difficult to solve that issue. The dev versions of FSeaerch already can do that.
    > Issue 1 - Initial index creation: I will create the index during the s/ware installation process and never create it again unless it is missing. To speed the creation during installation, I will use the mlocate.db file if it is found.
    So you're doing exactly what everyone else is doing. You can also ingore the mlocate index, because it doesn't contain enough information (size, date modified, ... aren't indexed by it so you'd need to stat all of those files anyway).
    > Issue 2 - Files that are changed/moved/removed when daemon is turned off: I don't really care, mostly. Those files a) have such a small probability of both existing and being of interest to the desktop user that lottery jackpots have a higher chance of happening to the user
    Like I already said, you're ignoring the hard and important problem. That's fine, but you suggested otherwise and now you're again doing nothing out of the ordinary.
    > I believe that this is enough to satisfy my original claim[1] of " "similar in performance and query capabilities""[2].
    Well, it depends, you're not going to beat Everthing in the areas me and others care and in an attempt to get anywhere near that, you're trading accuracy for speed (what makes Everything special is that it's both fast and accurate/reliable). That's fine, but this is nothing new or special, so I'm not really interested in that.
    
    CyberDildonics 2 years ago
    
    It can be done as fast as, or faster than, `Everything`.
    Then how would you do it? That's what I'm asking, how would you get the file attributes off of the disk as fast as everything on linux? Once you get them off the disk any modern computer can burn through them, but getting that data into memory in the first place is the problem.
  - wander_homer 2 years ago
    
    Yes, it's simply using stat on every file/folder. There's probably some room of improvement there with clever parallelization, but it'll remain a bottleneck.
    Everything is parsing a file called the MFT to build its index. This much more efficient but unfortunately this file only present on NTFS volumes, which makes it super useful on Windows systems, but not so much everywhere else.
    Another benefit you get on Windows is the USN journal, which allows Everything to keep the index updated much more efficiently.
- bdzr 2 years ago
  
  I've never used fsearch, but I use a CLI tool that replaces locate (https://plocate.sesse.net/). Do you have an idea of how the performance and index format compares with fsearch?
  - wander_homer 2 years ago
    
    I'm not familiar with the internals of plocate, but I'll have a brief look at it.
- pangey 2 years ago
  
  Is it possible to use eBPF for this task instead of inotify?
  - wander_homer 2 years ago
    
    Maybe, but I'm not sure if there's much benefit to that. The most inefficient part of the inotify or fanotify solution is that you have to walk the file system before monitoring can even start, because you first need to know which folders and files are there to begin with. And unfortunately this can't be avoided with eBPF.

jcul 2 years ago

I was wondering what this adds over mlocate. It seems it's a GUI only / GUI first tool.

The GitHub page recommends mlocate for a CLI version.

wander_homer 2 years ago

Hi, author here.
Likely the most significant benefit is the more powerful query language. For example you can also search by file modification date or size and use boolean operators. https://github.com/cboxdoerfer/fsearch/wiki/Search-syntax
- jcul 2 years ago
  
  Ah thanks for that, I can see the benefit there alright.
soundarana 2 years ago

For example it updates after every character you type. Sometimes you don't know exactly what you are looking for or you are exploring.

WebTDs 2 years ago

My favorite search tool on windows is Agent Ransack https://www.mythicsoft.com/agentransack/

Searches not only file names but in contents as well. Also blazing fast in my experience.

pacifika 2 years ago

Just noting this is Windows software not unix-like.

croemer 2 years ago

Unix-like apparently does not include macOS here

Edit: Or does it? https://ports.macports.org/port/fsearch/details/

paradox460 2 years ago

Do you really need it on Mac? We've got mdfind
- mistercow 2 years ago
  That's just a cli for Spotlight, right? I have found Spotlight to be increasingly unreliable over the years, to the point of being essentially useless now. Most recently, I discovered that absolutely nothing in my iCloud drive can be found with Spotlight, even if the file is on disk and I'm just trying to match its name. For extra fun, I found an Apple support page that suggested the Windows-in-1999-esque procedure of blocking and then unblocking the directory from indexing in the Spotlight preferences. Unsurprisingly, this didn't work.
  Testing out mdfind and trying to simply find a file in a specific directory (which only contains five files):
  mdfind -onlyin . name-of-my-file.md
  Nothing.
  mdfind -onlyin . -name name-of-my-file.md
  Nothing.
  mdfind -onlyin /absolute/path/to/cwd -name name-of-my-file.md
  Nothing.
  This is pretty reflective of my experience with the Spotlight GUI. Every search turns up something, but the file I want is almost never in the results.
  - Affric 2 years ago
    
    Spotlight progressively getting worse was one of the reasons I switched to Linux.
    I think MacOS is great. Many of the features are brilliant. The UI is second to none to this day.
    But holy moly, Spotlight has degraded. From searching my files, to searching my files and the internet, to consistently providing me with nothing I am looking for.
    When Apple first started selling OSX it's UNIX heritage was a part of the selling point but my vibe is that they only really consider text input a feature for devs. Which thinking of Mac's history is on brand but just leaves you down many a frustrating dead end.
  - pvtmert 2 years ago
    
    if you do not have the "show all file extensions" enabled, `.md` query in spotlight will not show up anything.
    try without extension. (eg: just the name)
    I even have scripts to locate other scripts using mdfind, it is pretty robust to be honest...
    Eg:
    source $(mdfind -name spacelatte-bashlib.sh)
    
    mistercow 2 years ago
    
    It also doesn’t work without the extension. It does work on my work laptop, so I expect that something is broken with my index. But this is also part of the problem. The whole thing is opaque, breaks, and offers no clear recourse for how to resolve any issues.
- ukuina 2 years ago
  
  And FindAnyFile (https://apps.tempel.org/FindAnyFile/index.php), disclosure: happy user with no affiliation.

liotier 2 years ago

How does it compare to Baloosearch ? Baloo and KDE Plasma go nicely together !

ensocode 2 years ago

Does it support find in files? I am using catfish and looking for alternatives but find in files would be a must have. Some are recommending fzf (rg, fd). What is your search workflow and what tools do you use?

wander_homer 2 years ago

> Does it support find in files?
No, not yet.

smcleod 2 years ago

It’d be interesting if this could integrate with plocate (mlocate’s replacement) which is incredibly quick at indexing and returning results but relatively basic.

wander_homer 2 years ago

You mean like being able to read the plocate database with FSearch? I don't see much point in that, because the plocate database is missing some crucial data, which FSearch uses to make searching and sorting quicker. For example file attributes like size or modification date and the sort order by various attributes (name, path, size, ...) aren't indexed by plocate.
If plocate is faster at building the index, it probably makes more sense look at what's the reason behind that and add these improvements to FSearch.

einpoklum 2 years ago

I like that the author has not done away with the menu, like certain software projects which shall not be named but begin with GN and end with ME.

QuadrupleA 2 years ago

I gave up on Desktop Linux unfortunately, but this is a great Everything replacement for those who love it on Windows.

ivanjermakov 2 years ago

What made you to give up?
- QuadrupleA 2 years ago
  
  Too much research and extremely low level troubleshooting required (i.e. source code reading) to get things working. Especially around wayland, multi-monitor, multi-GPU, Nvidia, etc.
  I hate what Microsoft has been doing with Windows, but Linux just isn't practical for my setup yet.
  - berkes 2 years ago
    
    You are either extremely unlucky or chose a very strange setup.
    The problems you describe were common in early 2000s, but haven't been common in Linux desktop for a decade or so.
    For those reading above and thinking "I'll skip Linux, if that's the current status": it's not. Just pick Ubuntu LTS. Use it on common hardware (e.g not bleeding edge) and stick with the defaults. Don't try to make it exactly like your Mac or Windows machine but lean into how it does things. They are different . They may be uncomfortable. Then, once familiar feel free to tinker and hack.
    I'm on Linux since 1996. I've hacked and tweaked everything in my younger years. Now I'm on a boring, hardly configured Ubuntu LTS. Well, my she'll and nvim are tuned beyond recognition, I guess. The rest: boring.
    
    badpenny 2 years ago
    
    > For those reading above and thinking "I'll skip Linux, if that's the current status": it's not.
    I politely disagree. I recently installed Fedora on my desktop PC because Microsoft decided that displaying a full-screen ad for Windows 11 that prevented my PC from booting was acceptable behaviour. Anyway, one of the first things I noticed on Fedora was that video playback was stuttery. After ages spent digging around, I discovered Fedora had disabled GPU hardware video decoding for legal reasons. Around the same time I made the mistake of trying to delete a directory with lots of files in it on an NTFS drive. The operation failed and corrupted the filesystem, and I had to spend a week or so downloading and restoring backups. Needless to say I'm back on Windows now.
    
    QuadrupleA 2 years ago
    
    I'm not new to all this - I used Slackware and Red Hat on the desktop in the early 2000s, and I use Linux on the server side daily. But on two separate laptops recently, manufactured 10 years apart, I've had all kinds of glitches and performance problems with (Arch) Linux, on a variety of desktops (Sway, KDE Plasma, Gnome) especially around video and GPU. You could say just don't buy Nvidia - but one, too late, and two, if you want to do anything interesting in AI these days they're hard to compete with.
    YMMV and if you're just in the shell and in VIM all day you might not notice video glitches and performance problems (computers did text terminals in the 70s so it's not a high bar). But as a lapsed game dev I have an eye for stutter, missed frames, etc, and those were pretty constant in my configs, despite sinking probably a hundred plus hours in, time I won't get back.
    
    ametrau 2 years ago
    
    I have wasted so much time trouble shooting just basic things on Linux DEs. Example: mouse wheel scroll speed. In general it’s a mishmash of various egos, various low efforts, various high efforts, etc. Not usable imo if you just want to use a computer and have the OS out of your way.
    
    pjmlp 2 years ago
    
    They are quite common on laptops, here is a second anecdote on the matter.
    We even have an internal how-to about what works and what not, for those that want to try their path outside the Thinkpad/Windows, Macbook official IT path.
  - pjmlp 2 years ago
    
    I feel you, as subscriber from Linux Journal during their print lifetime, I eventually found peace in Windows 7 and later, alongside desktop VMs, than ensuring everything on a laptop does indeed work properly.
    Even Linux branded laptops keep having issues, example on the one I still have around, I never got around fixing it dropping wlan connections, so for large OS updates it needs to be plugged on the LAN.
  - gitaarik 2 years ago
    
    Nvidia is indeed not well supported, on which Nvidia is mostly to blame (also see Linus Torvalds finger to Nvidia). That is too bad because the community can't do much about it.
    Multi monitor support should generally be fine in most desktop environments, at least for 2 screens. More than that can indeed be quirky, dependent on the desktop environment and window manager (X or Wayland) you're using.
    Multi-GPU is probably a bit niche to have good driver coverage, partly probably because of the Nvidia issue.

pjmlp 2 years ago

Looks quite nice, brownie points for being native.

carpo 2 years ago

E a d

aikinai 2 years ago

For anyone looking for a Mac equivalent, there's GoToFile[0].

As far as I've seen, this is the only app for Mac that doesn't just reuse Spotlight search (which I find to be terrible). I looked for exactly this type of app for years before finding it, and when I did, it didn't seem real with the old-fashioned website and zero mentions on sites like HN. But I can assure that it works great and it's maintained. I just wish the author would promote it better so it gets more attention and isn't so hard to find.

[0] https://www.soma-zone.com/GoToFile/

code_biologist 2 years ago

+1 on the need for non-Spotlight options. Source code makes a mess of spotlight for everyday usage. I like Leap a lot for multimedia browsing with more precise search capabilities than spotlight: https://ironicsoftware.com/leap/
- Jakob 2 years ago
  
  You can remove folders from spotlight under "Spotlight Privacy". I removed my source code folders, which vastly improved search results.
  - code_biologist 2 years ago
    
    100%. That's when I discovered I needed non-spotlight search to bulk search what I just excluded from spotlight.
optimalquiet 2 years ago

Soma-zone software in general is pretty neat for Mac power users. I especially like Launch Control, which lets you configure and inspect launchd (the mac service daemon) services.
monroewalker 2 years ago

How does a program like this work? Is it indexing every file on the system and monitoring all events for updates?
eviks 2 years ago

Foxtrot Search (expensive) uses its own index and also indexes content, not just names
There are a couple of other filename-only apps with own index, but don't remember now which of the alternatives do that (HoudahSpot?)

Settings

Fsearch, a fast file search utility for Unix-like systems

Keyboard Shortcuts