PSA: Potential libstdc++ hang in std::filesystem::symlink_status
Something I caught while running TLGS's crawler. Due to how drogon works. It needs to always successfully open a new socket on TCP connection. Otherwise the entire process is killed. But Gemini does not support multiple requests on the same connection. So it's easy to endup with a lot of open sockets (though most are waiting to be closed). My hack to solve this conflict is to periodically walk /proc/self/
and check how many sockets are open. And rest if there're too many. The socket counter looks like this:
size_t countOpenSockets()
{
namespace fs = std::filesystem;
size_t count = 0;
for(const auto& fd : fs::directory_iterator(fs::path("/proc/self/fd/"))) {
if(std::filesystem::is_symlink(fd)
&& std::filesystem::read_symlink(fd).generic_string().starts_with("socket:["))
count++;
}
return count;
}
This kind works. Sometimes, for unknown reasons, the crawler will hang completely and consum ~25% of CPU (mostly in kernel). Attaching a debugger to the process and dumping the stack trace shows the following:
Thread 2 (Thread 0x7fef8d775640 (LWP 1439864) "DrogonIoLoop"):
#0 __GI___fstatat64 (fd=-100, file=0x7fef88ebeb10 "/proc/self/fd/647", buf=0x7fef8d772290, flag=256) at ../sysdeps/unix/sysv/linux/fstatat64.c:166
#1 0x00007fef8e464a82 in std::filesystem::symlink_status(std::filesystem::__cxx11::path const&, std::error_code&) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x00007fef8e464deb in std::filesystem::symlink_status(std::filesystem::__cxx11::path const&) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00005628fe869fd3 in countOpenSockets() ()
#4 0x00005628fe876c6d in GeminiCrawler::dispatchCrawl()::{lambda()#1}::operator()(GeminiCrawler::dispatchCrawl()::{lambda()#1}::operator()()::_ZZN13GeminiCrawler13dispatchCrawlEvE
NUlvE_clEv.Frame*) [clone .actor] ()
#5 0x00005628fe996aa8 in trantor::TimerQueue::handleRead() ()
#6 0x00005628fe9832e0 in trantor::Channel::handleEventSafely() ()
#7 0x00005628fe9789b0 in trantor::EventLoop::loop() ()
#8 0x00005628fe979ba8 in trantor::EventLoopThread::loopFuncs() ()
#9 0x00007fef8e3c22c3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007fef8e130b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#11 0x00007fef8e1c2a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
This doesn't look right. fstat shouldn't have a negative fd
. Even if it passed on. It shouldn't cause a hang. So I assume this is not a bug of the Linux kernel itself. Instead I'm guessing it's stdlibc++ not correctly handling an invalid fd.
To workaround the issue, I replaced the check from using C++'s filesystem module to using C's stat
function. My crawler seill hangs unfortunately. But this time it seems to be my homebrew lockfree algorithm.
![Author's profile. Photo taken in VRChat by my friend Tast+](/images/profile6.webp)
Martin Chang
Systems software, HPC, GPGPU and AI. I mostly write stupid C++ code. Sometimes does AI research. Chronic VRChat addict
I run TLGS, a major search engine on Gemini. Used by Buran by default.
- marty1885 \at protonmail.com
- Matrix: @clehaxze:matrix.clehaxze.tw
- Jami: a72b62ac04a958ca57739247aa1ed4fe0d11d2df