PSA: Potential libstdc++ hang in std::filesystem::symlink_status
Something I caught while running TLGS's crawler. Due to how drogon works. It needs to always successfully open a new socket on TCP connection. Otherwise the entire process is killed. But Gemini does not support multiple requests on the same connection. So it's easy to endup with a lot of open sockets (though most are waiting to be closed). My hack to solve this conflict is to periodically walk /proc/self/
and check how many sockets are open. And rest if there're too many. The socket counter looks like this:
size_t countOpenSockets()
{
namespace fs = std::filesystem;
size_t count = 0;
for(const auto& fd : fs::directory_iterator(fs::path("/proc/self/fd/"))) {
if(std::filesystem::is_symlink(fd)
&& std::filesystem::read_symlink(fd).generic_string().starts_with("socket:["))
count++;
}
return count;
}
This kind works. Sometimes, for unknown reasons, the crawler will hang completely and consum ~25% of CPU (mostly in kernel). Attaching a debugger to the process and dumping the stack trace shows the following:
Thread 2 (Thread 0x7fef8d775640 (LWP 1439864) "DrogonIoLoop"):
#0 __GI___fstatat64 (fd=-100, file=0x7fef88ebeb10 "/proc/self/fd/647", buf=0x7fef8d772290, flag=256) at ../sysdeps/unix/sysv/linux/fstatat64.c:166
#1 0x00007fef8e464a82 in std::filesystem::symlink_status(std::filesystem::__cxx11::path const&, std::error_code&) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x00007fef8e464deb in std::filesystem::symlink_status(std::filesystem::__cxx11::path const&) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00005628fe869fd3 in countOpenSockets() ()
#4 0x00005628fe876c6d in GeminiCrawler::dispatchCrawl()::{lambda()#1}::operator()(GeminiCrawler::dispatchCrawl()::{lambda()#1}::operator()()::_ZZN13GeminiCrawler13dispatchCrawlEvE
NUlvE_clEv.Frame*) [clone .actor] ()
#5 0x00005628fe996aa8 in trantor::TimerQueue::handleRead() ()
#6 0x00005628fe9832e0 in trantor::Channel::handleEventSafely() ()
#7 0x00005628fe9789b0 in trantor::EventLoop::loop() ()
#8 0x00005628fe979ba8 in trantor::EventLoopThread::loopFuncs() ()
#9 0x00007fef8e3c22c3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007fef8e130b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#11 0x00007fef8e1c2a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
This doesn't look right. fstat shouldn't have a negative fd
. Even if it passed on. It shouldn't cause a hang. So I assume this is not a bug of the Linux kernel itself. Instead I'm guessing it's stdlibc++ not correctly handling an invalid fd.
To workaround the issue, I replaced the check from using C++'s filesystem module to using C's stat
function. My crawler seill hangs unfortunately. But this time it seems to be my homebrew lockfree algorithm.
Martin Chang
Systems software, HPC, GPGPU and AI. I mostly write stupid C++ code. Sometimes does AI research. Chronic VRChat addict
I run TLGS, a major search engine on Gemini. Used by Buran by default.
- marty1885 \at protonmail.com
- Matrix: @clehaxze:matrix.clehaxze.tw
- Jami: a72b62ac04a958ca57739247aa1ed4fe0d11d2df