social.weyr.org.uk | Ian Molton @ social.weyr.org.uk

Ian Molton

spyro@social.weyr.org.uk

Ian Molton

1 year ago

(Cheshire, UK)

Ian Molton
1 year ago — (Cheshire, UK)

@Friendica Support

My instance appears to have a problem - the first load of the network page takes so long that I've had to increase the timeout for php-fpm in my nginx config, or else get 504 errors.

It seems like later attempts to load the page are a bit faster, but I expect my users will have given up waiting long ago.

I've seen that others have had similar issues in the past, and there seems to be no solution.

iotop shows that even AFTER my profile page has loaded, a further 545MB of disk reads due to mysql goes on.

I'd noticed this sort of extreme disk activity going on before, albeit not as bad.

Is this a scalability issue in friendica? or is my node under attack?

Suggestions?

@Friendica Support

utzer [Friendica] likes this.

reshared this

Ian Molton

1 year ago

This seems very similar to the activitypub-troll attack - I'm seeing an ever growing set of UpdateContact requests.

Crazy-to-Bike likes this.

reshared this

Sean Riley

1 year ago •

Make sure you block those activity pub trolls, they will grind you to a halt.

Also if this is a new instance give it time to settle down. Friendica does alot of work in the first week or two to build out its understanding of other hosts, their accounts etc.

Increasing the workers is also helpful. And run it as a daemon.

I found a that shutting down mariadb each evening for backups and then restarting helped as well.

Crazy-to-Bike likes this.

Friendica Support reshared this.

Ian Molton

1 year ago

Is there a good way to identify them ?

Friendica Support reshared this.

Ian Molton

1 year ago

and clean up the damage...

Crazy-to-Bike likes this.

Friendica Support reshared this.

Sean Riley

1 year ago •

*.activitypub-troll.cf

Friendica Support reshared this.

Ian Molton

1 year ago

Yeah, I blocked that one ages ago.

Is it really the only one? Seems unlikely.

Friendica Support reshared this.

Crazy-to-Bike

1 year ago •

@Sean Riley @Ian Molton

I have these domains in my server block list since my instance was under attack from *.activitypub-troll.cf:

*.activitypub-proxy.cf
*.activitypub-troll.cf
*.gab.best
*.misskey-forkbomb.cf

@Ian Molton @Sean Riley

Friendica Support reshared this.

Sean Riley

1 year ago •

@crazy2bike
That looks pretty good, how many jobs are in the worker queue? How is CPU utilization? Memory usage?

@Crazy-to-Bike

Friendica Support reshared this.

Ian Molton

1 year ago

did I miss a reply here? I can't see what the above post is in reference to?

Friendica Support reshared this.

Sean Riley

1 year ago •

@crazy2bike
Your list of 4 troll exclusions

@Crazy-to-Bike

Friendica Support reshared this.

Ian Molton

1 year ago

I already had the .cf one

none of them were the issue in this instance though.

Friendica Support reshared this.

jesuiSatire …ᘛ⁐̤ᕐᐷ

1 year ago •

> Your list of 4 troll exclusions

* me
* myself
* my first secondary account
* my second secondary account

@dogriley
@spyro @crazy2bike

@Ian Molton @Sean Riley @Crazy-to-Bike

Friendica Support reshared this.

Michael 🇺🇦

1 year ago •

This is totally unrelated. The workers are not affecting the frontend processing and aren't a very good indicator for any frontend performance issue.

Friendica Support reshared this.

Ian Molton

1 year ago

I think the workers were pushing the machine into swap when large qeries are caused by the frontend (I've now seen a query over 600MiB)

There is no way the system can handle >1 user if its going to cause 600MiB of disk reads every time they access the site!

Either something is wrong, or there is an incredibly inefficient query running somewhere, or a query is being run over an unexpectedly huge data set - but I have no way to know.

Whatever it is, it's crippling the machine - it seems to cripple it gradally enough that it doesnt immediately OOM (sounds long-query-ish to me), and it doesnt prevent more workers running - the machine is being killed by multiple threads seemingly eating ram whilst being "blocked" by a large running query.

Eventually the machine gets so bogged down that it can't serve anything. I can still log in (just - last time it had a load average of >30), and if I kill / restart mariadb, it will carry on, but it seems to be living right on the wire - a good push will cause it to start losing the battle and bog down again.

Crazy-to-Bike likes this.

reshared this

Michael 🇺🇦

1 year ago •

I like to go for an analytic approach. First please let us trace the requests. There we can quickly distinguish if the delays are possibly caused by a lack of available /Apache/Nginx/PHP connections or if the database is the bottleneck.

Depending on the result, we then can change some parameters.

Friendica Support reshared this.

Ian Molton

1 year ago

*how* ?

Crazy-to-Bike

1 year ago •

I had this problem too. But even banning those spam domains and cleaning the database from their entries didn't have the effect that my server worked fine again.

The load stayed high and the responsing time was bad.

So I'm sure that in my case the change to another server with dumping and restoring the datebase was part of the problem.

@Ian Molton

Friendica Support reshared this.

Michael 🇺🇦

1 year ago •

Please have got a look at the requests, that your system receives. Switch the log level to "debug" and then run that command:
tail -f friendica.log | grep -a Request.*'"runFrontend"' (Of course you have to change it to the name of your log file)

With that command you will see incoming requests. "Request received" tells you, that the request just arrived, and both "Request processed sucessfully" and "Request processed with exception" tells you, that the processing is done. BTW: Don't be confused by that "with exception", if the response code is in the 200 range, then everything is fine.

With these log entries you will see, if there are delays before the processing starts or during the processing. Depending on that, we should see, what to do then.

Ian Molton likes this.

Friendica Support reshared this.

Ian Molton

1 year ago

There is a LOT of activity when I do that - I can't really tell whats going on.

I can see that most things seem to be getting 200 responses, but there can be long pauses.

Friendica Support reshared this.

Michael 🇺🇦

1 year ago •

You have to trace individual requests. For example you can filter for network requests. Then you can see, when you sent your request via browser, when it appeared in the system and when it has been processed. We need that data to really see the origin.

Friendica Support reshared this.

Ian Molton

1 year ago

How?

Is there a set of tools? is there a debugging guide?

Friendica Support reshared this.

Michael 🇺🇦

1 year ago •

See the command that I mentioned above. You can extend the filtering to only see requests to the network page. Then you can see the results.

Friendica Support reshared this.

Ian Molton

1 year ago

Michael... please remember...

I ***dont know what I'm looking at*** here.

I don't know how friendica works internally.

things like... "What is a Gserver? Why is it updating? Whats the difference between updating gserver*s* or in the singular?

What is Delivery? Why is there APDelivery?

What do all the things in the queue *MEAN*?

I can add a "grep network" to the suggestion you made earlier, but it //tells me nothing//.

Currently, My instance is responsive enough to the "home" button, bt the network page is still causing 504 errors.

This *reeks* of runaway query, but I dont have any idea where to start looking for it.

Friendica Support reshared this.

Michael 🇺🇦

1 year ago •

Okay. When you get these 504s, then please perform this command:
tail -f friendica.log | grep -a Request.*"GET /network".*'"runFrontend"'
This will show you all the network requests. You will see the first entry, once the system received the request. When you don't receive an entry there (or only sometimes), then please consult your apache/nginx configuration and increase the number of possible connections or possible php processes.

Friendica Support reshared this.

Michael 🇺🇦

1 year ago •

Filter for example this way:
tail -f friendica.log | grep -a 'Request processed sucessfully'.*'"runFrontend"'
There you can see the successful frontend requests and their performance (in the field "duration"). Perform some frontend requests on the network page and see if the duration is high (multiple seconds) or not.

Friendica Support reshared this.

Ian Molton

1 year ago

Trying this, I see nothing a the start of the request, and ~20 *seconds* later, I see the following.

I can't see any "duration" field.

2023-11-12T15:12:15Z app [DEBUG]: Request processed sucessfully {"response":200,"address":"81.187.24.115","request":"GET /network HTTP/2.0","referer":"https://social.weyr.org.uk/admin/queue","user-agent":"Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/113.0"} - {"file":"App.php","line":718,"function":"runFrontend","request-id":"a6f17cc40134fbcb0a68ccd553c15016","uid":"750f76","process_id":1224165}

Friendica Support reshared this.

Ian Molton

1 year ago

grepping for "duration" it does come up occasionally.

Friendica Support reshared this.

Crazy-to-Bike

1 year ago •

I had such problems after migating to another server with dumping and restoring the datebase.

Server load up to 14 with 8 cores, long loadinig times of the pages, no reload of the timeline with apps like Fedilab or Tusky.

At the end, I made a fresh installation at the new server and now everything is smooth.

Server load of < 1, quick responses, renew of the timeline in apps.

@Ian Molton

Friendica Support reshared this.

Ian Molton

1 year ago

Yeah, but I've got like 30 users - periodically deleting the system and reinstalling it is not an option.

Frankly, if there is an O(N^2) or worse algorithm in play, then this is only going to happen again.

The culprit is looking like low memory - not something I thought would be a problem on a machine with 4GiB of RAM, with little else to do...

Friendica Support reshared this.

Ian Molton

1 year ago

Update:

I've not done much - I've increased the php-fpm timeout in nginx so that some of the longer requests complete, bt this is looking like a database issue.

The VPS i'm running friendica on is essentially idle otherwise, has 4GiB RAM and two decent CPU cores.

Requests into the server aren't taking up much of its time, bmon shows ~4-15PPS a burst or two of 50-300PPS every 60 seconds or so. Peak bandwidth is about 2MB/s typical is more like 10-100K/s

Last night, mariadb crashed - I had increased the size of innodb_buffer_pool_size from 2 to 16G (yes, I know this is > RAM+SWAP). I reduced it down to 4 again, and it's seemingly ploughed through the worker queue as of this morning, but the whole site is still VERY slow.

Right now, from an adminn point of view, I see the biggest problem in friendica is that there is literally ZERO ability to introspect its behaviour.

I can look at the "worker queue", but there is no way to know what any of that means short of dismantling the code, which I dont have time to do.

I can't search the worker queue for issues in any meaningful way, and even if I could there is no mechanism to inspect / suspend / pause / cancel entries in the worker queue.

If I could suspend the entire queue, and selectively "gate through" certain entries, it'd be enormously helpful.

Lastly - of the "known" threats out there like activitypub-troll, is there a list of workarounds or a blocklist? HOW can we identify these malicious sites and keep our instances safe?

Friendica Support reshared this.

Ian Molton

1 year ago

What is an (honest) minimum spec for a friendica node? I have to admit, whilst I was expecting a largish storage requirement, the RAM requirements are very mch higher than anticipated.

Friendica Support reshared this.

Crazy-to-Bike

1 year ago •

As a reference point:
For testing I installed friendica on a shared webspace with 1 gb ram and 16 customers per core as single user instance.

It worked, but it was not perfect for a permanently productive instance.

Now my instance (until now no other users) is on an vps with 8 cores and 12 gb ram with a load < 1.

@Ian Molton

Friendica Support reshared this.

Ian Molton

1 year ago

was that including the database? 1GB RAM? how many users?

I couldnt get friendica to stay online in that scenario - the database (mariadb) just ate all the RAM.

Friendica Support reshared this.

Crazy-to-Bike

1 year ago •

Yes. As described it was a first single user instance for testing if I really change from mastodon to friendica. It worked, but it wasn't performant.

I think that my problems at the new server came from problems with the database that began at the shared webspace.

I dumped and restored the database there for testing the recovery scenario and the virtual view-tables couldn't be restored. I installed them by using the friendica cli and then used that database furthermore. And I dumped that database again to migrate to the new vps.

Beside the problems from friendica being under attack at the new server, I think the database once not restored properly was the reason I had to reinstall friendica from scratch 🤷‍♂️

@Ian Molton

Friendica Support reshared this.

Sean Riley

1 year ago •

@crazy2bike
I wouldn’t run it on a shared server. While possible I found resource requirements high to have a performant instance.

@Crazy-to-Bike

Friendica Support reshared this.

Ian Molton

1 year ago

(btw - the instance seems to have resumed normal (but sluggish) operations - most stuff seems to happen in <10 seconds.)

Friendica Support reshared this.

alastair87

1 year ago •

I just tried to sign in and it's been loading for a lot longer than ten seconds. It hasn't got to the 2FA authentication page yet.

Friendica Support reshared this.

alastair87

1 year ago •

Now it's 504'd.

Friendica Support reshared this.

alastair87

1 year ago •

For Firefish I was able to restore from an earlier backup than the most recent and it was able to recreate most of its state and carry on working. I don't know if Friendica can do that. A caveat being I don't have anyone else with a local account and could live with my profile pic reverting etc.

Friendica Support reshared this.

Ian Molton

1 year ago

yeah, I'd like to not start over. but 2 mins to load the profile page is going to have users giving up on it...

Friendica Support reshared this.

alastair87

1 year ago •

Or it didn't load at all when I tried it.

This is the resource usage for Firefish if you want to compare (Meilisearch is not necessary, it's for the full text search and I should probably switch to the much lighter weight Sonic):

Friendica Support reshared this.

alastair87

1 year ago •

Also using 56GB on disk, just for me as a single user, but then it's newer than Friendica and probably correspondingly less efficient . . .

Friendica Support reshared this.

Ian Molton

1 year ago

It'd be hard to be less efficient than this!

Friendica Support reshared this.

alastair87

1 year ago •

Indeed, but it sounds as if Friendica/MySQL is actually broken in some way rather than functioning as intended.

Friendica Support reshared this.

alastair87

1 year ago •

Ideally you'd just go back a day or a week or something. I had a corrupted backup archive on my cloud storage so I restored from an older backup kept on my local machine that I sync less often and it didn't revert everything.

Friendica Support reshared this.

alastair87

1 year ago •

If you do end up starting over I wonder whether it's worth looking for instances of the problem happening with postgres in case the issue is mysql/mariadb specific for some reason.

Friendica Support reshared this.

alastair87

1 year ago •

If it supports postgres which it looks like it might not so it may be moot (and if that would even help anyway).

Friendica Support reshared this.

Ian Molton

1 year ago

It doesn't.

Friendica Support reshared this.

AndiS 🌞🍷🇪🇺

1 year ago •

Have you seen the following issue on Github?

https://github.com/friendica/friendica/issues/13373

There are two separate problems described there. But If you are not running an old version of MariaDB (<10.6) you might have the same problem I experienced. I solved it by increasing the innodb_buffer_pool_size to 16 GB, original description here:

https://github.com/friendica/friendica/issues/13373#issuecomment-1712594367

Very slow after the latest 2023.05 update · Issue #13373 · friendica/friendica

This is something so weird and I still do not understand it. Context. This is our instance https://social.trom.tf . Our database is around 30GB in size. We use a daemon for Friendica with these con...

^GitHub

@Ian Molton

Friendica Support reshared this.

Ian Molton

1 year ago

Hi, this was the reason I tried increasing innodb_buffer_pool_size to 16GB...

Whats the cause of this (colossal!!!) increase in memory usage / requirements?

Friendica Support reshared this.

AndiS 🌞🍷🇪🇺

1 year ago •

The answer to this question also interests me. In my case it happened at the 2023.05 update. This fixed my issue on a system with 8GB RAM. Did it help in your case? @Ian Molton

@Ian Molton

This entry was edited (1 year ago)

Ian Molton likes this.

Friendica Support reshared this.

Ian Molton

1 year ago

I'm not lucky enough to have any way to try.

I can say that increasing it from 2G to 4G (still less than the 8G default in debian) made a substantial difference.

I *cannot believe* that a minimum requirement for a database of this type is over 8GB.

That's insane. I wrote production systems using mysql in 1998, back when 8GiB RAM was a total fantasy, that ran large scale SMS <-> Email gateways, and my machine could saturate 100MBit ethernet.

it had a 266MHz PII and 128MiB RAM, running FreeBSD IIRC.

What the actual?

Friendica Support reshared this.

AndiS 🌞🍷🇪🇺

1 year ago •

I'm with you on "what the actual...."

But is innodb_buffer_pool_size really in RAM? Because my server only has 8GB of physical RAM and it still works "good enough" for my single user instance - while having the innodb_buffer_pool_size set to 16GB.

@Ian Molton

Friendica Support reshared this.

Ian Molton

1 year ago

I don't know. But 2GB is not enough, and 8+GB seem to be too much...

The actual database is far larger anyway (41G) but the machine is 64 bit, so mapping that in (virtual) memory is a non-issue.

Friendica Support reshared this.

Ian Molton

1 year ago

I'm not running an old mysqld btw - mine reports 10.11.4

Friendica Support reshared this.

Ian Molton

1 year ago

Just tested after midnight for the first time - 680MiB of mysql reads caused by opening /profile/spyro

That seems like ... a lot?

Friendica Support reshared this.

Ian Molton

1 year ago

I'm still experiencing extreme slow page loading.

I restarted mysql when it stopped working (again), and after it restarted, loading the home page used well over 1GiB of RAM (I watched the RSS in top as the queries executed).

At this level of resource consumption, my instance can scale to maybe 2 simultaneous users.

This is clearly insane.

Can anyone actually help, or is this just going to be another case of the problem going unfixed?

If so, has anyone got experience migrating their timeline from friendica to something that doesn't require a datacentre to support more than one concurrent user?

⇧