Skip to main content


One thing I've enjoyed about hosting my own #friendica server on the #fediverse is being able to dabble with the addons. Such as the language detection feature which will auto-hide languages that aren't on your list of known languages. Works pretty well.
It still is hit-and-miss with mostly image posts and quote-share posts. But I'm still happy with it.
Yes very. I was surprised it was triggering at all in those since the text is below the threshold but I think it is counting raw text not just the displayble text (i.e. it is going through all the markup, URLs, etc.).
Yeah, we don't have a good way to extract the displayable text from the markup yet.
Is the plugin operating client side or server side? If client side it could be run through the HTML parser which should be able to break out the markup from non-markup and then just run the non-markup through that. It would then only run on text added to the quote text. So if I did a simple reshare of German it would not trigger at all, as an artifact though.
It runs server-side but we have the HTML as this stage so we possibly could do what you're saying.

Hum, looking into it, it seems we already are either stripping the tags of the HTML output or converting the BBCode to plaintext if we don't have the HTML output. The latter would be imperfect as image and link URLs would end up showing in the text we then parse to guess the language.

Hmm.

Ok, I found the issue. Removing the tags doesn't remove the whitespace, so we run the language detection on messages that have very little content, but are reaching the minimum thanks to the spaces and tabs. I'll have a fix shortly, it should prevent most false positives, especially with share posts that are heavy on HTML tags with a lot of indentation whitespaces.
Haven't found a solution for Wordle posts interpreted as Dutch, though.

Yes hit-and-miss with posts and quote-share posts.

But also some languages are totally beyond. Portuguese is such an example. Almost all of the text posts in Portuguese are consistently identified as Spanish.

Does Italian get conflated for Spanish too?
Nope, it’s less similar than you seem to think 😉
I know my Italian grandfather could watch Spanish shows but I had heard that written language wise it was too different. I just started some Portuguese DuoLingo exercises last month. I was surprised how different from Spanish it was, not that my Spanish is great to begin with.
The language filter works by identifying common letter associations in a given language. So it is possible that the words are different between Spanish and Portuguese, but that the letter associations are similar, and the other way around, it's possible that Italian words are close enough to Spanish for humans to guess their meaning but the letter association would be very different.
Oh I thought it was doing a dictionary lookup of some sort. Good to know.
No, it's cheaper this way, but it's also less accurate, even before the weird message text we feed it.