YSK: Your Lemmy activities (e.g. downvotes) are far from private

redtea@lemmygrad.ml · 3 years ago

YSK: Your Lemmy activities (e.g. downvotes) are far from private

3 years ago

Well, this is a bit uncomfortable. I wasn’t aware that votes were shared as well; IIRC an admin said that the occasional downvote brigades we’ve had couldn’t be reversed without losing posts, so I assumed the votes were anonymized somehow 😐

I wonder how feasible it would be to only send an anonymous vote count for every post/comment to other instances, keeping the details within the instance’s database only. Federation is already “broken” when an older database has to be restored or even when someone is banned from an instance, so it doesn’t seem to be fundamentally necessary for all instances to match

More importantly, is more sensitive data also shared? I would hope that IP addresses, which posts you’ve viewed, etc. aren’t stored anywhere, or at least not forwarded to other instances

redtea@lemmygrad.ml · 3 years ago

That sounds like a good solution to me.

Just knowing that any such data is collected or shared, and when, could help to improve one’s privacy.

savoy@lemmygrad.ml · 3 years ago

There’s a lot of info and discussion on this post that explains why. Pretty much that voting has never been private on other platforms as votes must be tied to users, otherwise users could add more than one vote per post. And this data must also be federated so that other instances’ posts are also safeguarded.

Lemmy isn’t designed as a privacy platform, it’s a socia media type link aggregator powered by ActivityPub. And with this federation brings decentralization, where it’s possible to not share data with other instances, but it will have to be shared in some way with any linked instances. There are pros/cons to each style: the current issues with Reddit show the problems with centralization, and there’s going to be an adjustment period as more people join Lemmy who don’t already know about the Fedi.

redtea@lemmygrad.ml · edit-2 3 years ago

I see that, but I didn’t vote on other platforms. I knew I’d be giving data directly to the owners and their five eyes operatives. I know this platform is public, so I’m careful with my words. Now I know I should be careful with voting, too.

The people who want our details are incredibly creative at how to interpret data that seems innocuous.

Edit: that link is very informative, thanks. I should confirm that I assume everything on the internet is public in one way or another and confirm that I don’t have any major concerns with the general security architecture of the Lemmy software or the way Lemmygrad is run. I just thought it’s something that we should talk about to make sure that we’re not increasing the chances of being doxxed by giving away useful metadata.

Edit 2: when I say, ‘I don’t have any major concerns with the general security architecture’, people should know that I’m not qualified to judge this from the coding side of things!

savoy@lemmygrad.ml · 3 years ago

For sure, people definitely should be educated on what data is open (posts/comments), closed (voting on Lemmy as kbin seems to show them publically), “private” (DMs which are explicitly described as not private and to use Matrix etc. for actual encryption), or secure (Matrix). I feel like a lot of us on Lemmygrad are aware of privacy more than the average netizen, but it wouldn’t hurt to have a primer for new users.

I think for social media the best thing would just be compartmentalization of identities, so the usual advice of don’t give away too much of who you are and keep usernames separate unless you want them to be connected/known.

Marxine@lemmy.world · 3 years ago

A pinned post with this information would probably go a long way for new users. I didn’t know that until you pointed it here :')

But definitely, being careful with this data can help against brigading and other risks.

redtea@lemmygrad.ml · 3 years ago

A primer would be useful. Especially from someone who knows what their talking about! My knowledge has served me well enough but it’s basic.

The Free Penguin@lemmygrad.ml · 3 years ago

“or giving away too many personal details” but hey! take our demographics survey!

redtea@lemmygrad.ml · 3 years ago

I see the irony. I suppose two differences are that, one, the survey was optional, and two, although participants might not necessarily know who was reading the answers, the answers weren’t (I don’t think) generally accessible.

Voting data is not generally accessible, either, but it appears to be accessible to admins of other instances. If this is the case, it wouldn’t be difficult to set up an instance, make a post through any account about a specific topic, and observing the data to see who up/downvotes it. This could narrow down the list of people that a bad actor might want to target for further data harvesting.

Considering how many billions are put into surveillance, data collection, and controlling (social) media, I don’t think we should discount this fear as farfetched. Even if, on its own, it’s fairly benign data.

I’ve also noticed that some of the new instances/users are a lot happier with ‘local’ communities. If this grows into, say, a list of ‘things to do in my area’ communities, it wouldn’t take much for a fascist to identify local targets by a series of what I’m going to call ‘voting traps’.

Maybe I’m paranoid.

Black AOC@lemmygrad.ml · 3 years ago

Exactly why I didn’t; I still don’t know how much data flows through lemmy-actual’s veins to be comfortable like that.

Red Wizard 🪄@lemmygrad.ml · 3 years ago

The nice thing about these projects is that the community can review what kind of data is collected because it is FOSS. Obviously, things like votes, comments, DMs, posts, and posts marked as read are all logged in the DB and tied to your account. One could extrapolate a lot about a person by extracting and doing data analysis against that data. Because of how federation works, it also means you can’t just rely on your instance operators to be trustworthy.

DM’s I think operate more like an Email message and likely are not federated in a way other instance hosts, other than the origin and destination hosts, can view. But the Origin host and the Destination host obviously could do a database query and pull your DMs. It should be noted, this is also true of Email unless you are using encrypted mail.

At some point, DMs could be built to be end-to-end encrypted with PGP if the devs/community desire that, but that’s not how it works now.

I’m sure that as a Lemmy Instance operator, you can also use your host server to log connections (read, IP addresses), but I’m unsure if Lemmy itself logs that information in its database alongside your account information (probably not?). You would probably want to log the connections as good operators, so you can find patterns and remove bad actors trying to say DDOS your box.

However, if more robust moderation tools were to be implemented, which include an IP-based ban, then that would have to be tied to your account to make it work.

There are platforms like Nostr for example, where everything is encrypted, even the primary content, and you have to provide the system some kind of encryption key to even view the feed.

redtea@lemmygrad.ml · 3 years ago

This is useful to know.

I’m fairly sure that Lemmygrad admins don’t collect any data except for knowing your country code.

Is that right? Maybe you know, @CriticalResist8@lemmygrad.ml?

CriticalResist8@lemmygrad.ml · 3 years ago

The only person with any access to under-the-hood stuff is muad’dibber, who I don’t think really cares about your personal info haha

otherwise the rest of the admin team doesn’t see anything you don’t, except the names of people who took actions in the modlog.

redtea@lemmygrad.ml · 3 years ago

I did think this was the case. It’s reassuring to have it confirmed, though, thanks.

Red Wizard 🪄@lemmygrad.ml · 3 years ago

To be clear, if I’m the operator of the Linux server on which Lemmy is deployed, I can use tools like tcpspy to pull information about the TCP/IP connections to my server on a given port. Obviously, this is simply the TCP/IP packets, which is going to be a sea of bots, internet crawlers, and other automated systems attempting to access my box because it’s on the internet, which will likely dwarf the legitimate connections to the box.

I guess it’s more likely though, you’ll be monitoring the box between the outside internet and the actual server (like an Nginx reverse proxy) but I could probably extract which IPs were going to Mylemmy.ml instead of my boxes IP address directly, but that’s outside the scope of my knowledge. I simply know that, once you own the box that someone else is connecting to, there is networking data you now have access to that can lead to identifying a person.

redtea@lemmygrad.ml · 3 years ago

Do you know if this would work if I viewed information on that instance/server through Lemmygrad? I.e. could the other instance associate my IP with my LG account even if LG doesn’t collect that data?

Red Wizard 🪄@lemmygrad.ml · 3 years ago

Content is effectively “Synced” to Lemmygrad. So if you are reading something from !technology@lemmy.world, you’re actually reading it from the Lemmygrad server located at lemmygrad.ml /c/technology@lemmy.world. So no, unless you are connecting to their domain address explicitly, those domains are not getting your TCP/IP connection.

I’ll note, this is why you should probably use a VPN to access the internet generally if you are concerned about privacy.

Your home network provider knows your IP address and what domain address (but not the pages contained within) you visit.
Your mobile provider knows your phone’s IMEI number and its assigned IP address (though I’m not 100% sure about this exactly because you might get a new IP depending on the tower or network you connect to while moving around the world). But your IMEI number uniquely identifies your device, and some website analytic systems capture that data. It should be assumed they have a historical record not unlike the one your ISP has.

Using a VPN means all your ISP/Mobile provider sees is you sending data to a VPN server over HTTPS, and that’s it. Services like NordVPN claim they do not keep logs on their system, and so if a government was to request your account’s history, they likely can’t provide that. They operate a Warrant Canary, here which in principle should give you faith that they have upheld that mission statement.

I use Nord on my phone and via their browser extension, and I have them set to auto-connect so they never turn off, and use their local discovery and bypass features to allow list sites and local services that get angry about my devices being on a VPN. I’ve been doing this for maybe two years now, and I’ve never noticed any material impact on the quality and speed of my connection.

This is not an ad for Nord 😅, I just think they’re practicing what they preach.

redtea@lemmygrad.ml · 3 years ago

That’s reassuring to know.

I would assume that security services can log everything even for VPN users. Plus, as Sakai observes, relying on technology for the security of revolutionaries is only part of the solution. The main risk comes from other people (who may appear to be comrades but are spies).

The state isn’t the only one to be worried about, though. With reactionary politics sliding into open fascism more and more, there are ordinary members of the public who don’t have access to the same powers as the state but who could get access to similar data if we’re not careful.

One thing I’ve always wondered is whether using a VPN puts you in a list for trying to hide your internet activity. They’re paranoid. I’m paranoid. It’s the age of paranoia!

RedTed@lemmygrad.ml · 3 years ago

I don’t see why anyone should care about votes, but if IP addresses were shared it would be concerning.

redtea@lemmygrad.ml · 3 years ago

It logs the timing as well, which could be sensitive data. For example if an employer were to gain access to this information and tie it to an account that someone thinks is anonymous, they’ll know when you weren’t working but getting paid to be at work. Or it could be used to determine when some is at home or out. Or be used as evidence for holding certain views.

It’s unlikely for a single employer to get that data. But I wouldn’t put it past the five eyes to set up an instance, mine data, and use analysts/AI to cross-reference it with other user metadata.

It’s like J Sakai says in his security pamphlet. It’s bad practice to give any information to feds, even ‘benign’ data because it helps them to build profiles on you and others. To offer a rather extreme example, if they know you were upvoting a comment at 16.07 EST, they can guess it wasn’t you at a protest at the same time. This means they can narrow down the suspect list to the other handful of people with your build, etc, who go to protests. Being lax with data means the feds have an easier time undermining the efforts of people who are tight with their data.

It’s a privacy issue that I hadn’t considered. I knew our admins could see this data. I didn’t realise it was visible to other instances.

RedTed@lemmygrad.ml · 3 years ago

Good points, I didn’t consider all of that either.

pnwml [she/her]@lemmygrad.ml · 3 years ago

Aaand this is particularly why I have a separation between my lefty lemmygrad account and my general use account. Don’t need to be witch hunted from other instances because of some liberal mod.

Darc@lemmy.world · 3 years ago

To be honest, as a dev very experienced in SQL, I’d love an opportunity to query info like this online, like SELECT-only permissions on a browser-based editor that removes any sensitive info before hitting the client. I’d love to play with this dataset and find cool trends. Am I weird?