twitter/the-algorithm
github.comWill Norris who works on OSS at Twitter posted this[0]: "watch this space https://github.com/twitter/the-algorithm"
[0]: https://twitter.com/willnorris/status/1518694675909013504
Oh hey - I didn't know he'd left Google. He was a big part of open source there too.
same team. different company.
Go team!
yehh
hello world
did the repo ever exist or was he joking, or teasing?
I just checked and its gone. But when it did exist it was just an empty repo no README even.
I don't understand the concept of open-sourcing "the algorithm".
First of all, "the algorithm" is probably hundreds of thousands of lines of code, including all the tedious boilerplate like cache policies and multi-AZ logic.
And second of all, doesn't the algorithm include machine learning components, which are trained on terabytes of data? That data will likely be impossible to open source. And open sourcing the neural nets without the training data is mostly meaningless from a transparency perspective?
Open sourcing is in this is not about the implementation or the CS algorithm. It is more about transparency. I think the idea is that the public should know how tweets are ranked, why tweets show up in timelines and which timelines, what makes tweets popular etc. Imagine Google publishing a document detailing how their system ranks pages aka publishes internal SEO rules officially. I don't know if it is a good idea or not. People with enough resources might be able to game the system (if they don't do it already).
The only change is that people with resource would stop guessing how to game the system, and start employing people to ensure they systematically game the system.
> if they don't do it already
That's an interesting point. A practical description of the algorithm from the perspective of someone trying to game it may be more useful than anything Twitter or Google would release.
Probably right but I think trying to be transparent is better than not trying at all.
Gesture carries weight to the users too.
Not sure any big company has tried this before. I could be wrong, but either way looking forward to it / FWIW hope it catches on.
Or is it six cats in a basement with a laser pointer and a mouse?
The point of releasing it is to let people know exactly why they see the tweets they do in the order they do. I hope Elon just goes back to time base ordering of tweets.
I personally know a FAANG employee whose full time job is building tools to try and help understand why the company's recommendation algorithm picks the things it does (and more specifically, predict how changes to the algo will affect that).
Even the people who build these systems barely know what the algorithm is going to do, much less why. It will be a herculean task to try and convey that to an average user.
Right. Also, it takes a new eng hire N years to gain a working understanding of the small subset of code that their team produces.
Disclaimer: I'm pretty new to twitter, so I may be misunderstanding something. On my Twitter Home Screen there are three little stars on the upper right of the central container which allow you to toggle between "top tweets" (ie. 'The Algorithm') and "latest tweets" which is the time based ordering.
What you are misunderstanding is that there is more social clout to be gained by complaining about twitter than exploring the available features. The ability to sort by latest doesn't stop people from feeling self righteous by demanding this feature exist.
You understand it, it's just like instagram, facebook, reddit etc, you let them pick something for you via some algo or you order by date.
Maybe it's a room full of lava lamps.
This is pretty standard in the machine learning world. You'll open-source the code and weights trained on a public data set (these are often licensed specifically for non-commercial use). But in production, you'll be using different weights trained on a proprietary data set.
The whole point of the repo seems to just show that there's nothing called as "the algorithm" It's probably something like 100s of 1000s of algorithms doing different things
You can open source the model code.
And developers will be able to train a model using it on a subset of Twitter data. Just that the quality of the outcome won't be the same as having the full set of Twitter data.
If it's too complicated, there is a good chance Elon will ask to simplify it until it can be open sourced.
Ever heard of Pseudocode which happens to be human-readable? If they are really going do it they will release source code for programmers and in general computer scientists to analyse and on top of that they will release pseudocode which non-techincal people can somewhat understand.
This seems to be a practical joke by a Twitter engineer as opposed to an actual release.
Could you take it any other way? I mean obviously you could, because most here appear to be taking 'the-algorithm' very seriously .. but, seriously? It's funny. No joke.
It seems more like a symbolic gesture.
Or a commit is being prepared? It's only 20 minutes old.
It just seems unlikely that the algorithm would be open-sourced right after a deal for Twitter is agreed upon (but before it actually goes through). I've never seen a buyout of this scale done by an individual, but I imagine the SEC and several other parties will need to be involved.
At the minimum, I would make a private Github repo first, add all relevant commits, and then make it public once there's actually content.
Sure it's fast but it seems the deal is done. A practical joke by an employee seems more unlikely than an official release.
The engineer from the OSS team at Twitter linked to it and said "watch this space"
https://twitter.com/willnorris/status/1518694675909013504
Sure I guess it could be that engineer gearing up for a joke, but I think it's more likely that it's a real release.
https://twitter.com/elonmusk/status/1518677066325053441 states that Elon Musk is to looking "make the algorithms open source."
Is this supposed to be a joke? It's clearly an empty repo.
Either this is a mistake, or this is a really, really misguided attempt at a joke from Twitter.
The tweet announcing it was captioned "watch this space":
https://twitter.com/willnorris/status/1518694675909013504
Which seems like a promise they intend to actually open source something there.
Who is that guy even ? Is he even associated with twitter ? Calling it an announcement seems like a stretch
His website claims he’s open source lead at Twitter: https://willnorris.com/
For anyone wanting a non-empty version, check out my article on how Twitter's algorithmic feed works from last week https://transitivebullsh.it/oss-twitter-algorithm-part-1
How you _think_ it works from 1000 miles above.
It wouldn't surprise me. It wouldn't be the first joke about him and his buyout.
It was created 20 minutes ago. Maybe a WIP?
For a company of Twitter's scale and resources, any public repos are supposed to go through legal to clear (this is how things work at every FAANG I've worked at).
So if it was a WIP, it'd be a private repo until it's ready to release publicly.
Well, all FAANG companies are public.
Many people have commented that it is empty. However what they do not realize is that there has never actually been an algorithm and that is why it is empty.
This is most likely the correct answer. I doubt there is a single piece of code or algorithm that controls all of twitter. From my limited perspective it's just a catch phrase to simplify what is likely tens of thousands of lines of code.
Well, that's an algorithm. If you have to draw a diagram showing how edge cache nodes affect a user's results based on whether or not they signed in from New York or Phoenix, or if the downtime of a cluster affects the relevance rankings of tweets for a photography enthusiast, then guess what...
That's an algorithm.
This shouldn't be downvoted. Any black box with inputs and outputs is a function over discrete data, otherwise known as an algorithm.
> a function over discrete data, otherwise known as an algorithm
'A function over discrete data' is a far broader concept / far larger set than 'an algorithm'.
I think it is unfortunate that "the algorithm" has become the popularized phrase for the thing, though. Yes, clearly an algorithm which evaluates the entire state of the universe and uses that as an input is still an algorithm. But it is also clearly not what people are looking for.
I'm pretty sure people expect to be able look at The Algorithm and find the part that is trying to destroy <whatever their hobby-horse is>. I bet whatever we see will lean more toward "big dumb pile of A/B tests."
I don't disagree, I might even argue that twitter is comprised of many algorithms. We can argue all software is algorithms, or we can just call it software.
That just moves the terminology around and doesn't answer people's questions. Everyone would simply start asking, "what software or lines of code directly affect my timeline?"
The "timeline algorithm" is a perfectly normal way to describe what people ask about.
Yes it's an algorithm, but as software engineers we shouldn't pander to people who actually think there is a small flow chart dictating what happens in your search results. It's idiotic or dishonest.
I mean I'm sure Google has somewhere a UML chart of the entire process of ordering results. Your insistence otherwise is disingenuous at best, either that or you've never actually worked on any sort of major software, at least with any level of reliability/dependability.
> I mean I'm sure Google has somewhere a UML chart of the entire process of ordering results.
No, there isn't.
Sure they have a chart but it's not "if !idpol -> make woke instead".
The level of discourse I'm trying to convey like what we saw in the Zuck/Pichai hearings in Congress. No understanding of the domain at all.
You or I know how software actually works, your average politician or man on the street does not.
Yes there is an algorithm. It may not be concise, it may not be found in one place, and nobody alive may be able to explain it to you, but the existence of timelines that are ordered certainly implies the existence of 'the algorithm' that is used to order timelines.
I mean even if timelines were totally random, or based on some external facts, there is an algorithm that is being used to order them.
This isn't just an academic distinction. Claiming 'there is no algorithm' because the algorithm is intentionally or unintentionally obfuscated or complicated has implications if that claim of 'no algorithm' is accepted. If my algorithm for approving mortgage applications is explicitly racist, I can just spread it's functionality across myriad services owned by lots of teams, make it almost impossible to figure out how it works, and then avoid any responsibility by saying 'there is no algorithm to decide loan approvals'? That would be bullshit!
There's not an algorithm to order timelines?
If it’s a list of tweets ordered based on a kajillion ML data points that varies per user is it still an algorithm?
And does every user have their own algorithm?
And could it be made readable to a human?
Yes, irrelevant, and the code somewhere is readable.
Take an example like gravity: is there an algorithm for how gravity acts even though it varies per each molecule based on a kajillion other molecules? Of course there is.
If I have your DNA can I recreate a perfect copy of you as you are now including your mind? It is after all the code you were made with.
The point is not to recreate it or explain it, or to have a way to say "tweet X will get rank Y" but to open source what there is. It doesn't matter if no single person can explain it, it'll still be open source and readable and the types of inputs will be known. But this doesn't look like a discussion on open source and more about politics at this point.
I for one haven't worked at FAANG and think it'll be super interesting to read this code, I can't believe software engineers are complaining about a potentially super complex bleeding edge codebase for timeline recommendations / ranking being open sourced. This is going to be great reading material if it ever gets published!
Nobody is trying to “recreate a perfect copy” of Twitter. Do you think that code which produces random numbers is also not an algorithm because it doesn’t recreate a perfect copy of a previous random number?
No you can't, you can only make a twin, but the mind is shaped by experience also and not just genetics. As is the body.
In that case, the definition of “the algorithm” is the training set.
> the definition of “the algorithm” is the training set
Or the trained model itself. There are people looking for intentional bias. But the insidiousness of the problem likely arises from unintentional bias. Letting researchers brute force the ranking models with hypotheticals could be a win win.
Is there any reason to not think that like most human-created systems it has a mixture of intentional and unintentional bias?
Twitter itself is probably the training set for twitter.
In that case, the algorithm is the labels.
Turns out, it's been RNG this entire time.
If that's the case, let it be client side RNG and not an RNG run server side, thanks.
what ups
ORDER BY timestamp DESC;no self respecting PM would ever allow something so trite and simple. this was obviously copy&pasted from stackoverflow. something like this would never web scale, so obviously they need somethign written from scratch preferably in Go/Rust/Node/anythingButSQL
The one true algorithm!
There probably is not "an algorithm" on a site as large and complex as Twitter, no. There are probably dozens if not hundreds of algorithms spread throughout the codebase which affect the timeline for individual users, possibly even code entirely self-generated by ML systems.
Well sure there is "ORDER BY" something, and there are batch jobs to calculate this something from different variables. Maybe it is partially a ML black box, but I'm pretty sure they are not flying completely blind. There's got to be an internal Wiki or some PPTs where they list what goes into the mix. Number of likes, retweets. Attached media, hashtags, trending or not. Do they do sentiment analysis and use positivity as a factor? Do they do PageRank so that likes from important people have more weight? Are there manual debuffs for certain topics? Are certain posts removed or added in a second pass? These are all answerable questions in principle.
Maybe not. It could all be event-driven, distributed, and just pulling off various prioritized queues.
that's still an algorithm
We'll, it's ok if you feel that way given this is entirely subjective, but I disagree based on experience.
I disagree b/c: That's a system. An algorithm is designed, a system will emerge from pieces. An algorithm can be defined, a system's behavior has to be characterized post-hoc. You can characterize a system, but only as a black/gray box. An algorithm has invariants and stateful steps, a system has nearly infinite state and nearly zero meaningful invariants.
The trouble is, there could be literally millions of ORDER BY queries. Hell, each user could be given a fine-tuned cnn…
In the end, it's still code that is being executed, which can be called "an algorithm". Algorithms are under no obligation to be simple or only include "one algorithm".
So then "open sourcing the algorithm" likely means opens sourcing the entire Twitter codebase?
At this point, what harm would that do to Twitter? The company's tech has been stagnant for years, and the only real features recently have been attempts to exploit Twitter's network effects to ape Clubhouse & Substack. (Seems successful for the former, kind of a failure for the latter.)
do we know? has any angy ex-twitter engineer revealed the secret sauce to the world?
The aggregate of a set of systems that maps inputs and state to outputs and state updates is...
still a system that maps inputs and state to outputs and outputs and state updates.
A collection of algorithms is just a bigger algorithm.
Turns out they never actually scaled their mailbox pattern. As a result, what you see is what they delivered before timeout.
just birds poking random tweets
This would be some pretty funny semantics... but, however it works, message is the same: Transparency for how these work
- Search results
- Comment Order
- Timeline Order
- Trends
- Human vs code
Personalization in general. Big gigantic “why” when it happens to you
Of note for anyone confused and reading the comments now, the link was to an empty, but real, twitter repository at https://github.com/twitter/the-algorithm. Now it is a 404.
Here’s the play: interpret “the algorithm” in a stupid way, and then claim it doesn’t exist, and then make a joke about it on company repo.
You know what? This does demonstrate the internal problems inside Twitter and shows the need for shakeup.
That sounds like a variant of this joke: https://picturesofpeoplescanningqrcodes.tumblr.com/
Joke's on them, in the last 2 years QR codes made the greatest upset of all times
If there is a total set of data, and a subset of it is produced for users, there is an algorithm. `SELECT TOP` is an algorithm. A ML model with a trillion parameters is an algorithm.
There is no "no algorithm."
You're not wrong. But would it be useful to define better what we mean with algorithm then? Like: we mean "non-obvious algorithm"?
You and I might be technical enough people to suss out little details, but I suspect the average person probably also wants to know whether or not something like Twitter testing a feature flag (Yes, really!) is affecting their timeline.
If it affects results in the slightest, I think that's what people are asking about.
Yeah, if the new senior dev screwed up a deployment and now football fans in Dallas are suddenly getting fewer posts about the most recent athlete scandal all because a regional deployment was the only thing doing relevance record keeping, then yes, that's a part of "the algorithm."
We have a leak revealing a "trends blacklist" at the very least. Some of their measures are detectable too: https://taishin-miyamoto.com/ShadowBan/
As far as I'm concerned, the algorithm is and can only ever be an implementation detail. If the leadership of Twitter changes overnight, it'd be highly likely the types of content that gets recommended will subtly/not-so-subtly change... because the algorithm has been changed.
It's an interchangeable function, it would only be publicized if it's clear to leadership that it wouldn't affect their revenue if people started trying to game towards the published algorithm.
I feel like this is a line from The Program (https://www.programaudioseries.com/)
You don't think recent news of the EU compelling social media companies to disclose their algorithms has anything to do with this?
No, the repository is a joke (IME). Regarding the EU, the details haven’t even been nailed down yet, and the timeline is that it will only become a requirement in 2024 I believe. The goal also isn’t to publish “the algorithm”, but to give researchers and civil society representatives access to the training data and information about the most important factors controlling the algorithm, so they can make assessments on the basis of that. It is not about open-sourcing code.
They don't necessarily have to publish code to the repo though, they could just as well publish papers that document their algorithm.
This, personally I was thinking in light of the new upcomming sale, the current Twitter staff might decide to 'open source' some algorithm. Which would be an effective method of protest against a sale like that.
But there definitely is a relationship algo that could be considered theirs, like all social medias inflating the bubbles users all feel.
There's clearly an (probably shifting daily) algorithm. I'm not sure what the statement the site is trying to make. Even a random number generator that selects tweets at random for your feed is an algorithm.
Twitter is mostly an infrastructure operation imho, someone school me. It’s a scale-based problem space, how do you get all these tweets out in real time at minimum, and at best, how do you do some level of topic bucketing on top of it.
Right but "the algorithm" is inextricably tied in with scaling. "the algorithm" is designed to handle updates at scale and the model probably has different parts updating to different events on different cadences.
I've worked on very large scale recommendation systems at a FAANG. If Twitter's system resembles anything like ours, the concept of publishing or open sourcing "the algorithm" doesn't make sense.
Even if we were to open source all associated code and publish all related documents it would be very difficult to make sense of the entire system. That is precisely why companies such as Twitter A/B test the hell out of everything. What most people think of as "the algorithm" is a complex system that receives many inputs (maybe hundreds) and has dependencies on many other internal Twitter services. Tweets likely pass through multiple filtering steps as well as scoring before you ever see them. Each of these steps is highly contextual, depending on: location, past tweets, verification status, etc. You can attempt to predict the effect of a certain change, but you never know the actual outcome until you test it.
I think what will ultimately happen is that _some_ details will be published. Elon will parade that around as a victory for free speech as Twitter is now more "open". In reality, nothing of value will be gained as "the algorithm" isn't a simple function.
As someone who’s also worked in this area, I disagree with this take.
There is typically clear objective function of a recommendation system.
What Twitter is optimizing for is what’s of interest here. And some of the hidden business rules. It’s likely these are specified in the code in an obvious way.
How exactly they achieve that is the part that is complex and relatively indecipherable.
It’s possible that it’s designed in such a way the optimization objectives are also unclear, but that would indicate a bad design and be to the detriment of the company and users.
Yeah seriously. If they are not able to put into words or diagrams what the algorithm is doing then the company itself has no idea how it works. And that to me would suggest it’s far from optimal.
Many complicated research papers have had no issues describing their models at a high level. This should be no different.
Twitter PMs would argue they've already communicated the objective function: relevance and engagement.
My point is that the devil is in the details and implementation. These details are likely something that no one person understands and no one person is able to fix. The concept of being able to extract "the algorithm", factor it out from the codebase and share it with the public doesn't make sense to me. It won't be possible to fully understand how Twitter serves recommendations and ranks posts without understanding how all the different services at Twitter interact. Are they planning on open sourcing all of Twitter? Highly doubtful.
Yeah, this makes no sense. There is no golden algorithm that Facebook, Tiktok or Twitter has figured out.
All these feed rankings are complex combinations of features, models coupled with weights and filters. On top of this abuse detection layers are added.
Unless Musk is planning to open source user data to show what all the "scores" and "features" for all the entities are and how they were reached to, this will make no sense. The whole argument against some people being downranked has been, why me? Just writing a whitepaper to tell the general methodology, is not going to make that go away.
On top of that, exposing every vector through which you measure and stop abuse, will just allow for more sophisticated abuse.
Can you mention the source of your information?
The openness for the algorithm is to make public how the algorithm works internally. It has nothing to do with how "novel" or "good" or "bad" the algorithm is. It is just a way to check if there is anything fishy going on in there. I don't see what the fuss is about if it only makes the defacto town square of the internet more transparent and open.
Twitter isn't the town square of the internet. The internet is the town square of the internet and you aren't owed a fair ranking in relation to a certain search term user or combination thereof because no singular definition of fair.
> Twitter isn't the town square of the internet
Maybe to you. You are entitled to your opinion. I won't censor you for having a contrary opinion.
> owed a fair ranking
Well we are getting it whether you like it or not.
> because no singular definition of fair
Of course there is. If New York Post's tweet about Hunter Biden's story gets blocked, I would like to see Taylor Lorenz's story about LibsofTikTok get blocked in a "fair" system. Since both are doxxing after all.
I don't believe the point was ever to convey an complete understanding of the system, but to allow for the identification of "red flags": Specific functions that would be deemed controversial or counter to equal opportunity speech.
Is the idea somewhere in the-algorithm there's a function called `derankGOPMembers()`?
Twitter's been pretty transparent in how it "deranks" certain accounts [1]. What more would come from opening the code that certainly not include the actual database of "no no terms" (if you were to believe that exists)?
[1] https://blog.twitter.com/en_us/topics/company/2022/our-ongoi...
The idea is that people should be able to find out if there is such a function. It's taking a stance that Twitter has nothing to hide. In my opinion, it's a great way to build trust with users, something that all of the popular social media platforms lack.
Twitter is transparent in how they do shadow bans? They still deny they are a thing but it's clear to see the network effects of them. Twitter is transparent - that's the most hilarious thing I have read today!
Has there ever been any actual indication of twitter shadow bans being a real thing, or are we putting it in the same "weird conspiracy theory" category as "facebook listens using microphone to target ads"?
I follow a few people in "gamer/twitch twitter" and every now and then this meme pops up of twitter secretly deranking "go live" twitch.tv tweets, which is much more palatable to these people than the reality of just no one caring about their boring tweets.
> Has there ever been any actual indication of twitter shadow bans being a real thing
Please stop living in this bubble. There have been plenty of reports of Twitter shadow banning speech it doesn't like. It has even suspended accounts multiple times only to reinstate them by saying it was an "algorithm error" or it being "wrongly flagged". In India we regularly see Twitter suspending accounts that have not violated any Indian law.
All because it doesn't align with the left-leaning ideology of Twitter. Even trends are boosted to favor left-leaning news portals over right-leaning news portals (which is visible in the trends section).
An example: https://www.freepressjournal.in/viral/bringbacktrueindology-...
^ This account specifically was suspended a total of 3 times. Ultimately the anonymous account owner decided to come out in the open and open a Twitter account in his real name. He got banned again very recently. In all those instances, there was not even one tweet of his that "crossed the line" when it came to anything illegal. Be it speech, instigation of violence or even a curse word. Nothing at all. The suspension was purely ideological.
What you linked to is not shadow banning, and was explicit banning/suspension that happens transparently (in the sense that anyone can see that it happened).
> What you linked to is not shadow banning
Of course this is evident too. And let me quote directly from an Opposition leader in India (whose party/ideology I completely oppose):
"“I have been reliably, albeit discreetly, informed by people at Twitter India that they are under immense pressure by the government to silence my voice. My account was even blocked for a few days for no legitimate reason,” Gandhi said in his letter."
"“For example, in May 2021, my account gained roughly 640,000 new followers. This had been the case for several years until July 2021. Then something strange happened. Since August 2021, the average number of my new monthly Twitter followers has fallen to nearly zero,” he claimed."
Article in question: https://www.businesstoday.in/latest/in-focus/story/rahul-gan...
Now just because I am opposed to his ideology (which is Left-Centrist) doesn't mean I want him banned/shadow-banned from the platform. He has every right to continue to voice his opinions. Now he claims that it was the Government (whose ideology I support) which supposedly interfered and pressurized Twitter to manipulate follower count. I would like transparency on this if my Government, which I support, did indeed do that. Or did Twitter itself decide to shadow ban his account. Or was Twitter doing it legitimately because he had a lot of bots following him and when they banned those bots his follower count decreased. Either ways, I want the facts in the open.
As far as shadow banning content goes. It happens on a fairly regular basis. Open any Tweet that has quite a few comments right now and scroll to the end. You typically have a "Show additional replies, including those that may contain offensive content". Take this tweet as an example: https://twitter.com/Bob_Mayo/status/1518679097672617990
When you click more replies it shows "Show additional replies". Clicking on it is an innocuous tweet with laughing emoji. But the tweet above it isn't put behind a collapsible card. How many would click on the "Show" button and read those replies that are hidden by default? This is throttling reach a.k.a shadow banning. It is not like there is some abusive word being used in the tweet. It is just laughing emojis.
Yes, there is. I had a friend accidently shadow ban himself by posting nsfw art without marking it as nsfw. After that his future posts wouldn't show up when searching hashtags he had tweeted on. He had to make a new account until the older one was good again.
take a look at the Twitch leak sometime. There was a hard-coded list of bad words (largely racial epithets & other 4chan constructions). Presumably there is a way to create some kind of config file to penalize "bad users" of Twitter, I know these things existed in other orgs I've worked at
This is likely in response to upcoming EU legislation on algorithm transparency[0]. It's not useful, but they'll need to do it eventually.
[0]: https://mashable.com/article/eu-digital-services-act-big-tec...
It's so difficult as someone who works in technology to tolerate this idea that there is "an algorithm". Maybe this will get flagged on HN but the overwhelming feeling is to just scream "don't be such a fucking idiot, this is a $43Bn company it doesn't boil down to 50 lines of code" and hell, there are plenty of examples of 50 lines of code at my company I could spend a month really understanding and even then not understand the full ramifications. It's a stupid persons idea of how tech works.
I don't think that most people actually believe "the algorithm" is like a Python script or something.
The average person on twitter isn't a software engineer. They won't know what python is. They don't know what an algorithm is. So let's be clear about what the baseline is here. There's no point talking to them about how you distribute queries using micro-services or whaever bullshit twitter's engineering team bought into this season.
This idea that "the algorithm" is something you can just "publish" is a pernicious lie told by people like Musk - who knows it isn't true - to the general public who don't know better. The "algorithm" in reality is probably farcical calls to cusotm APIs that no current employees understand well enough to modify which is why Twitter hasn't changed anything in coming up to a decade - which is when all the engineering talent left.
I believe all the public wants is a report with an overview of the mechanisms that dictate how and in which order tweets are shown to different users. Even if no current Twitter employee knows the system well enough to write it, they can still create a taskforce to do so. Sure, 3rd party APIs and machine learning stuff may obfuscate part of the system, but I'm sure the best they can do is good enough for starters.
This is a bad take. Just because it is a stupid unaccountable heap of shit now doesn't mean it has to remain that way in the future.
I'm not talking about whether it's accountable or not, I'm saying that to actually share the algorithm you're basically saying that you're going to open source Twitter's entire code base. Oh and when you do that the average person will be no better off because they don't read code anyway. And when engineers read through the code it's going to like "Where did you get all these different variable values from?" and the answer isn't "We came up with a method for valuing tweets from first principles", it's going to be "We showed 7 billion people tweet X and 7 billion people tweet Y and tweet X caused 5% more people to engage so we tweaked this value".
And sure, you can say "Well that's a bad way of designing the algorithm" but then what you're really saying is that you don't want to open source the algorithm at all, you want to re-write the algorithm to satisfy your sense of how the world should work with no evidence it'll actually work.
> , it's going to be "We showed 7 billion people tweet X and 7 billion people tweet Y and tweet X caused 5% more people to engage so we tweaked this value".
There is an entire new subfield of ML that is tackling this problem. There are now conferences dedicated to this topic. It is not an easy problem, but it is not impossible.
There are hundreds of researchers working on fairness, interpretability, trust and explainability in ML and a lot of them are working on models much much bigger than what Twitter's feed might involve.
This is a good starting point:
> And sure, you can say "Well that's a bad way of designing the algorithm" but then what you're really saying is that you don't want to open source the algorithm at all, you want to re-write the algorithm to satisfy your sense of how the world should work with no evidence it'll actually work.
You can still open source multiple steaming piles of shit and then let the community improve that so that it is more widely understandable and trusted. See [1] again.
No, it's much worse. The general public thinks it's a one line mathematical formula. The secret sauce.
agreed, I see this more as a simplification of how a network of interdependent systems operate, just because its not simple doesn't mean there isn't a fundamental structure the farther and farther you zoom out.
> it doesn't boil down to 50 lines of code
No one said that. You created a straw man and are arguing with it.
This comment says more about you than you think.
Go on then, what do you estimate the complexity of the algorithm to be. Because I'm going to ball park it and say to fully understand the algorithm behind twitter's feed, you probably need to open source twitter's entire production code base and it's research framework.
> source twitter's entire production code base and it's research framework.
So why is that impossible? Mastodon exists. Is Twitter engineering so horrible that they are orders of magnitude worse than Mastodon's engineer quality?
Understood and several of us have dealt with large complex systems.
Open sourcing algorithm or code is not about everyone go and analyze the same, instead when controversy or issues arise it'll be readily available for independent experts to review it.
My point isn't that it's impossible to analyze. It _is_ impossible to analyze without the context of all of Twitter's other services though. Twitter are not going to open source all of their services.
I agree. I doubt the Twitter moderation system is more complicated than the Linux kernel or any of the open source compilers. Sufficiently motivated individuals contribute, find a fix bugs all the time.
Even just releasing the audit trail of shadowbans, upranking, downranking would be meaningful. Full transparency of actions taken in the past and going forward.
This will never happen, not even under Elon. It makes them wide open to litigation.
> It makes them wide open to litigation
How? What specifically opens them up to litigation? It is a private company. It can do anything it wants as long as it is within the bounds of law. They have every right to ban anyone. Even on flimsy grounds. The idea here is to expose all the moral wrongs and bring it to the fore. Not that they are legally in the wrong (most cases they aren't). An audit is a good place to start.
Not every country follows section 230, actually only one does.
I don't think transparency of internal Company controls/audits has anything to do with Section 230.
Section 230, in brief, only provides immunity for social media providers from being responsible for the content that is posted by users on their platform. It has nothing to do with internal company policies. Rather, Section 230 actually enables internet companies to moderate content through the Good Samaritan protection.
So, it is actually worse that Twitter was moderating content in majority countries (except USA) where Section 230 wasn't even recognized. Revealing what exactly happened behind the scenes would not be a sufficient reason for litigation.
When Twitter was pulled up in India for its opaque moderation policies, it tried to quote American laws for its defense. In fact, when Jan 6th happened in USA, Twitter was quick to ban accounts of those who took part in the protests. 70,000 accounts were purged from Twitter. However, Twitter refused to ban accounts of those Khalistani terrorists who vandalized the Red Fort on Jan 26th in India and only partially complied with Government of India's orders. This double standard was visible to majority of Indians. So it is not like Twitter actually follows the laws set by the Country it is operating in either. There are multiple instances where Twitter has refused to follow directions by Government of India or by the Courts in India. You can read more about it here: https://www.hindustantimes.com/india-news/have-to-follow-ind...
So yes, moderation policies cannot be opaque. It has to be transparent. The reason for transparency is so that we know exactly why and for what reason was an account banned/shadow banned. If the Government of India sends a legal request for take down of accounts, it has to be complied with. Twitter cannot decide to invoke US laws in India.
Also, read this to understand more about this issue: https://techcrunch.com/2021/08/10/twitter-now-in-compliance-...
I guess I kind of think about these things not as algorithms, but as a collection of frontends and backends. Collectively, any human request (typing [ ramen shops near me ] as an example) will be handled by a bunch of different code, typically that code is structured as RPCs.
We can think of the main interaction as being a query which is an RPC payload. The contents contain the user request and a wide amount of other context (either referenced by a collection of keys like cookies, or materialized like fields that specify the user's age) and the response is a web page which contains sections (the web search response to the query, as well as the ads; either these could be rendered to two different frames, or interspersed, by the result presentation engine).
That query -> frontend translates into a tree or a graph of requests which collect up various bits of contextual data required to satisfy the query. For example, the query terms might be rewritten slightly and then sent to a web search backend which searches/ranks documents and returns the top matching documents on the organic web, or sent to an ads backend that returns the top matching bidders for those query terms. Again, just RPC/responce, although the actual context that the frontend and backend systems are dealing with, and use to modify the result, are truly enormous.
Each of those backend systems itself was produced with an enormous amount of data processing and contextual data that is available at serving time. All of this is implemented using various algorithms; everything from the TCP algorithms that manage bandwidth to the neural networks doing inference on the joint product of the user context and the query context and the ad context, and the logging system that writes the queries and their clicks to centralized storage for more ML training.
In theory though you could set up a system that compiled the full web stack, and ran the end to end of a user query, dumping all the intermediate RPCs, etc, from a modestly sized instantiation of the production system. and people could sit down and inspection what terms affected query result order, or which pages were omitted at which part of the filtering, or what data was logged.
It would be hell for a team to maintain and keep up to date wrt the production system, but many folks do this any way to have a simple version of the system around so they can make quick changes and see if it breaks part of the complex system without doing a full deployment.
Even if publishing the underlying algorithm isn't useful, they should be able to do research on what the end result of endless a/b testing has created in practice. I'm sure you could find the highest level heuristics and that would be extremely valuable to both the public and decision makers internally.
You won’t know what they were A/B testing though.
I get what you're saying. I'd imagine this repo will be closer to pseudo code than "complete" code for all systems that a tweet flows through. For example, the system that flags/remove picture of hotdogs would likely be represented as "if image contains hotdog: weight: -500"
That could be a starting point for a DSL that realizes the algorithm.
As you probably know, the algorithm is NOT the code, never has been.
It can be pseudo-code or diagram or whatever that can be used to understand what logic lies behind decision making.
Just because it has evolved to be a complex monster that is unnaccountable doesn't mean it has to be that way in the future.
There are ways to translating trained ML models and associated systems into understandable hierarchical rules.
Twitter's timeline is NOT AGI.
it’s almost like this thread is nothing but people being pedantic and saying, there isn’t just one algorithm, it’s multiple algorithms. yes do you really think elon doesn’t know that? it’s almost like he’s just trying to get the point across in the most simple and basic way possible. pointing out that the recommendation algorithm isn’t just one algorithm isn’t profound at all. this entire conversation is mostly just people who want to argue and point out that they know something about how large tech companies work. congrats, why don’t you tell elon since you’re obviously so much smarter
For someone who worked on recommendation systems, you really don't seem to understand the concept of an "algorithm" across the abstraction of multiple systems and at different layers of the stack you worked on.
In fact, a lot of people here really think what people are talking about is the equivalent of what is handled in a subroutine.
No, what people are talking about when they talk about "the algorithm" is anything affecting the result set they're reading. Concepts like eventual consistency and edge computing are... well... a part of a model which laypeople, and even reasonably technical people call an "algorithm."
Being pedantic about whether or not this happens in an SQL query, or across multiple codebases, or by region, doesn't escape the question.
Absolutely agree with what you say, but I wanted to pick a nit here:
> Being pedantic about whether or not this happens in an SQL query, or across multiple codebases, or by region, doesn't escape the question.
Actually, epistemic ~"muddying of the waters" is a well proven technique to control perceptions and public discourse. If it works on HN folks, I expect it would work much more easily on amateurs.
Did you just tell me to go write a map/reduce function in Erlang?
How did you work with a system that couldn't be understood even with access to full source and documentation? Surely your engineering process wasn't purely stochastic.
Let alone those 10s of GB of embeddings which might be user identifiable, without those, how do they claim to 'open source' the algorithm?
It would be gamed 100%.
theres gonna be a few crazy guys who devote their entire time to understanding some obscure part of this algorithm
i'm looking forward to see what's going to be on it.
I don’t understand people like you. Just because Elon Musk used the term “the algorithm” in conversation, it must be interpreted literally? And for someone like Elon Musk, he doesn’t know the complexity and need you to point out?
It's so weird. We get it, data scientists, your implementations are a garbage mess of last minute paper submission levels and you yourself are blind to half the things happening. That doesn't mean an implicit algorithm doesn't still exist. The term is perfectly applicable and it would probably do you a world of good to abstract it to the point where you can legitimately put it into a single repo.
Of course it doesn't make sense. I think it's just a dog whistle to the people who believe google have a guy in a room somewhere turning the "conservative search results" lever down a notch during elections.
Not saying Google is turning down conservative search results, but they absolutely can. Using the same cycle of human raters and tweaking weights they used to push down comparison shopping sites.
It's the man in a room part I was emphasising. It's simultaneously massively incorrect about how software actually works, misunderstands social dynamics inside companies, and massively stokes the fire when it comes to conspiracy theories along these lines (in this case at least)
There absolutely are explicit lists of controversial words that get weights on the models for up/down ranking or hiding content or flagging for manual review.
It's also true that these lists are only a small part of what affects your fringe website or twitter accounts.
Are you claiming that you have comprehensive objective knowledge of what did and did not happen in privileged Twitter operations across time, or is this more like your opinion on the matter? And if the latter, how is it that some people's opinions "don't make sense" or "are dog whistles", but yours are epistemically flawless (assuming you actually believe them to be true, of course)?
So you're saying google doesn't have infrastructure to allow a human specified list of keywords or domain names to "Twiddle" the results returned in varrious services such as news, search, and youtube for the purpose of artificial promotion or de-boosting?
"Among the elite, and within Twitter specifically, there is much more inclination to ban the right.
I say this as someone whose political views, if you force them onto the left-right spectrum, probably end up about 80% toward the left. E.g. I've spent millions over the past several elections supporting the Democrats.
It used to be that censorship was something the right did, and free speech was something the left were in favor of. But over the last few decades, banning "problematic" ideas has become a huge component of left culture (http://paulgraham.com/heresy.html).
Plus tech companies in general, and especially Twitter, lean to the left. Imagine walking around Twitter pre-Covid. You'd find plenty of openly far-left employees. How many openly far-right employees would you find? I don't think you'd find any.
The combination of (a) the left's recent focus on banning heretical ideas, (b) the leftward lean of tech companies generally, and (c) the leftward lean of Twitter even among tech companies, means that right-wing speech is much more likely to get banned on Twitter than left.
That's why people on the far right keep starting lame Twitter alternatives. You don't see people on the far left doing that. They don't need to. They have Twitter."
I’m pro censorship. We have and need speech norms.
The left and right are not equal. The left does not rely as much on lies to advocate its positions, and the left is not as oriented around destruction and regression as the right.
We need a more robust understanding of speech than “allowed or not”. Emphasis & volume matters.
I'm someone on the left and I fundamentally disagree with literally every point you made.
Good for you.
I'm a firm believer in the horseshoe theory.
If you listen to the intolerant wing of the right, the parts that would legitimately support overturning elections and arresting political opponents, they sound eerily similar to your sentiment.
The horse shoe theory makes a lot of sense until you actually research the right and left and discover that they are in fact very different.
It’s a thought stopping theory that allows you to remain smug and correct despite being deeply ignorant.
those modern lefts.. just look at russia. censorship is how it all started there.
The fact is that humans are not smart enough to reliably detect fact from fiction. Half of this website would just as soon look the other way at trumps track record.
Something needs to be done. Free speech maximalism is a threat.
"does not rely as much on lies to advocate its positions"
To be honest, that's hilariously false. The left suppressed Hunter Biden's laptop story for two years, saying it was misinformation, before admitting it was true. The IRS targeted right-wing and Christian non-profit organizations deliberately for auditing. Obama claimed that forcing nuns to buy health insurance covering birth control was necessary, and claimed you could keep your doctor under Obamacare, which even PolitiFact called a Pants-on-Fire Lie. The left claimed that a baker needed to make a personalized cake supporting gay rights, or it would open the waves to discrimination, even though the baker was willing to sell any other cake to them that wasn't customized with that particular label.
I can go on and on.
Please, do go on and on. You can stop when you get to the left attempting to overturn the American presidential election.
Edit: are our neoliberal overlords a bunch of cold hearted shit heads? Yes. Should we elect and promote barely disguised fascists in response? No.
Huh. I remember when the left had Maxine Waters saying to publicly harass voters of the opposite political party, and during the 2016 election when they tried to harass and dox the electors in hope that would prevent Trump from being elected.
Do you remember when the sitting democratic president of the United States claimed to have won the election and attempted to persuade his colleagues to overturn it?
Who the fuck cares about Maxine waters?
They framed Trump for treason and thereby completely undermined him before he even formally took office. Are we going to memoryhole that? That’s the same thing as stealing the election.
One is an indisputable coup attempt and the other is exaggerated.
> it’s not the one you think
Yes it is.
> That's why people on the far right keep starting lame Twitter alternatives. You don't see people on the far left doing that.
They don't need to because they have cool Twitter alternatives. Like Mastodon.
Which is also the Trump alternative, iirc. Just the far left seems to respect the license.
No, it's not THE trump alternative.
It existed before.
Open source, such that anyone can use it
I meant used by the Trump alternative.
That's fair, although part of the reason I call Mastodon the "cool" alternative is because of its federation. AFAIK the right wing forks don't generally have that.
Which side is on a book banning tear again?
Both! The difference is in what they target.
"On the right, banning books means trying to prevent their kids from reading them. But on the left it often means trying to prevent anyone from reading them." https://twitter.com/paulg/status/1515563419386191880
The example he cites is from a tweet made by a guild of writers for the Oxford University Press.
> We, the members of the OUP USA Guild, are calling upon our colleagues and authors to take a stand against the upcoming publication of “Gender-Critical Feminism”. Sign the petition below (OUP employees and authors only).
Yeah, that's not a "ban", and "gender critical feminism" = TERFs. Ie, "people who think trans women are faking it so they can rape women and children in bathrooms or 'cheat' 'the system'"
Remind me which side thinks businesses have a constitutional right to refuse to serve people if it violates their "beliefs"...yet finds it unacceptable that unionized workers are objecting to publishing of research on a subject?
Also, remind me which side is boycotting a major entertainment company...because said company's CEO voiced displeasure...at a bill that bans any language referencing non-straight gender orientations?
That reads like those "UI vs UX" images where the creator just takes 2 random photos and sticks them next to each other—while simultaneously being deeply ignorant of what’s going on in the country right now.
Paul Graham's tweet with zero corroborating evidence other than "believe me I'm paul graham" is not a source.
Mastodon is from a far right?
> I think it's just a dog whistle to the people who believe google have a guy in a room somewhere turning the "conservative search results" lever down a notch during elections.
Even if that's precisely true, is it not good to be creating a more trusted space for everyone? The grievances, regardless of merit, are mostly coming from the right. If you want to create a service that caters to all you're going to have to address their concerns. If he can do that in a way that is fair to all, it sounds like a win to me.
Perception is reality and right now the perception is a clear bias toward the left on Twitter. Anything that can be done to combat this fairly is a good thing.
Everyone seems to think that accept for the left on Twitter who believe there's a clear bias towards the right.
My whole point is that the task is sufficiently impossible (either technically or in terms of satisfying people) that aiming to do it feels like pure bluster. It will never be fair - you can interpret that sentence however you see fit.
> It will never be fair
But couldn't it at least be better?
And that hasn’t happened? Google got fined by the EU for manipulating search results. It definitely happens.
https://amp.usatoday.com/amp/1248099002
https://www.vox.com/2017/6/27/15878980/europe-fine-google-an...
That EU fine is a totally different issue, about defining the barrier between Google Search and other Google products.
A comparison I did from four months ago across google, ddg, yandex, and search.brave, searching for "are conservatives being silenced by big tech"
https://i.imgur.com/MVlshAT.png
You don't have to be conservative to see there's a pretty significant bias, just in the headlines. I'm a Pacific Green and I can still see it.
I think HN, as a largely technical audience, should see the clear problem with that query. Of course search engines certain results when you use terms like "big tech". It's a term that is largely used by conservative groups in criticism of the tech industry and for little else, and so the bias there is hardly surprising.
Like, say I drum up a conspiracy about how the government is putting Glupkleins in the water. Since it's not a real word, the only results that'll show up will be the conspiracy nonsense that I myself am peddling. People who "do their own research" on Glupkleins by punching it into a search engine will come away with the impression that the entire community is unified on whatever stance I want purely because nobody else knows or cares enough about it enough to write their own articles debunking it.
This is the same thing here, just at a murkier scale. Nobody uses the term big tech except in this context, and so using that term to find something which disproves the context around the term is a losing battle
Top result from google for me is: "Big Tech’s Conservative Censorship Inescapable and Irrefutable" from the Heritage foundation.
that's kind of a silly meta search query, not checking for bias in coverage of climate or race or religion or whatever.
Oh the racial bias is much easier to find, just start typing for "are white people". The suggestion list is perfectly happy with "are white p" and "are white pe", but literally vanishes as you type "peo". It does not do the same thing as you type "are black peo" (...ple more athletic, "what is a blue black person", "is wearing black bad", "how does wearing black affect you"). It's like they don't think white people are people, or don't want you talking about them as a group.
If it were objective, you'd think it would try and suggest something for all queries just to be maximally useful. But it doesn't. I mean surely people have asked questions that start with "are white people" why not show the most common one, or the most controversial, or the "highest quality" whatever that is. But no, absolutely nothing.
Someone clearly has their thumb on the scale.
>religion
Nah, this one is super obvious in the lack of an Easter doodle, but every other religion gets a shoutout on their important days. Just like the lack of one for International Men's day. Google's bias is as predictable as every other (D-CA).
Was it the algorithm or was it a "guy in a room" which/who decided to block New York Post's article on Hunter Biden's laptop scandal?
To those that are probably triggered by this and likely even uninformed, the "left" view has been that the laptop story was fake, but it turns out it was real.
Yep! But irrespective of whether the story was real/fake, the GP comment said that people "believe" that `Google have a guy in a room somewhere turning the "conservative search results" lever down a notch during elections`.
Now if you truly want to address this concern of the people, the question obviously would be if it was the algorithm that blocked the article or was it "a guy in a room"? I don't see what is the difficulty in admitting that it was "a guy in a room" because our values/ideology does not align with ideology of New York Post.
When you start to say that it was the algorithm that did it, and that there was zero human interference, without any proof to back up those assertions, then it is much the same as saying that it was a "popup" that triggered when one is caught watching porn. "I did not do it, it was the machine."
Then it becomes even more important to open source the algorithm so everyone can see what is happening internally. It at least puts some doubts to rest. It is better than saying "believe me it was the algorithm not me".
Oh, was this finally confirmed? I stopped following that story a long time ago.
What was “confirmed” was that many emails placed onto the hard drive were, in fact, real emails. (The content of those emails is fairly innocuous, neither illegal nor even particularly corrupt. But a year of continuous drumbeat spun GOP partisans into a frothy outrage about it.)
Those who investigated thoroughly found that the chain of custody of the hard drive was incredibly sloppy and the content of the hard drive was changed multiple times, making it impossible to confirm where the hard drive came from or who put which content onto it.
The speculation is that someone obtained Hunter Biden’s emails in some unrelated way (e.g. from other hacks) and then placed them onto the hard drive in question as a way to obscure the source/method.
Here’s the clearest story explaining the details: https://www.washingtonpost.com/technology/2022/03/30/hunter-...
Washington Post was one of the news outlets that denied Hunter Biden's laptop story and cast shadow on its authenticity until it had to do a major U-turn one year later. I wouldn't touch that publication with a 6-foot pole when it comes to unearthing Truth about Hunter Biden laptop case at the very least.
The Washington Post (among others) were rightly skeptical of a tabloid fluff story with no details, corroboration, or expert analysis about a mysterious harddrive with an implausible backstory somehow in the custody of Steve Bannon and Rudy Giuliani, dropping immediately before a presidential election.
Their skepticism turned out to be well founded. There’s really no story there. (This was similar to the non-story of Hillary Clinton’s email server, Russian-hacked DNC emails, etc. of 2016 which turned out to be completely anodyne and routine, but became the intense focus of months of news coverage and whipped partisans into a frothy frenzy based on wild lies/speculation.)
> There’s really no story there
That's your opinion and you are entitled to it. But it doesn't turn away from the fact that the story was suppressed. Real or not.
> This was similar to the non-story of Hillary Clinton’s email server, Russian-hacked DNC emails, etc. of 2016 which turned out to be completely anodyne and routine
At least it got proper coverage. There was no suppression of either the email gate or the Russia gate. Whether it was fake, true, hoax doesn't matter. It got the coverage it was due.
> were rightly skeptical of a tabloid fluff story with no details, corroboration, or expert analysis about a mysterious harddrive
Joe Biden said it was based on a "bunch of garbage". That it was "Russian disinformation". All news media outlets, including Washington Post, carried that forward in all their news headlines and editorial posts (except right wing news media outlets of course). What were the details, corroboration or expert analysis that they went through before labelling the Laptop as "Russian disinformation"?
> Their skepticism turned out to be well founded
I missed the part of their skepticism where they called it "Russian disinformation" and it turned out to be true. Can you highlight to me where it was proved that the Laptop was part of a smear campaign against Joe Biden initiated through "Russian disinformation"?
What Washington Post has finally done is admitted that the Laptop and its contents are real. Now when it comes to the meat of the matter, the actual contents of the Laptop, it hasn't gone through sufficient scrutiny yet. It also hasn't taken into account the whistleblower's account of what dealings the Biden family had with the Chinese and Ukrainians. On who the "Big Guy" was who received the payments; which is mentioned in a March 2017 email conversation between James Gilliar (Hunter's associate) and a Chinese energy firm where he says: "10 held by H for the big guy?". Who is the "Big Guy" here? Who is "H" here? I am assuming H is Hunter and "Big Guy" is Joe Biden. It is still an assumption and can only be proved in some Court of law in USA. But if you aren't even going to give this evidence a chance to see the light of the day you'll never be able to get to the bottom of the Truth.
Now I am not saying any of it is 100% true as it is still under scrutiny by a Grand Jury (though a lot of it is coming out to be true now). I am just saying that the media actively suppressed this information from the general public out of "fears" of it being "Russian disinformation" which still hasn't been proved. Or maybe, just maybe, they did not want their favorite candidate to lose.
The claim that it was Russian disinformation (Russian-hacked emails placed on a random hard drive along with gigabytes of other tampered-with content, then passed along to friendly partisans who had literally distributed Russian-intelligence produced disinfo multiple times in the recent past) is entirely plausible, indeed more plausible than the story about the abandoned laptop.
There is unfortunately not enough verifiable information about the sources and chain of custody of the hard drive to distinguish between these two alternatives.
Various media sources (and the American public in general) were burned multiple times by credulously repeating Bannon- and Giuliani-pitched (completely made up) bullshit, some of which actually was later demonstrated to be Russian government produced disinformation. Even if the hard drive turned out to be what they said (again, this is still dubious), it would be a good example of a “boy who cried wolf” situation.
Yep it was!
"it" is carrying a lot of weight.
What do you mean? The laptop has been proven to be hunter's, and most sources agree that the contents are real (especially the emails, which have been proven to be true). So what gives?
I have seen the videos of Hunter nude and engaging in sexual intercourse. You are telling me that what I saw is fake? The emails are legit as well. What exactly is "not true" except for your assertion?
Even the laptop has been proven to belong to him.
The onus is on you to prove it isn't Hunters laptop.
Twitter has a clear policy about posting and linking to doxxing / stolen private files / revenge porn. They never blocked based on commentary about the the laptop or the Burisma allegations, and only blocked NY Post account until they stopped trying to post the doxxing / revenge porn content.
> Twitter has a clear policy about posting and linking to doxxing
Excellent. So when is Twitter going to ban Taylor Lorenz for doxxing an anonymous Twitter user LibsofTikTok? This is her tweet right here: https://twitter.com/TaylorLorenz/status/1516399663305297920
It has been up since April 19th, 2022. Twitter immediately blocked New York Post's tweet.
I wonder why the algorithm is so slow when it comes to Taylor Lorenz? Maybe we'll find out when the code is open sourced. Then we can all legitimately blame "the algorithm" for causing all this divide between the left and right wing in a neutral, faultless, perfect digital town square called Twitter.
It's not a dog whistle, it's just a simplification because it's easier to say "the algorithm" than describing the system. Moreover, this nomenclature was mainstreamed by left-wing concern about "racist algorithms", especially in law enforcement, so it's not a partisan phrase at all.
That isn't very fair though, because its very different concern isn't it?
The racist algorithms critique isn't that their is a shadowy conspiracy of people pushing buttons behind the curtain, getting the results they want with conscious decisions, it is a concern about datasets mainly, as well as sociological questions about unconscious bias in testing and verifying the correctness of programs, which can cascade into social effects.
It is a completely different thing, and one grounded in actual research.
My point wasn’t that these theories are equal, but that “the algorithm” isn’t a right-wing dog whistle because it’s used in left-wing contexts as well.
> turning the "conservative search results" lever
It may not have been algorithmic, but it definitely happened.
You know this how? Because you were that guy?
A quick Google (ironically) confirms this has happened, beyond the anecdotal evidence of people (including me) who remember it on several occasions.
https://www.mediaite.com/news/ex-google-engineer-says-glitch...
I'm not making any comment on the theory in the article, but it has certainly happened.
Edit: more context in thread here:
Mellosouls posted a link to a toxic source, and the article doesn’t even have anything in it except empty speculation based on nonsense. Brutal. Either the psyops have reached HN or the level of intellectual honesty about these topics is trash. Or both.
I lifted the source indirectly from Google's own results (it was linked to from the Mail online), I have no idea of the general toxicity or otherwise of the website itself.
Update: here's a Twitter thread linked from Google's Tweet denying any bias in the incident. Make of it what you will.
> the concept of publishing or open sourcing "the algorithm" doesn't make sense
Whatever. He paid for it. Private company. Do what it wants.
> it would be very difficult to make sense of the entire system
No. Not buying that. Difficult isn't the same as impossible and, if only to game the system (harder,) people will figure it out. And even if it isn't 100% possible to reproduce the results based on what is released significant insights will still emerge.
Further, there is some ceiling on the complexity. Twitter operates at scale and that means they can't actually burn 52kWh of power for every tweet or store TiBs of metadata for every user to do the analysis or take 30 minutes to publish. Likely it's a pretty efficient system and, therefore, limited in complexity.
Imagine having something like this for Google's and YouTube's algorithms; $100bn+ SEO industry would go bankrupt or at least they would pivot to some sort of advising but there wouldn't be the mayhem that we have today.
Scientific knowledge is written down and freely available but I still don’t understand most of it. I think a public algorithm would increase SEO business if anything because it would get more effective once the bullshit was debunked.
Results would also become an absolute cesspool. Say what you want about how they are now, but if the people gaming it could see the exact rules, it would become completely useless.
Makes me wonder how Twitter employees internally are handling the news. If they are celebrating or commiserating?
Twitter Locks Down Product Changes After Agreeing to Musk Bid - https://finance.yahoo.com/news/twitter-locks-down-product-ch...
Looks like a wise move to keep potential destructive protests at bay.
Yep. I imagine Twitter has a employee base with a very strong activist streak. I'm sure some of them wouldn't mind martyring themselves to make a virtuous statement.
only because the job market is so good.
Substack made it quite clear: https://twitter.com/lulumeservey/status/1511376638487019524?...
Personally, I think this is great. Leave your activism at home and don't bring that to work.
I don't think it's very good to saboteurs.
> Twitter imposed the temporary ban to keep employees who may be miffed about the deal from “going rogue,” according to one of the people.
What could a rogue employee do?
Once upon a time they could perform all sorts of admin-level actions. One of my friends was there back in the day and said you could literally delete someone's account or use the built-in impersonation to see someone else's DMs. Another thing they could do back then was leak PII of specific accounts. I think it was the Saudis who paid someone to get someone's name. A friend of mine was part of the FBI team that caught one of those spies.
Insider risk is hard as heck.
There was a case in 2017 of a Twitter employee who deleted Donald Trump’s account for 11 minutes.
In 2020 hackers who gained access to Twitter admin tools that were - at the time - apparently accessible to thousands of employees were able to use that access to compromise multiple verified (blue check) Twitter accounts including Jeff Bezos, Barack Obama and Bill Gates. They used it to promote a lame Bitcoin scam.
In the aftermath of that things were supposedly tightened up a little but don’t doubt a Twitter employee could do some damage.
Create a repo for sharing the algorithm!
Twitter has a history with this.
- Rogue Twitter Employee Briefly Shuts Down Trump's Account [0]
[0] https://www.nytimes.com/2017/11/02/us/politics/trump-twitter...
look up confidential user information for the saudi government?
Are there precedents of this? This looks like there is not much trust internally. I doubt this was the first Elon’s decision
The level of trust is to be expected from a company like this imo. We're in an information war all around us. Spies, saboteurs, etc, are all part of a war, and we just witnessed a coup of leadership.
Well you’re reading about it here so clearly there are employees who aren’t above breaking the rules a bit ;)
Thanks! Good to know.
I was actually wondering some people may want to remove traces of what they have been doing.
I wish someday we can see the internal communications lead to the Hunter Biden laptop story ban.
Wonder if their RSU grants have a clause that converts them to cash in the event of a takeover.
And having been in a company that was taken over, it's a mixture of emotions - is my job safe, will this be the same culture I joined for etc. etc.
>"Wonder if their RSU grants have a clause that converts them to cash in the event of a takeover"
This is interesting question since RSUs are a big part of total comp but how are unvested RSUs dealt with when the stock is retired? Are those put on a future cash comp schedule? And if so at what conversion rate?
The last place I worked had an explicit clause in the RSU agreement that essentially said "if we get bought the price per share for unvested stock will be given to you as cash".
I'm curious was the complete cash out immediate or did it assume the same schedule as the vesting? I'm guessing if the cash out is immediate the company would then run the risk that those people might exit?
I believe it was immediate, but I don't have the grant document anymore to check.
I am sure there is a small set of Vocal employee's blasting internal communications with the end of the world messages, a majority of employee's just wanting to keep working and get their salary, and a different small set of employees hoping Elon purges the first set of employee's making the work place better.
There are probably some secret celebration.
I am sure those with in the money options are happy
Sell RSUs and get out.
I tried to make a pull request already, haha.
error forking repo: HTTP 403: The repository exists, but it contains no Git content. Empty repositories cannot be forked. (https://api.github.com/repos/twitter/the-algorithm/forks)
My thoughts:
- Explicit rules for temporary and permanent bans
- Edit button
- More fun and thoughtful conversations like HN
- Less thought bubble Brooklyn based reporters, less VC and side grind hustle snake oil, maybe more comedians and memes?
Care to explain "thought bubble Brooklyn based reporters"? Did you choose Brooklyn for the alliteration? Or is there something about my home I should know about?
Large amount of blue checkmark journalists live there, simple as that.
But just imagine if those journalists had to live without twitter ? They would have to report things from "the real world" ! I've been there. It's awfully complicated, and sometimes people disagree with you and you have to talk. Insane.
I’m confused , doesn’t this already happen on Twitter? Isn’t one of the popular Christian’s on Twitter that if you say the “wrong thing” a mob will come for you ?
It was just a joke. I have many friends in Brooklyn. I just really dislike the culture of Twitter addicted journalists who see the world through a myopic lens of whatever is trending on Twitter is important to cover, and there’s quite a lot of them in New York.
I know the algorithm i use, it ends with ORDER BY date DESC.
Nice for some things. Absolute death for discovery in a lot of cases though. Either trim your follows down dramatically, or miss some incredible stuff...
there is nothing to discover anymore. everyone is retweeting the same stories. and the best bet at discovering is via soomeone you follow
Same here, and it blows my mind that we're in such a minority.
This is what YouTube's Subscriptions tab does and I hate it. Rarely uploading quality creators are hidden in a sea of daily uploads from channels which focus on regular uploads rather than quality.
I think there is a place for a smarter algorithm than "ORDER BY date DESC", but one that is not designed to manipulate users into addiction.
Yeah it s not good for youtube because youtube doesnt have retweets, so people subscribe to everything they like. On twitter you can subscribe to a few people and they ll still retweet most of the stuff that the algorithm would give u
Is that actually possible on Twitter? (I have an account but don't use it.) I do believe them that when a user follows several 100+ other users, it would probably be impossible to read every tweet chronologically.
i always use chronological, atm following very few people, but in the past following too many to keep up w/.
even when following too many to read everything, i preferred chrono because it would yield a coherent slice of what was happening. an unbiased sample.
twitter is basically a medium for conversation.
imagine there's a large party. would you rather listen to an out-of-order "most important" set parts of the conversation, or just a slice of conversation from a particular time?
well, actually, both can be interesting, but generally the slice is more coherent. :-)
Assuming we're talking about the timeline "latest" mode (rather than "top"), IIRC in the past the used to do the home timeline processing on the read paths, but have long since opted to optimize for read latency, thus the heavy lifting is performed during the insertion, so in broad strokes, when someone post a tweet, the three main things that happen are:
1. Insertion of tweet to tweets table.
2. Insertion of that tweet-id to the home timelines of all that user's followers.
3. Insertion of that tweet-id to the user-timeline of that user.
On the read-path, if I'm not mistaken, the only join that happen is between the requested timeline and the tweets table (which is replicated across cluster of machines but not partitioned, or at least I remember reading that was the case not many years ago)
Yes. It is. It's been an option since... forever? Many many years at least. https://i.imgur.com/JM6saa2.png
For about a week they made a change that prevent that chronological timeline from being the default, but they reasonably quickly rolled that back. https://www.theverge.com/2022/3/14/22977782/twitter-default-...
I use lists by subject matter, including a list of personal contacts.
I then use tweet deck which shows a column of tweets per list.
As these are separated by subject and are chronological, it makes it far easier to follow.
Impossible to read, maybe. Impossible to display, definitely not.
Sort by recent is an available setting. Whether you can keep up with the volume is up to how many people you choose to follow.
Assuming Twitter is serious about publishing their feed algorithm [1], it's possible they're merely anticipating the EU's upcoming Digital Services Act which was finalized over the weekend. Among other things, the Act will compel large platforms to "make the working of their recommender algorithms (used for sorting content on the News Feed or suggesting TV shows on Netflix) transparent to users." [2]
Twitter's EU user base is probably [3] above the 45 million threshold that triggers the strictest transparency requirements under the Act. So perhaps they figure if they're going to be forced to disclose anyway, they might as well do it proactively.
[1] If it's even coherent to talk about their feed ranking system as a single algorithm — see the other comments in this thread.
[2] https://www.theverge.com/2022/4/23/23036976/eu-digital-servi...
[3] https://www.statista.com/statistics/242606/number-of-active-...
Seems weird to start as a non-private repo until there's some content. Also bit of an unusual name. Can't tell if this is internal trolling or the future
The Twitter board has unanimously approved Musk's purchase, pending approval by stockholders, and Musk has stated that once he owns the company they will open source the "algorithms" for transparency.
So not a troll, but yes it is odd to put up an empty repo, and announce the repo before there is anything in it.
Seems like a perfect troll/joke, especially given everything.
Trolling your new boss is a sure way to get your ass fired.
EDIT: in case you're not aware, Musk has stated that one of the first things he'll be doing after taking Twitter private is to open source its algorithm.
Firing someone for making an innocent joke doesn’t seem very “free speech absolutist”!
Creating a repository on the official account of your employer is not an innocent joke. Media will pick up on it expecting content and when it doesn’t show up/gets removed, you’ll have a whole bunch of loser journalists writing news stories bashing Elon.
In fact, any action that appears to have been made on behalf of a company is not and will never be considered an innocent joke. I’m sure you’ve heard everyone disclosing something they say or write is “their own opinion and not of their employer’s”. Now please show me that kind if disclosure for this repo and then maybe it can be considered a joke.
If you think doing this kind if thing is OK regardless of context, then you’re in for a world of surprises when shit goes south for you.
They probably have their own internal Git hosting service. Pushing to the public Github repo could be done at a later point, but the Git repo hasn't actually been created yet here, just the project on Github.
Surely you guys don’t think that twitters sorting algorithm is already factored out into its own repo. Of course it’s empty.
That doesn’t mean it’s a joke, I see it as a show of goodwill — that there are a handful of people inside Twitter that are excited for transparency and for a revenue model that isn’t entirely based on ads, that are excited to get to work on this right away.
Does it remind anyone of Po and the Dragon scroll from Kung Fu Panda?
Wait until Musk finds out its a bunch of gnarly PHP 5.4 code much of which is a black box everyone is afraid to touch.
The Timeline is actually just a SQL expression with 500 sub-queries
I'm going to guess some engineers at Twitter with Github org permissions are having fun with the "release the algorithm" discussion.
haha that's what I figured, someone decided to quit if Musk acquired Twitter and figured they'd just leave one last practical joke
whatever will show up in this repo, I hope people realize that depending on what data you put into some algorithm you can get whatever output you want, and twitter is never going to (and neither can or should they) publish everyone's personal information and interaction on the site.
So I'm not sure what the ultimate point of this exercise is other than producing faux-transparency.
I still think there could be lots of interest, even without the data. In fact some of the most interesting parts would be the algorithm in the broadest sense — how tech interacts with company policy and SOP. For one small example, what aspects of moderation/banning happen automatically without any further human intervention?
I can't believe they missed the chance to make it a rick roll. Such a wasted opportunity.
There are elements of their algo that I think should be openly defined, and perhaps there should be some regulatory branch that reports to Congress that has full access. However, obfuscation is often necessary to countering bad actors.
>perhaps there should be some regulatory branch that reports to Congress
I think only if you offer twitter users the level of first amendment protection they'd expect with a government body. Otherwise reporting to congress would be an a bold faced circumvention the first amendment. Twitter is a privately held company with no need to report to congress.
On the other hand, wouldn't open sourcing the algorithm help accelerate the identification of possible exploits?
"algorithm" here isn't some fancy, hard to debug code. It is the business logic of weighting tweets and how recommendations are made.
There is great opportunity to abuse this by Twitter, yes. There is also a lot of money to be made. But in defense of some of that being secret, is the fact that any publicly known ruleset (with no hidden exceptions) _will_ be exploited by bad actors. Imagine if search engines told spam sites exactly why their site dropped in page rankings.
The government, at federal, state and local levels, all rely on Twitter to conduct official taxpayer funded work. Taxpayer funded work should not happen on proprietary systems that operate with zero oversight or public transparency.
Elon polled Twitter users about this and the response was overwhelmingly in favor of open source and transparency. Everyone on Twitter got a vote.
If you oppose transparency, as many now are, you lose your credibility. So it’s another one of Elon’s people hacks, and look at all the morons falling for it.
Kind of unrealistic but I hope Twitter now open-sources not only the algorithm but also the Rails monolith itself. Would be kind of interesting to see how everything is done
The rails monolith is long gone.
In that case the same applies for the microservices
Isn't Twitter written in Scala now?
That gives even more reason to open-source it, right? If Twitter isn't using that codebase anymore.
I have literally no idea how a "twitter algorithm" could be published on github. Maybe I've been doing recommender systems wrong.
You can publish data flows and models to github in addition to source code components.
I'm very technical and I think it would still be valuable to have a list of all the things that weight into the timeline view, even without the models or underlying data.
Like, there's no public admission right now of whether "shadow banning" or "ghost banning" is even officially a thing!
Some transparency seems unquestionably more powerful than none, and we can work from there.
There is something vaguely threatening about this.
Perhaps Twitter will be the new Mozilla if it decides to open-source 'everything' then.
Maybe that is where it is going.
I don't get it ¯\_(ツ)_/¯
At the time of posting, Will Norris (the open source lead at twitter, admin of their github account presumably) posted this. It has 44 retweets, 193 likes, 17 quote tweets, on github it has 1.6k stars.
That seems... bizarre to me?
That's more stars than I'll ever have on any of my repos. Maybe I made a mistake not joining a faang
Nope. People are just excited that the Twitter cesspool might finally improve.
I agree that there is no such thing as "the algorithm." It is Twitter in its entirety. And with that I have a wild question. Can Musk make Twitter fully open-source on GitHub?
Can someone explain this to me? All I can see from this link is an empty GitHub repository. Not sure what I'm missing here.
It's just a declaration of intent at this point.
It's 404 today.
Anyone who actually uses Twitter already knows the algorithm:
* Chronological - reverse sort by date
* Home - for all of the followed topics, recommended topics, retweets and tweets in the past day determine the estimated level of engagement, include the highest and reverse sort by date. This is likely to be a fairly basic ML model.
It will be uncontroversial, technically unsophisticated and of no practical use to anyone - users, developers or researchers.
This is not going to be PageRank where some genuine new insight was discovered.
What are you basing this on? I wouldn't assume response prediction is using a "fairly basic ML model," it can be a lot more than that.
If it's a simple, constrained problem where the number of available features is low then inherently the complexity can never be high.
I've built hundreds of models and run a ML company and I don't believe it's technically possible for this rule not to be the case.
I work in basically this field. Tweets have text and often images and users with relationships with other users posting and liking them, the potential feature space was never going to be low dimensional.
You forgot the gigantic amount of human intervention required to unperson people who tweet against Twitter‘s political interests
What does "unperson" mean?
It’s lazy shorthand for “belatedly apply their policies against hate speech to prominent politicians”.
Like PewDiePie, Azealia Banks, Babylon Bee, and Carpe Donktum?
By the way, in the USA there is no legal support for the term hate speech. It is covered under the umbrella of the first amendment.
No, pewdiepie was banned for five minutes for saying he was going to join isis. He’s still on the site but doesn’t tweet.
https://mobile.twitter.com/pewdiepie
Hate speech doesn’t need legal protection as a term to be against the rules of the site. I can’t remember if that’s the actual term twitter uses, but it’s the spirit of their rules.
We could have a wider debate about whether our public square should be privately owned (I don’t think it should) but that wasn’t your point.
Edit: more details and reason why his tweets are (self)deleted
I brought up the first amendment because Elon has pretty much said that that is how they will decide on what continent gets published.
I never came close to saying they don’t have the right to restrict speech anyway they wish
* content not continent. Apparently I am suffering from literary incontinence
it comes from Orwell's 1984 - someone who has been removed from all records/books/photographs and killed. I don't think the kill part applies here.
So nobody is being shadowbanned or suppressed?
Last I heard, Twitter has internal tooling that allows moderators to shadowban or suppress.
"The algorithm" could mean a lot of things. Whatever it means, it probably spans hundreds or even thousands of services. That doesn't mean it cannot be made open-source.
I imagine they'd probably start with documentation and white-papers that communicate "here's how we intend for it to work".
It's seriously unlikely anyone in Twitter knows actually works how any non-trivial algorithm in the company works. To figure THAT out, they could decide to do a company-wide documentation and instrumentation push like they probably would've had to do for GDPR anyway, which is painful and boring and going to take a very long time.
Failing that, they could just say 'the algorithm as it stands is no longer fit for purpose, given part of its core requirement has become that it needs to be transparent and publishable, and presumably legible. We need to make a new one. Publish the core algorithm. We probably won't deploy it in that exact state, it's going to span multi-services and so on, you obviously don't get the data we used to train the models, but we will work backwards from it and here's an open mechanism to measure how true-to-form it actually is'
I could see GPT-3 being added in the empty space.
Why do we want to know the "algorithm"?
Clearly it will contain a straightforward bias against our pet interest, confirming a grand conspiracy and validating our paranoia.
if ($has_blue_checkmark) show_post_to($everyone);
Ugh, php...
I’ve spent the better part of a decade writing open source projects for few to see. An empty repo gets hundreds of stars immediately. It’s all a popularity contest.
Apples are red. The sky is blue. Twitter shadowbans and tinkers with who sees who. I wonder what the old guard will do with the codebase over the next few months.
It's empty.
It's a performance.
Did it get taken down already? I think open-sourcing the algo would materially change the value of the deal.
Twitters main value is its user base (especially including most journalists, "stars", politicans, companies,...)
I would say the algo itself is worthless, but my estimate would be nowhere near even 1B$.
If open-sourcing the algo allows bots to game the system then it has value to the bot wranglers; if the feed becomes full of garbage that gradually destroys Twitter's value.
Any worth that "the algorithm" has is not in it's implementation (probably pretty standard ML prediction systems), but in the training data and weights. In isolation from the data and the rest of the code it is probably pretty useless as anything other than an example of real world ML use.
Taken down by whom? It was posted on the official github account of twitter.
it's probably just a ripoff of pagerank with a separate spam filtering and banning system along with an army of contractors manually fixing it up.
if twitter is a game, sinking $43bn into it is kinda like winning or losing the grand final boss level. (unclear which)
wish elon would get back to facilitating the building of useful things. we still don't have a great clean energy generation story.
Musks first order of business?
clean house
Musk has repeatedly talked about "open sourcing" twitter's algorithm. Given Musk is (understandably) super impatient, this repo may be his first move. I expect this to start with bunch of readme and other high level docs and evolve into details and eventually code.
Seems like the most reasonable take. This move feels in-character for Musk.
#drama?
Nice, making it much easier to game!
is this performance art?
It was all in your head ;)
Not "the algorithm", but you can check if twitter is silently suppressing your account here: https://taishin-miyamoto.com/ShadowBan/
It's already gone.