02/27/2018 | Episode 17
Fred Sadaghiani is the CTO of Sift Science.
Evan: Welcome to “Trust & Safety in Numbers” presented by Sift Science. I’m your host, Evan Ramzipoor. About six years ago, Fred Sadaghiani joined a scrappy little startup you might have heard of, called “Sift Science,” as the CTO. In Part 1 of my interview with Fred, which you definitely should check out if you haven’t already, we chatted about how Fred developed his innovative approach to building systems. In today’s interview, we’re talking about what’s on the horizon for machine learning technology and how it’s changing the nature of the internet itself. But first, let’s warm up with a quick fraud fact.
Evan: Did you know that on average, victims of romance scams stand to lose a little over $10,000? To learn more, check out our “Fraud in the World of Online Dating” infographic on the Sift Science blog. Now, on to the interview. As many of our audience members probably already know, Sift Science fights online fraud using machine learning technology. But beyond Sift, companies like Facebook and organizations like “The New York Times” have also started using machine learning to improve user experience and safety on the internet. Do you think machine learning is somehow fundamentally changing the way we interact online?
Fred: Absolutely. I think machine learning is transforming…Many different use cases are occurring online. You know, five years ago, when Sift was at its first ever MRC, the Merchant Risk Council, you know, we had a booth that said, “Fight fraud with machine learning.” And nobody else had any messaging that had, you know, anything similar to that. And notably, the head of risk departments from e-commerce and retailers came up to us and asked us, “Well, what is machine learning?” They hadn’t even heard of it. Kind of, like, fast-forward to today and there’s no shortage of, you know, machine learning servicing in your media feed, in your newsfeed, whatever it is, or big companies talking about how they’re applying machine learning to a particular task in improving their user experience or reducing pain, and so on and so forth.
Evan: Fred’s right. In fact, machine learning is being used for everything from diagnosing psychopathologies to translating dense legal language into more digestible text, to protecting endangered species, to predicting hospital wait times. The list goes on indefinitely.
Fred: I think five years from now, we may not be talking about machine learning anymore. And the reason is, it’s just gonna be so ingrained in the fiber of all of the technology and systems that we have. Right now, we’re kind of at this peak of machine learning and AI as top subjects. But in the same way that today, we probably don’t talk about big data because it’s, kind of, just a given….Yeah, you’re a company, you’re online, and you’re dealing with many, many different users. Well, you have big data. Well, in five years from now, we probably won’t be talking about machine learning anymore. And so what does that mean? I think it means a couple of different things.
Evan: For one thing, we can expect the use of machine learning to become more democratized, both in terms of who is using it and who can claim expertise in it. Five years ago, machine learning probably wouldn’t have been as accessible to an organization like, let’s say “The New York Times,” as it is today.
Fred: So for New York Times, it may be about recommending what article to read. If you’re reading something that is interesting, you may find some other articles interesting. For Facebook and companies like Google, of course, it’s pretty well-known that they’re applying machine learning across their entire, kind of, like, platform across many different use cases. And so I think we’re just gonna be seeing this more and more.
Evan: One of the, kind of, core ideas that motivates everything you do at Sift is that trust is the currency driving how businesses and users interact online. But now that machine learning is powering so many digital services, do you think users will find it easier or more difficult to trust what they consume online?
Fred: There’s a couple of ways you can think about this. In industry today, you know, people talk about AI, and I think that’s really where this concern of, “Can we trust the system? Can we understand what’s happening?” is really coming into perspective. So if you really, kind of, like, separate what machine learning is from what AI is, it may help to better understand where the concerns and fears may come about. So if you think about machine learning as an algorithm that produces a prediction, it says, “Here’s what we think will happen.” And then you think about AI as a system that, kind of, subsumes machine learning, that is, it creates predictions, but then, further, it automatically acts on those, I think that’s where the fear comes. That’s where the uncertainty comes from.
Evan: Fred says, “The thing to keep in mind is that machine learning systems have actually been around for a long time. Decades, in fact.” Now, [inaudible 00:04:52] decision trees, neural networks, these systems have been around for a while. Only the fear of them is new because people are conflating machine learning with artificial intelligence.
Fred: These systems have been around. The question of, “Can you explain what the result of that algorithm is?” is, kind of, the area where I think there’s a little bit of uncertainty and question as to, “Can we apply these algorithms in a way to meaningfully improve the lives of people, the use cases, and improve the businesses’ ability to add value?” It’s not the algorithm that’s the concern. The real areas of concern are, one, AI systems that act autonomously. And then two is that explainability component, “Can you really understand why the algorithm recommended a particular set of actions or a particular set of results?” And I think that’s an area of machine learning that is woefully behind the, kind of, current state of the art, where we’re seeing new algorithms come, especially, in the deep learning space. And it would help, kind of, make people more comfortable with what’s happening in this space.
Evan: So what’s on the horizon? Fred thinks that in the next few years, this fear might dissipate as we make advances in explainability. That is, instead of just making algorithms that can do things, coming to understand why they do what they do, the story of how they arrived at that outcome.
Fred: [If] we had a system that could just surface, “Here’s Fred, and he’s the fraudster,” well, that might be great. But I’d like to understand why. “So I can appreciate that this sophisticated algorithm was able to surface Fred the fraudster. Now, tell me why.” And so we go to great lengths to clarify what features and what signals contributed to a particular score and really understand that humans need to be comfortable with the results. And so that’s a really important part of, kind of, creating and retaining that comfort with what the systems are producing.
Evan: We’ve, kind of, touched on this already. But machine learning is becoming democratized, in the sense that it’s easier for companies to implement machine learning fraud solutions or just to use the technology more broadly. How does Sift stand out in an increasingly crowded market?
Fred: I think Sift stands out in a couple of ways. So again, just to underscore your point, these machine learning algorithms are, essentially, public domain. They’re really well-documented. You can go to tensorflow.org and download a 100-line program that will implement a deep learning model and do object recognition, and show in a picture where there’s a human, or there’s a bicycle or a truck. There is no secret sauce there. Those algorithms are pretty well-understood. And, you know, I’d even wager that the algorithms being used at Facebook and Google, and Amazon, and Microsoft are all the same.
Evan: In other words, no one of them has any, kind of, strategic advantage over the others, at least as far as the technology is concerned. It’s a totally level playing field.
Fred: I think the real advantage for what Sift has is couple-fold. One, we have data. And so what we do is we take the union of all the traffic from our customers and create models that span just the local data that any one customer would see. We provide both that local data and group it with global data that is built across a battery of different kinds of models, so that we can give this, kind of, like, union of knowledge, this lens into the contrast between good and bad that any one given customer wouldn’t be able to see. And that really provides a meaningful lift in our ability to provide accurate scores. Some of our cohorts, we’re getting well into 20% improvement in accuracy.
And what this means is, if you were to, kind of, train a set of models, models that, you know, you could go get the code easily from any one of these open source repositories, and if you were to leave out the global knowledge that Sift had, imagine your accuracy as x. You know, you’re able to get a model that accurate. If you then mix in the data from the global model, we’re seeing close to a 20% lift.
Evan: That’s a lot to digest. So let me break it down. Pretty much anyone, whether you’re a company or an individual, can implement a machine learning system to solve a particular problem. Let’s say the problem is fraud, since that’s the problem that Sift is trying to solve. But Sift’s advantage is that to fight fraud, Sift draws on a global network of data about fraudsters and potential fraudsters from all of its customers. So just because you’re using the same basic algorithms as Sift doesn’t mean your results will be as accurate. They won’t be.
Fred: So we evaluate and test that pretty methodically to make sure we’re validating this, you know, thesis, that this global lens is really the meaningful differentiator behind being able to apply machine learning effectively. So that’s the first. The second is, I think this misapprehension about what machine learning is. I think there’s a little bit of confusion around what machine learning is. It’s not a, kind of, like, “set it and forget it” system. You can’t just kind of go and install a machine learning algorithm and assume, “Ah, I have now and forevermore solved all of my problems. And any other, kind of, use case or problem that comes, well the machine learning algorithm will maximally solve that problem for me, and I’ll be happy.” Well, unfortunately, the reality isn’t like that.
Evan: Machine learning is needy. It’s like that friend who’s constantly seeking validation and always needs to hear every little thing that’s going on in your life. Machine learning demands constant attention, updating and improving parameters in order for it to function properly.
Fred: What that means is the infrastructure that powers machine learning is often more important than the algorithms themselves. So the ability to, kind of, like, train and test and evaluate models, that infrastructure and the pipeline for building those models, taking all the data, transforming it, extracting features, and putting them into a form that can be put into a model, and then looking at the evaluations and trying to reason about, “Does the model perform better? Under what circumstances? For what class of the problem? Across what kinds of dimensions?” And then surfacing and shipping those models online to customers. That’s actually the harder problem than implementing and encoding or selecting the algorithm that you want to use.
And so what we’ve done at Sift is, kind of, novel, in the sense that we’ve built this infrastructure that allows us to train, test, evaluate. And, core to our business, this ability to go and do this rapidly and at scale for a problem that is really, really tough already, finding fraud, but is only getting tougher. Because fraudsters are getting more and more motivated, and more and more sophisticated in their approaches.
Evan: So, Fred says, “Sift has recognized exactly how needy machine learning is. And the architecture of Sift’s system has been designed specifically to support that.”
Fred: You need to back up that algorithm and that approach with an infrastructure and an investment that allows you to continually improve it.
Evan: Thanks for joining me on “Trust & Safety in Numbers.” Until next time, stay vigilant, fraud fighters.
Learn more about what sets Sift Science’s machine learning apart.
With billions of compromised credentials already in criminals’ hands, how do you protect your users’ accounts, your brand, and your bottom line?