Alex Gamero-Garrido, Ph.D. Candidate in Computer Science at University of California San Diego, now Northeastern Faculty Fellow
Can you tell us about your project and why you approached us?
I was completing my Ph.D. at the University of California San Diego on: ’How countries get their internet traffic from other countries and the implications on privacy, security, and reliability. The reason I approached you is that you have access to one of the most extensive web data platforms available to researchers. Compared to other platforms, it is so much more far-reaching in terms of coverage and countries.
Can you go into detail about your project? What were you looking for and why did you need access to web data?
Sure, the high-level idea is that the internet is a very large database, and so it’s composed of many networks. So, if you’re a person in any country, say in Ethiopia, you connect to the internet through your mobile provider or your fixed provider, and for the vast majority of people, that’s where their understanding of the model of the network ends – that’s my device, that’s my network model, that’s it. Now in reality there are tens of thousands of networks on the internet and a small number of them, play a critical role in terms of delivering traffic to many countries. So, identifying which networks are in that critical, privileged, position is important in terms of understanding how users get their traffic, who may be able to observe their traffic, and who may even be able to tamper with their traffic. Now at face value that is an interesting problem, I think it’s very straightforward to see why that’s the case, but there’s a lot of technical challenges within.
The biggest challenge is that we see an incomplete network, and this is because the way networks exchange traffic is that they have business agreements. Just like in any industry, they have some parameters about how much traffic they are going to exchange for how much money, or for free, and so on. Those agreements are secret usually, and so what we have to do as researchers is try to find ways to reveal them – to look at those networks with transparency. The way we do that is by running active measurements and by relying on some companies that cooperate with us and share their information with us.
The point at which Bright Data through The Bright Initiative comes in is because if you have measurement locations or measurement sources in more networks and in more countries, then you are more likely to reveal the correct networks that are delivering traffic. The number of places for measurements significantly expands our coverage and improves our accuracy as a result.
What do you think is the benefit The Bright Initiative provided you with?
I think it’s important in order to answer the question to understand the alternatives. The biggest public platform from which you can run internet measurements is run by the European Register, which is called RIPE NCC. That platform covers approximately 5% of the world’s networks. The number for Bright Data is much larger – a much larger percentage of the world’s networks, and the ability to essentially explore the public internet from more networks gives us the opportunity to reveal more of these crucial business agreements that are central to the connectivity of the countries that we have studied. So, as I said, I am in the preliminary stages of incorporating it, but I can already tell that the density of information that we can reveal through your platform is much bigger than the density of information that we can reveal from the alternatives that are available to the public. And so, it is really qualitatively and quantitatively a big jump in terms of how much visibility we gain.
You mentioned data protection and the data compliance process here – what can you tell us about it?
The global culture of privacy is changing. I think people, the general public, and slowly also regulators, are taking privacy more seriously. It is good to see that the private sector is doing it voluntarily, which is unfortunately not always the case. So, I was happy to see that there were some hurdles for me to clear when joining The Bright Initiative and using Bright Data, which for some other researchers may be slightly annoying, but I was happy that that process was in place, given that the really big goal here is to protect people’s information.
How has your experience been so far?
My experience has actually been great. It has been very straightforward, everybody has been friendly and kind of useful in terms of advice, and the process was very quick. You do take data protection seriously, which is again a good sign, and I’m just happy to say it has been very straightforward and a very positive experience.