Podcast

Harvard Business School Professors Bill Kerr and Joe Fuller talk to leaders grappling with the forces reshaping the nature of work.

01 Oct 2025
Managing the Future of Work

How web data is fueling the robot revolution

Bright Data's Or Lenchner on the evolving ground rules for harnessing the web's data. Charting the boundaries of fair use in training AI systems and robots. Also, the data gathering and analytics workforce.

Bill Kerr: The internet has always combined elements of the commons, private content we post online, and commercially controlled material. Those pieces have never fit neatly together, and large-scale collection of data to train AI is bringing tensions to the surface. Dynamic data, especially, is speeding the spread of automation—reshaping clerical, service and even creative work, while opening up new opportunities for engineers, compliance experts, and specialists who can put these systems to use. The rules being set now will influence not just who owns the data but the trajectory of AI and work itself.

Welcome to the Managing the Future of Work podcast from Harvard Business School. I’m your host, Bill Kerr. My guest today is Or Lenchner, CEO of Bright Data, whose data collection, analytics, and web tools put the firm at the center of the debate. We’ll talk about how the rapid emergence of generative AI has changed the data collection market. We’ll drill into the policy landscape, and we’ll discuss the ethical issues involved. We’ll also consider what all this means for skills, jobs, and workforce strategy. Or, welcome to the podcast.

Or Lenchner: Thanks for having me. Excited to be here.

Kerr: Or, why don’t you begin with a little bit about your background and what got you into this industry?

Lenchner: My background was always products. I always built products, and it was always somewhere around the internet. Started probably 15 years ago with some websites that I was able to monetize fairly quickly. I sold a bunch, and I just kept building these web-based products until I found this company, Bright Data, around a decade ago and joined as a product manager and became CEO in late 2018.

Kerr: Yeah. Well, I’m curious, just for one thing: we have a lot of students and entrepreneurs that are trying to get into a very fast-moving space due to things like generative AI that we’ll talk about. Just looking back over those 15 years and having seen so many different evolutions of the internet, what are a couple of things you wish you had known at the beginning or you’ve learned along the way?

Lenchner: So obviously everything is changing all the time and evolving and AI is just shifting, pushing everything faster than before, but it’s the same old thing of looking straight into this new revolution—whether it’s the internet 20 years ago or AI today or many things in between—and not telling too many lies to yourself as everything is moving so fast. It’s pretty similar to my product management philosophy in a way. I mean, just look at the numbers and forget about anything else. So just sticking with that doesn’t matter what happens in the world.

Kerr: That seems like pretty timeless advice. Don’t tell yourself too many lies.

Lenchner: It’s easy to start lying to yourself.

Kerr: So let’s talk about Bright Data, which is the venture that has stuck the longest with you here. Tell us a bit about the company and what are some of the offerings, customer use cases, that you most deliver.

Lenchner: Sure. So in the most general way to explain what we’re doing is the largest web data collection company in the world, obviously putting Google aside, but when we’re referring to a B2B business, in the scale of us collecting the data, in the revenues, in the number of employees. And it sounds maybe trivial. I mean there’s a lot of data on the web. Yes, someone collects it. I mean, it’s now more trivial with AI, but actually when we’re talking about scale and speed, it’s becoming incredibly difficult to do. And this is the technology that we build, allowing roughly 20,000 organizations, including largest LLM labs and e-commerce platforms and banks and cybersecurity to collect this precious asset, which is pretty much everything that is happening in the world. When this podcast will be online in a few days, I assume, that’s new fresh data that is being updated into the largest database in the history of humanity, and it is being collected and we are the enablers of that—or one of the enablers.

Kerr: We always like to hear that people are listening when we say stuff on this podcast. But let’s stay with this a little bit longer. So you mentioned speed, so I’m guessing you’re channeling here that there’s a financial services firm that may want to have very quick data on which to be able to make trading decisions or something like that. That’s kind of some of the information that you’re able to provide to those companies. Is that the sense of the speed that’s involved?

Lenchner: Absolutely, but AI just took it to a completely next level. So what you said is absolutely true and we serve a lot of these, for example, hedge funds. And they need to know what’s going on in the markets to place their bids right now and not after they read the quarterly reports. They need to know what’s going on in the world right now, and this is by, for example, reading customer reviews and user sentiment about the product that that company that is being traded on public markets is selling or, no, what Reddit is all about around this company or use case. But actually we thought this is what speed means until AI started. And now just think about you using a chatbot It’s about getting that knowledgeable intelligent output with the right fresh data. And it’s very easy to understand when you’re talking about prices of products. You can get the best deep research on a specific product, I don’t know. Milk. If you actually want to buy milk right now and price matters, shipping time matters, you have no idea. And so now we’re serving a lot of these folks and that’s a whole new level of speed and super low latency that we need to support. So you actually, as a user, will get the right price, which was updated just a few seconds ago, and you’ll have it in your favorable chatbot, whatever it is.

Kerr: If you had to say what share of use cases are wanting to understand things from the internet that were there a week ago and obviously before, the share that were there up until a minute ago to a week ago, and then the share that are at this very frontier edge, how does that tend to play out for you right now?

Lenchner: I would say that many of these things, it’s like a classic Pareto, so 80 percent needs to be fresh, and 20 percent can be more outdated. But the freshness... within the freshness part you will see a Pareto again. You can divide it again and I can keep going.

Kerr: Yeah. And I assume where part of this is you’re going to is as we move to a world where you could have agentic commerce, so I’m not only just using the LLM to learn about the ingredients in a milk brand, but I’m saying, go buy me some milk, and I need to have it and these other few things in the next few hours. It’s got to make all these choices as to what the prices are, how quickly they can get the inventory aligned and delivered, and that’s where you’re going to come in and play a special role.

Lenchner: Absolutely. And just think about the internet today, like 7 billion people able to browse the web. But to your point, just maybe a few months, for sure a few years ahead, we’re talking about hundreds of billions of web agents that will automate us humans. That’s like a completely different scale that we’re talking about of internet because you’re unlimited in the amount of agents that will browse the web. It’s already huge, but it’s just the starting point.

Kerr: Wow. And you’ve talked about, I think, the company Bright Data as being an infrastructure provider. And so, is that in the sense that you’re imagining this organization to be the infrastructure that facilitates this information flow across the web to a business that needs to participate?

Lenchner: For sure and that’s already the case. We’re an infrastructure company, a critical infrastructure company, and until, I would say, a year ago I would call us a data collection company. But now we’re also a web access company, and that’s to your latter point about these agents that need to browse the web always, all the time, 24/7, large scale. They need that infrastructure. It was kind of the natural evolution for us because it’s pretty much the same technology to get into a website without being blocked, and a lot, many times like in large scale, super-fast, then understand the structure of the website, make actions in the website, just think about understanding the price of a product in an e-commerce platform. You need to search something in the search bar. Then you need to load the search results, to scroll, to do pagination, to click on a result page. Inside a product page, you need to pick and choose the different variations of that product. That’s a lot of actions that needs to happen automatically. We’re already there. It is working the same if you want to book an Uber ride or book a restaurant or whatever it is, just another click. So from an infrastructure perspective, we—and actually our whole industry—is kind of already there. It’s the user adoption of these agents is not there yet, but it’s going to be there in a few months.

Kerr: But the pipes have already been laid, is what I’m hearing. The underlying IT web infrastructure is going to be able to keep up with the consumer adoption as that begins to scale up. Okay, let’s go back and continue on with generative AI. So you described the speed aspect. So just as this technology becomes such a potent and critical part of our web environment and the way the web gets used, are there things beyond just the speed that is necessary for you to keep up with that world?

Lenchner: Yeah, it’s always speed and scale. The requirements for speed and scale of our customers. For example, in day one when Covid started, saw how the world shifting from travel to e-commerce. Now everyone saw that a few years into Covid. We saw that in the first day because our customers that are serving billions of internet users started to cancel flights and we’re serving all of the travel tech industry and started to buy more online, and we’re serving all of the e-commerce industry, and we had to make sure that speed and scale will increase just to support all of these motions. And so every evolution and revolution in the world that is taking shape, impacts speed and scale. It’s always the same thing and it’s always bigger than we planned and thought.

Kerr: So this is a world where there’s a lot of legal and policy issues swirling around. It’s involving publicly available web data, there’s privacy issues and so forth. So maybe, I think, for myself and for listeners, take a step back and tell us about where the policy environment was pre-Covid—like five years ago—and then what are some of the important recent issues that have been arising, especially as we look at generative AI content and how that’s being deployed.

Lenchner: We were always very confident that, because the value of data is so big, there will come a day that it will be regulated. So we decided to be a self-regulated company right from the beginning, which is mostly two things in our industry. First thing is about private data or PII [personally identifiable information] is data that can identify you as a private person. And that was fairly easy because I think it was 2016, ’17 when we were still smaller and younger, the GDPR regulation—and in California the CCPA—started taking shape. And that was a good thing. You know what you can and cannot do. So that’s one thing and we were always kind of understanding that and making sure we’re aligned, but the more complicated stuff is about what you can and cannot collect from the web. And we were always regulating ourselves on that matter and pretty much defining what’s considered to be public information and what not, and fast-forward, we proved that in court.

Kerr: Give us a couple of examples to kind of ground, like places where this was in tension, like whether you can or should take something off of the web that’s otherwise public, nonpersonal data.

Lenchner: So we never wanted to be the judge in the sense of the use case because the same data can mean different things for different entities. The price of a product on a large e-commerce platform can be served as data for competitors to understand how they need to price their products, but it can also serve a hedge fund in analyzing if they should acquire that company or what’s the trends in this industry. I have a story I like to mention of a hedge fund from New York that is a Bright Data customer, and they tracked the prices of bleach on e-commerce platforms for many, many quarters. And we couldn’t understand why until we saw that they bought a chemical company one day. But the same data point, which is price of bleach, can be used for competition, which is all good things. So we decided to take a broader decision, which is just defining what’s public and what’s not. That’s easier and we can take responsibility for that as a vendor. So since the early days we defined that everything that does not require a login into a website is public. Just think about you opening an incognito browser. So you’re always logged out, everything that you can see is out there for everyone. If you [as the website operator] want only a group of people to be able to see this information—not everyone—you can put it behind the login and then who logged in, you can give them permissions and so on. And long story short, end of 2023, one of our customers, Meta, sued us in California, trying to stop us from collecting publicly available information, by our definition, from their assets, Instagram and Facebook. And we won that case and the judge in California decided or kind of accepted our definition for what’s public information on the web and what’s not. The same happened then with X. They were a customer as well. And we won this case as well. And this industry is now being regulated and slowly but mostly through court cases that then pass to the legislators. We’re tracking these court cases. We’re seeing a lot of rulings around how copyright protected material can or cannot be used for training models. And what we’re seeing is that judges, again, everything is happening in California, are taking the approach, that I think the logical approach, of you can’t really stop all of this innovation from happening. You shouldn’t, and you can’t. And their rulings are a lot in favor of, yes, you can take this content for training. Very interesting times. I think that we need a couple more years for all of this different litigation to end before we will see proper regulation in place.

Kerr: Tell us a little bit about some of the questions about anti-bot measures and how that is playing out.

Lenchner: In the web scraping world, you can either hire a lot of cheap workforce in some countries and give them all laptops and they can manually collect the prices of products over and over and over again. Not very useful. Or you can automate all of them into a machine that will be a lot more useful and can do everything, as I keep saying, in larger scale, faster speed. Since the early days of the internet, there were also companies that are trying to block these bots and trying to identify or separate between a real human and a bot. That’s, I would say, a cat and mouse game that we’ve been playing for years with these companies. I will always claim, and happy to take this debate, that opening access to public information will always be a positive thing rather than closing access to public information. Especially if you want to innovate, build the technologies or just have transparency for the sake of competition.

Kerr: Certainly, you’re identifying even just access to data or information. What are the prices? There’s going to be increasing and going back to that agentic world where one platform may want to buy the products off of another platform and bundle them up into their services, but they might be blocked by the competitor if the competitor is trying to remove that barrier. So there’s going to be both the actual information flow and access to the visibility, but then the actions themselves are going to be increasingly under some questions there.

Lenchner: It will evolve. Once it will go out from being a theory into production, then we’ll see. And even in the standard scraping world, there are use cases that are not cool, and we won’t allow them. For example, I’m doing some fake accounts on social media networks. It’s just all virtual, and it’s automating human behavior. But you can’t do that with Bright Data. We won’t allow it because we can’t understand what good value will come out from that to the world. I’m sure that you will find such use cases that are just a lose-lose situation, and then some will support them, some won’t. But these things need actually to start to happen before I think, in my opinion, anyone is trying to regulate that.

Kerr: Yeah, go a little further on that just broader ethical point. We’ve talked about the specifics of policy and the lawsuits, but you have an AI-for-good program. Tell us about the ethical approach of the organization towards these important questions.

Lenchner: So in Covid, we were scratching our head: “What can we do to help the world in this situation?” It was from a business perspective, it was a fairly good time because everything moved online and e-commerce was exploding, and we enjoyed that, but everyone was stuck at home. We partnered up with a few scientists and helped them to find data on the web that helped them in their real-world research. And this is when we realized that, okay, we can probably do that also when the world is not in Covid mode and just like standard days—if there’s such a thing in this crazy world. And we started the Bright Initiative, which pretty much gives the Bright Data toolsets, everything, with our expertise completely for free, pro bono, for any do-good use case, including academic research and anything else. A lot of NGOs. I’ll just share one example. So we are serving a bunch of NGOs that are trying to find and fight human trafficking. And surprisingly or maybe not, it all happens on websites that we’re all in. Classified boards, popular classified boards that you can find human trafficking and sex trafficking within. But if you don’t have the ability to collect massive amounts of data and then to analyze them, then you won’t find it, as a police officer with access to the internet, you will never find these things.

Kerr: That’s terrible. And thank you for efforts to try to curb some of those behaviors there. Let’s move and think some about the workforce. And I’d like to begin internally for you and then we’ll go to your customers and clients in a second. But tell us a little bit about your current hiring strategy. What is the workforce that you need to assemble and how is that shifting with generative AI and with the last five years and some of the changes that have been underway?

Lenchner: To start with, I was always kind of obsessed with keeping the company as smaller as possible. We’re roughly 450 employees right now. I don’t know if it sounds little or a lot, but comparing to our revenues, it’s nothing. We’re a fast-growing, very profitable company, and we are obsessed in only recruiting what we must have. And I am still approving every single job opening in the company. And most of my executive team just hates me for that. But it proves itself, because it helps us to stay focused. And I kind of feel that everyone that are talking about reducing the workforce now because you can replace a lot with AI, I’m not there yet. We’re using a lot. It’s magic, but it’s not really replacing any one of the engineers, and it’s kind of helping people just to do more. But I still don’t see how these tools replace humans, and I’m saying it from a perspective that I’m the largest web data vendor to all of these LLM labs and AI companies that are building these products.

Kerr: You’re in an environment, though, where also the skills of your 450 need to be replenished at a very rapid rate. Do you mostly do that through internal training? Do the quality of the engineers and someone that you’re hiring lead you to just rely on them themselves to kind of continually upskill them? Do you bring new people in? How do you think about the overall skills evolution?

Lenchner: Yeah, that’s a good point. I think that a few months ago I kind of realized that for the first time in my career I failed—that new joiners to the company are smarter than me in my domain. I always try to recruit smarter people, but I always felt that I know better than them in my domain because I’ve built most of these products in our industry. I just need to make sure I recruit the right people. We’re able to take the junior employees in our team, which for example are usually starting in the support team. They’re still students for computer science or just finished their degree and they need a starting point. So starting in the support team, deployment team is a good starting point. So we found a few of them that kind of know AI better than any veteran engineer in the company just because it’s their time to shine and they are now a pivotal part in the company. They are releasing the newest integrations that we are having with the hottest trendiest AI companies and I have no idea how to do all of these things that they’re doing. And a second thing that we’re seeing is that recruiting for specific skill set is something that we just need to master right now. So far, until a few months ago, a great engineer that passed all of our exams, just he’s welcome. We don’t care where he is, where he’s working from in the world, and it just works. If he doesn’t know a specific coding language, he will learn it on the fly. That’s okay. But for the AI part, we’re now understanding that there’s a specific skill set that we need to look for, which is kind of new for us.

Kerr: Let’s swivel around and think about your customers. And many people used to, and probably in the world more broadly, still do have people on their team that go out and try to snoop around their competitors’ websites and understand what’s going on. But you bring a lot more, obviously, firepower to that. How does this typically change their workforce strategy? Who’s using the data the most?

Lenchner: So yeah, we’re seeing a pretty clear trend. I’m not sure if it’s driven by actual attempt for value creation or just by sheer fear to stay behind. I’m talking from the customer perspective, but we’re seeing that usually like the CTO level, it depends on the company side, but more of the visionary level in the company will take a new role of trying to figure out the next direction. There will be a group of people that needs to think about the next few years and how everything is going to change. Some of them have cool ideas that make sense and some of them are just, they got the job and they need to figure something out. But I think it’s something good to have, a good practice and experiment to have.

Kerr: A different way I may also frame the question is, what’s the number one thing that when you hear a potential client talking about it or they express, that you worry, you’re not ready for this, your data sophistication is not where it needs to be. Is there something that is kind of like a red flag for you that says you need to mature yourself a little bit more before this can really be a useful tool?

Lenchner: It’s less about you’re thinking about data the wrong way, because actually the tools that we’ve been building for the last decade fit exactly the same new use cases that everyone are talking about. But what problem does it solve? Who needs this solution?

Kerr: We’re in such a fast-moving environment, with respect to gen AI and the web and e-commerce in that domain. What are some of the things that you’ve—either you’ve earmarked that these are going to happen, get ready for it, or you’re really watching for a signal, something that will indicate that something you could have imagined being 10 years away is no, it’s actually here. We need to start being ready to act upon it now.

Lenchner: Yeah, so I would say the trillion-dollar question, I think bigger than what we call or perceive as AI today, and that’s robotics. And I’m seeing that not as a theoretical vision or attempt to understand what’s going to happen. I’m seeing them because I’m talking to the model builders, teams. Those who are actually writing the code and need me to support their capacity for future years. All of the AI that we’re using today, eventually we’ll keep using it as is, as a chatbot, as a video model that is doing cool stuff. But eventually it’s all going to be condensed into a brain of a robot that will be in the physical world with us. I’m not sure if it’s going to be available for everyone from a trust perspective, from a cost perspective, in the next year or two. But this is by far the strongest trend I’m seeing into the future. Robots need a lot of data to understand the world.

Kerr: And just to continue forward for Bright Data, how would you play a role in that future? And I’m going to also second that robotics as just, even in the greater Boston area, you tend to see so many more very basic in many kinds, but robots that are in many of the supermarkets and so forth and it’s coming quickly. But what do you think Bright Data is going to do in that environment?

Lenchner: Just think about all these models that are winning gold medals in mathematics Olympics. Their intelligence is at the top, PhD-level math, but their knowledge hardly exists. And that’s always where we’re playing. It’s a lot of data to train the models and the robots to be as intelligent as possible. But also the real-time data, as I keep saying, large scale super-fast in order to have the relevant knowledge in the right time to get decisions. Otherwise, it will be completely useless for us if we really want to live together with these, wherever it will be, these robots or humanoids that will be here.

Kerr: Or, I’ve got a final question for you. Your day job keeps you very close to the frontier, but for yourself, how do you stay the most relevant person you can be toward this future technology? And do you have any other, for people that maybe their day job doesn’t keep them in such bleeding contact with the frontier, advice as to how they should be able to see better around that corner? Be able to look ahead a little bit further?

Lenchner: Yeah, maybe it’s a bit funny, but it’s easy to remember. Click buttons and what I mean is, a new tool comes out, click every button you can see. That’s the best way to learn. I’m clicking every button I can see in our platform and in every tool I’m using. I’m breaking a bunch of things while doing so, but this is how I keep learning because I don’t have time to read. That’s basically, in the reality.

Kerr: Great. Well, Or Lenchner is the CEO of Bright Data. Or, thanks so much for joining us today.

Lenchner: Thanks a lot.

Kerr: We hope you enjoy the Managing the Future of Work podcast. If you haven’t already, please subscribe and rate the show wherever you get your podcasts. You can find out more about the Managing the Future of Work Project at our website hbs.edu/managingthefutureofwork. While you’re there, sign up for our newsletter.

SUBSCRIBE ON iTUNES