Partner Article

The History Of Web Scraping And What The Future Holds

By Or Lenchner, CEO, Bright Data

Web scraping has existed for almost as long as the internet itself and plays an important role in today’s digital world. Despite the first web scraping API being launched at the turn of the millennium, the conversation around web scraping is as topical as ever following a recent decision by the U.S. Ninth Circuit of Appeals, which concluded as an interim decision that the collection of publicly accessible data does not breach the Computer Fraud and Abuse Act (CFFA), following a legal challenge from LinkedIn. However, what are the origins of this technology, what role does it play in today’s business landscape and what will the outcome of the ruling mean for the future?

The Origins Of Web Scraping

The first instance of web crawling goes back to 1993, which was a significant year for this technology. In June of that year, Matthew Gray developed the World Wide Web Wanderer Offsite Link to measure the size of the internet. Later that year, this was used to generate an index called the “Wandex”, and this allowed for the first web search engine to be created. Today, we take that for granted with major search engines providing a wealth of results almost instantly. Remarkably, before JumpStation’s web scraping technology was launched, the data collection was carried out by a manual administrator who’d collect and format data sets, which would hopefully align with what users were searching for.

Information Is Power In The Digital Age

Nearly twenty years on the concept of collecting publicly available data is a key foundation for many businesses across a wealth of sectors. That’s because the internet has become the biggest data resource in the world and insights for business are no longer coming solely from legacy channels, such as reports and manual databases, but also from near-live insights from the web. Public web scraping lets leaders make better-informed decisions that strongly impact their organizational and operational strategies as well as business outcomes.

There are plenty of compelling academic and business use cases that highlight the importance of collecting and analyzing public web data. For example, leading businesses use this technology to gather information on the state of markets, competitor intelligence such as pricing and stock levels, and consumer sentiment. Researchers, academics, investors, and journalists also all use public web scraping in their data strategies to gain real-time insights and base their reporting on credible data points. These include a look into public sentiments and wellbeing, organizational team structures, growth prospects, and the competitive landscape for target audience engagement.

Challenges To Web Scraping

Despite the clear, wide-ranging benefits of web scraping, LinkedIn attempted to restrict hiQ Labs, a data analytics company that collects publicly available data from LinkedIn profiles, from accessing its website in 2017. Its technology is used by companies to retain highly desirable employees, as well as identify knowledge/skill gaps within the organization. LinkedIn’s ban restricted hiQ Labs from operating any of its services and a legal battle followed in the US. This resulted in a court case in which a district court ruled in favor of hiQ. This has triggered a string of appeals in recent years, following which the case was sent back to The Ninth Circuit. In April 2022, the Ninth Circuit granted hiQ’s request for a preliminary injunction, meaning LinkedIn could not block hiQ from accessing its website. The court ruled LinkedIn’s claims of hiQ breaching laws such as the CFAA are unwarranted, as the data in question is publicly available.

What’s Next?

The Ninth Circuit’s ruling reaffirms the foundation on which the internet, the largest database ever created, was built: democratizing information for everyone. The ruling clearly states that scraping data that is publicly accessible on the internet is not a violation of the CFAA. Although the final outcome of this case is not yet known and there could be more legal challenges to come, the latest ruling by the US courts is a big win for archivists, academics, researchers, journalists and businesses that rely on the insights web scraping can provide. The future is bright for web scraping as the amount of online data continues to explode and this can be turned into insights and harnessed by users around the world.

Disclaimer: The information provided herein does not and is not intended to constitute legal advice. All information, content, and materials available herein are for general informational purposes only. Information herein may not constitute the most up-to-date legal or other information.

This was posted in Bdaily's Members' News section by Bright Data .