October 17, 2023

Navigating Data Ownership in the AI Age, Part 1: Types of Big Data and AI-Derived Data

Blogs, Online and On Point

Author(s) Erin Jane Illman, Sinan Pismisoglu

Online and On Point

The emergence of big data, artificial intelligence (AI), and the Internet of Things (IoT) has fundamentally transformed our understanding and utilization of data. While the value of big data is beyond dispute, its management introduces intricate legal questions, particularly concerning data ownership, licensing, and the protection of derived data. This article, the first installment in a two-part series, outlines challenges and opportunities presented by AI-processed and IoT-generated data. The second part, to be published Thursday, October 19, will discuss the complexities of the legal frameworks that govern data ownership.

Defining Big Data and Its Legal Implications

Big data serves as a comprehensive term for large, dynamically evolving collections of electronic data that often exceed the capabilities of traditional data management systems. This data is not merely voluminous but also possesses two key attributes with significant legal ramifications. First, big data is a valuable asset that can be leveraged for a multitude of applications, ranging from decoding consumer preferences to forecasting macroeconomic trends and identifying public health patterns. Second, the richness of big data often means it contains sensitive and confidential information, such as proprietary business intelligence and personally identifiable information (PII). As a result, the management and utilization of big data require stringent legal safeguards to ensure both the security and ethical handling of this information.

Legal Frameworks Governing Data Ownership

Navigating the intricate landscape of data ownership necessitates a multi-dimensional understanding that encompasses legal, ethical, and technological considerations. This complexity is further heightened by diverse intellectual property (IP) laws and trade secret statutes, each of which can confer exclusive rights over specific data sets. Additionally, jurisdictional variations in data protection laws, such as the European Union’s General Data Protection Regulation (GDPR) and the United States’ California Consumer Privacy Act (CCPA), introduce another layer of complexity. These laws empower individuals with greater control over their personal data, granting them the right to access, correct, delete, or port their information. However, the concept of “ownership” often varies depending on the jurisdiction and the type of data involved — be it personal or anonymized.

Machine-Generated Data and Ownership

The issue of data ownership extends beyond individual data to include machine-generated data, which introduces its own set of complexities. Whether it’s smart assistants generating data based on human interaction or autonomous vehicles operating independently of human input, ownership often resides with the entity that owns or operates the machine. This is typically defined by terms of service or end-user license agreements (EULAs). Moreover, IP laws, including patents and trade secrets, can also come into play, especially when the data undergoes specialized processing or analysis.

Derived Data and Algorithms

Derived and derivative algorithms refer to computational models or methods that evolve from, adapt, or draw inspiration from pre-existing algorithms. These new algorithms must introduce innovative functionalities, optimizations, or applications to be considered derived or derivative. Under U.S. copyright law, the creator of a derivative work generally holds the copyright for the new elements that did not exist in the original work. However, this does not extend to the foundational algorithm upon which the derivative algorithm is based. The ownership of the original algorithm remains with its initial creator unless explicitly transferred through legal means such as a licensing agreement.

In the field of patent law, derivative algorithms could potentially be patented if they meet the criteria of being new, non-obvious, and useful. However, the patent would only cover the novel aspects of the derivative algorithm, not the foundational algorithm from which it was derived. The original algorithm’s patent holder retains their rights, and any use of the derivative algorithm that employs the original algorithm’s patented aspects would require permission or licensing from the original patent holder.

Derived and derivative algorithms may also be subject to trade secret protection, which safeguards confidential information that provides a competitive advantage to its owner. Unlike patents, trade secrets do not require registration or public disclosure but do necessitate reasonable measures to maintain secrecy. For example, a company may employ non-disclosure agreements, encryption, or physical security measures to protect its proprietary algorithms.

AI-Processed and Derived Data

The advent of AI has ushered in a new era of data analytics, presenting both unique opportunities and challenges in the domain of IP rights. AI’s ability to generate “derived data” or “usage data” has far-reaching implications that intersect with multiple legal frameworks, including copyright, trade secrets, and potentially even patent law. This intersectionality adds a layer of complexity to the issue of data ownership, underscoring the critical need for explicit contractual clarity in licensing agreements and Data Use Agreements (DUAs).

AI-processed and derived data can manifest in various forms, each with unique characteristics. Extracted data refers to data culled from larger datasets for specific analyses. Restructured data has been reformatted or reorganized to facilitate more straightforward analysis. Augmented data is enriched with additional variables or parameters to provide a more comprehensive view. Inferred data involves the creation of new variables or insights based on the analysis of existing data. Lastly, modeled data has been transformed through ML models to predict future outcomes or trends. Importantly, these data types often contain new information or insights not present in the original dataset, thereby adding multiple layers of value and utility.

The benefits of using AI-processed and derived data can be encapsulated in three main points. First, AI algorithms can clean, sort, and enrich data, enhancing its quality. Second, the insights generated by AI can add significant value to the original data, rendering it more useful for various applications. Third, AI-processed data can catalyze new research, innovation, and product development avenues.

Conversely, the challenges in data ownership are multifaceted. First, AI-processed and derived data often involves a complex web of multiple stakeholders, including data providers, AI developers, and end users, which can complicate the determination of ownership rights. Second, the rapidly evolving landscape of AI and data science leads to a lack of clear definitions for terms like “derived data,” thereby introducing potential ambiguities in legal agreements. Third, given the involvement of multiple parties, it becomes imperative to establish clear and consistent definitions and agreements that meticulously outline the rights and responsibilities of each stakeholder.