0Patenting "Big Data"

Traditionally, “Big Data” referred to solving computationally intensive problems using massive amounts of data.  Because of the costs involved, Big Data was initially limited to government agencies or large academic institutions that had access to the most advanced computing resources and sought to solve complex challenges such as predicting weather patterns or mapping DNA sequencing. In the ’80s and ’90s, however, certain consumer industries such as telephone companies and credit card providers learned to mine their massive databases of call records and charge receipts to find “nuggets” of information. Subtle trends and aggregated statistics gave marketing analysts insight into ways to price their services or predict when a cardmember was primed to purchase that trip to Hawaii. More recently, the advent of cloud storage and processing services offered by Amazon and Google, coupled with large-scale open source databases such as Hadoop have drastically reduced the costs to capture, store and analyze large amounts of data.

The sheer abundance of available data suggests that the importance of “Big Data” will only continue to grow. For example, consider that at the end of 2011, an estimated 30 billion individual items were being tracked with unique RFID tags, and Twitter and Facebook were creating a combined 40 TB of log data every day.[1]  In addition, the increased willingness of companies and consumers to share data about their daily operations and activities creates an enormous amount of user-generated data. By combining traditional transactional data (e.g., sales, calls, trades, etc.) with user-supplied data that often includes references to products, places and events, the dimensionality of the data increases along with its sheer volume. No longer are companies simply drilling down into their data, but now companies are asking questions that cut “across” datasets looking for trends and opportunities that, absent the marriage of multiple data sources, would have otherwise gone unrecognized. Moreover, companies that have access to much of the consumer data (posts, tweets, check-ins, networking data, etc.) are realizing the value of the data. The availability of valuable data and the important trends that an analysis of such data can reveal have created opportunities for companies to monetize access to the data. All of this has brought Big Data to the forefront of current IT trends.

The increased focus on Big Data has brought with it challenging questions around how companies operating in this realm can protect their proprietary intellectual property, and, in particular, what is “patentable.”  Conventional database and storage patents typically focused on the hardware (e.g., high-throughput network storage systems) or the database management software systems used to implement the transactional and/or analytical processes that stored and accessed the data. But as companies move to standard, off-the-shelf (often open source) platforms and cloud-provisioned hardware and storage, the IP created by the new breed of companies becomes more nebulous and difficult to identify, and even harder to patent. After overcoming the somewhat esoteric (and often controversial) requirements of business methods implemented in software when it comes to patents, the process must still be novel (no one person has practiced the process before) and non-obvious (the system not simply a combination of known processes operating as intended).

One approach to identifying potentially patentable subject matter in a “Big Data” environment is to break the process down into three phases – ingestion and cohesion, analysis and provision. By way of example, consider a large retail chain that collects internal data from its point-of-sale, supply chain and customer database systems and external data from market research companies, social media sites (e.g., its Facebook page), third-party credit card companies, and its suppliers and vendors. Collecting and organizing this volume of data on a daily basis is certainly a challenge. Whether through custom programming interfaces, proprietary data services, or other custom data ingestion processes, the mere task of getting all the data into one place at one time and in a format that allows data from disparate sources to be used together can be fertile ground for patents. Processes used to normalize disparate data, filling in “missing” data and formatting data into a common construct for easier storage and/or analysis are just a few possible areas to consider. Using the example above, information about the retailer’s customers purchasing and payment history may come from data sources using seemingly incompatible formats and be organized using different dimensions (time, product, source, etc.) and thus requires a system that ingests data structured in various formats and uses various analytics processes to generate normalized, structured metadata that describes the data.[2] 

Once the data is stored and structured in a usable form, the task of making sense of the data begins. For example, the retailer may want to understand the correlation between an uptick in the activity on their Facebook page (posts, likes, comments, etc.) about particular products and subsequent sales of that product. Conversely, poor ratings or complaints may initiate customer service calls or discontinuation of a product. While database marketing and analysis methods have been used for many years to uncover trends and predict purchasing habits, the application of these techniques to such disparate data sources or the processes for finding and extracting the key data in a timely and cost-efficient manner may be new. The complexity and volume of data may require novel query techniques such as mapping meaningful data entities to underlying database elements or breaking complex queries into more manageable, interdependent queries.[3]

In addition to companies that collect data for their own use, many now recognize the value the data has to others, whether for market research, benchmarking studies or other analysis. In some instances, the provision of this data may in fact be the only source of revenue of the company. However, the need to manage and track where the data originated, what restrictions are placed on the use of the data and how to deliver the data in a manner that complies with these restrictions can lead to innovative, patentable subject matter. This can include techniques for anonymizing or aggregating the data such that personally identifiable data or other proprietary information is not compromised.[4]  Systems for graphically representing large volumes of data, providing access to the data or transmitting the data are also fertile ground for patentable subject matter. Patenting how the data is provided to third parties can often prove to be the most valuable means for protecting a process because such techniques are usually the only aspects of these implementations that are “customer facing” and easily identified when being copied by others.

In summary, while many of the technical challenges of storing and processing massive amounts of data have been addressed for years, only recently has the ability to capture, store, process and provide data across so many domains and with such speed been fully realized. As companies collect and build these databases and begin to recognize the value the data has not only to their business but to others, the need to protect such systems becomes even more important.

[2] See, for example, U.S. Patent No. 7,822,768, “System and Method for Automating Data Normalization Using Text Analytics”.

[3] See, for example, U.S. Patent No. 7,949,685, “Modeling and Implementing Complex Data Access Operations Based on Lower Level Traditional Operations”.

[4] See, for example, U.S. Patent No. 7,444,655,  Anonymous Aggregated Data Collection”.

0Court Limits Software Copyright Protection in Android Litigation

On May 31, 2012, the U.S. District Court for the Northern District of California held as a matter of law that the “structure, sequence, and organization” of Oracle’s Java Application Programming Interface (“API”) was not copyrightable because it consisted of functional material and names. The court’s decision obviated a jury verdict that Google had infringed Oracle’s copyright and is significant for several reasons:

  • It demonstrates the court’s willingness to reign in software copyright cases in which the plaintiff alleges infringement of non-literal elements, i.e., without actual copying of lines of software code. These are the cases typically enforced under the “structure, sequence, and organization” theory of infringement.
  • It explains the evolution of software copyright law in a single decision, including a reconciliation of the court’s findings with purportedly contrary case law.
  • The decision articulates a clear framework for limiting the copyrightability of computer interfaces.
  • Although the court explicitly denied that the “structure, sequence, and organization” doctrine was dead, its case law summary shows that the doctrine has been diminishing for decades. The appeal of this decision could either strengthen or further diminish the doctrine.

The District Court’s Reasoning in Oracle v. Google

The Java API is a set of pre-written programs in the Java programming language for carrying out various commands, such as determining which of two numbers is larger. Anyone is free to use the Java language to write a program, but the API is protected by Oracle’s copyright. Thus, when Google began developing its Android operating system for smartphones, it tried to obtain a license to the API. But negotiations fell through, and no license issued. Instead, Google wrote its own source code to implement the same functions that 37 API packages performed. It gave the programs the same names, arranged them in the same packages, and had them take the same input and output. Due to the manner in which the Java language works, this required that 3% of Google’s implementation of these API packages be identical to the Java API code. The district court found that, despite this literal copying, Google had not infringed Oracle’s copyright.

  • The court’s decision articulated several key principles of copyright law in the context of software copyright cases:
  • The merger doctrine means that if something can only be expressed one way, the expression of it cannot be subject to copyright protection.
  • Ideas, procedures, processes, systems, methods of operation and concepts cannot be copyrighted.
  • Names and short phrases are not copyrightable.
  • The fact that effort or investment was required to produce something does not make that thing copyrightable.
  • The scenes a faire doctrine means that functional elements essential for interoperability are not copyrightable because such elements are dictated by external factors, e.g., they are provided merely to ensure functional compatibility. 

The court found that the copied portions of the code were primarily functional. The nature of the Java language made it impossible to write a method that generated the same output from the same input without using the same or nearly the same expression. Of the copied portions, only the method and variable names could have been changed without affecting function. But because names are not copyrightable, the use of the names was not an infringement.

Certain Interfaces Are Not Copyrightable

The Oracle decision applies to a specific interface — the Java API. The court based its holding on a careful analysis of how that API worked and a factual determination that it could not accomplish the same function if it were expressed differently. Given the wide variety of interfaces that exist — from hardware buses to graphical user interfaces — this decision does not mean that interfaces are never copyrightable.

But importantly, Oracle shows that certain interfaces are not copyrightable, and it sets out a framework for deciding whether a particular interface is copyrightable. Courts should consider how an interface works to determine whether the same function can be accomplished in more than one way. If it can (and if the differences are not merely names or some other uncopyrightable element), the interface should be copyrightable. But if the only changes that can be made without harming functionality are to elements not subject to copyright, or if no changes can be made at all, then the interface is not subject to copyright.

Thus, software interfaces which can only function if the required inputs, outputs and keywords are put together in a particular way are much less likely to be found copyrightable. Similarly, an electronic interface, such as one that complies with a particular standard, is also less likely to be copyrightable, although nothing prevents a developer from copyrighting his version of software that implements a particular electronic interface. A user interface, in which a designer could choose, for instance, whether to have users click on buttons or type numbers to enter a PIN, is more likely to remain protectable by copyright.

An Appeal Is Likely, But Its Effects Are Uncertain

Oracle has announced that it intends to appeal, although as of press time it had not yet filed a notice of appeal. The deadline for Oracle to file will depend on when the district court decides a pending motion.

An appeal will most likely go to the Court of Appeals for the Federal Circuit, because Oracle’s complaint included claims for patent infringement as well as copyright.[1]  The Federal Circuit applies regional circuit law to non-patent issues, however, and therefore Ninth Circuit law will apply to the copyright issues. Regional circuit law, rather than Federal Circuit law, also becomes the binding authority for district courts on non-patent issues. And the Federal Circuit’s ruling will be merely persuasive, not binding, on the other circuits as well. This is not to say that a Federal Circuit opinion will carry no weight. As the first appellate court to address the issue on such a complete and articulate opinion, any Federal Circuit decision will likely be quite influential, particularly to courts in the Ninth Circuit and other circuits across the country.

The District Court Presented a Strong Case . . .

The district court opinion, authored by Judge William Alsup, is laudable for its exceptional clarity. Judge Alsup’s explanation of the technology at issue could be a model for introductory courses in computer science. He provides a detailed summary of the law of software copyright that belongs in every copyright casebook. His reasoning is clear and well-supported. In short, this opinion begs to be affirmed.

Appellate courts rule on orders, not the judges who write them, but Judge Alsup’s impressive track record is also worth noting: nearly 75% of his decisions that have been appealed in the last five years have been affirmed in full. To provide some perspective on this statistic, by contrast, in 2001 through 2010, the Federal Circuit’s rate of affirming patent infringement decisions in full was only around 55%.[2] In addition, Judge Alsup has an unusually strong technical background for a federal district court judge, including coding experience.

It will be difficult to attack that aspect of the court’s opinion which rests on the doctrine that names and short phrases are never copyrightable. Under federal regulations, “[w]ords and short phrases such as names, titles, and slogans” are “not subject to copyright.” 37 C.F.R. § 202.1. And the Ninth Circuit — like most courts — acknowledges this limitation on the scope of copyright. Thus, while the Copyright Act itself is silent on the issue of whether words and short phrases may be protected, the Federal Circuit, applying Ninth Circuit law, would be hard-pressed to say they are.

. . . But There Is No Guarantee the Federal Circuit Will Affirm

The portion of the opinion resting on the functionality of the interface is on shakier ground for several reasons. First, the law is less clear. The opinion acknowledges tension between early software copyright cases, which more enthusiastically protected “structure, sequence, and organization,” and recent cases, which have not used the phrase. Yet, as this opinion notes, “structure, sequence, and organization” is not a dead doctrine.

Second, Oracle may argue that Google did not have to organize its methods in the same groups as the methods in the Java API. The district court’s answer to this argument was that programs written in Java use the hierarchical location of a method to call that method; thus, the hierarchical organization is necessary to ensure interoperability between Android and millions of existing lines of code in Java. Yet Java programs are not entirely compatible with Android, as Google used the interfaces from only 37 of the 168 Java API packages. The Federal Circuit might question the necessity of using the hierarchical organization in light of that choice.

The appellate court may also consider the issue of nonliteral infringement and feel compelled to rationalize this decision with other non-literal copyright doctrines. Courts have long held that a book, for example, need not be copied word-for-word to be infringed if the characters and plot are the same. Thus, while the district court opinion focused on the three percent of code that was literally the same, on appeal the Federal Circuit might focus on the code in its entirety to determine whether it was substantially similar, although not literally identical.

Of course, the Federal Circuit could avoid the issue entirely. In addition to arguing that the API was not copyrightable, Google raised equitable defenses such as laches and implied license. The Federal Circuit could apply any of these defenses and dismiss Oracle’s copyright claim without ever reaching the question of copyrightability.

What the Holding Is Not

Despite some commentary to the contrary, this decision is not an “open source” victory. From a legal standpoint, open source refers to a form of licensing. Open source software is not immune from copyright laws; rather, the owners of copyright for open source software choose to license it openly, albeit with certain conditions. Typically, the license grants rights to copy, modify and even sell the code but requires that the licensee grant the same rights to downstream users of the original code and make available any modified source code. In this case, licenses were not at issue. Thus, this is not a decision about whether open source software is per se subject to copyright.

This decision also does not give carte blanche to the public to copy Java API packages. Rather, it permits anyone who wishes to write an original implementation of the same functions to use the same section of code as Google did — specifically, the method headers.


  • Copyright does not provide complete protection for interfaces. Interface developers should explore patents or contractual agreements to protect against non-literal infringement.
  • If this decision is affirmed, organizations wishing to use or implement interfaces without permission of the developer should seek legal advice as to whether the type of interface at issue is copyrightable.
  • Software copyright owners will need to more carefully assess the risk of pursuing non-literal copying theories in software copyright enforcement actions. 

[1] If all patent claims were dismissed without prejudice, the Federal Circuit would have no jurisdiction and the appeal would instead go to the Ninth Circuit.

[2] See Jason Rantanen, “Federal Circuit Statistics – FY 2011,” Patently-O (Oct. 26, 2011).

0Sounds Like Teen Spirit: Musicians and Advertisers at Odds Over Use of Sound-Alikes in Advertising

An increasingly important source of revenue for songwriters and performing artists has been licensing their compositions or sound recordings for use by commercial advertisers. Even when such advertising licenses are not particularly lucrative for the band, the use of its song in a national advertising campaign can be a valuable source of exposure. At the same time, advertisers have recognized the potential value of associating their brands with a particular band or musical work, or using a work to conjure a particular mood or theme and make the advertising message more powerful and memorable. According to one recent article, “[i]n recent years in a relentless quest for young customers, ad agencies have begun trolling among buzzed-about indie rock groups for musical ideas to use in ads for, say, cars and restaurants.”  James M. McKinley, Jr., “To Singers, Ad Sounds Too Familiar,” New York Times (June 7, 2012).

On some occasions, however, advertising agencies may resort to creating sound-alike tracks when a band declines to allow its music to be licensed for advertising or asks for more money than the advertiser is willing to pay. Several controversies have arisen recently over the use of such “sound-alike” music in advertisements. As recounted by Alex Scally of the indie rock duo Beach House (regarding the  incident described below), “it feels like something close to what we have made. A feeling and a sentiment and an energy has been copied and is being used to sell something we didn’t want to sell.”  Id. In that instance, Beach House raised objections to a Volkswagen advertisement that used music alleged to be highly evocative of its “dream pop” musical style and, in particular, its song “Take Care.”  An advertising agency had previously sought a license to use that song for the Volkswagen commercial, but the band is reported to have repeatedly declined the offer.  Volkswagen has denied that the song used was purposely made to imitate the Beach House song. In another incident, the rock duo The Black Keys reportedly sued Pizza Hut and Home Depot for using what they have alleged to be their single “Gold on the Ceiling” in commercials.

Numerous other such instances have arisen over the years, including the alleged use by a Spanish sporting goods company of a song that sounded like Fleet Foxes’ “White Winter Hymnal,” and Audi’s alleged use of a song that sounded like one from Sigur Ros. Other artists or their music publishers, such as Santana and Eminem, have likewise protested the use of sound-alikes in commercial advertisements. As discussed below, in the United States, the most prominent cases dealing with the potential liability of advertisers involved a Frito-Lay advertisement using a Tom Waits sound-alike and a Mercury automobile advertisement that used a Bette Midler sound-alike. It should also be noted that, when an advertisement is shown outside the United States, the laws of those countries may provide the artist with broader rights against infringement.

Such allegations raise interesting questions of liability, because federal copyright and trademark protections often do not prohibit such uses and, moreover, federal copyright laws may be held to preempt state law claims in some instances. While the federal copyright statute precludes the unlicensed synchronization use of a musical composition or sound recording, the statute generally does not prohibit another band’s performance of a different composition that simply resembles the overall style and mood of the plaintiff artist. It is not difficult to imagine over the years numerous recording artists and songwriters who achieved fame for themselves even though their style was highly evocative of another who came before them. To show infringement of  a musical composition would generally require a showing of copying through a substantial similarity analysis (in essence, akin to plagiarism) involving a note-by-note analysis usually undertaken in legal proceedings by expert musicologists.

Further, the copyright statute provides a mechanical licensing scheme that generally permits the creation of cover versions of compositions without any need for a permission from the prior sound recording artist. Thus, when the songwriter and recording artist are different parties, as long as the songwriter has licensed the use of the work for an advertisement, to preclude a second recording merely because it sounded like the first artist’s recording of the same song would seem to conflict with the statute. Indeed, the legislative history for Section 114(b) of the Copyright Act states, “[m]ere imitation of a recorded performance would not constitute a copyright infringement even where one performer deliberately sets out to simulate another’s performance as exactly as possible.”  H.R. REP. NO. 94-1476, at 106.

Likewise, trademark law is generally not sufficient legal protection for the style or sound of a musical performer. Trademark law likely will not provide a cause of action based on a use expressly permitted by the copyright statute, such as when an advertiser has obtained synchronization rights for the use of a musical composition. See, e.g., Butler v. Target Corp., 323 F. Supp.2d 1052, 1058-59 (C.D. Cal. 2004) (musical composition cannot be protected as a trademark; “[a] contrary conclusion would allow any copyright claim for infringement of rights in a musical composition to be converted automatically into a Lanham Act cause of action”) (citation omitted); Davis v. Trans World Airlines, 297 F. Supp. 1145 (C.D. Cal. 1969) (no cause of action existed over broadcast commercials imitative of plaintiffs’ recorded performance where defendants had acquired license to use composition); but see Waits v. Frito-Lay, 978 F.2d 1093, 1106-11 (9th Cir. 1992) (affirming verdict that use of sound-alike created false association between plaintiff and products advertised, but vacating damages as duplicative of right of publicity claim).

As a result, plaintiffs in cases of this nature have more commonly relied on a common law right of publicity claim. Both the Waits and Midler cases were decided in favor of the artist based on state right of publicity grounds, which were held not to be preempted by the federal copyright statute. In the Midler case, in 1988 the Ninth Circuit Court of Appeals held that vocalists have common law property rights to their distinctive voices. Midler v. Ford Motor Co., 849 F.2d 460, 463 (9th Cir. 1988). Like the allegations made in the Beach House controversy, the Midler case also involved a situation in which the advertising agency sought Midler’s permission but was refused. Id. at 461. The agency then enlisted Midler’s back-up singer who, according to the allegations, was played Midler’s version of the song and instructed to imitate it. Id.

The Ninth Circuit noted that the use of the song was licensed, and therefore the copyright statute provided Midler with no remedy on that point. Also, the parties were not in competition (and therefore the state unfair competition statute provided no remedy), and, because the commercial did not use Midler’s actual voice, California Civil Code Section 3344 (which generally prohibits the unlicensed use of a person’s voice or likeness in commercial advertising) also was of no assistance to Midler. Id. at 462.

The court held, however, that summary judgment for the advertiser was improper, because the California common law right of publicity recognized “an injury from ‘an appropriation of the attributes of one’s identity,’” which included the imitation of Midler’s recognizable voice. Although the court cautioned that not every imitation of a voice to advertise merchandise was actionable, it held that “when a distinctive voice of a professional singer is widely known and is deliberately imitated in order to sell a product, the sellers have appropriated what is not theirs and have committed a tort in California.”  Id. at 463. As the court stated:

Why did the defendants ask Midler to sing if her voice was not of value to them?  Why did they studiously acquire the services of a sound-alike and instruct her to imitate Midler if Midler’s voice was not of value to them?  What they sought was an attribute of Midler’s identity. Its value was what the market would have paid for Midler to have sung the commercial in person. Id.

Four years later, the Ninth Circuit held that the use of a sound-alike to perform an advertising jingle was an infringement. Waits, 978 F.2d at 1100-04. In the Waits case, the artist won a judgment in excess of $2 million (comprised mostly of a punitive damages award) from Frito-Lay for mimicking his voice in Doritos’ ads. Like the Midler case, here, the defendant had attempted to find a singer who could accurately imitate Waits’ distinctive voice. Id. at 1097-98. The court stated that while a vocal style “per se” is not protectable under the right of publicity, the imitation of a distinctive voice that identifies a person is.

The Midler and Waits cases serve as important reminders that state right of publicity claims may impose liability, even where federal copyright and trademark laws do not. Such claims, however, raise complex questions of possible preemption by the federal copyright statute as well as the factual issue of when a performer’s voice or combination of voices is distinctive. Moreover, it should be noted that while most states recognize a right of publicity, not all states provide that their respective right of publicity laws extend to vocal imitations. See, e.g., Romantics v. Activision Pub., Inc., 574 F. Supp.2d 758, 764 (E.D. Mich. 2008) (Michigan common law right of publicity has not recognized a right of publicity in the sound of a voice or combination of voices, even if distinctive). Further, the analysis in these cases is likely to be complicated when it is the imitation of a musical style or genre of a musical group (especially if purely instrumental) that is at issue, rather than a more simple imitation of a recognizable singer’s distinctive voice. Courts will likely be cautious against extending the right of publicity in such a way that it would give a band an effective monopoly to a particular musical style, sound or genre.


Advertisers and their agencies should use caution and carefully consider the legal issues when contemplating the use of sound-alike music for an advertisement. The opportunity to engage with the consumers’ buzz and enthusiasm for a musical performer or a particular song is a powerful lure, and when the artist refuses the license it may be tempting to resort to a sound-alike musical track. While this may seem to be an inexpensive way to appeal to a particular demographic, an accusation of musical plagiarism may engender negative reactions among the very consumers the advertiser seeks to appeal to, and result in legal consequences. Even when federal copyright and trademark laws are not likely to provide a legal basis for an artist’s infringement claim, the possibility of state common law protections, such as the right of publicity, may remain viable.


“Federal Circuit Refines Willful Patent Infringement”
The New York Law Journal
July 25, 2012
Author: Parker Bagley

“When HIPAA Is Not Enough: Tougher Texas Privacy Laws”
May 14, 2012
Authors: Jackie Klosek, Julia Holczer, Achal Oza

“Combating Trade Secret Theft Abroad Through Legal Action At Home”
Executive Counsel
May 2, 2012
Author: Michael Strapp