Attribution v. Citation

There's an interesting tweet about attribution in the data web. And it raises a tension I run into a lot but haven't seen a lot written about: the shifting nature of what the word "attribution" means.

We have a fairly common understanding of attribution in our daily lives: credit where credit is due is mine, and it tends to be what most people think. This is whether one is a musician, a scientist, a teacher, or anyone who does creative or innovative work. We like getting credit for our work. No problem there.

This idea of attribution encompasses the idea that we should get credit for our ideas. That if I'm the first one to realize that a certain gene knockout cures death, that the idea is linked to me forever. Like we link Watson and Crick to the DNA discovery. In this sense, attribution is very similar to the scholarly concept of citation.

However, the word "attribution" in a copyright license is a different beast. It even has a different wikipedia entry (which I did not create, and have not edited, despite my temptation!). I don't like the first sentence a lot, because it's not clear that in copyrights, attribution is something that gets triggered by the making of a copy - not by the use of the ideas in the copyrighted work.

This is the thing about the law. It's narrow in a lot of places. And it's often not what we think it is. Mainly because it was written by lawyers, not regular people.

Let's look at the legal code of the Creative Commons Attribution license. It's interesting.

The license grants the following rights:

- to Reproduce the Work, to incorporate the Work into one or more Collections, and to Reproduce the Work as incorporated in the Collections;
- to create and Reproduce Adaptations provided that any such Adaptation, including any translation in any medium, takes reasonable steps to clearly label, demarcate or otherwise identify that changes were made to the original Work. For example, a translation could be marked "The original work was translated from English to Spanish," or a modification could indicate "The original work has been modified.";
- to Distribute and Publicly Perform the Work including as incorporated in Collections; and,
- to Distribute and Publicly Perform Adaptations.

See? It's about reproducing the work, adapting the work, and distribution. I don't need these rights to read a work, or study a data set, and take the ideas in the work or the data set. I only need them to make copies and derivatives. The law doesn't allow ideas or facts to be covered by copyright. But don't take it from me, take it from the US Government:

"Copyright does not protect facts, ideas, systems, or methods of operation, although it may protect the way these things are expressed."

Now, because copyright doesn't protect these things, "attribution" in the sense of the license doesn't apply to ideas or facts either. Those rights above are conditional on my compliance with the terms of the license. Section 4 of the legal code lays out those conditions. If I fail to provide proper attribution, I lose the right to make and distribute copies and derivatives. I do NOT lose the right to "steal" the ideas in the article and claim them my own, because those ideas are not subject to copyright, and cannot be made subject to the attribution requirement.

This is where understanding that to the law, attribution is a very specific term of art, which is very different from what we think casually and commonly. Citation is much closer to the way we think than what is enabled in public copyright licenses or, for that matter, private copyright licenses.

This is why we recommend waiving attribution in the Science Commons protocol for open access to data. It's a narrow legal term that can screw with interoperability, while at the same time failing to provide what people really want, which is credit where credit is due.

Puneet Kishor, one of our fellows, got it right. We shouldn't use the law to make it hard to do the wrong thing. We should use technology to make it easy to do the right thing.

When it comes to data, and in particular to data interoperability, enabling citation and provenance that is easy to track and cite will serve the scientific goals far better than an attempt to port open source "principles" into a world where they fundamentally don't fit.

More like this

Yup. Copyright is (originally) specifically about the right to print off a copy of an artistic work and sell that copy. It's about sellable recordings on physical media. To keep up with the new realities, the meaning of the word has been stretched beyond all recognition.

What the world needs is for us all to ditch the notion of copyright as such and invent new words for rights to the results of intellectual and creative effort.

Note that the phrase "intellectual propertry" is an attempt to bypass this much-needed discussion. It's an attempt to *define* ideas as property - like my boots, or my car - and to by that definition sneakily present the usual ideas about property as nessesarily applying to ideas.

We need to discuss and argue as to whether video piracy (for instance) really is "theft", in any meaningful sense.