Indie Preserves
  • Preservation Tips Blog
  • Physical Preservation
  • Digital Preservation
  • About Indie Preserves

Find Out How to Save Your Label's Stuff

Seriously, take a moment to learn something. Or, if you really need help, you can always...

Contact Us

Embedded Booglaoo: Audio Metadata II: The Batching

6/22/2016

0 Comments

 
Embedded Boogaloo is an Indie Preserves series that looks at different kinds of embedded metadata formats for the digital file formats most likely to be used by music labels.

Scott here. When last we talked about embedding metadata in your label’s digital content, we talked about PDFs, photo metadata, and before that, audio metadata. Today we’re going to return to the world of audio meta tags and talk about ways that you can speed up your own metadata processes.

In our previous coverage, we talked about the many varied forms of metadata, pursuant to the particular file type being saved. To recap our larger points:

  1. Embedding metadata in audio files = a way to hedge your bets for future metadata readability.
  2. Making meta embedding a regular practice = highly recommended.
  3. A plethora of graphical user-interface music taggers (both open source and proprietary) exist.

If you read between the lines on that piece, we were subtly aiming at people just starting off with a label, or those without much audio data to steward.

But what happens if you have --  as we have previously termed it -- a buttload of data that needs to be cleaned (or a buttload of files without any tags at all)? Starting from scratch is a daunting process that can really put you off the job.

Luckily, when it comes to humans, we’ve pretty much wrote the book and bought the t-shirt when it comes to batch-automation of otherwise boring work, and metadata automation is just one example. There are a couple ways to do it -- with some more technically inclined than others -- but we’ll show you how to get started.

WAVs
When it comes to WAV metadata, BWF MetaEdit is the gold standard for the de facto high-rez audio standard. While some freeware (and many hostageware) programs claim to do imports and exports, most will only export the given metadata from a directory. The great thing about BWF MetaEdit is, once you’ve exported and sussed out what metadata needs to be fixed (or in most cases, added from scratch), you can upload the same spreadsheet of data back into BWF ME to overlay on your WAV files.

To wit:
Picture
From left to right: export the core (WAV) metadata from BWF to a CSV file, then perform some painful metadata cleaning, and finally re-import your audio metadata using BWF.
If you want some hints as to what goes into editing such metadata, I would recommend downloading Open Refine, an open source application that helps you clean messy metadata.
​
MP3

The same principle can be done with MP3 files -- god knows you won’t be saving your master audio as MP3s (riiiiight?) but if you offer complimentary digital downloads, it’s probably a good idea to embed those meta tags for your paying customers. (It’s just common courtesy!) Or more importantly, if you send off MP3s as promotional or review copies, I can’t think of anything more embarrassing than asking people to give your label’s music a chance, only to find out that your digital audio is as bereft of information as a redacted CIA dossier.


Luckily, I got you covered. Over at GitHub, I’ve posted a fairly simple Python script that will allow you to revise and embed the ID3 tags in your MP3s in almost exactly the same way as BWF MetaEdit (except without the snazzy graphic user interface). You might need to ask one of your nerdier friends to help you install Python on your computer (along with Mutagen, a Python library for editing digital audio metadata), but once you have it up and running, it’s a breeze.
0 Comments

Embedded Boogaloo: PDFs & Exiftool

8/7/2015

0 Comments

 
Picture
Embedded Boogaloo looks at different kinds of embedded metadata formats for the digital file formats most likely to be used by music labels.

Adobe's Portable Document Format can contain two types of metadata formats. The first is the Document Information Dictionary, a set of fields such as author, title, subject, creation and update dates which have been a part of the PDF file for (almost) ever. When you view the document properties in Adobe Reader, these are the metadata fields you’re looking at:

Picture
The metadata in these fields can be expanded with the Extensible Metadata Platform (aka XMP, also mentioned above in our article on Photo metadata). Note that this extended metadata isn't in the standard Adobe PDF reader, but when you view PDF properties in the full Adobe Acrobat application:
Picture
If you click on Additional Metadata in Acrobat...
Picture
...Here be dragons!
Going Deeper (Warning: Technical Crap Ahead)

Programs such as Microsoft Word, Adobe InDesign and Photoshop allow users to embed metadata when saving or exporting to the PDF format. But what about after the file is created? It 
should be noted that PDFs are "read only" by default, and as such, you won't be able to edit this metadata unless you have access to a commercial editor, like Adobe Acrobat. And what about making changes to large batches? 


Like the other formats, there are plenty of tools and workarounds (both free and not-so-free) that can help you with embedding PDF metadata. But as you go deeper into a metadata k-hole, you may find some degree of satisfaction in leveraging the Do-It-Yourself ethic by… uh, doing it yourself.

At our library, we use a great (and free!) command-line tool for reading and manually changing PDF file metadata called Exiftool. ExifTool is platform-independent, so it'll work on a range of operating systems -- we've used it on projects using both Mac and Windows platforms and the environment is (more or less) comparable.

Exiftool will be scary to some because, as we said, it's a command-line tool. But if you know what you’re doing -- and relative command-line skills aside, Exiftool is pretty easy to use -- you could take metadata editing into your damn own hands, either for one file or many. (If you want to jump in head first try University of Surrey's UNIX Tutorial For Beginners.)

Let's say, for example, you wanted to add a name and a title to a PDF that's on a PC. If you have Exiftool already installed, you could open a command window in the folder containing the PDF and type in the following command: 

exiftool  -Title="Proposal"  -Author="Indie Preserves"  -overwrite_original  Proposal.pdf

Picture
The command simply tells Exiftool to overwrite key pieces of PDF metadata tags on the local file "Proposal.pdf." 

We use this process on hundreds of PDFs at a time before PDFs are ingested into our institutional repository. We start with a spreadsheet that automatically copies over file names and descriptive information from another sheet, creating individual Exiftool commands:
Picture
All of this is copied to a text file and saved as a .BAT file, an executable format that runs the commands sequentially as they appear in the file. We save that BAT file to the folder of PDFs, run it, and let Exiftool overwrite the necessary metadata.

PDFs aren't the only format Exiftool works -- it will read metadata from over a hundred different file types. (Writing metadata on these files is a different story; you'll want to check out the supported types to see what you can do.) 

Beyond PDFs, Exiftool was born to run on image files, which means you can read and edit formats such as TIFF, Jpeg, PSD, PNG, Panasonic RAW, and many others. AVPreserve has a great blog post on analyzing embedded image metadata using Exiftool. For those of you not ready for command line prime-time, two different GUI apps (ExifToolGUI and pyExiftoolGUI) exist to shepherd you through the editing process (though you will still need to have Exiftool downloaded somewhere on your computer).

Metadata enrichment takes a bit of planning and practice, but whatever your technical level, you have the power to make your files as detailed as you want them to be. Go get 'em, tigers!

Hat-tip to David Riecks for the AVPreserve tutorial -- be sure to visit his site Photometadata.org!
0 Comments

Embedded Boogaloo: Photo Metadata

8/5/2015

0 Comments

 
Picture
Embedded Boogaloo looks at different kinds of embedded metadata formats for the digital file formats most likely to be used by music labels.

After audio files, we arrive at photos. At a time when people can feasibly keep thousands of photos on their phone, it becomes ever-so-important to make sure you know what-was-taken-when-and-where. (And, just as importantly, who took the photo, and does that person own the rights to it?)


Learning about photo metadata could be ugly, but fortunately, there a number of great web sites and advocacy groups spreading the word about embedding metadata in your photos.
PhotoMetadata.org -- a joint project between the Stock Artists Alliance and the Library of Congress -- offers free tutorials (in web content, PDFs, or videos) for embedding metadata using Adobe Photoshop, Adobe Bridge, Camera Bits Photo Mechanic and Microsoft Expression Media. As PhotoMetadata.org points out, JPEG, TIFF, Photoshop, and Raw file formats can include several types of metadata in the same file: IPTC-IIM, IPTC Core & Extension, PLUS, Extensible Metadata Platform (XMP), Exif, Dublin Core, and so on. Eager-beaver readers can learn about each of these in greater detail by going through the Guidelines for Handling Image Metadata, published by the Metadata Working Group.

The rest of you, shuddering at the idea of reading the 70-page guidelines, are probably asking for us to recommend a schema. The American Society of Media Photographers says IPTC Core and Extension (originally designed by the International Press Telecommunications Council) is the baseline schema used by image editing and cataloging software to describe the content and ownership of the pictures; the latest version of the schema was released in 2014.

So what do you record? Honestly, that’s up to you. The IPTC and IPTC Extension fields provide an embarrassment of photo metadata -- so much that it might be overwhelming. As we’ve said previously, your best bet will be to select the most pertinent chunks of metadata and apply them consistently throughout all of your photos. 

Picture
IPTC Extension fields in Photoshop.
(In addition to embedding metadata, the Library of Congress recommends naming your photo files with a brief description and the date of the photo. After all, chances are you won't remember what was taken when in 10 years...)

Embedded Boogaloo will wrap-up later this week with PDF metadata and some advanced editing techniques. Stay tuned.
0 Comments

Embedded Boogaloo: Audio Metadata

8/3/2015

0 Comments

 
Picture
Embedded Boogaloo looks at different kinds of embedded metadata formats for the digital file formats most likely to be used by music labels.

Our first Embedded Boogaloo is a no-brainer. Probably a label's most important digital file, chances are most of you already have experience with audio metadata. Even if you're new to putting out music, most of us have used iTunes in the past 15 years and know what a pain in the ass it is to find out our music files didn't come with any metadata. Of course, most labels will be primarily concerned with master audio files instead of MP3 and M4A files -- according to the survey we conducted earlier in the year, the most popular choice is the WAV format, but some also reported storing their master audio as FLAC and AIFF.

Let's start with the go-to format. According to the WAV Metadata Guide, there is a misconception that embedding metadata will “break” WAV files; on the contrary, the format can hold several different kinds of metadata formats and any “well-behaved WAV reader” should handle them. Thus, the kind of metadata you choose to embed depends on what program you anticipate using to "read" the file's metadata.

For example, the seven default metadata tags in the open source Audacity software (Artist, Track Title, Album, Track Number, Year, Genre, and Comments) can be exported into WAV files, and are done so in two formats: LIST-INFO and ID3 tags. LIST-INFO was the original specification for WAV audio and AVI video published in 1991; ID3 is the de facto standard for MP3 metadata. So why are both exported into each file? It’s a way to hedge your bets for future metadata readability: “Many player programs cannot read LIST INFO tags, but applications that can read ID3 tags in WAV files will be able to read the ID3 tags instead.”

AIFF files can have embedded ID3 tags as well. FLAC files, on the other hand, use the Vorbis comment metadata container. So w
hen enriching your metadata (or adding it from scratch), your best bet is to pick a tagging schema and stick with it consistently for each format -- for example, using embedded ID3 tags in all of your master WAV audio rather than switching between ID3 and LIST-INFO. (We’ll be hammering on consistency for the rest of the Embedded Boogaloo posts, so remember that word well.)

In terms of actually embedding the metadata, a plethora of graphical user-interface music taggers (both open source and proprietary) exist. We recommend finding an open source program that will provide one-stop embedding for all of your file formats. Some options include:
  • Kid3, an open-source ID3 tag editor for FLAC, WAV and AIFF files (Mac/Windows/Others)
  • BWF MetaEdit, a tool specializing in editing INFO metadata in Broadcast WAVE Format (BWF) files, but also supports traditional WAV files (Mac/Windows/Others)
  • IDTE Tag Editor works with FLAC Vorbis and WAV INFO tags, but also supports "forced" ID3 tag editing to any file (Windows/Linux)

Next, we'll cover photo metadata. Stay tuned.

0 Comments

The Importance of Earnest Metadata Part 2: Embedded Boogaloo

7/31/2015

0 Comments

 
For the sake of argument, let’s say we’ve convinced you that not only is robust metadata important for your label’s digital material, but that your files could stand to have a better pedigree than they do now.

Great! So now what?

Today, we're going to talk about enriching the metadata of your label’s digital material. With metadata, you can go as deep as you want, providing as much detail as humanly possible, but as you assert more control over your metadata, the time you spend doing it increases. We know that a great many of you are busy people with busy lives, on top of working as the sole person behind your label, and don't the time to fully commit to a project like improving the metadata of your files. For some people, a descriptive file name is all they need. Others have set up spreadsheets with file names and directory paths of the material they want to record information about. The spreadsheet method, they say, allows you to decide what’s important about the files and record that information in an easy, low-impact method.


We get it. We commiserate.

But.

We want to make it very clear that we do not advise these methods of recording metadata. In fact, we strongly advise that you do not keep track of your metadata these ways. For one thing, file names have a preset limit on how many characters they can have. On top of that, production hard drives are in a constant state of usage, meaning that any directory path recorded in a spreadsheet is impermanent. (Plus, all it would take to wipe out your metadata is the accidental deletion of a single file.)

By themselves, these methods are an eyelash above a waste of time. If you’re going to commit to enriching your files, commit to it by embedding your metadata.

Embedded metadata is exactly what it sounds like: information embedded within the file it describes, travelling with it for (hopefully) its entire life. The website Embedded Metadata Manifesto says: “In the online world, there can be many copies of a single image or video file, and with millions of images or videos on the internet, metadata is essential for identification and copyright protection. We should ensure this metadata travels with the content as a digital label, and remains with it over its lifetime.”

You can tell librarians are serious about metadata -- we have a manifesto.
Picture
Right about now, we can hear a few of you out there rolling your eyes at what sounds like a lot of work -- and it most likely will be. But as David Riecks at the Controlled Vocabulary blog has argued, the time it takes to embed metadata (especially copyright and contact info) is worth the 'trail of breadcrumbs' you create, either for yourself or anyone else who might need to access your data.

So, we’re going to embed our metadata -- problems solved, right? Nope. Different file types carry different methods and types of internal metadata. Sometimes, as we’ll see, there’s far more than a single useful method (or schema) in embedding metadata.

Next week, we'll show you different ways how to embed metadata into the major kinds of files that most labels use: audio files, photos, and PDFs.
0 Comments
<<Previous

    Who are we?

    Just a couple of library professionals. One of us specializes in special collections; the other in metadata. We both care passionately about preservation -- be it physical objects or files on a hard drive. We also care about music -- especially the music being made by local bands and musicians recording out of their bedrooms.

    Categories

    All
    Archives In The News
    Digital Preservation
    Disaster Planning 101
    Embedded Boogaloo
    Freesources
    Interviews
    Metadata
    Physical Preservation
    Preservation Planning
    What You Don't Need
    What You Need

    Archives

    February 2016
    January 2016
    December 2015
    November 2015
    October 2015
    September 2015
    August 2015
    July 2015

    RSS Feed


    Indie Preserves is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
    Indie Preserves is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
    Indie Preserves supports the Embedded Metadata Manifesto!
    Indie Preserves supports the Embedded Metadata Manifesto
✕