FADGI’s embARC: Extending embedded metadata support and validation for DPX and MXF files
Today’s guest post is from Kate Murray, Digital Projects Coordinator in Digital Collections Management and Services at the Library of Congress and Bertram Lyons, Partner and Managing Director for Software at AVP.
Note: This is the last in a series of updates from the Federal Agencies Digital Guidelines Initiative (FADGI) Audio-Visual working group. See That’s Our Cue! Updates for the FADGI Embedded Metadata Guidelines and BWF MetaEdit for the Cue Chunk in Broadcast Wave Files and Reading the (Same) Signals: Using FADGI’s ADCTest for Quality Control in Outsourced Audio Digitization for the previous installments.
embARC, short for “metadata embedded for archival content,” is a free, open source software application that enables users to audit and correct embedded metadata to comply with FADGI guidelines for DPX (Guidelines for Embedded Metadata within DPX File Headers for Digitized Motion Picture Film) and MXF (SMPTE RDD 48: MXF Archive and Preservation Format) files (figure 1).
DPX, short for Digital Picture Exchange, is a pixel-based (raster) file format intended for very high quality moving image content with attributes defined in a binary file header. MXF, short for Material Exchange Format, is an object-based file format that wraps video, audio, and other bitstreams (“essences”), optimized for content interchange or archiving by creators and/or distributors, and intended for implementation in devices ranging from cameras and video recorders to computer systems.
embARC was first released in 2019 and is developed with support from the Library of Congress and FADGI (Federal Agencies Digital Guidelines Initiative) and in collaboration with AVP and PortalMedia.
Recent development in 2020-2021 has expanded the scope of embARC to meet the evolving user needs and workflows of the audiovisual preservation community. This most recent release is an important milestone, introducing the first beta release of the CLI as well as the first official release of the GUI, which now includes functionality for the MXF file format.
New CLI beta version released
While developing the CLI (command line interface) version was already in the project work plan, discussions with local and international colleagues seeking to integrate embARC’s robust functionality to fill gaps in digital preservation workflows caused us to prioritize this work and bump it up in the timeline. Give the people what they want, FADGI says!
The CLI version allows users to include embARC services in automated and logical workflows more easily without requiring user interface interaction. The CLI allows users to include embARC services in automated and logical workflows with other applications to achieve specific tasks and move the files on to the next step more easily. For DPX files, embARC can support individual DPX files or an entire DPX sequence while not impacting the image data. For MXF files, which have much more complex metadata than DPX, embARC supports single file analysis. For this blog post, we’ll look at the CLI functionality for DPX files but a more complete explanation, including MXF files is available in the embARC CLI User Guide.
embARC CLI users can read and extract metadata from selected files. The output includes first a summary of total files processed, total files that were DPX format, and any non-DPX files found. This is based on triage format detection and will provide feedback if a non-DPX file is passed into the argument instead of a DPX file.
When users request to process a sequence of DPX files, the summary result also includes the results of nine custom tests that embARC runs to produce boolean (pass/fail) outputs. These tests look for file sequencing and duplication errors, as well as file truncation errors, and these are reported to the user.
After the summary results, the output articulates the file metadata for the single file that was processed or a comparative metadata analysis for all files if a sequence was processed. The information is structured according to the standard SMPTE structures as defined in the DPX specification (ST 268). The data is delivered in three columns: byte offset from byte 0, field name, field value. Users can also have this data written to a target file in JSON or CSV format for use in other applications.
For sequences, the comparative analysis looks at each metadata field/value present in each file and compares to give you a quick view of the static values across the sequence and to simultaneously provide flags where there are fields with multiple values so that you can evaluate in the CSV or JSON output, if desired.
Another new feature we have added for CLI is the process of conformance checking for one or more DPX files. With the optional conformance checking flags, embARC allows users to submit a set of rules using a conformance JSON template (see example below, figure 3). embARC will evaluate one file or a sequence of files based on the submitted rules and will provide a summary conformance report in the terminal as well as a CSV list of test results for each file and each test evaluated.
The embARC conformance template is a JSON document with the following elements.
- Rules – an array that contains all Rule objects to be evaluated.
- Rule – each Rule is an object that contains a Column, Operator, and Value property.
- Column – this property specifies the metadata field to target for the Rule. See Appendix B for a controlled list of possible column values.
- Operator – this property specifies the evaluation operator for the Rule. For example, Min, Max, or Equals, or Contains. See Appendix B for a controlled list of possible operators.
- Value – this property provides the value that will be evaluated against for the particular Column and Operator in the Rule.
Following (figure 4) is an example valid conformance rule set:
The conformance document can contain as many Rules as needed as long as they follow the pattern consistently.
As a result of conformance testing, embARC provides a summary result in the terminal that includes a count of files tested and files failed, as well as a list of files that failed any test. Additionally, embARC outputs a CSV file containing a row for each test carried out and the result (PASS/FAIL).
Examples for using the CLI version of embARC, including MXF workflows, are available in Appendix C of the CLI user guide.
embARC GUI version now includes MXF!
The second major update is the inclusion of the MXF format in the embARC GUI (graphical user interface) version which, along with existing DPX functionality, now also supports the FADGI sponsored SMPTE RDD 48: MXF Archive and Preservation Format guidelines. Adding MXF functionality was a big ask as MXF is a data-rich and metadata-complex file format.
With this new expanded support in embARC, the user interface has changed so that users are now presented with a splash page to load files upon launching the application. Because embARC’s UI (user interface) supports DPX and MXF in different ways, the splash page gives the application an opportunity to identify the desired file format from the user before launching the full UI. Users can now load DPX or MXF (but not both at the same time!) and the system will select the appropriate UI to show.
In the new MXF UI, embARC supports reading of the following MXF file structures: track descriptors; AS 07 Core Descriptive Metadata Scheme (DMS); Text Data Generic Stream Partitions (GSP); and Binary Data GSP present in the MXF file. Clicking on different tabs will display the fields present in that file section. Note: In-depth explanations for these terms are available in SMPTE RDD 48: MXF Archive and Preservation Format in section 4, Acronyms and Terms.
embARC reads a variety of track “descriptors” or metadata about picture essences, audio essences, and data essences. The Descriptors section of the embARC GUI user guide lists all the currently supported descriptors. These descriptors are “read only” in the UI and cannot be edited.
embARC supports the creation of, or editing of, a single AS 07 Core DMS for any supported MXF file. See Annex D of RDD 48 for more information about the Core Descriptive Metadata Scheme. If no existing AS 07 Core DMS is present, then embARC will allow a user to embed one. If an existing AS 07 Core DMS is present, then embARC will read the existing data and will allow a user to edit, delete, or add data to the existing one.
Additionally, embARC supports reading and downloading stored text-data GSPs (such as XML-based supplementary metadata) and binary-data GSPs (such as still images) in supported MXF files.
These are newly available open-source features for working with MXF files that are not easily found in other tools. We are excited to share them with the community and look forward to continued input and collaboration as we move forward. Comments are always welcome!
Where to find embARC
Source of Article