define('DISALLOW_FILE_EDIT', true); New File Format Research and Documentation on the Sustainability of Digital Formats – TemiLib Skip to main content

New File Format Research and Documentation on the Sustainability of Digital Formats

Today’s guest post is from Kate Murray, Liz Caringola, Genevieve Havemeyer-King and Liz Holdzkom of the Digital Collections Management & Services Division at the Library of Congress.


This is the eighth installment of our File Format Friends series! You can start at the very beginning with Fun with File Formats from December 2021 and work your way through to today. Let’s get caught up on what’s happening with the Sustainability of Digital Formats, where we love nothing more than documenting the stories behind your favorite file formats (like how to pronounce “gif” [FDD 133] and–spoiler alert–that you can pronounce it however you want).

New Format Descriptions and Analysis

Screenshot of a list of completed FDDs that were published between June 1, 2024 and May 31, 2025.
The list of completed FDDs that were published between June 1, 2024 and May 31, 2025, as posted on the Sustainability of Digital Formats site.

Since our last update in December 2024, we’ve completed 13 new format description documents (FDDs). Several focus on audiovisual content, including PTX_PTF_PTS, Pro ToolsSession Files (FDD 639); DAMF, Dolby Atmos Master File (FDD 646); and LOGICX, Logic Pro Project Format (FDD 640). A series related to MPEG-2 video came about in response to a new collection acquisition, so we completed the research for our custodial unit colleagues on M2TS, MPEG-2 Transport Stream for Blu-ray Discs (BDAV) and AVCHD (FDD 636); MPEG-TS, MPEG-2 Transport Stream (FDD 635); and MPEG-PS, MPEG-2 Program Stream (FDD 637) to help them make decisions in their processing workflows.

Another group of new entries relates to legacy Apple disk imaging formats. WOZ, WOZ Disk Image (FDD 642), and A2R, Apple II Flux Disk Image (FDD 643), complete the trio of Applesauce-supporting formats alongside MOOF, MOOF Disk Image (FDD 612). Fun fact that the WOZ format was named after Apple co-founder Steve Wozniak. ReActiveMicro’s entry Applesauce says, “The original proof of concept image file extension was ‘.A2D’. However at KFEST 2017 [KansasFest – the largest and longest running annual Apple II conference], Jason Scott made a request…that the extension be changed to ‘.WOZ’ as an homage to Wozniak.”

We also continue to expand our email format expertise with a new entry about HEML, HTML Email Markup Language (FDD 638), as well as serialization formats with both YAML (with the recursive acronym YAML Ain’t Markup Language, previously known as Yet Another Markup Language and pronounced like it rhymes with “camel”) and CBOR, Concise Binary Object Representation (FDD 647). CBOR (pronounced “sea boar” and is not, in fact, named after co-creator Carsten Bormann although the acronym fits, according to IETF 87 Proceedings) may be seeing a lot of new adoption as it’s the recommended core format for the Coalition for Content Provenance and Authenticity (C2PA) assertion data which helps document and verify the authenticity and provenance of media including content that is created or impacted by artificial intelligence (AI).

Speaking of AI and machine learning (ML), the new entry for PyTorch, PyTorch Serialized File Format (FDD 644), describes the format used for PyTorch models, which are used for neural network development, natural language processing, computer vision, and reinforcement learning.

Our final new entry is for PAR_Family, Parity Volume Set File Format Family (FDD 634). Funny story about this one. One of our Library of Congress custodial units contacted us about a new collection with “.par” files so we began our research. It turns out that there are several very different file types with this same extension. And, as luck would have it, another custodial unit also had some “.par” files. But they are not the same type of “.par” files! Our FDD focuses on the Parity set which is “an open-source recovery file format designed to accompany a set of files intended for transfer or storage, verify the data integrity of those files, and if needed, repair corrupted files or reconstruct missing files.” But, as explained in Tyler Thorsted’s blog post on the topic, there are many file formats that use .par, or variations of .par, as a file extension including Apache Parquet (.parquet) (FDD 575), Solid Edge 3D models (.par), DVD Studio Pro parse files (.par), and Reflexw data-format (.par). We very much enjoyed solving this file format mystery with our colleagues both internal to the Library of Congress and in the wider digital preservation community.  

Documenting Digital Accessibility Features

In support of the yearly update of the Recommended Formats Statement (RFS), we completed a project to document digital accessibility support for formats listed as “acceptable” in the RFS. The June 2024 blog post, More Formats and More About Formats: New Entries, Format Accessibility Features and Other Updates, explains the details of this work. To summarize that post, we use the Self Documentation sustainability factor to document a format’s potential support for features such as tags; structured data and alt text for screen readers; and the ability to support captions, subtitles and audio description of key visual elements in a video program for viewers who are blind or have a visual impairment, or simply prefer to have the option.

It’s important to note that the RFS does not require these accessibility features to be enabled for a format, but our additions provide information on the capacity for the format to support these features and is one of the evaluation factors used to help RFS Content Teams determine if a format is preferred or acceptable under the RFS guidance.

We completed updates on the 40 or so acceptable formats listed in the RFS across all content categories and we continue to update the monthly report available on Documenting Accessibility Features.

New Work for 2025 and Beyond

We have a variety of new projects planned for 2025 and forward, including a focus on formats related to AI and ML. We’re also excited to explore the new EA-PDF format, which is a profile of PDF specifically designed to support email archiving. For direct links to all our new FDDs and to see what we have planned for the rest of 2025, take a look at our draft workplan which includes a publication log.

Screenshot of the list of FDDs that are planned to be added between June 2025 and May 2026.
The list of FDDs that are planned to be added between June 2025 and May 2026, as posted on the Sustainability of Digital Formats site.

As we say often on the Sustainability of Digital Formats site, comments welcome!

Source of Article

Similar posts