Check out this link discussing work being done at NARA, and the NCAST Advanced Research Partners at the National Center for Supercomputing Applications (NCSA) related to the Data Format Description Language (DFDL). DFDL is an XML-based language for describing the format, structure, and metadata of a file in such a way that the content of a file can be viewed without using the creating software or an existing viewer. As stated on this blog post by Mark Conrad, “This research may be useful to NARA in addressing the problem of providing access to electronic records that are stored in thousands of different formats as those formats become obsolete over time.
http://blogs.archives.gov/online-public-access/?p=3618