Image Metadata and Exiv2 Architecture

It’s very difficult to recruit people to work on open source. In fact, it’s so difficult that I wonder if open source can survive in future. Lots of people have made small contributions to Exiv2, however only a hand-full have made a sustained effort. Furthermore, contributors can disappear for months with no indication of their intention. I’m not criticising anybody for how they behave, however it’s simply impossible to plan or schedule. When folks are paid in the office, you can reasonably expect that they will turn up regularly and can be assigned tasks. This model is invalid in open source.

I believe the large open source projects such as Apache, Clang and Mozilla employ engineers to undertake the work. I don’t know how they are funded. However, when pay-checks are offered, recruitment is possible in the market.

A modest project such as Exiv2 has no money. In fact, I pay for the modest expenses such as hosting the web-site and running a build server on a Mac Mini.

The major success I have had with recruitment is when Dan and Luis arrived in summer 2017. We adopted GitHub in April 2017 when Exiv2 v0.26 was released. I wondered if the move to GitHub had increased the project visibility and Dan and Luis would be the first of many contributors. 4 years later, we have enjoyed contributions from Alex, Kev, Leo, Miloš, Rosen and two Peters.

The report provides interesting insight. Andreas was the top contributor until I overtook him in 2020. We don’t monitor the book or release scripts on OpenHub. My contributes to those parts of the project put me well ahead of Andreas.

A boss once asked me “Do you know the 80/20 rule? 80% of the project is done by 20% of the people!”. For sure, this is true in Open Source.

So, how are contributors recruited? The answer is I don’t know. For sure, I appreciate the work done by Andreas, Luis, Dan, Neils and about 20 other wonderful people. Curiously, I’m not aware of any lady contributors. I only recall one support question asked by a lady.

11.18 Scheduling

This is a major and important topic. Apart from writing code, I’ve spent more time thinking about project scheduling than any other aspect of Software Engineering.

There are two worlds. There is the perfect world inhabited by management. They live in a world which is quite different from mine. In their world, the specification is clear, the schedule is realistic, nobody makes mistakes and everything works. It’s a wonderful place. Sadly, I’ve never had the good fortune to live in that world.

I worked in a company which, to hide their identity, I will call “West Anchors”. A colleague was giving a board presentation in which they had a slide:

It is the Policy of West Anchors ® to get it right first time, every time.

There we have it. Nothing is ever late, nothing is more difficult than expected, all suppliers deliver on time. Every specification is faultless. All modules work together perfectly. Nobody is ever sick.

When I discussed the project schedule with my boss, I asked him why there was no time in the schedule for bugs and fixes. His response was “There better not be any”. Five years later, West Anchors were closed by their owners. Presumably the owners were bored by perfection.

I also had the misfortune to work at a company where the boss was an expert in planning. He explained to me that the only challenge in Software Engineering was getting the schedule right. Everything else was trivial.

So, if you live in the perfect world, you’ll not find anything interesting or useful in this part of the book, because I’m talking about the less than perfect world in which I live. I usually call it Reality.

Another challenge is that many users pretend to live in the mythical world where everything works. Those users are hostile to open-source projects populated by people who live in my depressing world of Reality.

Scheduling an open-source project is almost impossible. You are dealing with volunteers. You might think you know the volunteers, however you don’t. It’s unusual to have even met the people. How can you understand the pressure and stress in another person’s life when you know so little about their circumstances. And remember they are volunteers. They can walk off the job if they wish. In a business, management have tools such as reviews, salary, careers, vacations, bonuses, promotions and lay-offs to manipulate the employees. In open-source, you have none of those tools.

Here are my thought about how to solve the scheduling problem.

The Problem

The problem is really simple. How to plan large projects and deliver on time to budget.

The state of the game

Currently, planning is based mostly on PERT and descendant technology. Products such as Microsoft Project are designed to schedule resources and tasks. And indeed it works for some projects and fails hopelessly for others.

When the London 2012 Olympic Games were bid, the budget was $3 billion. The final cost has been stated as $20 billion. I have no data to say if there were other costs, such as policing, which are not included in the $20 billion.

This is rather common. The cost of construction of the aircraft carriers HMS Queen Elizabeth and HMS Prince of Wales are other high visibility projects in which the plan and reality are very different.

The reason for cost over-runs is that new work items are required that were not known early enough to be in the plan. We cannot know what we do not know. However there may be a way to calculate the size of the unknowns at the beginning of a project.

A project is recursive and requires recursive handling

When a project is simple - for example painting your house - it is possible to measure the size of the task and calculate the quantity of materials and labour required. This method works fine for a well defined project with quantifiable tasks.

However, if you want your house to be painted in an extra-ordinary way, this method totally fails. Think of Michael Angelo in the Sistine Chapel in Rome. Michael and the Pope came close to blows in a 20 year struggle to produce one of the wonders of man’s creativity. Nobody gives a hoot today about the cost. Nobody cares about how long it took. Nobody can understand why the customer and the contractor were divided over something as trivial as money.

The reason for the cost over-runs is because the project is recursive. In a simple project, you have a sequence (or connected graph) of tasks:

	Task	Task	Task
Begin	Remove furniture	Apply N litres of paint	Restore furniture	Done

If the project had many rooms (say 10-20), you have to schedule resources (people). You have a finite set of painters, and you may have more than 1 team of painters. However the basic linear model is not affected.

When you scale to painting something large (like an Aircraft Carrier), two items rapidly emerge to invalidate the simple model.

1) Requirements Change

The Aircraft Carrier requires stealth paint that hasn’t been invented.

2) The paint task is large

You require training and inspection services to manage quality.

And many other things arrive which were not in the simple model. In the worst case, new tasks can be larger than the original task. You have an exploding, complex challenge.

To deal with this, you have to start a project inside the project. Something like “Remove furniture” is obvious in a house. But what would that mean on an Aircraft Carrier?

So, we stay calm and add more items to the project plan. And that’s when and why everything goes wrong. The plan gets longer and more detailed. However it’s still the same old linear model.

My observation is that the project is an assembly of projects. As you develop the project, every line item in the simple model is a project. And then there are projects inside the projects. For example if special paint is required for the Aircraft Carrier, that task is probably a complex network of projects involving chemistry, special machines to apply the paint, and maintenance processes for the ship in service.

What does this have to do with Fractals?

Everything. A project is a recursive entity that must be handled with recursive techniques. Fractal Geometry deals with recursion.

Being a retired Software Engineer, I have often been told “The project is late because you (the engineer) did not itemise the project properly at the outset.”. Wrong. It’s the inflexible PERT model that cannot handle recursion.

The State of Project Planning Today

The software industry has a huge and sad collection of projects which have come off the rails. If the bosses had known at the outset, things would have been different. There are two things we care about passionately:

How long is this going to take?
How much is this going to cost?

Notice, we don’t get overly bothered about what “it” is. We care about time and money.

If we do not know about the special paint for the Aircraft Carrier, are we upset? No. However the cost and schedule damage is painful for everybody involved.

Now, we can’t know what we do not know. Is there are a way to calculate the size of the unknowns? There might be, as I will explain.

When you plan a project, you say “How long did it take to do the last one?”, take into account inflation and apply optimism “We won’t make the same mistakes again.”. This is very bad thinking. The United Kingdom has not built an Aircraft Carrier for almost 40 years. Most of the engineers working on HMS Queen Elizabeth were not born when HMS Invincible sailed to the Falklands.

A whole collection of project planning tools are now used in the software industry. Together they go under the banner: “Agile” or “Scrum”. Scrum imposes a regime of meetings and reviews on the project team. Several of these techniques are interesting.

Measure	Description	Observation
Story Points Task Size Task Poker	The size of a task is not 1,2,3,4,5 as difficulty increases. They use the Fibonacci series: 1,2,3,5,8,13,21 … So big tasks rapidly increase their allocation of resources and time.	Fibonacci series is recursive: X(n) = X(n-1) + X(n-2) where X(1),X(2)=1
The Sprint Step wise linear	Scrum says “we can’t plan everything at once, however we can complete well defined tasks on sprint schedules (typically 2 weeks).	I don’t know how scrum deals with tasks that are longer than 1 sprint.
Velocity	The team velocity (average story points completed over the last 3 sprints) are monitored and used to verify that the team is being neither optimistic nor pessimistic in their determination of story points for tasks.	Velocity is not predicted, it is measured by project performance. In a nutshell, it is recursive.

However scrum has a fatal weakness. Nobody knows the size of the total project. The model is fundamentally inadequate, because it is a monitoring tool and not predictive.

Can we have a single unified model for project planning?

I believe there’s a measure in Fractals called "Roughness” which measures some feature of the recursive item. If you measure the roughness of animal lungs (which are of course recursive), they are about the same in Elephants, Humans and Mice. A value of 1.0 implies that the object is perfectly smooth. Higher values represent the chaotic nature of the item.

I think it’s possible to measure roughness in past projects in addition to the historical performance. So, although we have never built HMS Queen Elizabeth, we could know from other Naval projects:

How much paint/painter/per time unit (the only measure in Microsoft Project)
The roughness of painting Naval Ships (projects hiding inside the project)

Both are required to estimate the size of the task. PERT models assume a roughness of 1.0 and that is why it fails on large projects. No large project has a roughness of 1.0.

So how can we use this?

We need to do three things:

Add roughness to every item in the project plan
Collect data to estimate roughness
We need a pot of time and money, which I call Contingencies

Contingencies are a % of the whole project that should be used to assign resources as sub-projects emerge. All items in the project should have contingencies from which additional resources can be allocated. This is non-confrontational and does not require blame and finger pointing. We knew about the roughness and must plan for it.

In the past, I have applied contingencies as big brush stokes to the complete project. If the project is similar to the last one, contingencies are 10%. If the project involves many unknowns, perhaps it is 300% of the project.

An ex-boss thought 414.159% Factor 3.14159 to walk round the object, then 1.0 for the task itself! The point is that when you do something for the first time, you will spend time doing work that is subsequently abandoned. Nothing new can be achieved without trial and error.

Research is required to measure task roughness in past projects to validate this approach.

Other serious limitations with PERT

There are serious limitations with PERT. I only intend to investigate the use of fractals in planning and to ignore other limitations of PERT such as:

Trouble	Observation
PERT assumes that you can itemise and quantify every task in the project.	If you are investigating something new, you can probably do neither of these things.
Many projects cannot be quantified.	Why isn’t PERT used: 1. In crime investigation. 2. In medical treatment. 3. In investment management.
You will do abortive work and encounter road blocks.	Program Managers never plan time for this. Every innovative project has abortive work.
People are not interchangeable.	People leave, or are assigned to other projects. New team members require time to come up to speed with the project.
Some tasks have a gestation period.	If you are having a baby, more women won’t reduce the 9 month wait. Adding people is often counter-productive.
Management, and other project stakeholders, change goals and objectives.	The circumstances surrounding the project can change and have major implications for the project.

Because of the recursive nature of projects, there are serious limitations hiding inside these limitations.

So what am I going to do about this?

When I retired, I was thinking about doing a PhD in this area and thought it might take 10,000 hours over 5 years. The only tasks that I could define in 2014 were:

Write an outline of the project
Find a University willing to mentor/supervise the effort
Learn all about Fractals
Research and publish a thesis
Graduate

What is the roughness of these tasks? Unknown. Graduate is simple. Or is it? Do I need a new kilt? Who’s going to attend? Where will everybody stay? Even little tasks can grow into projects.

One thing is certain, getting a better approach to project estimation is of enormous importance. We have to do better. I have tried to set out here an area of investigation that is worthy of attention.

Final words about this. I didn’t undertake a PhD. Instead I have spent 10,000 hours working on Exiv2. This book is my thesis. The presentation at LGM in Rennes would have been my defence. My reward is to know that I’ve done my best.

TOC

11.19 Enhancements

I’m not sure there is anything very interesting to be said about this. There are really different types of requests. For example, adding recognition for one lens may only require one line of C++, a test image and a 10-line python test script. This is straightforward and can be fixed within hours. At the other extreme is the request to support BMFF files. This project involves research, code, test, build and documentation changes. And to make it even more difficult, the Community challenged the legality of providing the feature. This feature took one year to complete. Probably 500 hours were spend on legal discussion.

In principle, anybody can develop a feature and submit a PR. In reality, this seldom happens. When this does happen, the effort required by me and the developer is often about the same. So, code offered in a PR often doubles my work-load.

TOC

11.20 Tools

Every year brings new tools. For example: cmake, git, MarkDown, Conan and C++20. One of the remarkable properties of tools you have never used is that they are perfect and solve all known issues, until you use them. Tools you have never used are bug free and perfect. Or so I am told.

I had an issue with the release bundles for Exiv2 v0.26. My primary development platform is macOS. Remarkably, the version of tar shipped by Apple puts hidden files in bundles to store file extended attributes. I didn’t know this until the bundles shipped and a bug report appeared. You cannot see those files on macOS, because tar on macOS recreates the extended attributes. However there were thousands of hidden files in the source bundle on Linux. I recreated the bundles as Exiv2 v0.27a and shipped them. There is an environment variable to suppress this. I believe it is: TAR_WRITER_OPTIONS=–no-mac-metadata.

Case closed. Except for very critical emails about changing bundle checksums.

For v0.27 we adopted CMake to do the packaging. Very nice. Works well. Guess what? CMake produces .tar.gz files which have these hidden files. Several people emailed to say “You wouldn’t have this problem if you used CPack.”. 100% wrong. It is a known documented issue in CPack. So, the issue resurfaced because we used CPack. Additionally, we had three release candidates for v0.27 which were published on 27 October, 15 November and 7 December 2018. v0.27 shipped on 20 December and the bug report arrived on the day after Christmas Day.

I rebuilt the bundles as Exiv2 v0.27.0a and shipped them on 2 January 2019. I updated the build script to ensure that source bundles are created on Linux.

Please understand that I have nothing against using new tools. However most of the hype surrounding new tools is nonsense. This has been studied. There are 5 stages in adopting new tools.

There is one recent tool which has surprised and pleased me. I have written this book using markdown and very pleased with the experience. As Americans say “your mileage may differ!”.

TOC

11.21 Licensing and Legal

Licensing is a legal minefield. Exiv2 is licensed under GPLv2. Until Exiv2 v0.26, Andreas offered a commercial license for Exiv2. The contract between Andreas and users is not the concern of the Exiv2 open-source project.

In the days of the Commercial license, I made no distinction between open-source and commercial license users when it came to dealing with support and other requests. I felt that the commercial license freed the user from the obligations of GPL. However, it did not provide priority support, enhancement requests or any other benefit.

The general subject of the legality of Exiv2 hasn’t been explored. There has been an enormous discussion about the legality of reading BMFF files. See https://github.com/Exiv2/exiv2/issues/1229.

The BMFF legal issue has caused me to wonder if Exiv2 is legal. I also wonder if any open source is legal! What makes something legal or illegal? Is everything legal until there is a law which declares it as illegal, or everything illegal until permitted by legislation? I suspect everything is legal until there is a legal precedent such as legislation or a court ruling to the contrary.

Dealing with legal matters is not like reporting a bug. Exiv2 is an open-source project and we get a regular stream of issues reported on https://github.com/exiv2/exiv2. I acknowledge, investigate, reply and close the issue. By design, the process is focused on resolution. Legal processes are very different. When you ask for legal advice, you are instigating an open-ended process which will expand endlessly.

Closely related to licensing is patents. There is a patent non-agression group called OIN which is the Open Invention Network. Members agree to give toll free use of their patents to other members. They have an impressive list of members include Facebook, Google, SAP, Microsoft, IBM and Canon. They also maintain the Linux System Definition. https://openinventionnetwork.com

It makes sense for Exiv2 to belong to OIN and be included in the Linux System Definition. The process to join is to fill in a form and sign it. And that’s a problem. What is the legal status of Exiv2? I don’t know. When you have a child, you register the birth and the child is recognised in law as a legal entity. If you register a company, it is a legal entity with a board of directors. Board members can sign documents on behalf of the company. However, open source is neither of these things. It’s a collection of source on a server. I think it belongs to the authors. According to openhub.net there are 85 contributors. Presumably I need the consent of all 85 contributors. Even if I could contact all 85, it’s possible that somebody would withhold their permission to join OIN.

There is a reason to believe its not possible to get everybody to agree that Exiv2 should join OIN. Andreas used to offer a commercial license for Exiv2. In return for a modest fee, companies could ship products using the library and be free of the obligations of GPLv2. I had the idea of asking Google to purchase the commercial rights from Andreas. At least one team member objected because they didn’t want anybody to “own” their work. With some thought and discussion in my dining room, the team reached the conclusion that the Exiv2 Open Source Project was under no obligation concerning the Commercial License and a “Position Paper” was written. I had the good fortune on vacation to meet a Legal Person who specialized in Intellectual Property. I sent him the “Position Paper” and he concurred with our reasoning. So, we were able to release the project from the Commercial License with no consequence for Team Exiv2.

I was surprised by the Team’s resistance to the idea of asking Google to buy the commercial rights. My conclusion is that unless you have the contributor’s support, you should assume that you do not have their support. I don’t intend to sign the OIN License. I give permission for a future maintainer to sign the OIN License on my behalf.

TOC

11.22 Back-porting

I believe there are some folks maintaining back-ports of Exiv2. Our friend Pascal works on darktable and has back-ported many features and fixes. Thank You, Pascal for undertaking that chore.

I have to say that the inertia of the Linux Distros is considerable. It can take several years for new releases to arrive on the platform. I don’t know anything about the distros and I’m not going to judge why it is so sluggish.

TOC

11.23 Partners

Without question, dealing with some other projects which use Exiv2 has been very difficult. Folks who have adopted Exiv2 in their product may feel they are entitled to make enhancement requests, demand fixes, superior support and other privileges. In a nutshell, they feel entitled. They are not. They are entitled to the same as all other stakeholders. No more. No less.

TOC

11.24 Development

As this is the first and last book I will ever write, I’d like to close the discussion of Project Management with some thoughts and opinions about how software is developed. Management have been searching for the silver bullet that will cause projects to deliver on time, to budget, with great performance, few bugs and low cost maintenance. This search has been proceeding for more than 50 years. We’ve made some progress. However system complexity out-strips our management and control tools. The challenges are immense.

I’ve seen different approaches used. In the IT world, people involved in system development adopted and modified the drawing office model. In the drawing office, you have draftsmen working on drawing boards and engineers working at desks. The engineers do the design and the draftsmen draw it. The systems analyst was the designer and the programmers created the code. They work in a strict regime of SSADM - the Standard Structure Analysis and Design Methodology. This is often called “The Waterfall Method”. It’s horrible. It’s inflexible, slow and very expensive. It’s amazing that anything can be delivered this way.

When I worked at West Anchors, the analysts promoted all the programmers to programmer/analyst. So the programmer had to do the programming and the work of the analyst. This enabled the analyst to concentrate on office politics. The QE team at West Anchors didn’t test anything. They approved the test plans written by the programmer/analyst and they inspected the test logs required to prove that the programmer/analyst had done all the work. The parrot phrase of everybody who wasn’t a programmer/analyst was “I’m not technical” which meant “I’m not going to help you, so don’t ask. And, by the way, I am superior to you and you will do exactly what I tell you to do.”.

Before I retired, the circus started adopting Scrum. Loads of meetings. The project is divided into two-week sprints. There were two days of review meetings at the end of every sprint. Two days of planning meetings at the start of every sprint. Daily stand-up meetings which were usually about 1 hour. And I’m sure I’ve forgotten other pointless meetings. Sometimes people say they are agile. I haven’t figured out what that is. I think it’s some kind of “Let’s not bother looking ahead. It’ll be great if or when it’s delivered.”. And of course, all software development engineers (except me) are geniuses who create perfect code and therefore have no reason to document or help lesser co-workers.

In the last 10 years we have seen AI move out of the lab and into our homes, cars and phones. Probably 50% of code development time is spent on test related activity. Perhaps in future, AI will undertake more of that work. Remember it works 7x24, never takes a vacation and works very quickly. I have high hopes that AI can be used to automate testing in future. However, all coins have two sides and the AI may drown the engineer with very obscure bugs. In some way, we see this with CVEs discovered by automatic fuzzing libraries.

There is a method of developing code that works for me and that’s to do everything myself. This model doesn’t scale. However it is effective. Do I create bugs? Of course, I do. However I find and fix them. Many of the best people with whom I worked in Silicon Valley use this approach. And when I think about it, that’s exactly how Andreas created Exiv2.

Another method that I believe is very effective is prototyping. Working in a sand-box with a small amount of code can be very effective to explore and learn. I can say with certainty that I have learned more about metadata in 12 weeks by writing this book than I discovered by working on the Exiv2 code for 12 years. Program Management people hate prototyping because it doesn’t have a specification, milestones, deliverables or schedule.

If you have good folks on the team, the development will be enjoyable and the results will be good. However, Software Development in large chaotic company such as West Anchors is Russian Roulette with a bullet in every chamber. Good Luck. I’m happy to be retired.

TOC

12 Code discussed in this book

The latest version of this book and the programs discussed are available for download from:

svn://dev.exiv2.org/svn/team/book

To download and build these programs:

$ svn export svn://dev.exiv2.org/svn/team/book
$ mkdir book/build
$ cd book/build
$ cmake ..
$ make

I strongly encourage you to download, build and install Exiv2. The current (and all earlier releases) are available from: https://exiv2.org.

There is substantial documentation provided with the Exiv2 project. This book does not duplicate the project documentation, but compliments it by explaining how and why the code works.

TOC

tvisitor.cpp

The syntax for tvisitor is:

$ tvisitor
usage: tvisitor -{ U | S | R | X | C | I }+ path+
$

The options are:

Option	Exiv2 Equivalent	Meaning	Description
U	–unknown	Unknown	Reports unknown tags
S	-pS	Structure	This is the default
R	-pR	Recursive	Descend as deeply as possible
X	-pX	XMP	Report XMP
C	-pC	Icc	Report ICC Profiles
I	-pi	Iptc	Report IPTC Data

The options can be in any order and undefined characters are ignored. The most common option that I use is -pR which is equivalent to exiv2’s -pR. It could be simply stated as tvisitor -R foo. In the test harness, I use -pRU.

The option U prints Unknown (and known) tags. An unknown tag is an item of metadata for which tvisitor does not know the name. A tiff ‘tag’ is identified by a 16 bit integer and these are defined in the TIFF-EP and Exif Specifications. The MakeNote tags are not standardised and unknown tags are reported in the following style:

   address |    tag                    |      type |    count |    offset | value
       382 | 0x003b Exif.Nikon.0x3b    |  RATIONAL |        4 |      1519 | 256/256 256/256 256/256 256/256

The option S prints the Structure of the file. For example, use the following file: https://clanmills.com/Stonehenge.jpg, we see the structure of the JPEG file:

$ tvisitor -pS ~/Stonehenge.jpg 
STRUCTURE OF JPEG FILE (II): /Users/rmills/Stonehenge.jpg
 address | marker       |  length | signature
       0 | 0xffd8 SOI  
       2 | 0xffe1 APP1  |   15288 | Exif__II*_.___._..._.___.___..._.___.___
   15292 | 0xffe1 APP1  |    2610 | http://ns.adobe.com/xap/1.0/_<?xpacket b
   17904 | 0xffed APP13 |      96 | Photoshop 3.0_8BIM.._____'..__._...Z_..%
   18002 | 0xffe2 APP2  |    4094 | MPF_II*_.___.__.._.___0100..._.___.___..
   22098 | 0xffdb DQT   |     132 | _.......................................
   22232 | 0xffc0 SOF0  |      17 | ....p..!_........
   22251 | 0xffc4 DHT   |     418 | __........________............_.........
   22671 | 0xffda SOS   |      12 | .._...._?_..
END: /Users/rmills/Stonehenge.jpg
$

The option R performs a Recursive descent of the file and dumps embedded structures such as the TIFF which contains the Exif Metadata. It also descends into IPTC data and ICC Profiles.

$ tvisitor -pR ~/Stonehenge.jpg 
STRUCTURE OF JPEG FILE (II): /Users/rmills/Stonehenge.jpg
 address | marker       |  length | signature
       0 | 0xffd8 SOI  
       2 | 0xffe1 APP1  |   15288 | Exif__II*_.___._..._.___.___..._.___.___
  STRUCTURE OF TIFF FILE (II): /Users/rmills/Stonehenge.jpg:12->15280
   address |    tag                              |      type |    count |    offset | value
        10 | 0x010f Exif.Image.Make              |     ASCII |       18 |       146 | NIKON CORPORATION
        22 | 0x0110 Exif.Image.Model             |     ASCII |       12 |       164 | NIKON D5300
  ....
  END: /Users/rmills/Stonehenge.jpg:12->15280
   15292 | 0xffe1 APP1  |    2610 | http://ns.adobe.com/xap/1.0/_<?xpacket b
   17904 | 0xffed APP13 |      96 | Photoshop 3.0_8BIM.._____'..__._...Z_..%
  STRUCTURE OF 8BIM FILE (MM): /Users/rmills/Stonehenge.jpg:17922->78
       offset |   kind | tagName                      |  len | data | 
            0 | 0x0404 | PSD.8BIM.IPTCNAA             |   30 | 12+1 | ..__._...Z_..%G..__._...x_.___
    STRUCTURE OF IPTC FILE (MM): /Users/rmills/Stonehenge.jpg:17922->78:12->39
        Record | DataSet | Name                           | Length | Data
             1 |       0 | Iptc.Envelope.ModelVersion     |      2 | _.
             1 |      90 | Iptc.Envelope.CharacterSet     |      3 | .%G
             2 |       0 | Iptc.Application.RecordVersion |      2 | _.
             2 |     120 | Iptc.Application.Caption       |     12 | Classic View
    END: /Users/rmills/Stonehenge.jpg:17922->78:12->39
  END: /Users/rmills/Stonehenge.jpg:17922->78
   18002 | 0xffe2 APP2  |    4094 | MPF_II*_.___.__.._.___0100..._.___.___..
   22098 | 0xffdb DQT   |     132 | _.......................................
   22232 | 0xffc0 SOF0  |      17 | ....p..!_........
   22251 | 0xffc4 DHT   |     418 | __........________............_.........
   22671 | 0xffda SOS   |      12 | .._...._?_..
END: /Users/rmills/Stonehenge.jpg
$

There is no plan to have a man page for tvisitor because it has a 200 page book! tvisitor isn’t intended for any production use and has been written to explain how Exiv2 works. In less than 4000 lines of code it decodes the metadata in all formats supported by Exiv2 plus BMFF formats .CR3, .HEIC and .AVIF. Additionally, it supports BigTiff, extended JPEG, dumping ICC profiles and many other features which are not supported in Exiv2.

tvisitor is currently being tested using more than 10,000 images harvested from ExifTool, raw.Pixls.us, RawSamples.ch and images collected from issues reported to Exiv2. My aim is to successfully read 9990 which is 99.9% reliability. I fully expect the Community to attack me concerning the 0.1% that are not successfully decoded. On-line abuse from the Community is the reason that I am retiring.

I have written the book for two purposes:

To Explain how Exiv2 works.
To Explain how to parse metadata.

Exiv2 provides a unique capability to the Community and its long term maintenance is of importance to Linux. To my knowledge, no book as been written about Metadata. The tvisitor code would provide a good resource from which to develop a new Metadata Library.

TOC

dmpf.cpp

The purpose of this program is to inspect files. It’s od on steroids combined with parts of dd.

$ dmpf -h
usage: ./dmpf [-]+[key=value]+ path+
options: bs=1 count=0 dryrun=0 endian=0 hex=1 skip=0 start=0 verbose=0 width=32
$

There are numerous examples of this utility in the book. The options are:

Option	Description	Default	Comment
bs=	block size	1	1,2 or 4
count=	number of bytes to dump	rest of file	Accumulative
dryrun=	report options and quite	0
endian=	endian 0=little, 1=big	0	toggles native endian
hex=	0=int or 1=hex	0
skip=	number of bytes to skip	0	Accumulative
start=	number of bytes to start	0	Set by path->start:length+
verbose=	echo the settings	0
width=	width of output	32

path can be - or path or “path:offset->length”

The term Accumulative means you may use the option more that once and they will be added.
Non Accumulative settings are set when encountered. So, the last setting will prevail.

Here are some examples, using files/Stonehenge.jpg for which the structure is:

$ tvisitor -pS files/Stonehenge.jpg
STRUCTURE OF JPEG FILE (II): files/Stonehenge.jpg
 address | marker       |  length | signature
       0 | 0xffd8 SOI  
       2 | 0xffe1 APP1  |   15272 | Exif__II*_.___._..._.___.___..._.___.___
   15276 | 0xffe1 APP1  |    2786 | http://ns.adobe.com/xap/1.0/_<?xpacket b
   18064 | 0xffed APP13 |      96 | Photoshop 3.0_8BIM.._____'..__._...Z_..%
   18162 | 0xffe2 APP2  |    4094 | MPF_II*_.___.__.._.___0100..._.___.___..
   22258 | 0xffdb DQT   |     132 | _.......................................
   22392 | 0xffc0 SOF0  |      17 | ....p..!_........
   22411 | 0xffc4 DHT   |     418 | __........________............_.........
   22831 | 0xffda SOS   |      12 | .._...._?_..
END: files/Stonehenge.jpg
$

Purpose	Command	Output
Dump first 16 bytes	$ dmpf count=16 files/Stonehenge.jpg	`0 0: ....;.Exif__II*_ -> ff d8` `ff e1 3b a8 45 78 69 66 00 00 49 49 2a 00`
Dump 12 bytes of Exif data	$ dmpf count=12 skip=2 skip=4 skip=6 files/Stonehenge.jpg	`0xc 12: II*_.___._.. -> 49 49 2a 00` `08 00 00 00 0b 00 0f 01`
Dump 12 bytes of Exif data	$ dmpf ‘files/Stonehenge.jpg:12‑>24’	`0xc 12: II*_.___._.. -> 49 49 2a 00` `08 00 00 00 0b 00 0f 01`

TOC

csv.cpp

The purpose of this program is to “pretty print” csv files. The only use of this program is to “pretty-print” csv output from taglist for presentation in this book. For example:

$ taglist ALL | csv - | head -3
[Image.ProcessingSoftware]  [11]    [0x000b]    [Image] [Exif.Image.ProcessingSoftware] ...
[Image.NewSubfileType]  [254]   [0x00fe]    [Image] [Exif.Image.NewSubfileType] [Long]  ...
[Image.SubfileType] [255]   [0x00ff]    [Image] [Exif.Image.SubfileType]    [Short] ...
$

TOC

CMakeLists.txt

Here is the CMakeList.txt for the code that accompanies the book. It’s similar and simpler version of the cmake code in Exiv2.

cmake_minimum_required(VERSION 3.8)
project(book VERSION 0.0.1 LANGUAGES CXX)
include(CheckCXXCompilerFlag)

set(CMAKE_CXX_STANDARD                11 )
set(CMAKE_CXX_EXTENSIONS              ON )
set(CMAKE_OSX_ARCHITECTURES     "x86_64" )
set(CMAKE_XCODE_ARCHS           "x86_64" )

# build for the current version of macOS
if ( APPLE ) 
    execute_process(COMMAND sw_vers -productVersion
                    OUTPUT_STRIP_TRAILING_WHITESPACE
                    OUTPUT_VARIABLE SW_VERS
                   )
    set(CMAKE_OSX_DEPLOYMENT_TARGET "${SW_VERS}")
endif()

# don't build for 32bit (size_t/sprintf unhappiness)
if("${CMAKE_SIZEOF_VOID_P}" STREQUAL "4")
    message(FATAL_ERROR "32 bit build is not supported")  
    error()
endif()

# the programs
add_executable(visitor     visitor.cpp    )
add_executable(tvisitor    tvisitor.cpp   )
add_executable(dmpf        dmpf.cpp       )
add_executable(csv         csv.cpp        )
add_executable(parse       parse.cpp      )

# options
option(    EXIV2_TEAM_USE_SANITIZERS  "Enable ASAN when available"                  OFF )
if ( MSVC ) 
   option( EXIV2_ENABLE_PNG        "Build compressed/png support (requires libz)"   OFF )
else()
   option( EXIV2_ENABLE_PNG        "Build compressed/png support (requires libz)"   ON  )
endif()

if( EXIV2_ENABLE_PNG )
    find_package( ZLIB REQUIRED )
endif( )
if ( EXIV2_ENABLE_PNG AND ZLIB_FOUND ) 
    target_link_libraries ( tvisitor PRIVATE ${ZLIB_LIBRARIES} )
    target_compile_options( tvisitor PUBLIC -DHAVE_LIBZ)
endif()

if(WIN32)
    find_library(WSOCK32_LIBRARY   wsock32)
    find_library(WS2_32_LIBRARY    ws2_32)
    target_link_libraries(parse    wsock32 ws2_32)
    target_link_libraries(tvisitor wsock32 ws2_32)
endif()

# ASAN (not on Windows)
if ( EXIV2_TEAM_USE_SANITIZERS AND NOT (CYGWIN OR MINGW OR MSYS OR MSVC) ) 
    check_cxx_compiler_flag(                                  -fno-omit-frame-pointer       HAS_NO_EMIT)
    if(HAS_NO_EMIT)
        add_compile_options(                                  -fno-omit-frame-pointer )
        set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -fno-omit-frame-pointer")
    endif()
    check_cxx_compiler_flag(                                  -fsanitize=address,undefined  HAS_FSAU)
    if(HAS_FSAU)
        add_compile_options(                                  -fsanitize=address,undefined )
        set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -fsanitize=address,undefined")
    endif()
    check_cxx_compiler_flag(                                  -fno-sanitize-recover=all     HAS_FSRA)
    if(HAS_FSRA)
        add_compile_options(                                  -fno-sanitize-recover=all )
        set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -fno-sanitize-recover=all")
    endif()
endif()

# Test harness (in ../test)
add_custom_target(test  COMMAND ../test/run.sh )
add_custom_target(tests COMMAND ../test/run.sh )

# This is intentionally commented off
# See Chapter 8 Testing for discussion about building libtiff
if ( 0 ) 
    include_directories(/usr/local/include)
    link_directories(/usr/local/lib)
    add_executable(create_tiff create_tiff.cpp)
    target_link_libraries(create_tiff z tiff jpeg)
endif()

# That's all Folks!
##

TOC

make test

The code in the book has a simple test harness in test/run.sh. When you build, you can run the tests with the command:

586 rmills@rmillsmbp:~/gnu/exiv2/team/book/build $ make tests
Scanning dependencies of target tests
20200717_221452.avif passed
args passed
avi.avi passed
avif.avif passed
Canon.cr2 passed
Canon.crw passed
Canon.jpg passed
cr3.cr3 passed
csv passed
dmpf passed
heic.heic passed
IMG1.HEIC passed
IMG_3578.HEIC passed
mrw.mrw passed
NEF.NEF passed
NikonD5300.dcp passed
ORF.ORF passed
Stonehenge.jpg passed
Stonehenge.tiff passed
webp.webp passed
-------------------
Passed 20 Failed 0
-------------------
Built target tests
587 rmills@rmillsmbp:~/gnu/exiv2/team/book/build $

The code to implement the tests is in test/run.sh

#!/usr/bin/env bash

pass=0
fail=0

# Create reference and tmp directories
if [ ! -e ../test/data ]; then mkdir ../test/data ; fi
if [ ! -e ../test/tmp  ]; then mkdir ../test/tmp  ; fi

report()
{
    stub=$1
    # if there's no reference file, create one
    # (make it easy to add tests or delete and rewrite all reference files)
    if [ ! -e "../test/data/$stub" ]; then
        cp "../test/tmp/$stub" ../test/data
    fi
    
    diff -q "../test/tmp/$stub" "../test/data/$stub" >/dev/null 
    if [ "$?" == "0" ]; then
        echo "$stub passed";
        pass=$((pass+1))
    else
        echo "$stub failed"
        fail=$((fail+1))
    fi
}

# test every file in ../files
for i in $( ls ../files/* | sort --ignore-case ) ; do
    stub=$(basename $i)
    # dmpf and csv are utility tests
    if [ $stub == dmpf -o $stub == csv -o ]; then
        ./$stub ../files/$stub 2>&1 > "../test/tmp/$stub"
    else 
        ./tvisitor -pRU "$i"   2&>1 > "../test/tmp/$stub"
    fi
    report $stub
done

echo -------------------
echo Passed $pass Failed $fail
echo -------------------

# That's all Folks
##

TOC

The Last Word

I hope you found this book interesting. More to the point, I hope you found the book useful. I hope Exiv2 will live into the future, or this book inspires somebody to write a new library.

I’m going off to cut the grass and to run in the beautiful countryside around my home in Camberley, England. And I’m going to play the Euphonium and the Piano.

TOC