TITLE: abstractions and data conversion (Newsgroups: comp.lang.c++.moderated, 19 Jan 97) [ I'm happy to see John Skaller posting again to the news groups. As always, his comments leave me scratching my head. I sometimes even read them twice. Enjoy. -adc ] MALAK: malak@access.digex.net (Michael Malak) >Where should one put functions which convert data from type A to >type B? SKALLER: skaller@maxtal.com.au (John (Max) Skaller) This is a good question. It shows IMHO that OO methodology has serious faults because it cannot properly answer the question. I will show the correct, categorical, solution, after discussing the "OO" ones, which are all inferior. MALAK: >Example 1: Color spaces. We have classes RGB, HSV, and YIQ. RGB >can be considered the "baseline" format since it's closest to the >engineering domain SKALLER: Good example. MALAK: >Example 2: Image file formats. SKALLER: Also a good example. There is a very important difference between these examples. They are very good examples for this reason. Example 2 involves specific standardised file (data) formats. There is no "abstraction" here except that these formats represent images. Example 1 involves interfaces modelling specific standardised _interfaces_. Abstractions are modelled (represented) too, even if by interfaces rather than data. So the issues are the same at a different level: an interface is no more or less abstract than a data structure, inherently: it depends on your viewpoint. This is the problem with OO: it presents a uniform monolithic observer independent representation of an abstraction (a _specific_ interface for all clients). OTOH a pure data structure is completely observer dependent, you cannot DO anything with it, you have to write code to manipulate it. You can write anything that the data structure supports. The interface is almost entirely observer dependent, and, it is also (consequently) bound to specific data. This latter is without structure, as a _thesis_, OO is the reaction against it, the _antithesis_. It fails for the same reasons inverted. Categories provide BOTH facilities in a way that the programmer can engineer the split between data and function as desired. They're the _synthesis_. The formula thesis -> antithesis | V synthesis is due to Hegel. It represents transcendence, revolutiuon, or paradigm shift. (Depending on your religion :-) ----------------------------- I have done example 2 by using a single abstract class with ALL the accessors to support RGB, HSV, YIQ etc. This method requires reopening classes to support a new colour metric as a native method. It is not so good. How the data is represented is irrelevant -- in fact one can derive a class for each representation, and even some which _cache_ some computable values (which I have also done, for floating point colour spaces, since floating point can be expensive). But still, the interface is NOT entirely abstract: it is concrete just like data, albiet one level up. MALAK: >Here are the possible solutions: > >1) Establish a policy where all conversion functions go into the > "baseline" format class (such as RGB and Image). (BTW, this > seems to me to be the least desireable of the three possible > solutions.) SKALLER: This is a lousy solution. Why? It breaks the open/closed principle. See Meyer's OOSC. [ "Object-Oriented Software Construction", by Bertrand Meyer. -adc ] MALAK: >2) Establish a policy where no conversion functions go into the > "baseline" format class. They all go into the "other" > classes (such as HSV, YIQ, and LegacyImage). SKALLER: This is better. But it is still not good. It is not good to standardise one particular interface like this, unless there is very strong consensus. Chosing RGB makes sense for luminous colour (displays) (today, anyhow) but not for reflected colour (printing). MALAK: >3) Implement a double-dispatch mechanism into free subprograms > (which could be all grouped in a namespace). SKALLER: I do not understand. Double dispatch is a myth. You cannot (in general) implement a matrix of methods using two sequences. IF you can interconvert EVERYTHING to a common type, this mechanism, works. At least for image formats, this is not the case: even the notion of a 2D array of pixels does NOT provide the only description of an image. (Eg vector graphics, palettes, etc etc and on and on). So for images, in general, there is NO universal type. (In general conversions will be "lossy", and many are needed to minimse the loss). MALAK: >Option #3 seems to me to be the most OO, while #2 seems to be the >more natural fit for C++ (less cumbersome). And #1 would seem to >lead to fat classes. SKALLER: Let us take Example 1. Each file format is a PUBLIC external data format. It is represented by one or more internal data structures. Each may have convenient methods but the data must be PUBLIC. Now write conversion routines. As global functions. NOT members. You can have any number of them. There is no need to break encapsulation and no way to break encapsulation because there isn't any. There is no need to "coerce" the function inappropriately into one or the other class. Perhaps conversion is an isomorphism for some formats, (preserves all the information) and perhaps not. This is crucial structural information. So now we have algorithms and data structures. All exposed to public view. So we need to HIDE: 1) the details of the internal representations of the external file formats 2) the implementation details of the conversion functions Because C++ does not support the correct unit of modularity directly, namely the category, we need to find a tricky engineering solution built of the available tools. Namespaces could be useful. Another method is to use a dummy class. In Java, such modules ARE supported directly by the compiler. (See Java protection system). namespace impl { struct GIF { .. }; struct JPG { .. }; GIF gif_to_jpg(GIF) { ..;. } // etc } namespace ImageCategory { class GIF { impl::GIF gif; ..... }; class JPG { impl::JPG jpg; ..... }; GIF gif_to_jpg( JPG j) { return GIF(impl::gif_to_JPG(j.impl::jpg)); } .. } Here in the namespace ImageCategory all the implementation details are hidden by wrapping. The CONTRACT is: 1) IF the CLIENT uses ImageCategory and _not_ impl then changes to implementation details in impl will be transparent to CLIENT code. 2) The SERVER must maintain the implementation space and faithfully implement the conversions wrapped in the client interface. Changes will not impact any CLIENT code. 3) The COMPILER will prevent the CLIENT modifying the implementation accidentally. (Provided rule 1 is kept). The SERVER (implementation) space is OPEN: new data structures can be added, whether new representations of file formats already handled, or addition of new file formats. Similarly, new conversions can be implemented, whether better versions of those already implemented, or new ones not before implemented. It is also CLOSED in the sense that existing formats and algorithms can be left alone and used. The interface of the CLIENT space is OPEN and CLOSED: it can be used now, it can be extended, and it can be modified transparently to the client to hook better internal representations. This model is categorical, and it obeys the open/closed principle. It hides information properly by separating the interface space from the implementation space. (Note that the interfaces chosen are themselves implementation details at a higher level) A weakness is that there is no compiler support for hiding the implementation space from the client. (I.e. no enforcement) Using classes instead of namespaces solves this problem at the expense of having to break open classes to extend them (namespaces are extensible by design). Many languages providing separate module constructions provide the requisite enforcement of the access contract, but at the expense of a separate construction from the class (loss of unification and hence scalability and reusability), and usually with loss of openness (you have to break open the modules to extend them). The point of this design is that the correct level of modularity is the CATEGORY and NOT THE CLASS. Classes can be used to model categories, but the results are not properly scalable. The same applies to namespaces and most "module" constructions in popular languages. (Possibly excepting SML??) In the image format example, it is NOT the image which is the unit. It is the _collection_ of images and the maps between them TOGETHER which should exist at multiple levels of abstraction. (Indeed, "levels" is the wrong word because it smacks of hierarchy.) It is this CATEGORY which defines the abstraction "Image". An image is NOT a single type. The types of various images are distinct but related. Categories represent images by the relationships between them NOT just by attributes. In particular, categorical structure defines what an image is: images can be bitmapped, vectored, or classified in various useful ways WHICH IS REFLECTED IN THE ABSTRACT RELATIONS BETWEEN THE (CONVERSION) FUNCTIONS. (In reality, one would need to represent output and input devices as well, to truly distinguish an image from, say, a sound -- and possibly even model the eye, since a dog, colour blind human, superman, weather satellite, stellar interferometer, or Netscape may "see" something quite different :-)