tardiff - archive comparison and patch generation utility
v2.1.4 released April 21, 2005
tardiff [z|y|j|Z][c|r|u][i|b] output originalTAR
tardiff [z|y|j|Z][c|r|u][i|b][d[m][R]][C[M][F][C|B][o]] output originalTAR toTAR
[--duplicates [--ask-remove] [--no-movement|--circleName=
tardiff is a utility designed for comparing an existing TAR archive to its equivalents in and under the current directory, or for comparing two TAR archives. It is useful for making patches (i.e. a patch from 1.0.0 to 2.0.0, where each version is self-contained in its own TAR file) or for storing changes between an original TAR file and a final result (for example, between an original source archive and your changes to its source, or between before compilation to after compilation).
The command line is interpreted in two ways: long and short. The method used is determined by looking at the first argument. If it begins with a dash (-), the long interpreter is used. Otherwise, the short interpreter is used, and the first argument is expected to be either the desired options strung together without dashes or the name of the TAR file to create or append to.
The short interpretation mode makes the arguments position-dependent -- they have to be in the order given above (the first and third forms). The options are all represented by the -letter form given (minus the dash).
The long interpretation mode has no restrictions on the order of the arguments.
The presence or absense of toTAR affects what options are available; see the long forms above.
- The TAR (or ZIP) file you want to write to.
- The base TAR file to be compared against.
When toTAR is present, and the B checksum option is given, this is a checksum file generated in a previous run of tardiff that was generated from the base TAR file to be compared against.
- A TAR file that should result from extracting originalTAR, then extracting output and running finish.sh or finish.bat, if present, and erasing those two shell scripts.
In other words, if a + b = c, originalTAR is a, toTAR is c, and we're trying to find b, which will be output.
When toTAR is not given, originalTAR defines which files are considered for the output archive in the following way:
- Are there any directories inside originalTAR? If so, and any equivalent directories exist, all files and directories inside those directories will be considered. Please remember that many archives contain a "top-level" directory (for example, gzip-1.2.4a.tar.gz (from GNU) contains the gzip-1.2.4a directory, and everything inside the archive is placed in there).
- All files with equivalent names to files specifically mentioned inside originalTAR will be considered in every case.
When comparing against only one archive, the directory tardiff is run from is important: if it's not run from the same directory the archive was originally extracted into, i.e. if the directory structures don't match exactly, the result will probably be an empty archive and a shell script to erase everything.
When toTAR is given, every file in the toTAR archive will be considered. More importantly, perhaps, tardiff will "speculate" if it can't find any intersecting files between the archives: It will see if it can find any intersection if it drops the top directory level in either of, but not both of, the archives. If it's speculation doesn't produce any intersections, it (effectively) decides that the user is always right, and drops the idea. If duplicate checking is not selected (--duplicate or -d), it will generate an archive that is a duplicate of toTAR except for a couple of scripts for erasing the old file versions.
- --gzip or -z
- Causes the output TAR data to be compressed with the gzip program. This says nothing about if toTAR or originalTAR are compressed or how.
- --bzip2, -y or -j
- Causes the output TAR data to be compressed with the bzip2 program. If toTAR is not being used, tardiff will play "guess the option" with tar to discover how to request a bzip2-compressed archive.
- --zip or -Z
- Causes tardiff to create a ZIP file instead of a TAR file by using the zip program, with either just the finish.bat script or no script at all. No compression options can be used in conjunction with this option. Files, directories, and symbolic links will be stored into the archive, but not devices, pipes, etc.. When toTAR is not given, symoblic links will be stored as the file pointed to; when toTAR is given, however, symbolic links are stored as symbolic links, and all files being saved into the archive will be extracted into a temporary directory (a subdirectory of
/tmp/, generally) first.
- --create or -c
- If output already exists, causes tardiff to overwrite it. If output doesn't exist, this is the default.
- --append or -r
- This option is not available if the --zip option is in use.
If output already exists, causes tardiff to append to it. If toTAR is given, output does not need to be a TAR file, and tardiff does not check if it is compressed or not, so this mode can be used to write the output onto a stub file; this can be used to create self-extracting archives. However, without toTAR, the tar program will be examining output, and it does care.
Be aware that, without toTAR, tardiff will uncompress the output before adding to it. This is because at least one version of tar (the ancient one the author uses) refuses to add to compressed archives.
If output exists, and the --zip option isn't in use, this is the default.
- --update or -u
- Not available when toTAR is given, causes tardiff to pass the u option to tar, with all the results that entails. tardiff will decompress output before passing the work off to tar.
If the --zip option is in force, causes zip to update the archive. In this case, this option is available when toTAR is given, and is the default if output exists.
- --noScripts or -i
- Prevents tardiff from writing finish.sh and finish.bat to the archive. This disables --duplicates, and removed files can not be noted in the archive without scripts.
- --makeBAT or -b
- Add an equivalent Windows .BAT file to the created archive (named finish.bat) in addition to finish.sh (the Unix shell script version).
- --duplicates or -d
- Cause tardiff to search for duplicate files in toTAR, or between originalTAR and toTAR, to generate move and copy commands into the generated finish.sh shell script. Without this option, tardiff will generate delete commands for where the files moved from, and an entire copy of the same file at its new place into the archive. However, this does increase the number of passes taken through the source archives (unless canonical checksums are in use; any checksums at all will reduce this increase). The size of the increase is dependent on the maximum "carry"ing amount; see --carry for information on changing this maximum.
- --ask-remove or -R
- Causes the generated shell script (and batch file, if included) to pause before deleting any files, asking the user:Do you want to delete the files that vanished between these two versions? If yes, press Enter. If no, press Ctrl-C.
- --no-movement or -m
- Causes tardiff to skip the movement scan and avoid generating any move commands into the shell script. This is useful when the user may have customized some of their files that have since been moved, but any such customizations should be dropped.
- Causes tardiff to handle circular movement dependencies by renaming one of the files in the circle into an "out of the way" place -- the temporary file name given as part of this option. tempName must be unique; if it already exists in the "wrong place" in the patch applier's file system, the existing file will probably be overwritten. If a file with this name already exists anywhere in either source archive, tardiff will refuse to continue past the initial archive scans by displaying an appropriate message. tempName may not contain any slashes (
/); backslashes are allowed, but will cause difficulties to any Windows user who tries to apply the patch.
- Set the maximum amount of data tardiff will "carry" before it will no longer take on more "weight". maximumCarry can be specified in kilobytes or megabytes by appending "K" or "M" to the number. tardiff actually tracks the "weight" in units of 512 bytes; the default is 1M. As long as the carried "weight" is less than the maximum, tardiff will add to it, even if the addition would put the total weight far above this maximum; this is not a ceiling.
In short, the lower the maximum given, the longer tardiff will require to make comparisons at a lower amount of memory used -- it's a trade-off between speed and memory usage. The trade-off is not linear; a small increase in maximumCarry could yield a lot of speed improvement or only a little, depending on the sizes of the files in the archives to be compared. Further, carrying's time benefit is minimal when neither source archive is compressed, although it does reduce the amount of disk I/O.
Carrying can be turned off completely by using --carry=0, or it can be limited to a single file at a time by using --carry=1 (all the way up to --carry=512).
At the moment, carry only affects duplicate scanning. In the future, it's use will probably be expanded to improve the worst-case time of the comparison scans.
- --checksum=longChecksumOptions or -CshortChecksumOptions
- Checksum options.
If the short command line interpretation mode is being used, just add a C followed by whatever checksum options may be desired to the options argument.
If the long interpretation mode is being used, separate the checksum options with commas (,).
In either interpretation mode, all checksum options need to be specified together. Checksums are only available when toTAR is included.
- md5 or M
- Generate checksums using the md5sum program instead of the sum program.
- Generate checksums using some other checksum generation program. The program must accept the data to operate on via standard input and must return a checksum via standard output; only the data returned up to the first whitespace character will be returned as the checksum.
If parameters to the command are required, the spaces must be escaped somehow; the means of doing this will depend on the shell in use. On some shells, the easiest way may be to enclose the entire checksum options list in quotation marks. Note that no part of the command may contain a comma (,) character, as it is used to set off the next checksum option. As the command will not be passed through the shell, pipes are not supported.
If desired, more than one checksum program may be used by separating them with the semicolon (;) character; most shells see this character as a separator for the next command to run in a sequence, so escape it accordingly. The returned checksums will be the set of all checksum programs' checksums separated by spaces. The order in which the checksum programs are given is significant, especially if you want to use the checksums again later (i.e. by the output option).
If both the md5 and this option are given, the md5 checksum will always be considered the first checksum program in the list, in addition to the program(s) given in this option.
This option has no short equivalent.
- fast or F
- Speed up checksum generation by not checking if each checksum program started properly. Without this option, each time a checksum program is started, tardiff pauses up to 5 seconds (to verify that the pipe opened properly; this usually takes much less than a second) and then waits 0.01 seconds to give up its time slice (it hopes) so the new process can try to execute the needed checksum program. In this way, if the program fails to start for some reason, tardiff will have an opportunity to realize a problem exists and try again, up to 10 tries per checksum program start. Because each checksum program is started for each file read from both the base and target archives, these small waits can add up...
With this option, tardiff will no longer pause at all, and can not make multiple tries at starting a checksum program. Except when the base option is used, this doesn't have to be such a big deal; tardiff can deal with missing a checksum or two, although each missing checksum will add the file as a potential duplicate of every other file of the same size. When the base option is used, however, complete checksum information is required -- there is no other way to compare the potential duplicates -- and the file will be considered a mismatch (which is the safest available decision). If the main --duplicates option is active, and not all checksum information is available for base comparisons, a warning message will be displayed.
- canon or C
- Treat checksums as canonical -- if the checksums are identical, the files are identical. Without this option, checksums will be used to determine if two files are not identical, eliminating them from the pool of files that require comparison, but the files will still be compared under the assumption that this may be a false positive result.
- base or B
- The origTAR file is actually a checksum output file, and should be treated as such. Because the original is not available for comparison, checksums will be treated as canonical.
tardiff will not determine the checksum method used in the referenced output file for you. It expects the user to tell it what method to use, and the same method must be used for this to work properly. (If a different method is used, only directories, empty files, and non-file objects will be properly compared.)
- output or o
- If this option is used, tardiff will write out checksum files (suitable for use with the base option) for each source archive after the first pass through them.
The names of the checksum files are determined by the checksum program being used and the name of the archive the checksums were generated from. If the md5 option is being used, these checksum files will have .md5 appended to the name of their archive; otherwise, .sum will be appended to the name of their archive (even if the "method" option is used).
To generate the checksums for an archive without creating an output archive or wasting time making comparisons, give any value for output, the name of the archive of interest for origTAR, /dev/null for toTAR, and activate the noScripts option. tardiff will take the result archive to be empty, meaning that all files in origTAR were erased, but it won't realize this until after the checksums for origTAR were generated and written to disk, and it will exit with return code 0 with the message "No archivable changes detected."
When toTAR is used,
- tardiff may not work properly with archives generated with tar's --sparse option; this situation has not been tested, but is believed by the author, upon reading tar's info pages, to be an obvious point of failure. Fortunately, the believed failure mode in this case should result in tardiff beeping a lot...
With or without toTAR,
- Extracting archives made with tardiff may have interesting consequences if any items changed type -- i.e. files became directories, directories became devices, devices became pipes, pipes became sockets, or sockets became stock brokers. The actual consequences will depend on the extraction utility, and if it erases the object occupying the name first. Strictly speaking, this really isn't tardiff's problem; doing the Right Thing is more the responsibility of the extractor.
- tardiff assumes that the file system the patch is to be extracted on is “CasE sEnsItIve”, and does not suffer from other differing filename conflict bugs (in the file system or in the basic file manipulation tools). If the assumptions are wrong, and there are filenames in the patch that rely on these assumptions, applying the patch may fail, possibly silently.
The author believes that these should only be a problem when the --duplicates option is in use; however, it is also possible that file deletions may hit the wrong targets on sufficiently problematic systems!
A partial fix may (eventually) be added to correct deletions in any generated Windows .BAT file, but the author recommends against holding one's breath.
If tardiff beeps, there's definitely trouble, and it's trying to tell the user about it.
no error message
- If you see no unusual message on the screen, and tardiff is making the initial pass, the beeping is the sound of tardiff complaining about one of your source archives. Either it isn't an archive, there are pieces missing, or there was a long symbolic link in there. In any of these cases, the output archive may not be quite right -- extra files included, some files left out, or even some corruption of the created archive. This error should probably be given text and made fatal...
Giving a checksum file as an archive and then neglecting to tell tardiff about it is another good reason for tardiff to complain.
Newly discovered problems:
It appears that an unknown TAR file creator doesn't include the "ustar" signature in its file headers in the created TAR archive. This causes tardiff to beep constantly, discarding all the (otherwise valid) headers, and finally concludes that the archive in question is empty. I'm not sure what to do about this, since the signature's presence could be considered an important check to ensure that tardiff is not getting out of sync with the archive it's reading...
It appears that very old versions of tar generate spaces (hex 20) instead of NULL bytes (hex 00) in some parts of the file headers generated. As soon as I figure out what I'm doing with the "ustar" signature problem, I'll fix this at the same time.
- Malformed UTF-8 character (overflow at 0xffffffff, byte 0xe5, after start byte 0xff) in pattern match (m//)...
- If you see this message, tardiff's output will be corrupted. This is Perl's announcement that it is modifying the data passing through the program, and appears when the bzip2 header check is used. Unfortunately, perl considers this a warning (and not an error), and does not stop the program from running...
If this message is seen, either reinstall tardiff (the install script should detect that the version of Perl is new enough to need this fix) or remove the hash (
#) just before the line "
use bytes;", near the beginning of the program. If the version of Perl in use is too old to have this problem, though, removing that commenting hash will prevent the program from compiling (and thus running) at all, because older versions of Perl do not recognize the "use bytes" directive.
- Received untargeted error from tar:
- It's received this message from tar (while comparing an archive to the current directory), and it has no idea what to do with it. The file listed will be left out of the created archive.
If you believe that this file should go into the archive, look in the source for @msgUnkn, find the line matching the message from tar, and move it into one of the other sections. @msgSkip is for messages that don't indicate that the file should go into the output archive, @msgRedo is for those that you do want placed into the output archive, and @msgBad is for those that should be erased (i.e. adding delete commands into the shell script).
- Some problem occurred during tar diff execution!
- This means that tar returned a blank line. It may be because the archive is corrupted, or it may be some other reason.
- Failed to run program
- This means that the program listed couldn't be started for some reason.
- Checksum program ... not found; can't continue
- The named program couldn't be found for generating checksums, and tardiff needs to be able to run that program to try to use the checksum file given. Obviously, the checksum program needs to be installed somewhere in the path.
- Temporary file name exists in a source archive
- This means that tardiff noticed that the filename given with the --circleName= option already exists in one of the source archives (i.e. the source, the target, or both). Try again with a different tempName.
- Warning: Incomplete/missing checksums detected
- This means that a checksum program didn't return a result properly, and (as the base checksum option is active), tardiff is unable to compensate by simply comparing the possible duplicates.
- Failed to finish generating the movement instructions!
- This means that the author didn't manage to cover every possible combination of movements that might be needed, including the one tardiff just encountered. If this error is ever encountered, please send a bug report to the author, as the error message directs.
As long as everything the error message asks for is included, and the most recent version of tardiff is being used, there should be no need to include anything else, so there's no need to read about how to give good bug reports in this case.
- Failed to properly generate the movement instructions!
- This error is nearly the twin of the last error; it means that tardiff managed to finish generating the movement instructions (for inclusion into the finish script(s)). However, when tardiff was checking its work, it discovered that something is wrong with the instructions it generated! If this error is ever encountered, definitely send a bug report to the author, as directed by the error message -- if this condition is ever triggered, it could indicate that other errors in generating movement instructions are not being caught, because it should not be possible for this error to occur, ever!
As long as everything the error message asks for is included, and the most recent version of tardiff is being used, there should be no need to include anything else, so there's no need to read about how to give good bug reports in this case.
- Can't get current working directory!
- This error means that tardiff couldn't determine what the current directory (i.e. the directory it was being run from) is. Check that the pwd program is in the path, or allow tardiff to use the
/procdirectory to determine its current directory.
- Couldn't create temporary directory!
- For tardiff to generate a ZIP file, it has to extract the files that will become part of that ZIP file to disk somewhere. It first tries the
/tmpdirectory. If it can't create a directory there, it looks for environmental variables called
TEMP(in that order) and tries those locations. This error means that tardiff couldn't create a temporary directory in any of those locations.
- Failed to extract data to disk
- While extracting data to disk in preparation of asking zip to create the ZIP file, something went wrong. Usually, this means that the temporary disk is full, or an inode couldn't be created there.
It is possible to prevent tardiff from using
/tmpon its next run. To do so, first set the environmental variable
TMPto where you want the temporary directory to be created. Next, start tardiff as before. Then, while tardiff is running, determine its process ID and create either a file or a directory named
/tmp/tardiff-pid, where pid is the running tardiff's process ID. Now, when tardiff tries to create its temporary directory (it tries the name just given first), it will fail and move on to the environmental variables for a directory to use.
- Fatal: Couldn't write to output file [number] (size)
- tardiff was trying to write a data segment to the output file, but it failed. size is how many bytes it was trying to write, and should always be 512. number 4 is generated when adding the extra shell scripts to the archive, so any size less than 512 is caused by a problem talking with tar. number 2 is generated when writing data from the toTAR archive, so any size less than 512 is because of a short read from there. The only time when you should see this error is when size is 512 and the disk being written to is full.
- Fatal: Couldn't write to output file [number]
- Similar to the previous error, these indicate a problem writing a file header. number 3 is from the shell scripts, and number 1 is from the toTAR archive. Usually this means that the disk is full at a very convenient place.
If the problem encountered isn't shown above, or was only discovered when the tar program was used to interact with the created TAR archive (i.e.
tar ztvf outputTAR.tar.gzreturns an error message while reporting on the contents of the archive), and isn't due to a known problem (listed above), it may be a bug.
Before reporting bugs, please read and understand this web page; it has some good suggestions about how to make useful bug reports.
It has been said that the only good system administrator is a truly paranoid system administrator. From personal experience, the only truly good programmer is a very, very paranoid programmer who fully expects that the system is out to get him (or her)... because, sometimes, it is! This is the sort of programmer who keeps testing and re-testing her (or his) program after every single, tiny change, just on the off chance that this one will be the one that will cause the computer to show it's true colors. In case anyone was wondering, this is the case for this programmer on this program.
Unfortunately, of course, not every possible problem will surface with testing, no matter how many truly devious test cases are created and tested against. Additionally, it seems that it's not only system administrators and programmers who can be paranoid with cause; apparently, some packagers are too!
To reassure these people, this section includes methods with which any unknown problems with tardiff can be discovered before releasing a patch. For those who have suggested that perhaps tardiff could do more tests on it's own, during patch creation, it may be reassuring to know that tardiff itself can not reliably do any more tests than it already does. To date, the only (unknown) bug in a released version could not possibly have been caught by tardiff itself, so far as the author is aware. (In that problem, Perl itself was causing the problems, and [as this help was meant to be transparent to the programs written in it] it is very difficult to imagine how, especially knowing nothing of this extra "help" at the time and no access to a system running an affected [i.e. new] version of Perl, this problem could have been caught by tardiff. In the course of creating a patch archive, tardiff compares parts of the original archives several times over using the same means of orientation as it will use when writing the archive to disk, given that canonical checksums are not in use -- in that case, Perl invisibly diddled with the data in the exact same way every time.)
As long as the user is paying attention while tardiff is operating, listening for beeps (which are the only non-fatal errors, at this point; though they may become fatal errors in the future...), any problems with the results that tardiff is at all aware of should be obvious; in most cases, tardiff will not even generate the patch archive if it is aware of a problem.
The most obvious test is running the generated patch archive through tar (given --zip isn't used); the command "
tar tvf archiveName" (with appropriate compression options added in, of course) will catch nearly all corruption errors, if not absolutely all. The most likely failures, including Perl diddling with the data, will cause misalignments of the blocks written to the archive. These errors will cause tar to issue warnings (though not always errors) while reading through the archive. As tar is not written in Perl, it does not suffer from Perl's invisible, and not program-requested, features. Additionally, this test is relatively fast, and requires only CPU time, not additional disk space. A related test with zip (
unzip -t archiveName) won't do any good, as creating that archive is handled by zip.
The most complete test eats disk space (or memory, if something like tmpfs is in use) for breakfast, but it is the most complete test that can be used. Full copies of both versions are needed to run this test, which is run on every version of tardiff even considered for release, both on the author's machine and (now) on a SourceForge shell account as well, against a variety of archives. To run this test:
- First, create a new (temporary) directory, and extract the old (source) version of the package into this directory, referred to here as "original".
- Second, apply the generated patch to the version just extracted.
- Third, create another new (temporary) directory, and extract the new (target) version of the package into this directory, referred to here as "target".
- Finally, diff the directories against each other. If the directory structure has shifted between original and target, be sure to adjust the invocation appropriately:
diff -r original target
- If diff produced any output at all, there's a problem.
Properly applied, paranoia can be a very good thing. If your paranoia leads you to discover a bug in tardiff, please inform the author about it. In general, the more details you can provide, the better; but simply a description of whatever is known about a problem from the tests above is much better than nothing at all. Of course, the more specific a bug report is -- if this and that option cause the bug or not, for example -- the faster the bug can be isolated and eliminated. If you are worried that reporting a bug will label you as a paranoid person, don't; the author, as mentioned above, is quite paranoid, and it's not a bad thing to be.
- Successful completion
- The arguments didn't make sense, or some other problem occurred
- When the --zip option is used, tardiff will return zip's return code if zip was running at the time of the final error.
tardiff uses the following programs:
Used for everything when toTAR isn't given, but only required for generating script files when toTAR is given.
Used for creating and updating ZIP archives when the --zip option is used.
- Compression Support
To use any patch created, the user may use the following programs:
The generated archive is nothing more than a regular tar (or zip) archive with, possibly, a script or two attached to deal with deletions, duplications, and movements. Any program that is capable of undoing the compression used (by itself or by invoking an external program) and reading the tar archive's entries out to disk will work.
- tar (or unzip) - to extract the archive
- Compression Support - to decompress the archive (used automatically by tar when compression options are given)
- sh or cmd.exe (Windows) - to run the finish.sh or finish.bat script
I've heard that there are Windows programs capable of doing this, and I built a Windows self-extract header for gzip'd tar files myself; it really wasn't all that difficult to merge my own (years old) work on extracting uncompressed tar archives with bits and pieces of the gzip 1.2.4 source code, which is easily available.
Although actually applying a patch isn't that difficult, I do understand that some people might have trouble. To that end, feel free to point any users over here.
The next version of tardiff is not currently being worked on. This project has been shelved for the immediate future. However, any bugs reported will receive fixes as quickly as possible.
Various features have been requested, but are not currently being implemented:
- Some sort of interface with cvs (or cvsfs) to download changed files from cvs and automatically make a patch archive from them.
- Modifying tardiff so it could compare two checksum files and download the required changed files from cvs or via some other source (such as cvsfs).
- A means to generate a unified diff patch from the differences between two tar archives or between the file system and a tar archive.
- The case where toTAR isn't used could use a complete rewrite, so it could generate checksums out of originalTAR and the "equivalent" directories, find duplicates and movements between them, etc..
Finally, there are features which have been requested, but are probably not going to be implemented:
- Someone suggested diffing every textual file in the archive against every other textual file in the archive and choosing the smallest result. And the resulting patch was going to be applied on a Windows box how, exactly? (I've been thinking about this one, and yes, it's possible to do the comparisons, but...)
The first version of tardiff was completed before December 29, 2002 (by which time system backups show it was in active use). This version did not support toTAR or script generation (and therefore could not note removed files), and totalled a mere 6,353 bytes. As it depended entirely on tar for the comparison of an archive to the current directory, as well as for generating the patch archive -- and as it's only real task was to generate lists of files to include in the patch archive from the messages produced by tar -- this version of tardiff worked properly on all versions of perl at or after 5.005_03 (i.e. the version on the author's system). Unfortunately, it had some difficulty working with a version of tar newer than 1.13, wherein tar directed some of its results to standard error rather than standard output, bypassing the pipe; and the options accepted by tar to indicate a bzip2 archive apparently changed from y (on the Slackware version of tar used by the author) to I and, later, to j. This version escaped into the wild on or around July 28, 2004. Unfortunately, if the author was asked about this, he has since forgotten about it...
This initial version contained no copyright notices or version information; after all, why would they be needed on a program that would only be used in one place? And besides, the initial version was not very difficult, nor (in the author's opinion) new; it was simply what anyone presented with the problem probably would have written; why even make the pretense of having copyright on something that, in the author's opinion, did not even merit a copyright at all?
Perhaps inevitably, it was not long before someone asked for a new feature. On or around July 29, 2004, Boris Koenig requested what have become toTAR support, checksums, removal scripts, and (although the author didn't really realize it when he found out about this request) fixes for problems with newer versions of tar. He also requested the ability for tardiff to interface with cvs to generate a patch file from a CVS version, which has not (yet) been implemented. The next day, he also suggested that tardiff should identify moved or duplicated files (now implemented). Beginning around the 31st of July, a long e-mail exchange about the utility and possible means of implementation commenced. (At some point within this exchange, he mentioned that he was "looking for an EXE-stub to TGZ files" that could automatically run batch files, etc.... so I [thinking that he was asking me to write one] wrote one out of an old OS/2 untar program I wrote and the gzip source, significantly snipped down to size, finished on August 9, 2004. As it turns out, it sounds like such stubs already exist... I suppose he mentioned as much in an e-mail the day after he suggested it, and I just missed it...?)
On July 31, 2004, Arnt Karlsen brought up an interoperability problem between tardiff and tar: the y (for bzip2) option. Until this message, the author had absolutely no idea that (apparently) all mainstream versions of tar use either I (in the older versions) or j (in the newer versions) for this option. It appears that this change (y) was part and parcel of the Slackware distribution used by the author (?). In response, tardiff was modified to "discover" the proper option to use by experimentation; however, if tar ever changes its j option to another letter, and starts using either of I or y, this approach will no longer work. (If I and y remain unused, at least an error will be reported, rather than trying to use an inappropriate option!)
The next version released ("v2.0") was much more complex -- in order to compare two archives without unpacking them first, the script would have to do the comparison work on its own. Written on a system with perl 5.005_03, it produced unusable archives on any version of perl newer than 5.006 (when unicode support was added to perl, causing perl to interpret every string read from or written to a file to possibly contain unicode characters, causing miscounting and re-encoding of bytes that, according to the manual pages, should have been left untouched), though the author was not to find this out until after it was released on or around August 1, 2004. This version, now 40,067 bytes, included a better syntax help message, the "discovery" response to tar's changing bzip2 option, checksums (md5sum or sum only), TAR to TAR comparison, top-level "speculation", and removal script generation.
The first hint of a problem (in hindsight) with generating patch archives using this newer version of tardiff was received from Boris Koenig on August 1, 2004, but the problem could easily have been his source archive... and, of course, the author couldn't replicate the problem at the time.
On August 1, 2004, Boris Koenig also suggested that the invocation format could use some work. This has resulted in the new "long argument form," as described above and available in this version.
On or around July 31, 2004, Boris Koenig suggested setting up a SourceForge account and project for tardiff. Later, on August 1, he said that he couldn't find anything like tardiff, which I still find somewhat hard to credit. Fortunately, he actually went out and registered the project for the author. By August 2, 2004, the tardiff SourceForge project was registered; and Boris originally populated it with the page from GeoCities, where tardiff was previously being made available, by September 4th. Even better, he sent a number of e-mails with instructions on how to access the SourceForge account! (Thanks!)
The SourceForge account turned out to be crucial in identifying and fixing the corruption bug, as they run a much newer version of perl than the author. Unfortunately, the author didn't do much with the new account for a considerable length of time, being rather busy working on the requested new features and being unaware of there being any problem...
On August 3, 2004, Boris (in using tardiff on the SourceForge shell account) noted odd differences between the versions of the patch archive created on his system and those created on SourceForge. In retrospect, if he had run "tar ztvf" against those archives, the bug would have been glaringly obvious at that point (i.e. a fatal error message, not far into the archive), and perhaps it would have been fixed sooner, and the author should have asked for this check in his follow-up e-mail about these differences, but didn't. On the other hand, though, it would have been quite reasonable to believe that the difference was entirely due to Boris's use of a CVS version on his local system, and the official archive on SourceForge... The author's initial questions centered around the arguments used with tardiff and any checksum files generated, which was reasonable (even in hindsight).
The author finally got to trying to use the SourceForge shell account on August 9, 2004, and began testing tardiff on that account on August 11, 2004. The author quickly discovered the problem (thanks to the report of odd differences) and had it fixed by August 12, 2004.
Unfortunately, the fixes make tardiff useless on older versions of perl, so an install script needed to be written to deal with the situation. A complete release (v2.0.1) and the first version of this web page were made available on August 14, 2004.
Over the following month and a half, the author continued working on a duplication and movement scanner and handler. During this time, the author finally found the time to read through some of the archives of the FlightGear-developers mailing list, which has sparked a few ideas in the author's mind (hmmm, let's build a GUI, a scripting language, ... onto my Windows tar.gz extraction stub!) that will, fortunately for the author, most emphatically NOT be implemented any time soon. (Although some people around here were rather worried about this for a while...)
Around August 16, 2004, the author started hearing of potential problems with the algorithms used for md5 (here), and so support for using other programs to generate checksums was added to tardiff. For good measure, the author also added the ability to run several checksum programs against each file, as well as various other changes to checksum generation. These changes don't affect the format of the generated checksum files except when more than one checksum program is used, so older checksum files are still usable without problems. By this time, the long argument form and the change to the created scripts' names had already been implemented.
Within the first few weeks of this period, the means by which tardiff recognizes a compressed source archive was changed; tardiff no longer relies on the filename because it now checks the beginning of the file instead.
On September 11, 2004, Boris (after being silent for a month) mentioned that it might be useful for tardiff to include the version used to create a patch in that patch's shell script, as a comment. This was simple to implement; obviously, though, a patch that doesn't include a shell script won't include any information on the version of tardiff used to generate it.
On September 16, 2004, Erik Hofman and Arnt Karlsen expressed concerns about users trying to apply patches incorrectly, and Boris wondered if tardiff could check itself more. By September 20, 2004, these resulted in the creation of the Applying Patches page and the Paranoia section, above. Additionally (and perhaps more worrying), the author considered adding some means of pre-patch application verifier to his Windows tar.gz extraction stub... but (wisely?) decided that, if this sort of functionality is going to be added, it should be added at the same time as the GUI, etc.; which means, not any time soon.
Around September 20, 2004, someone asked me to add an option for having the generated scripts present a prompt for the user before deleting any files. It was a simple request, and quickly implemented. By this point, an additional change had been made to the scripts generated -- the Unix shell script will finish by erasing itself and the equivalent Windows batch file (if that file was also created by tardiff). The Windows batch file will erase the equivalent Unix shell script, but not itself (the last time the author tried having Windows erase the batch file it was running, Windows requested that the disk containing the file be reinserted...).
On October 4, 2004, handling mechanisms for circular movement dependencies were implemented and tested. Final touches for the release of version 2.1 of tardiff were completed on October 5, 2004.
By October 7, 2004, the author realized that he had made a couple of mistakes in version 2.1. One was the in-program help -- the short versions of two options weren't mentioned (although, strictly speaking, the help did say that there were other options available...). The other was in the handling of --duplicates in conjunction with --no-movement; though (in some sense) not a bug, the effect was unintended. Fortunately, these were quickly fixed.
While examining the problems with tar files containing long filenames and long symbolic links, and the known problems comparing differing object types, the author decided that these should be easy to fix, and so delayed releasing an immediate fix for the new problems. He did, however, update the web page to reflect those mistakes.
By the end of the day on October 7, 2004, the fixes for tardiff's trouble dealing with long filenames, long symbolic links, and differing object type comparisons were implemented, but not yet tested. Additionally, tardiff now synthesizes a "checksum" for non-file, non-directory objects; this should allow tardiff to compare these objects even when nothing but the checksum file is available as source data. To complete user's anticipated needs in checksum files, the author also added a commentary line to the beginning of the checksum files generated when custom checksum methods are given -- the line is literally the custom method information, and (being at the top of the file) suitable for quickly discovering the custom method that should be used with these files. tardiff does not attempt to enforce the method stored, however.
During testing, late on October 7, 2004, the author discovered another silly mistake -- the inclusion of some debug output when generating Unix shell scripts. This was, fortunately, very easy to track down and eliminate -- after all, it was meant to be easy to find!
Testing of the fix was completed before midnight on October 7, 2004, and version 2.1.1 was released that night.
On October 10, 2004, the author added support for creating ZIP files when the toTAR argument isn't used, and had a plan for supporting the creation of ZIP files when toTAR is used. Enhancements to archive modifications done when toTAR isn't used were also completed.
By October 11, 2004, the author had evidence that the method he had envisioned for supporting ZIP file creation when toTAR is used was hopeless. In practice, it was workable (though with some humorous consequences) with version 2.2 of zip (the version on the author's computer) but it doesn't work at all with version 2.3 of zip (the version in use on SourceForge). Finally, a compromise of sorts was decided upon -- tardiff does support the creation of ZIP files with toTAR, but this is done by first extracting the files of interest. In any case, only regular files, directories, and symbolic links will be written into ZIP files.
On October 12, 2004, the modifications for allowing the tardiff user to use zip when toTAR is in use were usable, but the author is still uncertain if symbolic links should be handled differently in this case. The author is considering expanding ZIP handling to add extra commands to the generated scripts to deal with symbolic links. How other kinds of objects (pipes, sockets, devices...) could be handled by the script on a Windows system is, unfortunately, beyond the author. Version 2.1.2 was released on this day.
On December 29, 2004, a user discovered that tardiff added quotation marks around the name of each uncompressed TAR archive created from two TAR archives. This problem has been fixed in 2.1.3 on this day.
On April 21, 2005, it was discovered that tardiff-generated patch archives that tried to remove or rename files with spaces in their names (via the finish.sh script only) didn't work as expected. The 2.1.4 version causes tardiff to appropriately encase these file names in quotes, solving this problem for newly created patch archives.
On June 2, 2005, it was confirmed that an unknown TAR file creator (not any version of GNU tar, as far as the author can tell) doesn't include the "ustar" signature in its file headers in the created TAR archive. As a result, tardiff treats all these blocks as invalid. Additionally, very old versions of tar (1.12 confirmed, 1.13 is fine) generate spaces (hex 20) instead of NULL bytes (hex 00) in some parts of the file headers, and tardiff wasn't set up to deal with this. This problem hasn't been fixed yet; the author has a potential fix written, but it could cause other problems with valid TAR archives. The author fears that sparse TAR archives, in particular, could be improperly dealt with under the current "fix"; but there could easily be other problems...
Update: Fixing the problem with older tar archives isn't difficult, and I think I have a "solution" of sorts for archives without the file header signatures. I intend to make it a fatal error when one of the archives does not have a signature in its first file header, presenting an error message and a switch to turn off the warning. When this switch is activated, any archives that are missing the first file header's signature will not expect any signature in further headers in that archive, but an archive that includes a signature in the first header will expect one in all subsequent headers. (If the unknown program can "append to" a tar-generated archive, archives so affected will still have problems, though... so perhaps this isn't the best solution.)
Click here to read a discussion on this problem. If you have an opinion, please let me know! Use the author's address below.
|Last updated 2005 June 6|