opensubscriber
   Find in this group all groups
 
Unknown more information…

h : help-tar@gnu.org 6 October 2009 • 10:06PM -0400

[Help-tar] tar + lbzip2 proposal
by ERSEK Laszlo

REPLY TO AUTHOR
 
REPLY TO GROUP




Dear GNU Tar Maintainer,


here's an idea to add lbzip2 and other parallel bzip2 implementation support
to GNU Tar. I'm asking for your opinion. I'm willing to implement the
suggested functionality if you accept the proposal (after necessary
amendments, of course).

In my preliminary understanding, the name of the compression program to use
is pointed to by the "use_compress_program_option" global variable. This
variable can be set up in a multitude of ways:

1 -j / --bzip2 and the like set it up by a call to
   set_use_compress_program_option() with a fixed argument -- specifying more
   than one distinct values via this function (eg. with -j -z) makes tar exit
   with an error.

2 -I / --use-compress-program allows the user to specify the argument to
   set_use_compress_program_option() directly.

3 -a / --auto-compress selects the compression program at archive creation
   time from the name suffix of the file-to-be-created, by calling
   set_compression_program_by_suffix(). If this attempt fails because of an
   unknown suffix, then tar doesn't override a compression program specified
   otherwise (see 1 and 2 above). Thus, when creating an archive, if both -j
   (or --use=bzip2) and -a are specified, and -f has argument "file.tar.gz",
   then -a takes precendece and gzip will be selected. If -f has argument
   'file.tar.qqq', then -j takes effect.

4 If the user didn't specify a compression program via methods 1 or 2, then
   at testing/extraction time tar selects the compression program according
   to the magic signature stored in the file. If that fails, tar falls back
   to the suffix-based method.

   open_compressed_archive()
     -> compress_program()
       -> magic[].program
     -> set_compression_program_by_suffix()
       -> find_compression_program()
         -> compression_suffixes[].program

This list is possibly incomplate and/or inaccurate. It would be important to
identify all write access sites to "use_compress_program_option"; please
verify the list! Thank you.

The array "compression_suffixes" could be static, I think, just like "magic"
is.

In general, --use-compress-program cannot be added to TAR_OPTIONS.


Proposal:

* Introduce new global variable "bzip2_filter", with default value "bzip2".
   The variable has type "const char *".

* Introduce new command line option "--bzip-filter" to change the value of
   the variable "bzip2_filter". Thus the options requires an argument. The
   option can be passed only once on the command line and only before setting
   "use_compress_program_option" in any way.

* The character array pointed to by "bzip2_filter" lives in either static
   storage (default "bzip2") or automatic storage (parameter to main()). It
   can't be modified or freed.

* Modify case 1 (-j / --bzip2) to pass the value of "bzip2_filter" to
   set_use_compress_program_option(), instead of a fixed "bzip2" string.

* Case 2 is unchanged.

* Change the compress_program() macro definition into a real static function
   that handles the bz2 magic value as an exception, and returns the value of
   "bzip2_filter". The strings currently returned by compress_program() from
   magic[] also have static storage class.

* Change set_compression_program_by_suffix() to handle the bzip2 suffixes as
   exceptions, and to return the value of "bzip2_filter". The strings
   currently returned by this function from compression_suffixes[] also have
   static storage class.

* Due to the last two points, the auto-selection methods in 3 and 4 will use
   the program passed by --bzip-filter (or per default bzip2) where bzip2 is
   auto-selected now.

* As development advances, more and more multi-threaded alternaties might be
   added to tar, with --gzip-filter for pigz, for example. Once the
   exceptions in compress_program() and set_compression_program_by_suffix()
   start to proliferate, flat tables would become desirable again, ie.
   extending the current magic[] and compression_suffixes[] arrays with
   pointers to global variables, each holding the selected alternative for
   that family of compression. Maybe this is the preferred way to start out
   with even now.

* Usage: user prepends "--bzip2-filter=lbzip2" to her TAR_OPTIONS.

* On Debian, the tar source could be patched, so that "bzip2_filter"
   defaults to "/etc/alternatives/bzip2-filter", which would be a symlink to
   /bin/bzip2 per default. Packages like "lbzip2" and "pbzip2" would add
   alternatives.


I'm greatly interested in your opinion,
thanks,
lacos



Bookmark with:

Delicious   Digg   reddit   Facebook   StumbleUpon

Related Messages

opensubscriber is not affiliated with the authors of this message nor responsible for its content.