> On Sat, Jun 9, 2012 at 2:32 PM, David Kastrup <dak@gnu....> wrote:
> One principal distinguishing feature, like with a Scheme
> hashtable, is the ability to grow on-demand.
> Scheme/Guile vectors are fixed size.
> It is a bit of a nuisance that one can grow a hashtable
> efficiently and on-demand, but not so an array.
> After all, there already _is_ such a mechanism underlying hash
> tables so it seems somewhat peculiar not to have it available for
> vectors as well.
> I don't know how much you know about data structures,
I do list the various implementations and details.
> and I must confess I'm not very educated on Guile or Luas
And I do list the details here. Since I do it in free prose, chances
are that I am not just quoting material I have not understood.
> Based on what you are writing I would assume that the scheme
> hashtables aren't growable in the same way as a vector has to be
I don't see anything supporting this assumption in what I wrote. Nor in
5.6.12 Hash Tables
Hash tables are dictionaries which offer similar functionality as
association lists: They provide a mapping from keys to values. The
difference is that association lists need time linear in the size of
elements when searching for entries, whereas hash tables can normally
search in constant time. The drawback is that hash tables require a
little bit more memory, and that you can not use the normal list
procedures (*note Lists::) for working with them.
Guile provides two types of hashtables. One is an abstract data type
that can only be manipulated with the functions in this section. The
other type is concrete: it uses a normal vector with alists as
elements. The advantage of the abstract hash tables is that they will
be automatically resized when they become too full or too empty.
-- Scheme Procedure: make-hash-table [equal-proc hash-proc #:weak
Create and answer a new hash table with EQUAL-PROC as the equality
function and HASH-PROC as the hashing function.
As a legacy of the time when Guile couldn't grow hash tables,
START-SIZE is an optional integer argument that specifies the
approximate starting size for the hash table, which will be
rounded to an algorithmically-sounder number.
> The number of elements in a hashtable isn't limited by it's "size".
> They are often implemented as each position (where the hashtables size
> is the number of positions) being a linked list giving the hashtable
> (in theory) limitless actual size.
However, if the number of hash buckets is not grown along with the
number of entries, hashtable access is O(n) in cost rather than O(1)
since after the initial split into hash buckets, the cost is that of
linear search. This is the difference in behavior between hashtables in
Guile 1.4 (?) with fixed size, and hashtables in 1.6+ with variable
> Growing a vector/array involves having to allocate new continuous
> memory and copying all the elements there, so for example in C++ (i
> think) the std:vector is increased by half it's current size each time
> meaning that the more expensive the copying gets the more elements you
> can insert into the vector before it has to resize.
Sure: since the growth happens with exponential backoff, the amortized
cost for n entries is O(n).
> I would assume it wouldn't be that difficult to implement a pretty
> efficient growable vector for scheme.
Since that already is what is used internally in hashtables it can't be
difficult... The advantage of growing a hashtable is that you don't
have waste: if you double the size of a hashtable, it means that you
split each bucket in two, and potentially any bucket after the split can
contain new data. In contrast, after a similar vector resize, half of
the buckets are _guaranteed_ not to contain data. You can reduce the
waste by using less than exponential backoff, but then the amortized
cost is no longer O(n).
Anyway: your answer was based on the assumption that I did not do my
homework before asking, and that two people not reading documentation
might guess better than one person not reading documentation.
I hope I have now provided adequate coverage concerning this hypothesis
so that it should be possible to focus on the smaller set of remaining