This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Deprecate bsddb
Type: Stage:
Components: Extension Modules Versions: Python 2.3
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: skip.montanaro Nosy List: drbits, gtk, jackjansen, loewis, skip.montanaro
Priority: normal Keywords: patch

Created on 2002-05-07 03:46 by gtk, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
deprecate_bsddb.diff gtk, 2002-05-07 03:46 patch to deprecate bsddb and remove from anydbm's list of candidates
bsddb.diff skip.montanaro, 2002-06-14 03:33
Messages (13)
msg39908 - (view) Author: Garth T Kidd (gtk) Date: 2002-05-07 03:46
Large numbers of inserts break bsddb, as first 
discovered in Python 1.5 (bug 408271). 

According to Barry Warsaw, "trying to get the bsddb 
module that comes with Python to work is a hopeless 
cause." 

If it's broken, let's discourage people from using it. 
In particular, let's ensure that people importing 
shelve or anydbm don't end up using it by default. 

The submitted patch adds a DeprecationWarning to the 
bsddb module and removes bsddb from the list of db 
module candidates in anydbm. 
msg39909 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-05-08 09:01
Logged In: YES 
user_id=21627

I'm in favour of this change, but I'd like simultaneously
incorporate bsddb3.
msg39910 - (view) Author: Garth T Kidd (gtk) Date: 2002-05-09 03:12
Logged In: YES 
user_id=59803

Let's not turn a simple patch into something requiring a 
PEP, compulsory thrashing on comp.lang.python, SleepyCat 
being willing to change their distribution model, lawyers 
(to make sure the licences are compatible), and so on. 

I'd hate it if other people spent the kind of time I did 
trying to get shelve to work only to find that a known-
broken bsddb was causing all the problems, and that a patch 
was there to gently guide them to gdbm, but it got jammed 
because of scope-creep. 

Let's get this one, very simple and necessary (bsddb IS 
broken) change out of the way, and THEN start negotiating, 
thrashing, and integrating. :) 

I firmly believe bsddb3 should be one of the included 
batteries. Let's do it, but let's guide people away from 
broken code first. 
msg39911 - (view) Author: Martin D Katz, Ph.D. (drbits) Date: 2002-05-16 23:10
Logged In: YES 
user_id=276840

I am not sure there is a reason to deprecate bsddb. The 
btopen format appears to be stable enough for normal work. 
Maybe 2.3 should change dbhash to use btopen?
msg39912 - (view) Author: Martin D Katz, Ph.D. (drbits) Date: 2002-05-20 18:14
Logged In: YES 
user_id=276840

#!/bin/python
# Test for Python bug report 553108
# This program shows that bsddb seems to work reliably with
# the btopen database format.

# This is based on the test program
# in the discussion of bug report 445862
# This has been enhanced to perform read, modify,
# write operations in random order.

# This is only one of several tests I performed.
# This included 4,000,000 read, modify, write operations to 
90,909 records
# (an average of 44,000 writes for each record).
# Note: This program took approximately 50 hours to run
# on my 930MHz Pentium 3 under Windows 2000 with
# ActiveState Python version 2.1.1 build 212
import unittest, sys, os, math, time

LIMIT=4000000
DISPLAY_AT_END=1

USE_RANDOM=100  # If set, number of keys is approximately 
LIMIT/USE_RANDOM
AUTO_RANDOM=1
if USE_RANDOM and AUTO_RANDOM:
    USE_RANDOM=int(math.sqrt(math.sqrt(LIMIT)))
    if USE_RANDOM < 2:
        USE_RANDOM = 2
##  The format of the value string is
##      count|hash|hash...|b
##  Where
##      count is an 8 byte hexadecimal count of the number 
of times
##          this record has been written.
##      hash is the md5 hash of the random value that 
created this record.
##          It is the key for this record. It is appended 
once for each
##          time the record is written (that is, it occurs 
count times).
##      b is 129 '!'
## if USE_RANDOM is set, its value should be >= 2

class BreakDB(unittest.TestCase):
    def runTest(self):
        import md5, bsddb, os
        if USE_RANDOM:
            import random
            random.seed()
            max_key=int(LIMIT / USE_RANDOM)
        m = md5.new()
        b = "!" * 129       # small string to write
        db = bsddb.btopen(self.dbname, 'c')
        try:
            self.db = db
            for count in xrange(1, LIMIT+1):
                if count % 100==0:
                    print >> sys.stderr, " %10d\r" % 
(count),
                if USE_RANDOM:
                    r = random.randrange(0, max_key)
                    m = md5.new(str(r))
                    key = m.hexdigest()
                    if db.has_key(key):
                        rec = db[key]
                        old_count = int(rec[0:8], 16)
                        should_be = '%08X|%s%s'% (old_count,
                                                  ((key+'|')
*old_count), b)
                        if rec != should_be:
                            self.fail("Mismatched data: db
["+repr(key)+"]="+
                                repr(db[key])+". Should 
be "+repr(should_be))
                            return 1
                    else: # New record
                        rec = '00000000|'+b
                        old_count = 0
                    new_count = old_count+1
                    new_rec = '%08X|%s%s'% (new_count, key, 
rec[8:], )
                    db[key] = new_rec
                else:
                    m.update(str(count))
                    db[m.digest()] = b
            try:
                db.sync()
            except:
                pass
            if DISPLAY_AT_END:
                rec = db.first()
                count = 0
                while 1:
                    print >> sys.stderr, "  count = %6i db[%
s]=%s" % (
                        count, rec[0], rec[1], )
                    count += 1
                    try:
                        rec = db.next()
                    except KeyError:
                        break
        finally:
            db.close()

    def unlinkDB(self):
        import os
        if os.path.exists(self.dbname):
            os.unlink(self.dbname)

    def setUp(self):
        self.dbname = 'test.db'
        self.unlinkDB()

    def tearDown(self):
        self.db.close()
        self.unlinkDB()

if __name__ == '__main__':
    runner = unittest.TextTestRunner()
    runner.run(unittest.TestSuite([BreakDB()]))
msg39913 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2002-06-11 16:09
Logged In: YES 
user_id=44345

I think deprecating bsddb is too drastic.  In the first place, the problems
you refer to are in the underlying Berkeley DB library, not in the bsddb
code itself.  In the second place, later versions of the library fix the
problem.

The attached patch attempts to modify setup.py and configure.in to
solve the problem.  It does a couple things differently than the current
CVS version:

  1. It only searches for versions 2 and 3 of the Berkeley DB library by
   default.  People who know what they are doing can uncomment the
   information relevant to version 1.

  2. It moves all the checking code into setup.py.  The header file checks
  in configure.in were deleted.

  3. The ndbm lookalike stuff for the dbm module is done differently.  This
  has not really been tested yet.  I anticipate further changes will be
  necessary with this code.

I'm sure it's not perfect.  Please give it a try and let me know how it
works for you.

All that said, I think a better migration path is to replace the current
module with the bsddb3/pybsddb stuff.  I think that would effectively
restrict you to versions 3 or 4 of the underlying Berkeley DB library, so
it probably couldn't be done with impunity. 

Skip
msg39914 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2002-06-13 07:35
Logged In: YES 
user_id=44345

Here's an updated patch.  It's different in a couple ways:

  * support for Berkeley DB 4.x was added.  You will need to
    configure iBerkdb with the 1.85 compatibility stuff.

  * I cleaned up the dbm build code a bit.

  * I added a diff for the configure file for people who don't
    have autoconf handy.

Skip
msg39915 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2002-06-14 03:33
Logged In: YES 
user_id=44345

a couple more tweaks... I forgot to include dbmmodule.c in 
previous patches.  This version of the patch also includes a 
modified README file that adds a section about building the 
bsddb and dbm modules.
msg39916 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-06-14 07:16
Logged In: YES 
user_id=21627

The patch looks good, please apply it.
msg39917 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2002-06-14 20:32
Logged In: YES 
user_id=44345

Implemented in
  setup.py 1.93
  README 1.147
  configure 1.315
  configure.in 1.325
  pyconfig.h.in 1.42
  Modules/dbmmodule 2.30
msg39918 - (view) Author: Jack Jansen (jackjansen) * (Python committer) Date: 2002-07-02 21:52
Logged In: YES 
user_id=45365

Skip,
I'm reopening this bug report: the fix breaks builds on Mac OS X, and I haven't a clue as to how to fix this so I hope you can help. MacOSX has /usr/include/ndbm.h (implemented with Berkeley DB, I think) but it doesn't have any of the libraries (I assume everything needed is in libc).

Everything worked fine until last week, when configure still took care of defining HAVE_NDBM_H.
msg39919 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2002-07-02 22:17
Logged In: YES 
user_id=44345

Jack,

Sorry to here you're having trouble.  Alas, my MacOS X system is with 
my wife at the moment, so I can't dig into the problem much.  Can you 
provide me with some background info?  If you can send me your copy 
of ndbm.h (I doubt it's using Berkeley DB) and figure out which library 
dbm_open resides in, that would be great.  Also, can you provide me 
with the output of the build process so I can see just what errors are 
being generated?

Skip
msg39920 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2002-08-06 17:43
Logged In: YES 
user_id=44345

Closing this again.  I think Jack's running okay on MacOSX once again.
History
Date User Action Args
2022-04-10 16:05:18adminsetgithub: 36567
2002-05-07 03:46:04gtkcreate