Initial checkin of Planet WebKit
authoraroben@apple.com <aroben@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Sat, 1 Dec 2007 01:37:47 +0000 (01:37 +0000)
committeraroben@apple.com <aroben@apple.com@268f45cc-cd09-0410-ab3c-d52691b4dbfc>
Sat, 1 Dec 2007 01:37:47 +0000 (01:37 +0000)
        Rubberstamped by Sam.

        * README: Added.
        * config.ini: Added.
        * planet/AUTHORS: Added.
        * planet/INSTALL: Added.
        * planet/LICENCE: Added.
        * planet/NEWS: Added.
        * planet/PKG-INFO: Added.
        * planet/README: Added.
        * planet/THANKS: Added.
        * planet/TODO: Added.
        * planet/examples/atom.xml.tmpl: Added.
        * planet/examples/basic/config.ini: Added.
        * planet/examples/basic/index.html.tmpl: Added.
        * planet/examples/fancy/config.ini: Added.
        * planet/examples/fancy/index.html.tmpl: Added.
        * planet/examples/foafroll.xml.tmpl: Added.
        * planet/examples/opml.xml.tmpl: Added.
        * planet/examples/output/images/edd.png: Added.
        * planet/examples/output/images/evolution.png: Added.
        * planet/examples/output/images/feed-icon-10x10.png: Added.
        * planet/examples/output/images/jdub.png: Added.
        * planet/examples/output/images/keybuk.png: Added.
        * planet/examples/output/images/logo.png: Added.
        * planet/examples/output/images/opml.png: Added.
        * planet/examples/output/images/planet.png: Added.
        * planet/examples/output/images/thom.png: Added.
        * planet/examples/output/planet.css: Added.
        * planet/examples/rss10.xml.tmpl: Added.
        * planet/examples/rss20.xml.tmpl: Added.
        * planet/planet-cache.py: Added.
        * planet/planet.py: Added.
        * planet/planet/__init__.py: Added.
        * planet/planet/atomstyler.py: Added.
        * planet/planet/cache.py: Added.
        * planet/planet/compat_logging/__init__.py: Added.
        * planet/planet/compat_logging/config.py: Added.
        * planet/planet/compat_logging/handlers.py: Added.
        * planet/planet/feedparser.py: Added.
        * planet/planet/htmltmpl.py: Added.
        * planet/planet/sanitize.py: Added.
        * planet/planet/tests/__init__.py: Added.
        * planet/planet/tests/data/simple.tmpl: Added.
        * planet/planet/tests/data/simple2.tmpl: Added.
        * planet/planet/tests/test_channel.py: Added.
        * planet/planet/tests/test_main.py: Added.
        * planet/planet/tests/test_sanitize.py: Added.
        * planet/planet/tests/test_sub.py: Added.
        * planet/planet/timeoutsocket.py: Added.
        * planet/runtests.py: Added.
        * planet/setup.py: Added.
        * templates/atom.xml.tmpl: Added.
        * templates/foafroll.xml.tmpl: Added.
        * templates/index.html.tmpl: Added.
        * templates/opml.xml.tmpl: Added.
        * templates/rss10.xml.tmpl: Added.
        * templates/rss20.xml.tmpl: Added.
        * wwwroot/images/feed-icon-10x10.png: Added.
        * wwwroot/images/planet.png: Added.
        * wwwroot/planet.css: Added.

git-svn-id: https://svn.webkit.org/repository/webkit/trunk@28268 268f45cc-cd09-0410-ab3c-d52691b4dbfc

60 files changed:
PlanetWebKit/ChangeLog [new file with mode: 0644]
PlanetWebKit/README [new file with mode: 0644]
PlanetWebKit/config.ini [new file with mode: 0644]
PlanetWebKit/planet/AUTHORS [new file with mode: 0644]
PlanetWebKit/planet/INSTALL [new file with mode: 0644]
PlanetWebKit/planet/LICENCE [new file with mode: 0644]
PlanetWebKit/planet/NEWS [new file with mode: 0644]
PlanetWebKit/planet/PKG-INFO [new file with mode: 0644]
PlanetWebKit/planet/README [new file with mode: 0644]
PlanetWebKit/planet/THANKS [new file with mode: 0644]
PlanetWebKit/planet/TODO [new file with mode: 0644]
PlanetWebKit/planet/examples/atom.xml.tmpl [new file with mode: 0644]
PlanetWebKit/planet/examples/basic/config.ini [new file with mode: 0644]
PlanetWebKit/planet/examples/basic/index.html.tmpl [new file with mode: 0644]
PlanetWebKit/planet/examples/fancy/config.ini [new file with mode: 0644]
PlanetWebKit/planet/examples/fancy/index.html.tmpl [new file with mode: 0644]
PlanetWebKit/planet/examples/foafroll.xml.tmpl [new file with mode: 0644]
PlanetWebKit/planet/examples/opml.xml.tmpl [new file with mode: 0644]
PlanetWebKit/planet/examples/output/images/edd.png [new file with mode: 0644]
PlanetWebKit/planet/examples/output/images/evolution.png [new file with mode: 0644]
PlanetWebKit/planet/examples/output/images/feed-icon-10x10.png [new file with mode: 0644]
PlanetWebKit/planet/examples/output/images/jdub.png [new file with mode: 0644]
PlanetWebKit/planet/examples/output/images/keybuk.png [new file with mode: 0644]
PlanetWebKit/planet/examples/output/images/logo.png [new file with mode: 0644]
PlanetWebKit/planet/examples/output/images/opml.png [new file with mode: 0644]
PlanetWebKit/planet/examples/output/images/planet.png [new file with mode: 0644]
PlanetWebKit/planet/examples/output/images/thom.png [new file with mode: 0644]
PlanetWebKit/planet/examples/output/planet.css [new file with mode: 0644]
PlanetWebKit/planet/examples/rss10.xml.tmpl [new file with mode: 0644]
PlanetWebKit/planet/examples/rss20.xml.tmpl [new file with mode: 0644]
PlanetWebKit/planet/planet-cache.py [new file with mode: 0755]
PlanetWebKit/planet/planet.py [new file with mode: 0755]
PlanetWebKit/planet/planet/__init__.py [new file with mode: 0644]
PlanetWebKit/planet/planet/atomstyler.py [new file with mode: 0644]
PlanetWebKit/planet/planet/cache.py [new file with mode: 0644]
PlanetWebKit/planet/planet/compat_logging/__init__.py [new file with mode: 0644]
PlanetWebKit/planet/planet/compat_logging/config.py [new file with mode: 0644]
PlanetWebKit/planet/planet/compat_logging/handlers.py [new file with mode: 0644]
PlanetWebKit/planet/planet/feedparser.py [new file with mode: 0644]
PlanetWebKit/planet/planet/htmltmpl.py [new file with mode: 0644]
PlanetWebKit/planet/planet/sanitize.py [new file with mode: 0644]
PlanetWebKit/planet/planet/tests/__init__.py [new file with mode: 0644]
PlanetWebKit/planet/planet/tests/data/simple.tmpl [new file with mode: 0644]
PlanetWebKit/planet/planet/tests/data/simple2.tmpl [new file with mode: 0644]
PlanetWebKit/planet/planet/tests/test_channel.py [new file with mode: 0755]
PlanetWebKit/planet/planet/tests/test_main.py [new file with mode: 0755]
PlanetWebKit/planet/planet/tests/test_sanitize.py [new file with mode: 0755]
PlanetWebKit/planet/planet/tests/test_sub.py [new file with mode: 0755]
PlanetWebKit/planet/planet/timeoutsocket.py [new file with mode: 0644]
PlanetWebKit/planet/runtests.py [new file with mode: 0755]
PlanetWebKit/planet/setup.py [new file with mode: 0755]
PlanetWebKit/templates/atom.xml.tmpl [new file with mode: 0644]
PlanetWebKit/templates/foafroll.xml.tmpl [new file with mode: 0644]
PlanetWebKit/templates/index.html.tmpl [new file with mode: 0644]
PlanetWebKit/templates/opml.xml.tmpl [new file with mode: 0644]
PlanetWebKit/templates/rss10.xml.tmpl [new file with mode: 0644]
PlanetWebKit/templates/rss20.xml.tmpl [new file with mode: 0644]
PlanetWebKit/wwwroot/images/feed-icon-10x10.png [new file with mode: 0644]
PlanetWebKit/wwwroot/images/planet.png [new file with mode: 0644]
PlanetWebKit/wwwroot/planet.css [new file with mode: 0644]

diff --git a/PlanetWebKit/ChangeLog b/PlanetWebKit/ChangeLog
new file mode 100644 (file)
index 0000000..71775fe
--- /dev/null
@@ -0,0 +1,66 @@
+2007-11-30  Adam Roben  <aroben@apple.com>
+
+        Initial checkin of Planet WebKit
+
+        Rubberstamped by Sam.
+
+        * README: Added.
+        * config.ini: Added.
+        * planet/AUTHORS: Added.
+        * planet/INSTALL: Added.
+        * planet/LICENCE: Added.
+        * planet/NEWS: Added.
+        * planet/PKG-INFO: Added.
+        * planet/README: Added.
+        * planet/THANKS: Added.
+        * planet/TODO: Added.
+        * planet/examples/atom.xml.tmpl: Added.
+        * planet/examples/basic/config.ini: Added.
+        * planet/examples/basic/index.html.tmpl: Added.
+        * planet/examples/fancy/config.ini: Added.
+        * planet/examples/fancy/index.html.tmpl: Added.
+        * planet/examples/foafroll.xml.tmpl: Added.
+        * planet/examples/opml.xml.tmpl: Added.
+        * planet/examples/output/images/edd.png: Added.
+        * planet/examples/output/images/evolution.png: Added.
+        * planet/examples/output/images/feed-icon-10x10.png: Added.
+        * planet/examples/output/images/jdub.png: Added.
+        * planet/examples/output/images/keybuk.png: Added.
+        * planet/examples/output/images/logo.png: Added.
+        * planet/examples/output/images/opml.png: Added.
+        * planet/examples/output/images/planet.png: Added.
+        * planet/examples/output/images/thom.png: Added.
+        * planet/examples/output/planet.css: Added.
+        * planet/examples/rss10.xml.tmpl: Added.
+        * planet/examples/rss20.xml.tmpl: Added.
+        * planet/planet-cache.py: Added.
+        * planet/planet.py: Added.
+        * planet/planet/__init__.py: Added.
+        * planet/planet/atomstyler.py: Added.
+        * planet/planet/cache.py: Added.
+        * planet/planet/compat_logging/__init__.py: Added.
+        * planet/planet/compat_logging/config.py: Added.
+        * planet/planet/compat_logging/handlers.py: Added.
+        * planet/planet/feedparser.py: Added.
+        * planet/planet/htmltmpl.py: Added.
+        * planet/planet/sanitize.py: Added.
+        * planet/planet/tests/__init__.py: Added.
+        * planet/planet/tests/data/simple.tmpl: Added.
+        * planet/planet/tests/data/simple2.tmpl: Added.
+        * planet/planet/tests/test_channel.py: Added.
+        * planet/planet/tests/test_main.py: Added.
+        * planet/planet/tests/test_sanitize.py: Added.
+        * planet/planet/tests/test_sub.py: Added.
+        * planet/planet/timeoutsocket.py: Added.
+        * planet/runtests.py: Added.
+        * planet/setup.py: Added.
+        * templates/atom.xml.tmpl: Added.
+        * templates/foafroll.xml.tmpl: Added.
+        * templates/index.html.tmpl: Added.
+        * templates/opml.xml.tmpl: Added.
+        * templates/rss10.xml.tmpl: Added.
+        * templates/rss20.xml.tmpl: Added.
+        * wwwroot/images/feed-icon-10x10.png: Added.
+        * wwwroot/images/planet.png: Added.
+        * wwwroot/planet.css: Added.
+
diff --git a/PlanetWebKit/README b/PlanetWebKit/README
new file mode 100644 (file)
index 0000000..6100f1f
--- /dev/null
@@ -0,0 +1,13 @@
+This directory contains files for Planet WebKit <http://planet.webkit.org/>.
+
+Some directories/files of interest are:
+
+* config.ini
+    Contains the configuration for Planet WebKit
+* planet/
+    Contains the Planet 2.0 software <http://planetplanet.org/>
+* templates/
+    Contains the templates used to build the HTML and feeds for Planet WebKit
+* wwwroot/
+    Serves as the document root on http://planet.webkit.org/
+
diff --git a/PlanetWebKit/config.ini b/PlanetWebKit/config.ini
new file mode 100644 (file)
index 0000000..1541814
--- /dev/null
@@ -0,0 +1,102 @@
+# Planet configuration file
+#
+# This illustrates some of Planet's fancier features with example.
+
+# Every planet needs a [Planet] section
+[Planet]
+# name: Your planet's name
+# link: Link to the main page
+# owner_name: Your name
+# owner_email: Your e-mail address
+name = Planet WebKit
+link = http://planet.webkit.org/
+owner_name = Adam Roben
+owner_email = aroben@apple.com
+
+# cache_directory: Where cached feeds are stored
+# new_feed_items: Number of items to take from new feeds
+# log_level: One of DEBUG, INFO, WARNING, ERROR or CRITICAL
+# feed_timeout: number of seconds to wait for any given feed
+cache_directory = cache
+new_feed_items = 2
+log_level = DEBUG
+feed_timeout = 20
+
+# template_files: Space-separated list of output template files
+template_files = templates/index.html.tmpl templates/atom.xml.tmpl templates/rss20.xml.tmpl templates/rss10.xml.tmpl templates/opml.xml.tmpl templates/foafroll.xml.tmpl
+
+# The following provide defaults for each template:
+# output_dir: Directory to place output files
+# items_per_page: How many items to put on each page
+# days_per_page: How many complete days of posts to put on each page
+#                This is the absolute, hard limit (over the item limit)
+# date_format: strftime format for the default 'date' template variable
+# new_date_format: strftime format for the 'new_date' template variable
+# encoding: output encoding for the file, Python 2.3+ users can use the
+#           special "xml" value to output ASCII with XML character references
+# locale: locale to use for (e.g.) strings in dates, default is taken from your
+#         system. You can specify more locales separated by ':', planet will
+#         use the first available one
+output_dir = wwwroot
+items_per_page = 60
+days_per_page = 0
+date_format = %B %d, %Y %I:%M %p
+new_date_format = %B %d, %Y
+encoding = utf-8
+# locale = C
+
+
+# To define a different value for a particular template you may create
+# a section with the same name as the template file's filename (as given
+# in template_files).
+
+# Provide no more than 7 days articles on the front page
+[templates/index.html.tmpl]
+days_per_page = 14
+
+# If non-zero, all feeds which have not been updated in the indicated
+# number of days will be marked as inactive
+activity_threshold = 0
+
+
+# Options placed in the [DEFAULT] section provide defaults for the feed
+# sections.  Placing a default here means you only need to override the
+# special cases later.
+[DEFAULT]
+# Hackergotchi default size.
+# If we want to put a face alongside a feed, and it's this size, we
+# can omit these variables.
+facewidth = 65
+faceheight = 85
+
+
+# Any other section defines a feed to subscribe to.  The section title
+# (in the []s) is the URI of the feed itself.  A section can also be
+# have any of the following options:
+# 
+# name: Name of the feed (defaults to the title found in the feed)
+#
+# Additionally any other option placed here will be available in
+# the template (prefixed with channel_ for the Items loop).  We use
+# this trick to make the faces work -- this isn't something Planet
+# "natively" knows about.  Look at fancy-examples/index.html.tmpl
+# for the flip-side of this.
+
+[http://www.atoker.com/blog/category/webkit/feed/]
+name = Alp Toker
+nick = alp
+
+[http://zecke.blogspot.com/feeds/posts/full/-/WebKit]
+name = Holger Freyther
+nick = zecke
+
+[http://blog.justinhaygood.com/category/webkit/feed/]
+name = Justin Haygood
+nick = jhaygood
+
+[http://labs.trolltech.com/blogs/feed/atom/?cat=9&author=2]
+name = Simon Hausmann
+nick = tronical
+
+[http://webkit.org/blog/feed/]
+name = Surfin' Safari
diff --git a/PlanetWebKit/planet/AUTHORS b/PlanetWebKit/planet/AUTHORS
new file mode 100644 (file)
index 0000000..8cd722c
--- /dev/null
@@ -0,0 +1,2 @@
+Scott James Remnant <scott@netsplit.com>
+Jeff Waugh <jdub@perkypants.org>
diff --git a/PlanetWebKit/planet/INSTALL b/PlanetWebKit/planet/INSTALL
new file mode 100644 (file)
index 0000000..354ec69
--- /dev/null
@@ -0,0 +1,151 @@
+Installing Planet
+-----------------
+
+You'll need at least Python 2.1 installed on your system, we recommend
+Python 2.3 though as there may be bugs with the earlier libraries.
+
+Everything Pythonesque Planet needs should be included in the
+distribution.
+
+ i.
+    First you'll need to extract the files into a folder somewhere.
+    I expect you've already done this, after all, you're reading this
+    file.  You can place this wherever you like, ~/planet is a good
+    choice, but so's anywhere else you prefer.
+
+ ii.
+    Make a copy of the files in the 'examples' subdirectory, and either
+    the 'basic' or 'fancy' subdirectory of it and put them wherever
+    you like; I like to use the Planet's name (so ~/planet/debian), but
+    it's really up to you.
+
+    The 'basic' index.html and associated config.ini are pretty plain
+    and boring, if you're after less documentation and more instant
+    gratification you may wish to use the 'fancy' ones instead.  You'll
+    want the stylesheet and images from the 'output' directory if you
+    use it.
+
+ iii.
+    Edit the config.ini file in this directory to taste, it's pretty
+    well documented so you shouldn't have any problems here.  Pay
+    particular attention to the 'output_dir' option, which should be
+    readable by your web server and especially the 'template_files'
+    option where you'll want to change "examples" to wherever you just
+    placed your copies.
+
+ iv.
+    Edit the various template (*.tmpl) files to taste, a complete list
+    of available variables is at the bottom of this file.
+
+ v.
+    Run it: planet.py pathto/config.ini
+
+    You'll want to add this to cron, make sure you run it from the
+    right directory.
+
+ vi.
+    Tell us about it! We'd love to link to you on planetplanet.org :-)
+
+
+Template files
+--------------
+
+The template files used are given as a space separated list in the
+'template_files' option in config.ini.  They are named ending in '.tmpl'
+which is removed to form the name of the file placed in the output
+directory.
+
+Reading through the example templates is recommended, they're designed to
+pretty much drop straight into your site with little modification
+anyway.
+
+Inside these template files, <TMPL_VAR xxx> is replaced with the content
+of the 'xxx' variable.  The variables available are:
+
+       name    ....    } the value of the equivalent options
+       link    ....    } from the [Planet] section of your
+       owner_name .    } Planet's config.ini file
+       owner_email     }
+
+       url     ....    link with the output filename appended
+       generator ..    version of planet being used
+
+       date    ....                             { your date format
+       date_iso ...    current date and time in { ISO date format
+       date_822 ...                             { RFC822 date format
+
+
+There are also two loops, 'Items' and 'Channels'.  All of the lines of
+the template and variable substitutions are available for each item or
+channel.  Loops are created using <TMPL_LOOP LoopName>...</TMPL_LOOP>
+and may be used as many times as you wish.
+
+The 'Channels' loop iterates all of the channels (feeds) defined in the
+configuration file, within it the following variables are available:
+
+       name    ....    value of the 'name' option in config.ini, or title
+       title   ....    title retreived from the channel's feed
+       tagline ....    description retreived from the channel's feed
+       link    ....    link for the human-readable content (from the feed)
+       url     ....    url of the channel's feed itself
+
+       Additionally the value of any other option specified in config.ini
+       for the feed, or in the [DEFAULT] section, is available as a
+       variable of the same name.
+
+       Depending on the feed, there may be a huge variety of other
+       variables may be available; the best way to find out what you
+       have is using the 'planet-cache' tool to examine your cache files.
+
+The 'Items' loop iterates all of the blog entries from all of the channels,
+you do not place it inside a 'Channels' loop.  Within it, the following
+variables are available:
+
+       id      ....    unique id for this entry (sometimes just the link)
+       link    ....    link to a human-readable version at the origin site
+
+       title   ....    title of the entry
+       summary ....    a short "first page" summary
+       content ....    the full content of the entry
+
+       date    ....                                  { your date format
+       date_iso ...    date and time of the entry in { ISO date format
+       date_822 ...                                  { RFC822 date format
+
+       If the entry takes place on a date that has no prior entry has
+       taken place on, the 'new_date' variable is set to that date.
+       This allows you to break up the page by day.
+
+       If the entry is from a different channel to the previous entry,
+       or is the first entry from this channel on this day
+       the 'new_channel' variable is set to the same value as the
+       'channel_url' variable.  This allows you to collate multiple
+       entries from the same person under the same banner.
+       
+       Additionally the value of any variable that would be defined
+       for the channel is available, with 'channel_' prepended to the
+       name (e.g. 'channel_name' and 'channel_link').
+
+       Depending on the feed, there may be a huge variety of other
+       variables may be available; the best way to find out what you
+       have is using the 'planet-cache' tool to examine your cache files.
+
+
+There are also a couple of other special things you can do in a template.
+
+ -  If you want HTML escaping applied to the value of a variable, use the
+    <TMPL_VAR xxx ESCAPE="HTML"> form.
+
+ -  If you want URI escaping applied to the value of a variable, use the
+    <TMPL_VAR xxx ESCAPE="URI"> form.
+
+ -  To only include a section of the template if the variable has a
+    non-empty value, you can use <TMPL_IF xxx>....</TMPL_IF>.  e.g.
+
+    <TMPL_IF new_date>
+    <h1><TMPL_VAR new_date></h1>
+    </TMPL_IF>
+
+    You may place a <TMPL_ELSE> within this block to specify an
+    alternative, or may use <TMPL_UNLESS xxx>...</TMPL_UNLESS> to
+    perform the opposite.
diff --git a/PlanetWebKit/planet/LICENCE b/PlanetWebKit/planet/LICENCE
new file mode 100644 (file)
index 0000000..1090fa3
--- /dev/null
@@ -0,0 +1,84 @@
+Planet is released under the same licence as Python, here it is:\r
+\r
+\r
+A. HISTORY OF THE SOFTWARE\r
+==========================\r
+\r
+Python was created in the early 1990s by Guido van Rossum at Stichting Mathematisch Centrum (CWI) in the Netherlands as a successor of a language called ABC. Guido is Python's principal author, although it includes many contributions from others. The last version released from CWI was Python 1.2. In 1995, Guido continued his work on Python at the Corporation for National Research Initiatives (CNRI) in Reston, Virginia where he released several versions of the software. Python 1.6 was the last of the versions released by CNRI. In 2000, Guido and the Python core development team moved to BeOpen.com to form the BeOpen PythonLabs team. Python 2.0 was the first and only release from BeOpen.com.\r
+\r
+Following the release of Python 1.6, and after Guido van Rossum left CNRI to work with commercial software developers, it became clear that the ability to use Python with software available under the GNU Public License (GPL) was very desirable. CNRI and the Free Software Foundation (FSF) interacted to develop enabling wording changes to the Python license. Python 1.6.1 is essentially the same as Python 1.6, with a few minor bug fixes, and with a different license that enables later versions to be GPL-compatible. Python 2.1 is a derivative work of Python 1.6.1, as well as of Python 2.0.\r
+\r
+After Python 2.0 was released by BeOpen.com, Guido van Rossum and the other PythonLabs developers joined Digital Creations. All intellectual property added from this point on, starting with Python 2.1 and its alpha and beta releases, is owned by the Python Software Foundation (PSF), a non-profit modeled after the Apache Software Foundation. See http://www.python.org/psf/ for more information about the PSF.\r
+\r
+Thanks to the many outside volunteers who have worked under Guido's direction to make these releases possible.\r
+\r
+B. TERMS AND CONDITIONS FOR ACCESSING OR OTHERWISE USING PYTHON\r
+===============================================================\r
+\r
+PSF LICENSE AGREEMENT\r
+---------------------\r
+\r
+1. This LICENSE AGREEMENT is between the Python Software Foundation ("PSF"), and the Individual or Organization ("Licensee") accessing and otherwise using Python 2.1.1 software in source or binary form and its associated documentation.\r
+\r
+2. Subject to the terms and conditions of this License Agreement, PSF hereby grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use Python 2.1.1 alone or in any derivative version, provided, however, that PSF's License Agreement and PSF's notice of copyright, i.e., "Copyright (c) 2001 Python Software Foundation; All Rights Reserved" are retained in Python 2.1.1 alone or in any derivative version prepared by Licensee.\r
+\r
+3. In the event Licensee prepares a derivative work that is based on or incorporates Python 2.1.1 or any part thereof, and wants to make the derivative work available to others as provided herein, then Licensee hereby agrees to include in any such work a brief summary of the changes made to Python 2.1.1.\r
+\r
+4. PSF is making Python 2.1.1 available to Licensee on an "AS IS" basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 2.1.1 WILL NOT INFRINGE ANY THIRD PARTY RIGHTS.\r
+\r
+5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON 2.1.1 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 2.1.1, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.\r
+\r
+6. This License Agreement will automatically terminate upon a material breach of its terms and conditions.\r
+\r
+7. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between PSF and Licensee. This License Agreement does not grant permission to use PSF trademarks or trade name in a trademark sense to endorse or promote products or services of Licensee, or any third party.\r
+\r
+8. By copying, installing or otherwise using Python 2.1.1, Licensee agrees to be bound by the terms and conditions of this License Agreement.\r
+\r
+BEOPEN.COM TERMS AND CONDITIONS FOR PYTHON 2.0\r
+----------------------------------------------\r
+\r
+BEOPEN PYTHON OPEN SOURCE LICENSE AGREEMENT VERSION 1\r
+\r
+1. This LICENSE AGREEMENT is between BeOpen.com ("BeOpen"), having an office at 160 Saratoga Avenue, Santa Clara, CA 95051, and the Individual or Organization ("Licensee") accessing and otherwise using this software in source or binary form and its associated documentation ("the Software").\r
+\r
+2. Subject to the terms and conditions of this BeOpen Python License Agreement, BeOpen hereby grants Licensee a non-exclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use the Software alone or in any derivative version, provided, however, that the BeOpen Python License is retained in the Software, alone or in any derivative version prepared by Licensee.\r
+\r
+3. BeOpen is making the Software available to Licensee on an "AS IS" basis. BEOPEN MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, BEOPEN MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE WILL NOT INFRINGE ANY THIRD PARTY RIGHTS.\r
+\r
+4. BEOPEN SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE SOFTWARE FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.\r
+\r
+5. This License Agreement will automatically terminate upon a material breach of its terms and conditions.\r
+\r
+6. This License Agreement shall be governed by and interpreted in all respects by the law of the State of California, excluding conflict of law provisions. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between BeOpen and Licensee. This License Agreement does not grant permission to use BeOpen trademarks or trade names in a trademark sense to endorse or promote products or services of Licensee, or any third party. As an exception, the "BeOpen Python" logos available at http://www.pythonlabs.com/logos.html may be used according to the permissions granted on that web page.\r
+\r
+7. By copying, installing or otherwise using the software, Licensee agrees to be bound by the terms and conditions of this License Agreement.\r
+\r
+CNRI OPEN SOURCE GPL-COMPATIBLE LICENSE AGREEMENT\r
+-------------------------------------------------\r
+\r
+1. This LICENSE AGREEMENT is between the Corporation for National Research Initiatives, having an office at 1895 Preston White Drive, Reston, VA 20191 ("CNRI"), and the Individual or Organization ("Licensee") accessing and otherwise using Python 1.6.1 software in source or binary form and its associated documentation.\r
+\r
+2. Subject to the terms and conditions of this License Agreement, CNRI hereby grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use Python 1.6.1 alone or in any derivative version, provided, however, that CNRI's License Agreement and CNRI's notice of copyright, i.e., "Copyright (c) 1995-2001 Corporation for National Research Initiatives; All Rights Reserved" are retained in Python 1.6.1 alone or in any derivative version prepared by Licensee. Alternately, in lieu of CNRI's License Agreement, Licensee may substitute the following text (omitting the quotes): "Python 1.6.1 is made available subject to the terms and conditions in CNRI's License Agreement. This Agreement together with Python 1.6.1 may be located on the Internet using the following unique, persistent identifier (known as a handle): 1895.22/1013. This Agreement may also be obtained from a proxy server on the Internet using the following URL: http://hdl.handle.net/1895.22/1013".\r
+\r
+3. In the event Licensee prepares a derivative work that is based on or incorporates Python 1.6.1 or any part thereof, and wants to make the derivative work available to others as provided herein, then Licensee hereby agrees to include in any such work a brief summary of the changes made to Python 1.6.1.\r
+\r
+4. CNRI is making Python 1.6.1 available to Licensee on an "AS IS" basis. CNRI MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, CNRI MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 1.6.1 WILL NOT INFRINGE ANY THIRD PARTY RIGHTS.\r
+\r
+5. CNRI SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON 1.6.1 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 1.6.1, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.\r
+\r
+6. This License Agreement will automatically terminate upon a material breach of its terms and conditions.\r
+\r
+7. This License Agreement shall be governed by the federal intellectual property law of the United States, including without limitation the federal copyright law, and, to the extent such U.S. federal law does not apply, by the law of the Commonwealth of Virginia, excluding Virginia's conflict of law provisions. Notwithstanding the foregoing, with regard to derivative works based on Python 1.6.1 that incorporate non-separable material that was previously distributed under the GNU General Public License (GPL), the law of the Commonwealth of Virginia shall govern this License Agreement only as to issues arising under or with respect to Paragraphs 4, 5, and 7 of this License Agreement. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between CNRI and Licensee. This License Agreement does not grant permission to use CNRI trademarks or trade name in a trademark sense to endorse or promote products or services of Licensee, or any third party.\r
+\r
+8. By clicking on the "ACCEPT" button where indicated, or by copying, installing or otherwise using Python 1.6.1, Licensee agrees to be bound by the terms and conditions of this License Agreement.\r
+\r
+        ACCEPT\r
+\r
+CWI PERMISSIONS STATEMENT AND DISCLAIMER\r
+----------------------------------------\r
+\r
+Copyright (c) 1991 - 1995, Stichting Mathematisch Centrum Amsterdam, The Netherlands. All rights reserved.\r
+\r
+Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Stichting Mathematisch Centrum or CWI not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission.\r
+\r
+STICHTING MATHEMATISCH CENTRUM DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
\ No newline at end of file
diff --git a/PlanetWebKit/planet/NEWS b/PlanetWebKit/planet/NEWS
new file mode 100644 (file)
index 0000000..e985e7c
--- /dev/null
@@ -0,0 +1,4 @@
+Planet 1.0
+----------
+
+ * First release!
diff --git a/PlanetWebKit/planet/PKG-INFO b/PlanetWebKit/planet/PKG-INFO
new file mode 100644 (file)
index 0000000..2b3c30b
--- /dev/null
@@ -0,0 +1,10 @@
+Metadata-Version: 1.0
+Name: planet
+Version: 2.0
+Summary: The Planet Feed Aggregator
+Home-page: http://www.planetplanet.org/
+Author: Planet Developers
+Author-email: devel@lists.planetplanet.org
+License: Python
+Description: UNKNOWN
+Platform: UNKNOWN
diff --git a/PlanetWebKit/planet/README b/PlanetWebKit/planet/README
new file mode 100644 (file)
index 0000000..f5c9ac6
--- /dev/null
@@ -0,0 +1,12 @@
+Planet
+------
+
+Planet is a flexible feed aggregator. It downloads news feeds published by
+web sites and aggregates their content together into a single combined feed,
+latest news first.
+
+It uses Mark Pilgrim's Universal Feed Parser to read from RDF, RSS and Atom
+feeds; and Tomas Styblo's templating engine to output static files in any
+format you can dream up.
+
+Keywords: feed, blog, aggregator, RSS, RDF, Atom, OPML, Python
diff --git a/PlanetWebKit/planet/THANKS b/PlanetWebKit/planet/THANKS
new file mode 100644 (file)
index 0000000..95081e6
--- /dev/null
@@ -0,0 +1,18 @@
+Patches and Bug Fixes
+---------------------
+
+Chris Dolan - fixes, exclude filtering, duplicate culling
+David Edmondson - filtering
+Lucas Nussbaum - locale configuration
+David Pashley - cache code profiling and recursion fixing
+Gediminas Paulauskas - days per page
+
+
+Spycyroll Maintainers
+---------------------
+
+Vattekkat Satheesh Babu
+Richard Jones
+Garth Kidd
+Eliot Landrum
+Bryan Richard
diff --git a/PlanetWebKit/planet/TODO b/PlanetWebKit/planet/TODO
new file mode 100644 (file)
index 0000000..990c46f
--- /dev/null
@@ -0,0 +1,22 @@
+TODO
+====
+
+  * Expire feed history
+
+    The feed cache doesn't currently expire old entries, so could get
+    large quite rapidly.  We should probably have a config setting for
+    the cache expiry, the trouble is some channels might need a longer
+    or shorter one than others.
+
+  * Allow display normalisation to specified timezone
+
+    Some Planet admins would like their feed to be displayed in the local
+    timezone, instead of UTC.
+
+  * Support OPML and foaf subscriptions
+
+    This might be a bit invasive, but I want to be able to subscribe to OPML
+    and FOAF files, and see each feed as if it were subscribed individually.
+    Perhaps we can do this with a two-pass configuration scheme, first to pull
+    the static configs, second to go fetch and generate the dynamic configs.
+    The more I think about it, the less invasive it sounds. Hmm.
diff --git a/PlanetWebKit/planet/examples/atom.xml.tmpl b/PlanetWebKit/planet/examples/atom.xml.tmpl
new file mode 100644 (file)
index 0000000..c444d01
--- /dev/null
@@ -0,0 +1,61 @@
+<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
+<feed xmlns="http://www.w3.org/2005/Atom">
+
+       <title><TMPL_VAR name></title>
+       <link rel="self" href="<TMPL_VAR feed ESCAPE="HTML">"/>
+       <link href="<TMPL_VAR link ESCAPE="HTML">"/>
+       <id><TMPL_VAR feed ESCAPE="HTML"></id>
+       <updated><TMPL_VAR date_iso></updated>
+       <generator uri="http://www.planetplanet.org/"><TMPL_VAR generator ESCAPE="HTML"></generator>
+
+<TMPL_LOOP Items>
+       <entry<TMPL_IF channel_language> xml:lang="<TMPL_VAR channel_language>"</TMPL_IF>>
+               <title type="html"<TMPL_IF title_language> xml:lang="<TMPL_VAR title_language>"</TMPL_IF>><TMPL_VAR title ESCAPE="HTML"></title>
+               <link href="<TMPL_VAR link ESCAPE="HTML">"/>
+               <id><TMPL_VAR id ESCAPE="HTML"></id>
+               <updated><TMPL_VAR date_iso></updated>
+               <content type="html"<TMPL_IF content_language> xml:lang="<TMPL_VAR content_language>"</TMPL_IF>><TMPL_VAR content ESCAPE="HTML"></content>
+               <author>
+<TMPL_IF author_name>
+                       <name><TMPL_VAR author_name ESCAPE="HTML"></name>
+<TMPL_IF author_email>
+                       <email><TMPL_VAR author_email ESCAPE="HTML"></email>
+</TMPL_IF author_email>
+<TMPL_ELSE>
+<TMPL_IF channel_author_name>
+                       <name><TMPL_VAR channel_author_name ESCAPE="HTML"></name>
+<TMPL_IF channel_author_email>
+                       <email><TMPL_VAR channel_author_email ESCAPE="HTML"></email>
+</TMPL_IF channel_author_email>
+<TMPL_ELSE>
+                       <name><TMPL_VAR channel_name ESCAPE="HTML"></name>
+</TMPL_IF>
+</TMPL_IF>
+                       <uri><TMPL_VAR channel_link ESCAPE="HTML"></uri>
+               </author>
+               <source>
+<TMPL_IF channel_title>
+                       <title type="html"><TMPL_VAR channel_title ESCAPE="HTML"></title>
+<TMPL_ELSE>
+                       <title type="html"><TMPL_VAR channel_name ESCAPE="HTML"></title>
+</TMPL_IF>
+<TMPL_IF channel_subtitle>
+                       <subtitle type="html"><TMPL_VAR channel_subtitle ESCAPE="HTML"></subtitle>
+</TMPL_IF>
+                       <link rel="self" href="<TMPL_VAR channel_url ESCAPE="HTML">"/>
+<TMPL_IF channel_id>
+                       <id><TMPL_VAR channel_id ESCAPE="HTML"></id>
+<TMPL_ELSE>
+                       <id><TMPL_VAR channel_url ESCAPE="HTML"></id>
+</TMPL_IF>
+<TMPL_IF channel_updated_iso>
+                       <updated><TMPL_VAR channel_updated_iso></updated>
+</TMPL_IF>
+<TMPL_IF channel_rights>
+                       <rights type="html"><TMPL_VAR channel_rights ESCAPE="HTML"></rights>
+</TMPL_IF>
+               </source>
+       </entry>
+
+</TMPL_LOOP>
+</feed>
diff --git a/PlanetWebKit/planet/examples/basic/config.ini b/PlanetWebKit/planet/examples/basic/config.ini
new file mode 100644 (file)
index 0000000..446511f
--- /dev/null
@@ -0,0 +1,88 @@
+# Planet configuration file
+
+# Every planet needs a [Planet] section
+[Planet]
+# name: Your planet's name
+# link: Link to the main page
+# owner_name: Your name
+# owner_email: Your e-mail address
+name = Planet Zog
+link = http://www.planet.zog/
+owner_name = Zig The Alien
+owner_email = zig@planet.zog
+
+# cache_directory: Where cached feeds are stored
+# new_feed_items: Number of items to take from new feeds
+# log_level: One of DEBUG, INFO, WARNING, ERROR or CRITICAL
+cache_directory = examples/cache
+new_feed_items = 2
+log_level = DEBUG
+
+# template_files: Space-separated list of output template files
+template_files = examples/basic/index.html.tmpl examples/atom.xml.tmpl examples/rss20.xml.tmpl examples/rss10.xml.tmpl examples/opml.xml.tmpl examples/foafroll.xml.tmpl
+
+# The following provide defaults for each template:
+# output_dir: Directory to place output files
+# items_per_page: How many items to put on each page
+# days_per_page: How many complete days of posts to put on each page
+#                This is the absolute, hard limit (over the item limit)
+# date_format: strftime format for the default 'date' template variable
+# new_date_format: strftime format for the 'new_date' template variable
+# encoding: output encoding for the file, Python 2.3+ users can use the
+#           special "xml" value to output ASCII with XML character references
+# locale: locale to use for (e.g.) strings in dates, default is taken from your
+#         system. You can specify more locales separated by ':', planet will
+#         use the first available one
+output_dir = examples/output
+items_per_page = 60
+days_per_page = 0
+date_format = %B %d, %Y %I:%M %p
+new_date_format = %B %d, %Y
+encoding = utf-8
+# locale = C
+
+
+# To define a different value for a particular template you may create
+# a section with the same name as the template file's filename (as given
+# in template_files).
+#
+#     [examples/rss10.xml.tmpl]
+#     items_per_page = 30
+#     encoding = xml
+
+
+# Any other section defines a feed to subscribe to.  The section title
+# (in the []s) is the URI of the feed itself.  A section can also be
+# have any of the following options:
+# 
+# name: Name of the feed (defaults to the title found in the feed)
+#
+# Additionally any other option placed here will be available in
+# the template (prefixed with channel_ for the Items loop).  You can
+# define defaults for these in a [DEFAULT] section, for example
+# Planet Debian uses the following to define faces:
+#
+#     [DEFAULT]
+#     facewidth = 64
+#     faceheight = 64
+#
+#     [http://www.blog.com/rss]
+#     face = foo.png
+#     faceheight = 32
+#
+# The facewidth of the defined blog defaults to 64.
+
+[http://www.netsplit.com/blog/index.rss]
+name = Scott James Remnant
+
+[http://www.gnome.org/~jdub/blog/?flav=rss]
+name = Jeff Waugh
+
+[http://usefulinc.com/edd/blog/rss91]
+name = Edd Dumbill
+
+[http://blog.clearairturbulence.org/?flav=rss]
+name = Thom May
+
+[http://www.hadess.net/diary.rss]
+name = Bastien Nocera
diff --git a/PlanetWebKit/planet/examples/basic/index.html.tmpl b/PlanetWebKit/planet/examples/basic/index.html.tmpl
new file mode 100644 (file)
index 0000000..42d53d3
--- /dev/null
@@ -0,0 +1,88 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+
+### Planet HTML template.
+### 
+### This is intended to demonstrate and document Planet's templating
+### facilities, and at the same time provide a good base for you to
+### modify into your own design.
+### 
+### The output's a bit boring though, if you're after less documentation
+### and more instant gratification, there's an example with a much
+### prettier output in the fancy-examples/ directory of the Planet source.
+
+### Lines like this are comments, and are automatically removed by the
+### templating engine before processing.
+
+
+### Planet makes a large number of variables available for your templates.
+### See INSTALL for the complete list.  The raw value can be placed in your
+### output file using <TMPL_VAR varname>.  We'll put the name of our
+### Planet in the page title and again in an h1.
+<head>
+<title><TMPL_VAR name></title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+<meta name="generator" content="<TMPL_VAR generator ESCAPE="HTML">">
+</head>
+
+<body>
+<h1><TMPL_VAR name></h1>
+
+### One of the two loops available is the Channels loop.  This allows you
+### to easily create a list of subscriptions, which is exactly what we'll do
+### here.
+
+### Note that we can also expand variables inside HTML tags, but we need
+### to be cautious and HTML-escape any illegal characters using the form
+### <TMPL_VAR varname ESCAPE="HTML">
+
+<div style="float: right">
+<h2>Subscriptions</h2>
+<ul>
+<TMPL_LOOP Channels>
+<li><a href="<TMPL_VAR link ESCAPE="HTML">" title="<TMPL_VAR title ESCAPE="HTML">"><TMPL_VAR name></a> <a href="<TMPL_VAR url ESCAPE="HTML">">(feed)</a></li>
+</TMPL_LOOP>
+</ul>
+</div>
+
+### The other loop is the Items loop, which will get iterated for each
+### news item.
+
+<TMPL_LOOP Items>
+
+### Visually distinguish articles from different days by checking for
+### the new_date flag.  This demonstrates the <TMPL_IF varname> ... </TMPL_IF>
+### check.
+
+<TMPL_IF new_date>
+<h2><TMPL_VAR new_date></h2>
+</TMPL_IF>
+
+### Group consecutive articles by the same author together by checking
+### for the new_channel flag.
+
+<TMPL_IF new_channel>
+<h3><a href="<TMPL_VAR channel_link ESCAPE="HTML">" title="<TMPL_VAR channel_title ESCAPE="HTML">"><TMPL_VAR channel_name></a></h3>
+</TMPL_IF>
+
+
+<TMPL_IF title>
+<h4><a href="<TMPL_VAR link ESCAPE="HTML">"><TMPL_VAR title></a></h4>
+</TMPL_IF>
+<p>
+<TMPL_VAR content>
+</p>
+<p>
+<em><a href="<TMPL_VAR link ESCAPE="HTML">"><TMPL_IF author>by <TMPL_VAR author> at </TMPL_IF><TMPL_VAR date></a></em>
+</p>
+</TMPL_LOOP>
+
+<hr>
+<p>
+<a href="http://www.planetplanet.org/">Powered by Planet!</a><br>
+<em>Last updated: <TMPL_VAR date></em>
+</p>
+</body>
+
+</html>
diff --git a/PlanetWebKit/planet/examples/fancy/config.ini b/PlanetWebKit/planet/examples/fancy/config.ini
new file mode 100644 (file)
index 0000000..52084af
--- /dev/null
@@ -0,0 +1,106 @@
+# Planet configuration file
+#
+# This illustrates some of Planet's fancier features with example.
+
+# Every planet needs a [Planet] section
+[Planet]
+# name: Your planet's name
+# link: Link to the main page
+# owner_name: Your name
+# owner_email: Your e-mail address
+name = Planet Schmanet
+link = http://planet.schmanet.janet/
+owner_name = Janet Weiss
+owner_email = janet@slut.sex
+
+# cache_directory: Where cached feeds are stored
+# new_feed_items: Number of items to take from new feeds
+# log_level: One of DEBUG, INFO, WARNING, ERROR or CRITICAL
+# feed_timeout: number of seconds to wait for any given feed
+cache_directory = examples/cache
+new_feed_items = 2
+log_level = DEBUG
+feed_timeout = 20
+
+# template_files: Space-separated list of output template files
+template_files = examples/fancy/index.html.tmpl examples/atom.xml.tmpl examples/rss20.xml.tmpl examples/rss10.xml.tmpl examples/opml.xml.tmpl examples/foafroll.xml.tmpl
+
+# The following provide defaults for each template:
+# output_dir: Directory to place output files
+# items_per_page: How many items to put on each page
+# days_per_page: How many complete days of posts to put on each page
+#                This is the absolute, hard limit (over the item limit)
+# date_format: strftime format for the default 'date' template variable
+# new_date_format: strftime format for the 'new_date' template variable
+# encoding: output encoding for the file, Python 2.3+ users can use the
+#           special "xml" value to output ASCII with XML character references
+# locale: locale to use for (e.g.) strings in dates, default is taken from your
+#         system. You can specify more locales separated by ':', planet will
+#         use the first available one
+output_dir = examples/output
+items_per_page = 60
+days_per_page = 0
+date_format = %B %d, %Y %I:%M %p
+new_date_format = %B %d, %Y
+encoding = utf-8
+# locale = C
+
+
+# To define a different value for a particular template you may create
+# a section with the same name as the template file's filename (as given
+# in template_files).
+
+# Provide no more than 7 days articles on the front page
+[examples/fancy/index.html.tmpl]
+days_per_page = 7
+
+# If non-zero, all feeds which have not been updated in the indicated
+# number of days will be marked as inactive
+activity_threshold = 0
+
+
+# Options placed in the [DEFAULT] section provide defaults for the feed
+# sections.  Placing a default here means you only need to override the
+# special cases later.
+[DEFAULT]
+# Hackergotchi default size.
+# If we want to put a face alongside a feed, and it's this size, we
+# can omit these variables.
+facewidth = 65
+faceheight = 85
+
+
+# Any other section defines a feed to subscribe to.  The section title
+# (in the []s) is the URI of the feed itself.  A section can also be
+# have any of the following options:
+# 
+# name: Name of the feed (defaults to the title found in the feed)
+#
+# Additionally any other option placed here will be available in
+# the template (prefixed with channel_ for the Items loop).  We use
+# this trick to make the faces work -- this isn't something Planet
+# "natively" knows about.  Look at fancy-examples/index.html.tmpl
+# for the flip-side of this.
+
+[http://www.netsplit.com/blog/index.rss]
+name = Scott James Remnant
+face = keybuk.png
+# pick up the default facewidth and faceheight
+
+[http://www.gnome.org/~jdub/blog/?flav=rss]
+name = Jeff Waugh
+face = jdub.png
+facewidth = 70
+faceheight = 74
+
+[http://usefulinc.com/edd/blog/rss91]
+name = Edd Dumbill
+face = edd.png
+facewidth = 62
+faceheight = 80
+
+[http://blog.clearairturbulence.org/?flav=rss]
+name = Thom May
+face = thom.png
+# pick up the default faceheight only
+facewidth = 59
diff --git a/PlanetWebKit/planet/examples/fancy/index.html.tmpl b/PlanetWebKit/planet/examples/fancy/index.html.tmpl
new file mode 100644 (file)
index 0000000..41510ca
--- /dev/null
@@ -0,0 +1,125 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+
+### Fancy Planet HTML template.
+### 
+### When combined with the stylesheet and images in the output/ directory
+### of the Planet source, this gives you a much prettier result than the
+### default examples template and demonstrates how to use the config file
+### to support things like faces
+### 
+### For documentation on the more boring template elements, see
+### examples/config.ini and examples/index.html.tmpl in the Planet source.
+
+<head>
+<title><TMPL_VAR name></title>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+<meta name="generator" content="<TMPL_VAR generator ESCAPE="HTML">">
+<link rel="stylesheet" href="planet.css" type="text/css">
+<TMPL_IF feedtype>
+<link rel="alternate" href="<TMPL_VAR feed ESCAPE="HTML">" title="<TMPL_VAR channel_title_plain ESCAPE="HTML">" type="application/<TMPL_VAR feedtype>+xml">
+</TMPL_IF>
+</head>
+
+<body>
+<h1><TMPL_VAR name></h1>
+
+<TMPL_LOOP Items>
+<TMPL_IF new_date>
+<TMPL_UNLESS __FIRST__>
+### End <div class="channelgroup">
+</div>
+### End <div class="daygroup">
+</div>
+</TMPL_UNLESS>
+<div class="daygroup">
+<h2><TMPL_VAR new_date></h2>
+</TMPL_IF>
+
+<TMPL_IF new_channel>
+<TMPL_UNLESS new_date>
+### End <div class="channelgroup">
+</div>
+</TMPL_UNLESS>
+<div class="channelgroup">
+
+### Planet provides template variables for *all* configuration options for
+### the channel (and defaults), even if it doesn't know about them.  We
+### exploit this here to add hackergotchi faces to our channels.  Planet
+### doesn't know about the "face", "facewidth" and "faceheight" configuration
+### variables, but makes them available to us anyway.
+
+<h3><a href="<TMPL_VAR channel_link ESCAPE="HTML">" title="<TMPL_VAR channel_title_plain ESCAPE="HTML">"><TMPL_VAR channel_name></a></h3>
+<TMPL_IF channel_face>
+<img class="face" src="images/<TMPL_VAR channel_face ESCAPE="HTML">" width="<TMPL_VAR channel_facewidth ESCAPE="HTML">" height="<TMPL_VAR channel_faceheight ESCAPE="HTML">" alt="">
+</TMPL_IF>
+</TMPL_IF>
+
+
+<div class="entrygroup" id="<TMPL_VAR id>"<TMPL_IF channel_language> lang="<TMPL_VAR channel_language>"</TMPL_IF>>
+<TMPL_IF title>
+<h4<TMPL_IF title_language> lang="<TMPL_VAR title_language>"</TMPL_IF>><a href="<TMPL_VAR link ESCAPE="HTML">"><TMPL_VAR title></a></h4>
+</TMPL_IF>
+<div class="entry">
+<div class="content"<TMPL_IF content_language> lang="<TMPL_VAR content_language>"</TMPL_IF>>
+<TMPL_VAR content>
+</div>
+
+### Planet also makes available all of the information from the feed
+### that it can.  Use the 'planet-cache' tool on the cache file for
+### a particular feed to find out what additional keys it supports.
+### Comment extra fields are 'author' and 'category' which we
+### demonstrate below.
+
+<p class="date">
+<a href="<TMPL_VAR link ESCAPE="HTML">"><TMPL_IF author>by <TMPL_VAR author> at </TMPL_IF><TMPL_VAR date><TMPL_IF category> under <TMPL_VAR category></TMPL_IF></a>
+</p>
+</div>
+</div>
+
+<TMPL_IF __LAST__>
+### End <div class="channelgroup">
+</div>
+### End <div class="daygroup">
+</div>
+</TMPL_IF>
+</TMPL_LOOP>
+
+
+<div class="sidebar">
+<img src="images/logo.png" width="136" height="136" alt="">
+
+<h2>Subscriptions</h2>
+<ul>
+<TMPL_LOOP Channels>
+<li>
+<a href="<TMPL_VAR url ESCAPE="HTML">" title="subscribe"><img src="images/feed-icon-10x10.png" alt="(feed)"></a> <a <TMPL_IF link>href="<TMPL_VAR link ESCAPE="HTML">" </TMPL_IF><TMPL_IF message>class="message" title="<TMPL_VAR message ESCAPE="HTML">"</TMPL_IF><TMPL_UNLESS message>title="<TMPL_VAR title_plain ESCAPE="HTML">"</TMPL_UNLESS>><TMPL_VAR name></a>
+</li>
+</TMPL_LOOP>
+</ul>
+
+<p>
+<strong>Last updated:</strong><br>
+<TMPL_VAR date><br>
+<em>All times are UTC.</em><br>
+<br>
+Powered by:<br>
+<a href="http://www.planetplanet.org/"><img src="images/planet.png" width="80" height="15" alt="Planet" border="0"></a>
+</p>
+
+<p>
+<h2>Planetarium:</h2>
+<ul>
+<li><a href="http://www.planetapache.org/">Planet Apache</a></li>
+<li><a href="http://planet.debian.net/">Planet Debian</a></li>
+<li><a href="http://planet.freedesktop.org/">Planet freedesktop.org</a></li>
+<li><a href="http://planet.gnome.org/">Planet GNOME</a></li>
+<li><a href="http://planetsun.org/">Planet Sun</a></li>
+<li><a href="http://fedora.linux.duke.edu/fedorapeople/">Fedora People</a></li>
+<li><a href="http://www.planetplanet.org/">more...</a></li>
+</ul>
+</p>
+</div>
+</body>
+
+</html>
diff --git a/PlanetWebKit/planet/examples/foafroll.xml.tmpl b/PlanetWebKit/planet/examples/foafroll.xml.tmpl
new file mode 100644 (file)
index 0000000..a78e3e8
--- /dev/null
@@ -0,0 +1,31 @@
+<?xml version="1.0"?>
+<rdf:RDF
+       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+       xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
+       xmlns:foaf="http://xmlns.com/foaf/0.1/"
+       xmlns:rss="http://purl.org/rss/1.0/"
+       xmlns:dc="http://purl.org/dc/elements/1.1/"
+>
+<foaf:Group>
+       <foaf:name><TMPL_VAR name ESCAPE="HTML"></foaf:name>
+       <foaf:homepage><TMPL_VAR link ESCAPE="HTML"></foaf:homepage>
+       <rdfs:seeAlso rdf:resource="<TMPL_VAR url ESCAPE="HTML">" />
+
+<TMPL_LOOP Channels>
+       <foaf:member>
+               <foaf:Agent>
+                       <foaf:name><TMPL_VAR name ESCAPE="HTML"></foaf:name>
+                       <foaf:weblog>
+                               <foaf:Document rdf:about="<TMPL_VAR link ESCAPE="HTML">">
+                                       <dc:title><TMPL_VAR title_plain ESCAPE="HTML"></dc:title>
+                                       <rdfs:seeAlso>
+                                               <rss:channel rdf:about="<TMPL_VAR url ESCAPE="HTML">" />
+                                       </rdfs:seeAlso>
+                               </foaf:Document>
+                       </foaf:weblog>
+               </foaf:Agent>
+       </foaf:member>
+</TMPL_LOOP>
+
+</foaf:Group>
+</rdf:RDF>
diff --git a/PlanetWebKit/planet/examples/opml.xml.tmpl b/PlanetWebKit/planet/examples/opml.xml.tmpl
new file mode 100644 (file)
index 0000000..b56ee5f
--- /dev/null
@@ -0,0 +1,15 @@
+<?xml version="1.0"?>
+<opml version="1.1">
+       <head>
+               <title><TMPL_VAR name ESCAPE="HTML"></title>
+               <dateModified><TMPL_VAR date_822></dateModified>
+               <ownerName><TMPL_VAR owner_name></ownerName>
+               <ownerEmail><TMPL_VAR owner_email></ownerEmail>
+       </head>
+       
+       <body>
+               <TMPL_LOOP Channels>
+               <outline type="rss" text="<TMPL_VAR name ESCAPE="HTML">" xmlUrl="<TMPL_VAR url ESCAPE="HTML">" title="<TMPL_IF title><TMPL_VAR title ESCAPE="HTML"></TMPL_IF><TMPL_UNLESS title><TMPL_VAR name ESCAPE="HTML"></TMPL_UNLESS>"<TMPL_IF channel_link> htmlUrl="<TMPL_VAR channel_link ESCAPE="HTML">"</TMPL_IF> />
+               </TMPL_LOOP>
+       </body>
+</opml>
diff --git a/PlanetWebKit/planet/examples/output/images/edd.png b/PlanetWebKit/planet/examples/output/images/edd.png
new file mode 100644 (file)
index 0000000..eefa1c0
Binary files /dev/null and b/PlanetWebKit/planet/examples/output/images/edd.png differ
diff --git a/PlanetWebKit/planet/examples/output/images/evolution.png b/PlanetWebKit/planet/examples/output/images/evolution.png
new file mode 100644 (file)
index 0000000..412dcfb
Binary files /dev/null and b/PlanetWebKit/planet/examples/output/images/evolution.png differ
diff --git a/PlanetWebKit/planet/examples/output/images/feed-icon-10x10.png b/PlanetWebKit/planet/examples/output/images/feed-icon-10x10.png
new file mode 100644 (file)
index 0000000..cc869bc
Binary files /dev/null and b/PlanetWebKit/planet/examples/output/images/feed-icon-10x10.png differ
diff --git a/PlanetWebKit/planet/examples/output/images/jdub.png b/PlanetWebKit/planet/examples/output/images/jdub.png
new file mode 100644 (file)
index 0000000..8a0de0b
Binary files /dev/null and b/PlanetWebKit/planet/examples/output/images/jdub.png differ
diff --git a/PlanetWebKit/planet/examples/output/images/keybuk.png b/PlanetWebKit/planet/examples/output/images/keybuk.png
new file mode 100644 (file)
index 0000000..265dc39
Binary files /dev/null and b/PlanetWebKit/planet/examples/output/images/keybuk.png differ
diff --git a/PlanetWebKit/planet/examples/output/images/logo.png b/PlanetWebKit/planet/examples/output/images/logo.png
new file mode 100644 (file)
index 0000000..f277bf9
Binary files /dev/null and b/PlanetWebKit/planet/examples/output/images/logo.png differ
diff --git a/PlanetWebKit/planet/examples/output/images/opml.png b/PlanetWebKit/planet/examples/output/images/opml.png
new file mode 100644 (file)
index 0000000..3f18190
Binary files /dev/null and b/PlanetWebKit/planet/examples/output/images/opml.png differ
diff --git a/PlanetWebKit/planet/examples/output/images/planet.png b/PlanetWebKit/planet/examples/output/images/planet.png
new file mode 100644 (file)
index 0000000..9606a0c
Binary files /dev/null and b/PlanetWebKit/planet/examples/output/images/planet.png differ
diff --git a/PlanetWebKit/planet/examples/output/images/thom.png b/PlanetWebKit/planet/examples/output/images/thom.png
new file mode 100644 (file)
index 0000000..738179a
Binary files /dev/null and b/PlanetWebKit/planet/examples/output/images/thom.png differ
diff --git a/PlanetWebKit/planet/examples/output/planet.css b/PlanetWebKit/planet/examples/output/planet.css
new file mode 100644 (file)
index 0000000..f8ca042
--- /dev/null
@@ -0,0 +1,146 @@
+body {
+       border-right: 1px solid black;
+       margin-right: 200px;
+
+       padding-left: 20px;
+       padding-right: 20px;
+}
+
+h1 {
+       margin-top: 0px;
+       padding-top: 20px;
+
+       font-family: "Bitstream Vera Sans", sans-serif;
+       font-weight: normal;
+       letter-spacing: -2px;
+       text-transform: lowercase;
+       text-align: right;
+
+       color: grey;
+}
+
+h2 {
+       font-family: "Bitstream Vera Sans", sans-serif;
+       font-weight: normal;
+       color: #200080;
+
+       margin-left: -20px;
+}
+
+h3 {
+       font-family: "Bitstream Vera Sans", sans-serif;
+       font-weight: normal;
+
+       background-color: #a0c0ff;
+       border: 1px solid #5080b0;
+
+       padding: 4px;
+}
+
+h3 a {
+       text-decoration: none;
+       color: inherit;
+}
+
+h4 {
+       font-family: "Bitstream Vera Sans", sans-serif;
+       font-weight: bold;
+}
+
+h4 a {
+       text-decoration: none;
+       color: inherit;
+}
+
+img.face {
+       float: right;
+       margin-top: -3em;
+}
+
+.entry {
+       margin-bottom: 2em;
+}
+
+.entry .date {
+       font-family: "Bitstream Vera Sans", sans-serif;
+       color: grey;
+}
+
+.entry .date a {
+       text-decoration: none;
+       color: inherit;
+}
+
+.sidebar {
+       position: absolute;
+       top: 0px;
+       right: 0px;
+       width: 200px;
+
+       margin-left: 0px;
+       margin-right: 0px;
+       padding-right: 0px;
+
+       padding-top: 20px;
+       padding-left: 0px;
+
+       font-family: "Bitstream Vera Sans", sans-serif;
+       font-size: 85%;
+}
+
+.sidebar h2 {
+       font-size: 110%;
+       font-weight: bold;
+       color: black;
+
+       padding-left: 5px;
+       margin-left: 0px;
+}
+
+.sidebar ul {
+       padding-left: 1em;
+       margin-left: 0px;
+
+       list-style-type: none;
+}
+
+.sidebar ul li:hover {
+       color: grey;
+}
+
+.sidebar ul li a {
+        text-decoration: none;
+}
+
+.sidebar ul li a:hover {
+        text-decoration: underline;
+}
+
+.sidebar ul li a img {
+        border: 0;
+}
+
+.sidebar p {
+       border-top: 1px solid grey;
+       margin-top: 30px;
+       padding-top: 10px;
+
+       padding-left: 5px;
+}
+
+.sidebar .message {
+    cursor: help;
+    border-bottom: 1px dashed red;
+}
+
+.sidebar a.message:hover {
+    cursor: help;
+       background-color: #ff0000;
+       color: #ffffff !important;
+       text-decoration: none !important;
+}
+
+a:hover {
+       text-decoration: underline !important;
+       color: blue !important;
+}
diff --git a/PlanetWebKit/planet/examples/rss10.xml.tmpl b/PlanetWebKit/planet/examples/rss10.xml.tmpl
new file mode 100644 (file)
index 0000000..cdaaa79
--- /dev/null
@@ -0,0 +1,37 @@
+<?xml version="1.0"?>
+<rdf:RDF
+       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+       xmlns:dc="http://purl.org/dc/elements/1.1/"
+       xmlns:foaf="http://xmlns.com/foaf/0.1/"
+       xmlns:content="http://purl.org/rss/1.0/modules/content/"
+       xmlns="http://purl.org/rss/1.0/"
+>
+<channel rdf:about="<TMPL_VAR link ESCAPE="HTML">">
+       <title><TMPL_VAR name ESCAPE="HTML"></title>
+       <link><TMPL_VAR link ESCAPE="HTML"></link>
+       <description><TMPL_VAR name ESCAPE="HTML"> - <TMPL_VAR link ESCAPE="HTML"></description>
+
+       <items>
+               <rdf:Seq>
+<TMPL_LOOP Items>
+                       <rdf:li rdf:resource="<TMPL_VAR id ESCAPE="HTML">" />
+</TMPL_LOOP>
+               </rdf:Seq>
+       </items>
+</channel>
+
+<TMPL_LOOP Items>
+<item rdf:about="<TMPL_VAR id ESCAPE="HTML">">
+       <title><TMPL_VAR channel_name ESCAPE="HTML"><TMPL_IF title>: <TMPL_VAR title_plain ESCAPE="HTML"></TMPL_IF></title>
+       <link><TMPL_VAR link ESCAPE="HTML"></link>
+       <TMPL_IF content>
+       <content:encoded><TMPL_VAR content ESCAPE="HTML"></content:encoded>
+       </TMPL_IF>
+       <dc:date><TMPL_VAR date_iso></dc:date>
+       <TMPL_IF author_name>
+       <dc:creator><TMPL_VAR author_name></dc:creator>
+       </TMPL_IF>
+</item>
+</TMPL_LOOP>
+
+</rdf:RDF>
diff --git a/PlanetWebKit/planet/examples/rss20.xml.tmpl b/PlanetWebKit/planet/examples/rss20.xml.tmpl
new file mode 100644 (file)
index 0000000..81cbffb
--- /dev/null
@@ -0,0 +1,30 @@
+<?xml version="1.0"?>
+<rss version="2.0">
+
+<channel>
+       <title><TMPL_VAR name></title>
+       <link><TMPL_VAR link ESCAPE="HTML"></link>
+       <language>en</language>
+       <description><TMPL_VAR name ESCAPE="HTML"> - <TMPL_VAR link ESCAPE="HTML"></description>
+
+<TMPL_LOOP Items>
+<item>
+       <title><TMPL_VAR channel_name ESCAPE="HTML"><TMPL_IF title>: <TMPL_VAR title_plain ESCAPE="HTML"></TMPL_IF></title>
+       <guid><TMPL_VAR id ESCAPE="HTML"></guid>
+       <link><TMPL_VAR link ESCAPE="HTML"></link>
+       <TMPL_IF content>
+       <description><TMPL_VAR content ESCAPE="HTML"></description>
+       </TMPL_IF>
+       <pubDate><TMPL_VAR date_822></pubDate>
+       <TMPL_IF author_email>
+       <TMPL_IF author_name>
+       <author><TMPL_VAR author_email> (<TMPL_VAR author_name>)</author>
+       <TMPL_ELSE>
+       <author><TMPL_VAR author_email></author>
+       </TMPL_IF>
+       </TMPL_IF>
+</item>
+</TMPL_LOOP>
+
+</channel>
+</rss>
diff --git a/PlanetWebKit/planet/planet-cache.py b/PlanetWebKit/planet/planet-cache.py
new file mode 100755 (executable)
index 0000000..9334583
--- /dev/null
@@ -0,0 +1,194 @@
+#!/usr/bin/env python
+# -*- coding: UTF-8 -*-
+"""Planet cache tool.
+
+"""
+
+__authors__ = [ "Scott James Remnant <scott@netsplit.com>",
+                "Jeff Waugh <jdub@perkypants.org>" ]
+__license__ = "Python"
+
+
+import os
+import sys
+import time
+import dbhash
+import ConfigParser
+
+import planet
+
+
+def usage():
+    print "Usage: planet-cache [options] CACHEFILE [ITEMID]..."
+    print
+    print "Examine and modify information in the Planet cache."
+    print
+    print "Channel Commands:"
+    print " -C, --channel     Display known information on the channel"
+    print " -L, --list        List items in the channel"
+    print " -K, --keys        List all keys found in channel items"
+    print
+    print "Item Commands (need ITEMID):"
+    print " -I, --item        Display known information about the item(s)"
+    print " -H, --hide        Mark the item(s) as hidden"
+    print " -U, --unhide      Mark the item(s) as not hidden"
+    print
+    print "Other Options:"
+    print " -h, --help        Display this help message and exit"
+    sys.exit(0)
+
+def usage_error(msg, *args):
+    print >>sys.stderr, msg, " ".join(args)
+    print >>sys.stderr, "Perhaps you need --help ?"
+    sys.exit(1)
+
+def print_keys(item, title):
+    keys = item.keys()
+    keys.sort()
+    key_len = max([ len(k) for k in keys ])
+
+    print title + ":"
+    for key in keys:
+        if item.key_type(key) == item.DATE:
+            value = time.strftime(planet.TIMEFMT_ISO, item[key])
+        else:
+            value = str(item[key])
+        print "    %-*s  %s" % (key_len, key, fit_str(value, 74 - key_len))
+
+def fit_str(string, length):
+    if len(string) <= length:
+        return string
+    else:
+        return string[:length-4] + " ..."
+
+
+if __name__ == "__main__":
+    cache_file = None
+    want_ids = 0
+    ids = []
+
+    command = None
+
+    for arg in sys.argv[1:]:
+        if arg == "-h" or arg == "--help":
+            usage()
+        elif arg == "-C" or arg == "--channel":
+            if command is not None:
+                usage_error("Only one command option may be supplied")
+            command = "channel"
+        elif arg == "-L" or arg == "--list":
+            if command is not None:
+                usage_error("Only one command option may be supplied")
+            command = "list"
+        elif arg == "-K" or arg == "--keys":
+            if command is not None:
+                usage_error("Only one command option may be supplied")
+            command = "keys"
+        elif arg == "-I" or arg == "--item":
+            if command is not None:
+                usage_error("Only one command option may be supplied")
+            command = "item"
+            want_ids = 1
+        elif arg == "-H" or arg == "--hide":
+            if command is not None:
+                usage_error("Only one command option may be supplied")
+            command = "hide"
+            want_ids = 1
+        elif arg == "-U" or arg == "--unhide":
+            if command is not None:
+                usage_error("Only one command option may be supplied")
+            command = "unhide"
+            want_ids = 1
+        elif arg.startswith("-"):
+            usage_error("Unknown option:", arg)
+        else:
+            if cache_file is None:
+                cache_file = arg
+            elif want_ids:
+                ids.append(arg)
+            else:
+                usage_error("Unexpected extra argument:", arg)
+
+    if cache_file is None:
+        usage_error("Missing expected cache filename")
+    elif want_ids and not len(ids):
+        usage_error("Missing expected entry ids")
+
+    # Open the cache file directly to get the URL it represents
+    try:
+        db = dbhash.open(cache_file)
+        url = db["url"]
+        db.close()
+    except dbhash.bsddb._db.DBError, e:
+        print >>sys.stderr, cache_file + ":", e.args[1]
+        sys.exit(1)
+    except KeyError:
+        print >>sys.stderr, cache_file + ": Probably not a cache file"
+        sys.exit(1)
+
+    # Now do it the right way :-)
+    my_planet = planet.Planet(ConfigParser.ConfigParser())
+    my_planet.cache_directory = os.path.dirname(cache_file)
+    channel = planet.Channel(my_planet, url)
+
+    for item_id in ids:
+        if not channel.has_item(item_id):
+            print >>sys.stderr, item_id + ": Not in channel"
+            sys.exit(1)
+
+    # Do the user's bidding
+    if command == "channel":
+        print_keys(channel, "Channel Keys")
+
+    elif command == "item":
+        for item_id in ids:
+            item = channel.get_item(item_id)
+            print_keys(item, "Item Keys for %s" % item_id)
+
+    elif command == "list":
+        print "Items in Channel:"
+        for item in channel.items(hidden=1, sorted=1):
+            print "    " + item.id
+            print "         " + time.strftime(planet.TIMEFMT_ISO, item.date)
+            if hasattr(item, "title"):
+                print "         " + fit_str(item.title, 70)
+            if hasattr(item, "hidden"):
+                print "         (hidden)"
+
+    elif command == "keys":
+        keys = {}
+        for item in channel.items():
+            for key in item.keys():
+                keys[key] = 1
+
+        keys = keys.keys()
+        keys.sort()
+
+        print "Keys used in Channel:"
+        for key in keys:
+            print "    " + key
+        print
+
+        print "Use --item to output values of particular items."
+
+    elif command == "hide":
+        for item_id in ids:
+            item = channel.get_item(item_id)
+            if hasattr(item, "hidden"):
+                print item_id + ": Already hidden."
+            else:
+                item.hidden = "yes"
+
+        channel.cache_write()
+        print "Done."
+
+    elif command == "unhide":
+        for item_id in ids:
+            item = channel.get_item(item_id)
+            if hasattr(item, "hidden"):
+                del(item.hidden)
+            else:
+                print item_id + ": Not hidden."
+
+        channel.cache_write()
+        print "Done."
diff --git a/PlanetWebKit/planet/planet.py b/PlanetWebKit/planet/planet.py
new file mode 100755 (executable)
index 0000000..72920d7
--- /dev/null
@@ -0,0 +1,168 @@
+#!/usr/bin/env python
+"""The Planet aggregator.
+
+A flexible and easy-to-use aggregator for generating websites.
+
+Visit http://www.planetplanet.org/ for more information and to download
+the latest version.
+
+Requires Python 2.1, recommends 2.3.
+"""
+
+__authors__ = [ "Scott James Remnant <scott@netsplit.com>",
+                "Jeff Waugh <jdub@perkypants.org>" ]
+__license__ = "Python"
+
+
+import os
+import sys
+import time
+import locale
+import urlparse
+
+import planet
+
+from ConfigParser import ConfigParser
+
+# Default configuration file path
+CONFIG_FILE = "config.ini"
+
+# Defaults for the [Planet] config section
+PLANET_NAME = "Unconfigured Planet"
+PLANET_LINK = "Unconfigured Planet"
+PLANET_FEED = None
+OWNER_NAME  = "Anonymous Coward"
+OWNER_EMAIL = ""
+LOG_LEVEL   = "WARNING"
+FEED_TIMEOUT = 20 # seconds
+
+# Default template file list
+TEMPLATE_FILES = "examples/basic/planet.html.tmpl"
+
+
+
+def config_get(config, section, option, default=None, raw=0, vars=None):
+    """Get a value from the configuration, with a default."""
+    if config.has_option(section, option):
+        return config.get(section, option, raw=raw, vars=None)
+    else:
+        return default
+
+def main():
+    config_file = CONFIG_FILE
+    offline = 0
+    verbose = 0
+
+    for arg in sys.argv[1:]:
+        if arg == "-h" or arg == "--help":
+            print "Usage: planet [options] [CONFIGFILE]"
+            print
+            print "Options:"
+            print " -v, --verbose       DEBUG level logging during update"
+            print " -o, --offline       Update the Planet from the cache only"
+            print " -h, --help          Display this help message and exit"
+            print
+            sys.exit(0)
+        elif arg == "-v" or arg == "--verbose":
+            verbose = 1
+        elif arg == "-o" or arg == "--offline":
+            offline = 1
+        elif arg.startswith("-"):
+            print >>sys.stderr, "Unknown option:", arg
+            sys.exit(1)
+        else:
+            config_file = arg
+
+    # Read the configuration file
+    config = ConfigParser()
+    config.read(config_file)
+    if not config.has_section("Planet"):
+        print >>sys.stderr, "Configuration missing [Planet] section."
+        sys.exit(1)
+
+    # Read the [Planet] config section
+    planet_name = config_get(config, "Planet", "name",        PLANET_NAME)
+    planet_link = config_get(config, "Planet", "link",        PLANET_LINK)
+    planet_feed = config_get(config, "Planet", "feed",        PLANET_FEED)
+    owner_name  = config_get(config, "Planet", "owner_name",  OWNER_NAME)
+    owner_email = config_get(config, "Planet", "owner_email", OWNER_EMAIL)
+    if verbose:
+        log_level = "DEBUG"
+    else:
+        log_level  = config_get(config, "Planet", "log_level", LOG_LEVEL)
+    feed_timeout   = config_get(config, "Planet", "feed_timeout", FEED_TIMEOUT)
+    template_files = config_get(config, "Planet", "template_files",
+                                TEMPLATE_FILES).split(" ")
+
+    # Default feed to the first feed for which there is a template
+    if not planet_feed:
+        for template_file in template_files:
+            name = os.path.splitext(os.path.basename(template_file))[0]
+            if name.find('atom')>=0 or name.find('rss')>=0:
+                planet_feed = urlparse.urljoin(planet_link, name)
+                break
+
+    # Define locale
+    if config.has_option("Planet", "locale"):
+        # The user can specify more than one locale (separated by ":") as
+        # fallbacks.
+        locale_ok = False
+        for user_locale in config.get("Planet", "locale").split(':'):
+            user_locale = user_locale.strip()
+            try:
+                locale.setlocale(locale.LC_ALL, user_locale)
+            except locale.Error:
+                pass
+            else:
+                locale_ok = True
+                break
+        if not locale_ok:
+            print >>sys.stderr, "Unsupported locale setting."
+            sys.exit(1)
+
+    # Activate logging
+    planet.logging.basicConfig()
+    planet.logging.getLogger().setLevel(planet.logging.getLevelName(log_level))
+    log = planet.logging.getLogger("planet.runner")
+    try:
+        log.warning
+    except:
+        log.warning = log.warn
+
+    # timeoutsocket allows feedparser to time out rather than hang forever on
+    # ultra-slow servers.  Python 2.3 now has this functionality available in
+    # the standard socket library, so under 2.3 you don't need to install
+    # anything.  But you probably should anyway, because the socket module is
+    # buggy and timeoutsocket is better.
+    if feed_timeout:
+        try:
+            feed_timeout = float(feed_timeout)
+        except:
+            log.warning("Feed timeout set to invalid value '%s', skipping", feed_timeout)
+            feed_timeout = None
+
+    if feed_timeout and not offline:
+        try:
+            from planet import timeoutsocket
+            timeoutsocket.setDefaultSocketTimeout(feed_timeout)
+            log.debug("Socket timeout set to %d seconds", feed_timeout)
+        except ImportError:
+            import socket
+            if hasattr(socket, 'setdefaulttimeout'):
+                log.debug("timeoutsocket not found, using python function")
+                socket.setdefaulttimeout(feed_timeout)
+                log.debug("Socket timeout set to %d seconds", feed_timeout)
+            else:
+                log.error("Unable to set timeout to %d seconds", feed_timeout)
+
+    # run the planet
+    my_planet = planet.Planet(config)
+    my_planet.run(planet_name, planet_link, template_files, offline)
+
+    my_planet.generate_all_files(template_files, planet_name,
+        planet_link, planet_feed, owner_name, owner_email)
+
+
+if __name__ == "__main__":
+    main()
+
diff --git a/PlanetWebKit/planet/planet/__init__.py b/PlanetWebKit/planet/planet/__init__.py
new file mode 100644 (file)
index 0000000..929920b
--- /dev/null
@@ -0,0 +1,953 @@
+#!/usr/bin/env python
+# -*- coding: UTF-8 -*-
+"""Planet aggregator library.
+
+This package is a library for developing web sites or software that
+aggregate RSS, CDF and Atom feeds taken from elsewhere into a single,
+combined feed.
+"""
+
+__version__ = "2.0"
+__authors__ = [ "Scott James Remnant <scott@netsplit.com>",
+                "Jeff Waugh <jdub@perkypants.org>" ]
+__license__ = "Python"
+
+
+# Modules available without separate import
+import cache
+import feedparser
+import sanitize
+import htmltmpl
+import sgmllib
+try:
+    import logging
+except:
+    import compat_logging as logging
+
+# Limit the effect of "from planet import *"
+__all__ = ("cache", "feedparser", "htmltmpl", "logging",
+           "Planet", "Channel", "NewsItem")
+
+
+import os
+import md5
+import time
+import dbhash
+import re
+
+try: 
+    from xml.sax.saxutils import escape
+except:
+    def escape(data):
+        return data.replace("&","&amp;").replace(">","&gt;").replace("<","&lt;")
+
+# Version information (for generator headers)
+VERSION = ("Planet/%s +http://www.planetplanet.org" % __version__)
+
+# Default User-Agent header to send when retreiving feeds
+USER_AGENT = VERSION + " " + feedparser.USER_AGENT
+
+# Default cache directory
+CACHE_DIRECTORY = "cache"
+
+# Default number of items to display from a new feed
+NEW_FEED_ITEMS = 10
+
+# Useful common date/time formats
+TIMEFMT_ISO = "%Y-%m-%dT%H:%M:%S+00:00"
+TIMEFMT_822 = "%a, %d %b %Y %H:%M:%S +0000"
+
+
+# Log instance to use here
+log = logging.getLogger("planet")
+try:
+    log.warning
+except:
+    log.warning = log.warn
+
+# Defaults for the template file config sections
+ENCODING        = "utf-8"
+ITEMS_PER_PAGE  = 60
+DAYS_PER_PAGE   = 0
+OUTPUT_DIR      = "output"
+DATE_FORMAT     = "%B %d, %Y %I:%M %p"
+NEW_DATE_FORMAT = "%B %d, %Y"
+ACTIVITY_THRESHOLD = 0
+
+class stripHtml(sgmllib.SGMLParser):
+    "remove all tags from the data"
+    def __init__(self, data):
+        sgmllib.SGMLParser.__init__(self)
+        self.result=''
+        self.feed(data)
+        self.close()
+    def handle_data(self, data):
+        if data: self.result+=data
+
+def template_info(item, date_format):
+    """Produce a dictionary of template information."""
+    info = {}
+    for key in item.keys():
+        if item.key_type(key) == item.DATE:
+            date = item.get_as_date(key)
+            info[key] = time.strftime(date_format, date)
+            info[key + "_iso"] = time.strftime(TIMEFMT_ISO, date)
+            info[key + "_822"] = time.strftime(TIMEFMT_822, date)
+        else:
+            info[key] = item[key]
+    if 'title' in item.keys():
+        info['title_plain'] = stripHtml(info['title']).result
+
+    return info
+
+
+class Planet:
+    """A set of channels.
+
+    This class represents a set of channels for which the items will
+    be aggregated together into one combined feed.
+
+    Properties:
+        user_agent      User-Agent header to fetch feeds with.
+        cache_directory Directory to store cached channels in.
+        new_feed_items  Number of items to display from a new feed.
+        filter          A regular expression that articles must match.
+        exclude         A regular expression that articles must not match.
+    """
+    def __init__(self, config):
+        self.config = config
+
+        self._channels = []
+
+        self.user_agent = USER_AGENT
+        self.cache_directory = CACHE_DIRECTORY
+        self.new_feed_items = NEW_FEED_ITEMS
+        self.filter = None
+        self.exclude = None
+
+    def tmpl_config_get(self, template, option, default=None, raw=0, vars=None):
+        """Get a template value from the configuration, with a default."""
+        if self.config.has_option(template, option):
+            return self.config.get(template, option, raw=raw, vars=None)
+        elif self.config.has_option("Planet", option):
+            return self.config.get("Planet", option, raw=raw, vars=None)
+        else:
+            return default
+
+    def gather_channel_info(self, template_file="Planet"):
+        date_format = self.tmpl_config_get(template_file,
+                                      "date_format", DATE_FORMAT, raw=1)
+
+        activity_threshold = int(self.tmpl_config_get(template_file,
+                                            "activity_threshold",
+                                            ACTIVITY_THRESHOLD))
+
+        if activity_threshold:
+            activity_horizon = \
+                time.gmtime(time.time()-86400*activity_threshold)
+        else:
+            activity_horizon = 0
+
+        channels = {}
+        channels_list = []
+        for channel in self.channels(hidden=1):
+            channels[channel] = template_info(channel, date_format)
+            channels_list.append(channels[channel])
+
+            # identify inactive feeds
+            if activity_horizon:
+                latest = channel.items(sorted=1)
+                if len(latest)==0 or latest[0].date < activity_horizon:
+                    channels[channel]["message"] = \
+                        "no activity in %d days" % activity_threshold
+
+            # report channel level errors
+            if not channel.url_status: continue
+            status = int(channel.url_status)
+            if status == 403:
+               channels[channel]["message"] = "403: forbidden"
+            elif status == 404:
+               channels[channel]["message"] = "404: not found"
+            elif status == 408:
+               channels[channel]["message"] = "408: request timeout"
+            elif status == 410:
+               channels[channel]["message"] = "410: gone"
+            elif status == 500:
+               channels[channel]["message"] = "internal server error"
+            elif status >= 400:
+               channels[channel]["message"] = "http status %s" % status
+
+        return channels, channels_list
+
+    def gather_items_info(self, channels, template_file="Planet", channel_list=None):
+        items_list = []
+        prev_date = []
+        prev_channel = None
+
+        date_format = self.tmpl_config_get(template_file,
+                                      "date_format", DATE_FORMAT, raw=1)
+        items_per_page = int(self.tmpl_config_get(template_file,
+                                      "items_per_page", ITEMS_PER_PAGE))
+        days_per_page = int(self.tmpl_config_get(template_file,
+                                      "days_per_page", DAYS_PER_PAGE))
+        new_date_format = self.tmpl_config_get(template_file,
+                                      "new_date_format", NEW_DATE_FORMAT, raw=1)
+
+        for newsitem in self.items(max_items=items_per_page,
+                                   max_days=days_per_page,
+                                   channels=channel_list):
+            item_info = template_info(newsitem, date_format)
+            chan_info = channels[newsitem._channel]
+            for k, v in chan_info.items():
+                item_info["channel_" + k] = v
+    
+            # Check for the start of a new day
+            if prev_date[:3] != newsitem.date[:3]:
+                prev_date = newsitem.date
+                item_info["new_date"] = time.strftime(new_date_format,
+                                                      newsitem.date)
+    
+            # Check for the start of a new channel
+            if item_info.has_key("new_date") \
+                   or prev_channel != newsitem._channel:
+                prev_channel = newsitem._channel
+                item_info["new_channel"] = newsitem._channel.url
+    
+            items_list.append(item_info)
+
+        return items_list
+
+    def run(self, planet_name, planet_link, template_files, offline = False):
+        log = logging.getLogger("planet.runner")
+
+        # Create a planet
+        log.info("Loading cached data")
+        if self.config.has_option("Planet", "cache_directory"):
+            self.cache_directory = self.config.get("Planet", "cache_directory")
+        if self.config.has_option("Planet", "new_feed_items"):
+            self.new_feed_items  = int(self.config.get("Planet", "new_feed_items"))
+        self.user_agent = "%s +%s %s" % (planet_name, planet_link,
+                                              self.user_agent)
+        if self.config.has_option("Planet", "filter"):
+            self.filter = self.config.get("Planet", "filter")
+
+        # The other configuration blocks are channels to subscribe to
+        for feed_url in self.config.sections():
+            if feed_url == "Planet" or feed_url in template_files:
+                continue
+
+            # Create a channel, configure it and subscribe it
+            channel = Channel(self, feed_url)
+            self.subscribe(channel)
+
+            # Update it
+            try:
+                if not offline and not channel.url_status == '410':
+                    channel.update()
+            except KeyboardInterrupt:
+                raise
+            except:
+                log.exception("Update of <%s> failed", feed_url)
+
+    def generate_all_files(self, template_files, planet_name,
+                planet_link, planet_feed, owner_name, owner_email):
+        
+        log = logging.getLogger("planet.runner")
+        # Go-go-gadget-template
+        for template_file in template_files:
+            manager = htmltmpl.TemplateManager()
+            log.info("Processing template %s", template_file)
+            try:
+                template = manager.prepare(template_file)
+            except htmltmpl.TemplateError:
+                template = manager.prepare(os.path.basename(template_file))
+            # Read the configuration
+            output_dir = self.tmpl_config_get(template_file,
+                                         "output_dir", OUTPUT_DIR)
+            date_format = self.tmpl_config_get(template_file,
+                                          "date_format", DATE_FORMAT, raw=1)
+            encoding = self.tmpl_config_get(template_file, "encoding", ENCODING)
+        
+            # We treat each template individually
+            base = os.path.splitext(os.path.basename(template_file))[0]
+            url = os.path.join(planet_link, base)
+            output_file = os.path.join(output_dir, base)
+
+            # Gather information
+            channels, channels_list = self.gather_channel_info(template_file) 
+            items_list = self.gather_items_info(channels, template_file) 
+
+            # Gather item information
+    
+            # Process the template
+            tp = htmltmpl.TemplateProcessor(html_escape=0)
+            tp.set("Items", items_list)
+            tp.set("Channels", channels_list)
+        
+            # Generic information
+            tp.set("generator",   VERSION)
+            tp.set("name",        planet_name)
+            tp.set("link",        planet_link)
+            tp.set("owner_name",  owner_name)
+            tp.set("owner_email", owner_email)
+            tp.set("url",         url)
+        
+            if planet_feed:
+                tp.set("feed", planet_feed)
+                tp.set("feedtype", planet_feed.find('rss')>=0 and 'rss' or 'atom')
+            
+            # Update time
+            date = time.gmtime()
+            tp.set("date",        time.strftime(date_format, date))
+            tp.set("date_iso",    time.strftime(TIMEFMT_ISO, date))
+            tp.set("date_822",    time.strftime(TIMEFMT_822, date))
+
+            try:
+                log.info("Writing %s", output_file)
+                output_fd = open(output_file, "w")
+                if encoding.lower() in ("utf-8", "utf8"):
+                    # UTF-8 output is the default because we use that internally
+                    output_fd.write(tp.process(template))
+                elif encoding.lower() in ("xml", "html", "sgml"):
+                    # Magic for Python 2.3 users
+                    output = tp.process(template).decode("utf-8")
+                    output_fd.write(output.encode("ascii", "xmlcharrefreplace"))
+                else:
+                    # Must be a "known" encoding
+                    output = tp.process(template).decode("utf-8")
+                    output_fd.write(output.encode(encoding, "replace"))
+                output_fd.close()
+            except KeyboardInterrupt:
+                raise
+            except:
+                log.exception("Write of %s failed", output_file)
+
+    def channels(self, hidden=0, sorted=1):
+        """Return the list of channels."""
+        channels = []
+        for channel in self._channels:
+            if hidden or not channel.has_key("hidden"):
+                channels.append((channel.name, channel))
+
+        if sorted:
+            channels.sort()
+
+        return [ c[-1] for c in channels ]
+
+    def find_by_basename(self, basename):
+        for channel in self._channels:
+            if basename == channel.cache_basename(): return channel
+
+    def subscribe(self, channel):
+        """Subscribe the planet to the channel."""
+        self._channels.append(channel)
+
+    def unsubscribe(self, channel):
+        """Unsubscribe the planet from the channel."""
+        self._channels.remove(channel)
+
+    def items(self, hidden=0, sorted=1, max_items=0, max_days=0, channels=None):
+        """Return an optionally filtered list of items in the channel.
+
+        The filters are applied in the following order:
+
+        If hidden is true then items in hidden channels and hidden items
+        will be returned.
+
+        If sorted is true then the item list will be sorted with the newest
+        first.
+
+        If max_items is non-zero then this number of items, at most, will
+        be returned.
+
+        If max_days is non-zero then any items older than the newest by
+        this number of days won't be returned.  Requires sorted=1 to work.
+
+
+        The sharp-eyed will note that this looks a little strange code-wise,
+        it turns out that Python gets *really* slow if we try to sort the
+        actual items themselves.  Also we use mktime here, but it's ok
+        because we discard the numbers and just need them to be relatively
+        consistent between each other.
+        """
+        planet_filter_re = None
+        if self.filter:
+            planet_filter_re = re.compile(self.filter, re.I)
+        planet_exclude_re = None
+        if self.exclude:
+            planet_exclude_re = re.compile(self.exclude, re.I)
+            
+        items = []
+        seen_guids = {}
+        if not channels: channels=self.channels(hidden=hidden, sorted=0)
+        for channel in channels:
+            for item in channel._items.values():
+                if hidden or not item.has_key("hidden"):
+
+                    channel_filter_re = None
+                    if channel.filter:
+                        channel_filter_re = re.compile(channel.filter,
+                                                       re.I)
+                    channel_exclude_re = None
+                    if channel.exclude:
+                        channel_exclude_re = re.compile(channel.exclude,
+                                                        re.I)
+                    if (planet_filter_re or planet_exclude_re \
+                        or channel_filter_re or channel_exclude_re):
+                        title = ""
+                        if item.has_key("title"):
+                            title = item.title
+                        content = item.get_content("content")
+
+                    if planet_filter_re:
+                        if not (planet_filter_re.search(title) \
+                                or planet_filter_re.search(content)):
+                            continue
+
+                    if planet_exclude_re:
+                        if (planet_exclude_re.search(title) \
+                            or planet_exclude_re.search(content)):
+                            continue
+
+                    if channel_filter_re:
+                        if not (channel_filter_re.search(title) \
+                                or channel_filter_re.search(content)):
+                            continue
+
+                    if channel_exclude_re:
+                        if (channel_exclude_re.search(title) \
+                            or channel_exclude_re.search(content)):
+                            continue
+
+                    if not seen_guids.has_key(item.id):
+                        seen_guids[item.id] = 1;
+                        items.append((time.mktime(item.date), item.order, item))
+
+        # Sort the list
+        if sorted:
+            items.sort()
+            items.reverse()
+
+        # Apply max_items filter
+        if len(items) and max_items:
+            items = items[:max_items]
+
+        # Apply max_days filter
+        if len(items) and max_days:
+            max_count = 0
+            max_time = items[0][0] - max_days * 84600
+            for item in items:
+                if item[0] > max_time:
+                    max_count += 1
+                else:
+                    items = items[:max_count]
+                    break
+
+        return [ i[-1] for i in items ]
+
+class Channel(cache.CachedInfo):
+    """A list of news items.
+
+    This class represents a list of news items taken from the feed of
+    a website or other source.
+
+    Properties:
+        url             URL of the feed.
+        url_etag        E-Tag of the feed URL.
+        url_modified    Last modified time of the feed URL.
+        url_status      Last HTTP status of the feed URL.
+        hidden          Channel should be hidden (True if exists).
+        name            Name of the feed owner, or feed title.
+        next_order      Next order number to be assigned to NewsItem
+
+        updated         Correct UTC-Normalised update time of the feed.
+        last_updated    Correct UTC-Normalised time the feed was last updated.
+
+        id              An identifier the feed claims is unique (*).
+        title           One-line title (*).
+        link            Link to the original format feed (*).
+        tagline         Short description of the feed (*).
+        info            Longer description of the feed (*).
+
+        modified        Date the feed claims to have been modified (*).
+
+        author          Name of the author (*).
+        publisher       Name of the publisher (*).
+        generator       Name of the feed generator (*).
+        category        Category name (*).
+        copyright       Copyright information for humans to read (*).
+        license         Link to the licence for the content (*).
+        docs            Link to the specification of the feed format (*).
+        language        Primary language (*).
+        errorreportsto  E-Mail address to send error reports to (*).
+
+        image_url       URL of an associated image (*).
+        image_link      Link to go with the associated image (*).
+        image_title     Alternative text of the associated image (*).
+        image_width     Width of the associated image (*).
+        image_height    Height of the associated image (*).
+
+        filter          A regular expression that articles must match.
+        exclude         A regular expression that articles must not match.
+
+    Properties marked (*) will only be present if the original feed
+    contained them.  Note that the optional 'modified' date field is simply
+    a claim made by the item and parsed from the information given, 'updated'
+    (and 'last_updated') are far more reliable sources of information.
+
+    Some feeds may define additional properties to those above.
+    """
+    IGNORE_KEYS = ("links", "contributors", "textinput", "cloud", "categories",
+                   "url", "href", "url_etag", "url_modified", "tags", "itunes_explicit")
+
+    def __init__(self, planet, url):
+        if not os.path.isdir(planet.cache_directory):
+            os.makedirs(planet.cache_directory)
+        cache_filename = cache.filename(planet.cache_directory, url)
+        cache_file = dbhash.open(cache_filename, "c", 0666)
+
+        cache.CachedInfo.__init__(self, cache_file, url, root=1)
+
+        self._items = {}
+        self._planet = planet
+        self._expired = []
+        self.url = url
+        # retain the original URL for error reporting
+        self.configured_url = url
+        self.url_etag = None
+        self.url_status = None
+        self.url_modified = None
+        self.name = None
+        self.updated = None
+        self.last_updated = None
+        self.filter = None
+        self.exclude = None
+        self.next_order = "0"
+        self.cache_read()
+        self.cache_read_entries()
+
+        if planet.config.has_section(url):
+            for option in planet.config.options(url):
+                value = planet.config.get(url, option)
+                self.set_as_string(option, value, cached=0)
+
+    def has_item(self, id_):
+        """Check whether the item exists in the channel."""
+        return self._items.has_key(id_)
+
+    def get_item(self, id_):
+        """Return the item from the channel."""
+        return self._items[id_]
+
+    # Special methods
+    __contains__ = has_item
+
+    def items(self, hidden=0, sorted=0):
+        """Return the item list."""
+        items = []
+        for item in self._items.values():
+            if hidden or not item.has_key("hidden"):
+                items.append((time.mktime(item.date), item.order, item))
+
+        if sorted:
+            items.sort()
+            items.reverse()
+
+        return [ i[-1] for i in items ]
+
+    def __iter__(self):
+        """Iterate the sorted item list."""
+        return iter(self.items(sorted=1))
+
+    def cache_read_entries(self):
+        """Read entry information from the cache."""
+        keys = self._cache.keys()
+        for key in keys:
+            if key.find(" ") != -1: continue
+            if self.has_key(key): continue
+
+            item = NewsItem(self, key)
+            self._items[key] = item
+
+    def cache_basename(self):
+        return cache.filename('',self._id)
+
+    def cache_write(self, sync=1):
+        """Write channel and item information to the cache."""
+        for item in self._items.values():
+            item.cache_write(sync=0)
+        for item in self._expired:
+            item.cache_clear(sync=0)
+        cache.CachedInfo.cache_write(self, sync)
+
+        self._expired = []
+
+    def feed_information(self):
+        """
+        Returns a description string for the feed embedded in this channel.
+
+        This will usually simply be the feed url embedded in <>, but in the
+        case where the current self.url has changed from the original
+        self.configured_url the string will contain both pieces of information.
+        This is so that the URL in question is easier to find in logging
+        output: getting an error about a URL that doesn't appear in your config
+        file is annoying.
+        """
+        if self.url == self.configured_url:
+            return "<%s>" % self.url
+        else:
+            return "<%s> (formerly <%s>)" % (self.url, self.configured_url)
+
+    def update(self):
+        """Download the feed to refresh the information.
+
+        This does the actual work of pulling down the feed and if it changes
+        updates the cached information about the feed and entries within it.
+        """
+        info = feedparser.parse(self.url,
+                                etag=self.url_etag, modified=self.url_modified,
+                                agent=self._planet.user_agent)
+        if info.has_key("status"):
+           self.url_status = str(info.status)
+        elif info.has_key("entries") and len(info.entries)>0:
+           self.url_status = str(200)
+        elif info.bozo and info.bozo_exception.__class__.__name__=='Timeout':
+           self.url_status = str(408)
+        else:
+           self.url_status = str(500)
+
+        if self.url_status == '301' and \
+           (info.has_key("entries") and len(info.entries)>0):
+            log.warning("Feed has moved from <%s> to <%s>", self.url, info.url)
+            try:
+                os.link(cache.filename(self._planet.cache_directory, self.url),
+                        cache.filename(self._planet.cache_directory, info.url))
+            except:
+                pass
+            self.url = info.url
+        elif self.url_status == '304':
+            log.info("Feed %s unchanged", self.feed_information())
+            return
+        elif self.url_status == '410':
+            log.info("Feed %s gone", self.feed_information())
+            self.cache_write()
+            return
+        elif self.url_status == '408':
+            log.warning("Feed %s timed out", self.feed_information())
+            return
+        elif int(self.url_status) >= 400:
+            log.error("Error %s while updating feed %s",
+                      self.url_status, self.feed_information())
+            return
+        else:
+            log.info("Updating feed %s", self.feed_information())
+
+        self.url_etag = info.has_key("etag") and info.etag or None
+        self.url_modified = info.has_key("modified") and info.modified or None
+        if self.url_etag is not None:
+            log.debug("E-Tag: %s", self.url_etag)
+        if self.url_modified is not None:
+            log.debug("Last Modified: %s",
+                      time.strftime(TIMEFMT_ISO, self.url_modified))
+
+        self.update_info(info.feed)
+        self.update_entries(info.entries)
+        self.cache_write()
+
+    def update_info(self, feed):
+        """Update information from the feed.
+
+        This reads the feed information supplied by feedparser and updates
+        the cached information about the feed.  These are the various
+        potentially interesting properties that you might care about.
+        """
+        for key in feed.keys():
+            if key in self.IGNORE_KEYS or key + "_parsed" in self.IGNORE_KEYS:
+                # Ignored fields
+                pass
+            elif feed.has_key(key + "_parsed"):
+                # Ignore unparsed date fields
+                pass
+            elif key.endswith("_detail"):
+                # retain name and  email sub-fields
+                if feed[key].has_key('name') and feed[key].name:
+                    self.set_as_string(key.replace("_detail","_name"), \
+                        feed[key].name)
+                if feed[key].has_key('email') and feed[key].email:
+                    self.set_as_string(key.replace("_detail","_email"), \
+                        feed[key].email)
+            elif key == "items":
+                # Ignore items field
+                pass
+            elif key.endswith("_parsed"):
+                # Date fields
+                if feed[key] is not None:
+                    self.set_as_date(key[:-len("_parsed")], feed[key])
+            elif key == "image":
+                # Image field: save all the information
+                if feed[key].has_key("url"):
+                    self.set_as_string(key + "_url", feed[key].url)
+                if feed[key].has_key("link"):
+                    self.set_as_string(key + "_link", feed[key].link)
+                if feed[key].has_key("title"):
+                    self.set_as_string(key + "_title", feed[key].title)
+                if feed[key].has_key("width"):
+                    self.set_as_string(key + "_width", str(feed[key].width))
+                if feed[key].has_key("height"):
+                    self.set_as_string(key + "_height", str(feed[key].height))
+            elif isinstance(feed[key], (str, unicode)):
+                # String fields
+                try:
+                    detail = key + '_detail'
+                    if feed.has_key(detail) and feed[detail].has_key('type'):
+                        if feed[detail].type == 'text/html':
+                            feed[key] = sanitize.HTML(feed[key])
+                        elif feed[detail].type == 'text/plain':
+                            feed[key] = escape(feed[key])
+                    self.set_as_string(key, feed[key])
+                except KeyboardInterrupt:
+                    raise
+                except:
+                    log.exception("Ignored '%s' of <%s>, unknown format",
+                                  key, self.url)
+
+    def update_entries(self, entries):
+        """Update entries from the feed.
+
+        This reads the entries supplied by feedparser and updates the
+        cached information about them.  It's at this point we update
+        the 'updated' timestamp and keep the old one in 'last_updated',
+        these provide boundaries for acceptable entry times.
+
+        If this is the first time a feed has been updated then most of the
+        items will be marked as hidden, according to Planet.new_feed_items.
+
+        If the feed does not contain items which, according to the sort order,
+        should be there; those items are assumed to have been expired from
+        the feed or replaced and are removed from the cache.
+        """
+        if not len(entries):
+            return
+
+        self.last_updated = self.updated
+        self.updated = time.gmtime()
+
+        new_items = []
+        feed_items = []
+        for entry in entries:
+            # Try really hard to find some kind of unique identifier
+            if entry.has_key("id"):
+                entry_id = cache.utf8(entry.id)
+            elif entry.has_key("link"):
+                entry_id = cache.utf8(entry.link)
+            elif entry.has_key("title"):
+                entry_id = (self.url + "/"
+                            + md5.new(cache.utf8(entry.title)).hexdigest())
+            elif entry.has_key("summary"):
+                entry_id = (self.url + "/"
+                            + md5.new(cache.utf8(entry.summary)).hexdigest())
+            else:
+                log.error("Unable to find or generate id, entry ignored")
+                continue
+
+            # Create the item if necessary and update
+            if self.has_item(entry_id):
+                item = self._items[entry_id]
+            else:
+                item = NewsItem(self, entry_id)
+                self._items[entry_id] = item
+                new_items.append(item)
+            item.update(entry)
+            feed_items.append(entry_id)
+
+            # Hide excess items the first time through
+            if self.last_updated is None  and self._planet.new_feed_items \
+                   and len(feed_items) > self._planet.new_feed_items:
+                item.hidden = "yes"
+                log.debug("Marked <%s> as hidden (new feed)", entry_id)
+
+        # Assign order numbers in reverse
+        new_items.reverse()
+        for item in new_items:
+            item.order = self.next_order = str(int(self.next_order) + 1)
+
+        # Check for expired or replaced items
+        feed_count = len(feed_items)
+        log.debug("Items in Feed: %d", feed_count)
+        for item in self.items(sorted=1):
+            if feed_count < 1:
+                break
+            elif item.id in feed_items:
+                feed_count -= 1
+            elif item._channel.url_status != '226':
+                del(self._items[item.id])
+                self._expired.append(item)
+                log.debug("Removed expired or replaced item <%s>", item.id)
+
+    def get_name(self, key):
+        """Return the key containing the name."""
+        for key in ("name", "title"):
+            if self.has_key(key) and self.key_type(key) != self.NULL:
+                return self.get_as_string(key)
+
+        return ""
+
+class NewsItem(cache.CachedInfo):
+    """An item of news.
+
+    This class represents a single item of news on a channel.  They're
+    created by members of the Channel class and accessible through it.
+
+    Properties:
+        id              Channel-unique identifier for this item.
+        id_hash         Relatively short, printable cryptographic hash of id
+        date            Corrected UTC-Normalised update time, for sorting.
+        order           Order in which items on the same date can be sorted.
+        hidden          Item should be hidden (True if exists).
+
+        title           One-line title (*).
+        link            Link to the original format text (*).
+        summary         Short first-page summary (*).
+        content         Full HTML content.
+
+        modified        Date the item claims to have been modified (*).
+        issued          Date the item claims to have been issued (*).
+        created         Date the item claims to have been created (*).
+        expired         Date the item claims to expire (*).
+
+        author          Name of the author (*).
+        publisher       Name of the publisher (*).
+        category        Category name (*).
+        comments        Link to a page to enter comments (*).
+        license         Link to the licence for the content (*).
+        source_name     Name of the original source of this item (*).
+        source_link     Link to the original source of this item (*).
+
+    Properties marked (*) will only be present if the original feed
+    contained them.  Note that the various optional date fields are
+    simply claims made by the item and parsed from the information
+    given, 'date' is a far more reliable source of information.
+
+    Some feeds may define additional properties to those above.
+    """
+    IGNORE_KEYS = ("categories", "contributors", "enclosures", "links",
+                   "guidislink", "date", "tags")
+
+    def __init__(self, channel, id_):
+        cache.CachedInfo.__init__(self, channel._cache, id_)
+
+        self._channel = channel
+        self.id = id_
+        self.id_hash = md5.new(id_).hexdigest()
+        self.date = None
+        self.order = None
+        self.content = None
+        self.cache_read()
+
+    def update(self, entry):
+        """Update the item from the feedparser entry given."""
+        for key in entry.keys():
+            if key in self.IGNORE_KEYS or key + "_parsed" in self.IGNORE_KEYS:
+                # Ignored fields
+                pass
+            elif entry.has_key(key + "_parsed"):
+                # Ignore unparsed date fields
+                pass
+            elif key.endswith("_detail"):
+                # retain name, email, and language sub-fields
+                if entry[key].has_key('name') and entry[key].name:
+                    self.set_as_string(key.replace("_detail","_name"), \
+                        entry[key].name)
+                if entry[key].has_key('email') and entry[key].email:
+                    self.set_as_string(key.replace("_detail","_email"), \
+                        entry[key].email)
+                if entry[key].has_key('language') and entry[key].language and \
+                   (not self._channel.has_key('language') or \
+                   entry[key].language != self._channel.language):
+                    self.set_as_string(key.replace("_detail","_language"), \
+                        entry[key].language)
+            elif key.endswith("_parsed"):
+                # Date fields
+                if entry[key] is not None:
+                    self.set_as_date(key[:-len("_parsed")], entry[key])
+            elif key == "source":
+                # Source field: save both url and value
+                if entry[key].has_key("value"):
+                    self.set_as_string(key + "_name", entry[key].value)
+                if entry[key].has_key("url"):
+                    self.set_as_string(key + "_link", entry[key].url)
+            elif key == "content":
+                # Content field: concatenate the values
+                value = ""
+                for item in entry[key]:
+                    if item.type == 'text/html':
+                        item.value = sanitize.HTML(item.value)
+                    elif item.type == 'text/plain':
+                        item.value = escape(item.value)
+                    if item.has_key('language') and item.language and \
+                       (not self._channel.has_key('language') or
+                       item.language != self._channel.language) :
+                        self.set_as_string(key + "_language", item.language)
+                    value += cache.utf8(item.value)
+                self.set_as_string(key, value)
+            elif isinstance(entry[key], (str, unicode)):
+                # String fields
+                try:
+                    detail = key + '_detail'
+                    if entry.has_key(detail):
+                        if entry[detail].has_key('type'):
+                            if entry[detail].type == 'text/html':
+                                entry[key] = sanitize.HTML(entry[key])
+                            elif entry[detail].type == 'text/plain':
+                                entry[key] = escape(entry[key])
+                    self.set_as_string(key, entry[key])
+                except KeyboardInterrupt:
+                    raise
+                except:
+                    log.exception("Ignored '%s' of <%s>, unknown format",
+                                  key, self.id)
+
+        # Generate the date field if we need to
+        self.get_date("date")
+
+    def get_date(self, key):
+        """Get (or update) the date key.
+
+        We check whether the date the entry claims to have been changed is
+        since we last updated this feed and when we pulled the feed off the
+        site.
+
+        If it is then it's probably not bogus, and we'll sort accordingly.
+
+        If it isn't then we bound it appropriately, this ensures that
+        entries appear in posting sequence but don't overlap entries
+        added in previous updates and don't creep into the next one.
+        """
+
+        for other_key in ("updated", "modified", "published", "issued", "created"):
+            if self.has_key(other_key):
+                date = self.get_as_date(other_key)
+                break
+        else:
+            date = None
+
+        if date is not None:
+            if date > self._channel.updated:
+                date = self._channel.updated
+#            elif date < self._channel.last_updated:
+#                date = self._channel.updated
+        elif self.has_key(key) and self.key_type(key) != self.NULL:
+            return self.get_as_date(key)
+        else:
+            date = self._channel.updated
+
+        self.set_as_date(key, date)
+        return date
+
+    def get_content(self, key):
+        """Return the key containing the content."""
+        for key in ("content", "tagline", "summary"):
+            if self.has_key(key) and self.key_type(key) != self.NULL:
+                return self.get_as_string(key)
+
+        return ""
diff --git a/PlanetWebKit/planet/planet/atomstyler.py b/PlanetWebKit/planet/planet/atomstyler.py
new file mode 100644 (file)
index 0000000..9220702
--- /dev/null
@@ -0,0 +1,124 @@
+from xml.dom import minidom, Node
+from urlparse import urlparse, urlunparse
+from xml.parsers.expat import ExpatError
+from htmlentitydefs import name2codepoint
+import re
+
+# select and apply an xml:base for this entry
+class relativize:
+  def __init__(self, parent):
+    self.score = {}
+    self.links = []
+    self.collect_and_tally(parent)
+    self.base = self.select_optimal_base()
+    if self.base:
+      if not parent.hasAttribute('xml:base'):
+        self.rebase(parent)
+        parent.setAttribute('xml:base', self.base)
+
+  # collect and tally cite, href and src attributes
+  def collect_and_tally(self,parent):
+    uri = None
+    if parent.hasAttribute('cite'): uri=parent.getAttribute('cite')
+    if parent.hasAttribute('href'): uri=parent.getAttribute('href')
+    if parent.hasAttribute('src'): uri=parent.getAttribute('src')
+
+    if uri:
+      parts=urlparse(uri)
+      if parts[0].lower() == 'http':
+        parts = (parts[1]+parts[2]).split('/')
+        base = None
+        for i in range(1,len(parts)):
+          base = tuple(parts[0:i])
+          self.score[base] = self.score.get(base,0) + len(base)
+        if base and base not in self.links: self.links.append(base)
+
+    for node in parent.childNodes:
+      if node.nodeType == Node.ELEMENT_NODE:
+        self.collect_and_tally(node)
+    
+  # select the xml:base with the highest score
+  def select_optimal_base(self):
+    if not self.score: return None
+    for link in self.links:
+      self.score[link] = 0
+    winner = max(self.score.values())
+    if not winner: return None
+    for key in self.score.keys():
+      if self.score[key] == winner:
+        if winner == len(key): return None
+        return urlunparse(('http', key[0], '/'.join(key[1:]), '', '', '')) + '/'
+    
+  # rewrite cite, href and src attributes using this base
+  def rebase(self,parent):
+    uri = None
+    if parent.hasAttribute('cite'): uri=parent.getAttribute('cite')
+    if parent.hasAttribute('href'): uri=parent.getAttribute('href')
+    if parent.hasAttribute('src'): uri=parent.getAttribute('src')
+    if uri and uri.startswith(self.base):
+      uri = uri[len(self.base):] or '.'
+      if parent.hasAttribute('href'): uri=parent.setAttribute('href', uri)
+      if parent.hasAttribute('src'): uri=parent.setAttribute('src', uri)
+
+    for node in parent.childNodes:
+      if node.nodeType == Node.ELEMENT_NODE:
+        self.rebase(node)
+
+# convert type="html" to type="plain" or type="xhtml" as appropriate
+def retype(parent):
+  for node in parent.childNodes:
+    if node.nodeType == Node.ELEMENT_NODE:
+
+      if node.hasAttribute('type') and node.getAttribute('type') == 'html':
+        if len(node.childNodes)==0:
+          node.removeAttribute('type')
+        elif len(node.childNodes)==1:
+
+          # replace html entity defs with utf-8
+          chunks=re.split('&(\w+);', node.childNodes[0].nodeValue)
+          for i in range(1,len(chunks),2):
+             if chunks[i] in ['amp', 'lt', 'gt', 'apos', 'quot']:
+               chunks[i] ='&' + chunks[i] +';'
+             elif chunks[i] in name2codepoint:
+               chunks[i]=unichr(name2codepoint[chunks[i]])
+             else:
+               chunks[i]='&' + chunks[i] + ';'
+          text = u"".join(chunks)
+
+          try:
+            # see if the resulting text is a well-formed XML fragment
+            div = '<div xmlns="http://www.w3.org/1999/xhtml">%s</div>'
+            data = minidom.parseString((div % text.encode('utf-8')))
+
+            if text.find('<') < 0:
+              # plain text
+              node.removeAttribute('type')
+              text = data.documentElement.childNodes[0].nodeValue
+              node.childNodes[0].replaceWholeText(text)
+
+            elif len(text) > 80:
+              # xhtml
+              node.setAttribute('type', 'xhtml')
+              node.removeChild(node.childNodes[0])
+              node.appendChild(data.documentElement)
+
+          except ExpatError:
+            # leave as html
+            pass
+
+      else:
+        # recurse
+        retype(node)
+
+  if parent.nodeName == 'entry':
+    relativize(parent)
+
+if __name__ == '__main__':
+
+  # run styler on each file mention on the command line
+  import sys
+  for feed in sys.argv[1:]:
+    doc = minidom.parse(feed)
+    doc.normalize()
+    retype(doc.documentElement)
+    open(feed,'w').write(doc.toxml('utf-8'))
diff --git a/PlanetWebKit/planet/planet/cache.py b/PlanetWebKit/planet/planet/cache.py
new file mode 100644 (file)
index 0000000..dfc529b
--- /dev/null
@@ -0,0 +1,306 @@
+#!/usr/bin/env python
+# -*- coding: UTF-8 -*-
+"""Item cache.
+
+Between runs of Planet we need somewhere to store the feed information
+we parsed, this is so we don't lose information when a particular feed
+goes away or is too short to hold enough items.
+
+This module provides the code to handle this cache transparently enough
+that the rest of the code can take the persistance for granted.
+"""
+
+import os
+import re
+
+
+# Regular expressions to sanitise cache filenames
+re_url_scheme    = re.compile(r'^[^:]*://')
+re_slash         = re.compile(r'[?/]+')
+re_initial_cruft = re.compile(r'^[,.]*')
+re_final_cruft   = re.compile(r'[,.]*$')
+
+
+class CachedInfo:
+    """Cached information.
+
+    This class is designed to hold information that is stored in a cache
+    between instances.  It can act both as a dictionary (c['foo']) and
+    as an object (c.foo) to get and set values and supports both string
+    and date values.
+
+    If you wish to support special fields you can derive a class off this
+    and implement get_FIELD and set_FIELD functions which will be
+    automatically called.
+    """
+    STRING = "string"
+    DATE   = "date"
+    NULL   = "null"
+
+    def __init__(self, cache, id_, root=0):
+        self._type = {}
+        self._value = {}
+        self._cached = {}
+
+        self._cache = cache
+        self._id = id_.replace(" ", "%20")
+        self._root = root
+
+    def cache_key(self, key):
+        """Return the cache key name for the given key."""
+        key = key.replace(" ", "_")
+        if self._root:
+            return key
+        else:
+            return self._id + " " + key
+
+    def cache_read(self):
+        """Read information from the cache."""
+        if self._root:
+            keys_key = " keys"
+        else:
+            keys_key = self._id
+
+        if self._cache.has_key(keys_key):
+            keys = self._cache[keys_key].split(" ")
+        else:
+            return
+
+        for key in keys:
+            cache_key = self.cache_key(key)
+            if not self._cached.has_key(key) or self._cached[key]:
+                # Key either hasn't been loaded, or is one for the cache
+                self._value[key] = self._cache[cache_key]
+                self._type[key] = self._cache[cache_key + " type"]
+                self._cached[key] = 1
+
+    def cache_write(self, sync=1):
+        """Write information to the cache."""
+        self.cache_clear(sync=0)
+
+        keys = []
+        for key in self.keys():
+            cache_key = self.cache_key(key)
+            if not self._cached[key]:
+                if self._cache.has_key(cache_key):
+                    # Non-cached keys need to be cleared
+                    del(self._cache[cache_key])
+                    del(self._cache[cache_key + " type"])
+                continue
+
+            keys.append(key)
+            self._cache[cache_key] = self._value[key]
+            self._cache[cache_key + " type"] = self._type[key]
+
+        if self._root:
+            keys_key = " keys"
+        else:
+            keys_key = self._id
+
+        self._cache[keys_key] = " ".join(keys)
+        if sync:
+            self._cache.sync()
+
+    def cache_clear(self, sync=1):
+        """Remove information from the cache."""
+        if self._root:
+            keys_key = " keys"
+        else:
+            keys_key = self._id
+
+        if self._cache.has_key(keys_key):
+            keys = self._cache[keys_key].split(" ")
+            del(self._cache[keys_key])
+        else:
+            return
+
+        for key in keys:
+            cache_key = self.cache_key(key)
+            del(self._cache[cache_key])
+            del(self._cache[cache_key + " type"])
+
+        if sync:
+            self._cache.sync()
+
+    def has_key(self, key):
+        """Check whether the key exists."""
+        key = key.replace(" ", "_")
+        return self._value.has_key(key)
+
+    def key_type(self, key):
+        """Return the key type."""
+        key = key.replace(" ", "_")
+        return self._type[key]
+
+    def set(self, key, value, cached=1):
+        """Set the value of the given key.
+
+        If a set_KEY function exists that is called otherwise the
+        string function is called and the date function if that fails
+        (it nearly always will).
+        """
+        key = key.replace(" ", "_")
+
+        try:
+            func = getattr(self, "set_" + key)
+        except AttributeError:
+            pass
+        else:
+            return func(key, value)
+
+        if value == None:
+            return self.set_as_null(key, value)
+        else:
+            try:
+                return self.set_as_string(key, value)
+            except TypeError:
+                return self.set_as_date(key, value)
+
+    def get(self, key):
+        """Return the value of the given key.
+
+        If a get_KEY function exists that is called otherwise the
+        correctly typed function is called if that exists.
+        """
+        key = key.replace(" ", "_")
+
+        try:
+            func = getattr(self, "get_" + key)
+        except AttributeError:
+            pass
+        else:
+            return func(key)
+
+        try:
+            func = getattr(self, "get_as_" + self._type[key])
+        except AttributeError:
+            pass
+        else:
+            return func(key)
+
+        return self._value[key]
+
+    def set_as_string(self, key, value, cached=1):
+        """Set the key to the string value.
+
+        The value is converted to UTF-8 if it is a Unicode string, otherwise
+        it's assumed to have failed decoding (feedparser tries pretty hard)
+        so has all non-ASCII characters stripped.
+        """
+        value = utf8(value)
+
+        key = key.replace(" ", "_")
+        self._value[key] = value
+        self._type[key] = self.STRING
+        self._cached[key] = cached
+
+    def get_as_string(self, key):
+        """Return the key as a string value."""
+        key = key.replace(" ", "_")
+        if not self.has_key(key):
+            raise KeyError, key
+
+        return self._value[key]
+
+    def set_as_date(self, key, value, cached=1):
+        """Set the key to the date value.
+
+        The date should be a 9-item tuple as returned by time.gmtime().
+        """
+        value = " ".join([ str(s) for s in value ])
+
+        key = key.replace(" ", "_")
+        self._value[key] = value
+        self._type[key] = self.DATE
+        self._cached[key] = cached
+
+    def get_as_date(self, key):
+        """Return the key as a date value."""
+        key = key.replace(" ", "_")
+        if not self.has_key(key):
+            raise KeyError, key
+
+        value = self._value[key]
+        return tuple([ int(i) for i in value.split(" ") ])
+
+    def set_as_null(self, key, value, cached=1):
+        """Set the key to the null value.
+
+        This only exists to make things less magic.
+        """
+        key = key.replace(" ", "_")
+        self._value[key] = ""
+        self._type[key] = self.NULL
+        self._cached[key] = cached
+
+    def get_as_null(self, key):
+        """Return the key as the null value."""
+        key = key.replace(" ", "_")
+        if not self.has_key(key):
+            raise KeyError, key
+
+        return None
+
+    def del_key(self, key):
+        """Delete the given key."""
+        key = key.replace(" ", "_")
+        if not self.has_key(key):
+            raise KeyError, key
+
+        del(self._value[key])
+        del(self._type[key])
+        del(self._cached[key])
+
+    def keys(self):
+        """Return the list of cached keys."""
+        return self._value.keys()
+
+    def __iter__(self):
+        """Iterate the cached keys."""
+        return iter(self._value.keys())
+
+    # Special methods
+    __contains__ = has_key
+    __setitem__  = set_as_string
+    __getitem__  = get
+    __delitem__  = del_key
+    __delattr__  = del_key
+
+    def __setattr__(self, key, value):
+        if key.startswith("_"):
+            self.__dict__[key] = value
+        else:
+            self.set(key, value)
+
+    def __getattr__(self, key):
+        if self.has_key(key):
+            return self.get(key)
+        else:
+            raise AttributeError, key
+
+
+def filename(directory, filename):
+    """Return a filename suitable for the cache.
+
+    Strips dangerous and common characters to create a filename we
+    can use to store the cache in.
+    """
+    filename = re_url_scheme.sub("", filename)
+    filename = re_slash.sub(",", filename)
+    filename = re_initial_cruft.sub("", filename)
+    filename = re_final_cruft.sub("", filename)
+
+    return os.path.join(directory, filename)
+
+def utf8(value):
+    """Return the value as a UTF-8 string."""
+    if type(value) == type(u''):
+        return value.encode("utf-8")
+    else:
+        try:
+            return unicode(value, "utf-8").encode("utf-8")
+        except UnicodeError:
+            try:
+                return unicode(value, "iso-8859-1").encode("utf-8")
+            except UnicodeError:
+                return unicode(value, "ascii", "replace").encode("utf-8")
diff --git a/PlanetWebKit/planet/planet/compat_logging/__init__.py b/PlanetWebKit/planet/planet/compat_logging/__init__.py
new file mode 100644 (file)
index 0000000..3bd0c6d
--- /dev/null
@@ -0,0 +1,1196 @@
+# Copyright 2001-2002 by Vinay Sajip. All Rights Reserved.
+#
+# Permission to use, copy, modify, and distribute this software and its
+# documentation for any purpose and without fee is hereby granted,
+# provided that the above copyright notice appear in all copies and that
+# both that copyright notice and this permission notice appear in
+# supporting documentation, and that the name of Vinay Sajip
+# not be used in advertising or publicity pertaining to distribution
+# of the software without specific, written prior permission.
+# VINAY SAJIP DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
+# ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL
+# VINAY SAJIP BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR
+# ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
+# IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
+# OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+
+"""
+Logging package for Python. Based on PEP 282 and comments thereto in
+comp.lang.python, and influenced by Apache's log4j system.
+
+Should work under Python versions >= 1.5.2, except that source line
+information is not available unless 'sys._getframe()' is.
+
+Copyright (C) 2001-2002 Vinay Sajip. All Rights Reserved.
+
+To use, simply 'import logging' and log away!
+"""
+
+import sys, os, types, time, string, cStringIO
+
+try:
+    import thread
+    import threading
+except ImportError:
+    thread = None
+
+__author__  = "Vinay Sajip <vinay_sajip@red-dove.com>"
+__status__  = "beta"
+__version__ = "0.4.8.1"
+__date__    = "26 June 2003"
+
+#---------------------------------------------------------------------------
+#   Miscellaneous module data
+#---------------------------------------------------------------------------
+
+#
+#_srcfile is used when walking the stack to check when we've got the first
+# caller stack frame.
+#
+if string.lower(__file__[-4:]) in ['.pyc', '.pyo']:
+    _srcfile = __file__[:-4] + '.py'
+else:
+    _srcfile = __file__
+_srcfile = os.path.normcase(_srcfile)
+
+# _srcfile is only used in conjunction with sys._getframe().
+# To provide compatibility with older versions of Python, set _srcfile
+# to None if _getframe() is not available; this value will prevent
+# findCaller() from being called.
+if not hasattr(sys, "_getframe"):
+    _srcfile = None
+
+#
+#_startTime is used as the base when calculating the relative time of events
+#
+_startTime = time.time()
+
+#
+#raiseExceptions is used to see if exceptions during handling should be
+#propagated
+#
+raiseExceptions = 1
+
+#---------------------------------------------------------------------------
+#   Level related stuff
+#---------------------------------------------------------------------------
+#
+# Default levels and level names, these can be replaced with any positive set
+# of values having corresponding names. There is a pseudo-level, NOTSET, which
+# is only really there as a lower limit for user-defined levels. Handlers and
+# loggers are initialized with NOTSET so that they will log all messages, even
+# at user-defined levels.
+#
+CRITICAL = 50
+FATAL = CRITICAL
+ERROR = 40
+WARNING = 30
+WARN = WARNING
+INFO = 20
+DEBUG = 10
+NOTSET = 0
+
+_levelNames = {
+    CRITICAL : 'CRITICAL',
+    ERROR : 'ERROR',
+    WARNING : 'WARNING',
+    INFO : 'INFO',
+    DEBUG : 'DEBUG',
+    NOTSET : 'NOTSET',
+    'CRITICAL' : CRITICAL,
+    'ERROR' : ERROR,
+    'WARN' : WARNING,
+    'WARNING' : WARNING,
+    'INFO' : INFO,
+    'DEBUG' : DEBUG,
+    'NOTSET' : NOTSET,
+}
+
+def getLevelName(level):
+    """
+    Return the textual representation of logging level 'level'.
+
+    If the level is one of the predefined levels (CRITICAL, ERROR, WARNING,
+    INFO, DEBUG) then you get the corresponding string. If you have
+    associated levels with names using addLevelName then the name you have
+    associated with 'level' is returned. Otherwise, the string
+    "Level %s" % level is returned.
+    """
+    return _levelNames.get(level, ("Level %s" % level))
+
+def addLevelName(level, levelName):
+    """
+    Associate 'levelName' with 'level'.
+
+    This is used when converting levels to text during message formatting.
+    """
+    _acquireLock()
+    try:    #unlikely to cause an exception, but you never know...
+        _levelNames[level] = levelName
+        _levelNames[levelName] = level
+    finally:
+        _releaseLock()
+
+#---------------------------------------------------------------------------
+#   Thread-related stuff
+#---------------------------------------------------------------------------
+
+#
+#_lock is used to serialize access to shared data structures in this module.
+#This needs to be an RLock because fileConfig() creates Handlers and so
+#might arbitrary user threads. Since Handler.__init__() updates the shared
+#dictionary _handlers, it needs to acquire the lock. But if configuring,
+#the lock would already have been acquired - so we need an RLock.
+#The same argument applies to Loggers and Manager.loggerDict.
+#
+_lock = None
+
+def _acquireLock():
+    """
+    Acquire the module-level lock for serializing access to shared data.
+
+    This should be released with _releaseLock().
+    """
+    global _lock
+    if (not _lock) and thread:
+        _lock = threading.RLock()
+    if _lock:
+        _lock.acquire()
+
+def _releaseLock():
+    """
+    Release the module-level lock acquired by calling _acquireLock().
+    """
+    if _lock:
+        _lock.release()
+
+#---------------------------------------------------------------------------
+#   The logging record
+#---------------------------------------------------------------------------
+
+class LogRecord:
+    """
+    A LogRecord instance represents an event being logged.
+
+    LogRecord instances are created every time something is logged. They
+    contain all the information pertinent to the event being logged. The
+    main information passed in is in msg and args, which are combined
+    using str(msg) % args to create the message field of the record. The
+    record also includes information such as when the record was created,
+    the source line where the logging call was made, and any exception
+    information to be logged.
+    """
+    def __init__(self, name, level, pathname, lineno, msg, args, exc_info):
+        """
+        Initialize a logging record with interesting information.
+        """
+        ct = time.time()
+        self.name = name
+        self.msg = msg
+        self.args = args
+        self.levelname = getLevelName(level)
+        self.levelno = level
+        self.pathname = pathname
+        try:
+            self.filename = os.path.basename(pathname)
+            self.module = os.path.splitext(self.filename)[0]
+        except:
+            self.filename = pathname
+            self.module = "Unknown module"
+        self.exc_info = exc_info
+        self.lineno = lineno
+        self.created = ct
+        self.msecs = (ct - long(ct)) * 1000
+        self.relativeCreated = (self.created - _startTime) * 1000
+        if thread:
+            self.thread = thread.get_ident()
+        else:
+            self.thread = None
+        if hasattr(os, 'getpid'):
+            self.process = os.getpid()
+        else:
+            self.process = None
+
+    def __str__(self):
+        return '<LogRecord: %s, %s, %s, %s, "%s">'%(self.name, self.levelno,
+            self.pathname, self.lineno, self.msg)
+
+    def getMessage(self):
+        """
+        Return the message for this LogRecord.
+
+        Return the message for this LogRecord after merging any user-supplied
+        arguments with the message.
+        """
+        if not hasattr(types, "UnicodeType"): #if no unicode support...
+            msg = str(self.msg)
+        else:
+            try:
+                msg = str(self.msg)
+            except UnicodeError:
+                msg = self.msg      #Defer encoding till later
+        if self.args:
+            msg = msg % self.args
+        return msg
+
+def makeLogRecord(dict):
+    """
+    Make a LogRecord whose attributes are defined by the specified dictionary,
+    This function is useful for converting a logging event received over
+    a socket connection (which is sent as a dictionary) into a LogRecord
+    instance.
+    """
+    rv = LogRecord(None, None, "", 0, "", (), None)
+    rv.__dict__.update(dict)
+    return rv
+
+#---------------------------------------------------------------------------
+#   Formatter classes and functions
+#---------------------------------------------------------------------------
+
+class Formatter:
+    """
+    Formatter instances are used to convert a LogRecord to text.
+
+    Formatters need to know how a LogRecord is constructed. They are
+    responsible for converting a LogRecord to (usually) a string which can
+    be interpreted by either a human or an external system. The base Formatter
+    allows a formatting string to be specified. If none is supplied, the
+    default value of "%s(message)\\n" is used.
+
+    The Formatter can be initialized with a format string which makes use of
+    knowledge of the LogRecord attributes - e.g. the default value mentioned
+    above makes use of the fact that the user's message and arguments are pre-
+    formatted into a LogRecord's message attribute. Currently, the useful
+    attributes in a LogRecord are described by:
+
+    %(name)s            Name of the logger (logging channel)
+    %(levelno)s         Numeric logging level for the message (DEBUG, INFO,
+                        WARNING, ERROR, CRITICAL)
+    %(levelname)s       Text logging level for the message ("DEBUG", "INFO",
+                        "WARNING", "ERROR", "CRITICAL")
+    %(pathname)s        Full pathname of the source file where the logging
+                        call was issued (if available)
+    %(filename)s        Filename portion of pathname
+    %(module)s          Module (name portion of filename)
+    %(lineno)d          Source line number where the logging call was issued
+                        (if available)
+    %(created)f         Time when the LogRecord was created (time.time()
+                        return value)
+    %(asctime)s         Textual time when the LogRecord was created
+    %(msecs)d           Millisecond portion of the creation time
+    %(relativeCreated)d Time in milliseconds when the LogRecord was created,
+                        relative to the time the logging module was loaded
+                        (typically at application startup time)
+    %(thread)d          Thread ID (if available)
+    %(process)d         Process ID (if available)
+    %(message)s         The result of record.getMessage(), computed just as
+                        the record is emitted
+    """
+
+    converter = time.localtime
+
+    def __init__(self, fmt=None, datefmt=None):
+        """
+        Initialize the formatter with specified format strings.
+
+        Initialize the formatter either with the specified format string, or a
+        default as described above. Allow for specialized date formatting with
+        the optional datefmt argument (if omitted, you get the ISO8601 format).
+        """
+        if fmt:
+            self._fmt = fmt
+        else:
+            self._fmt = "%(message)s"
+        self.datefmt = datefmt
+
+    def formatTime(self, record, datefmt=None):
+        """
+        Return the creation time of the specified LogRecord as formatted text.
+
+        This method should be called from format() by a formatter which
+        wants to make use of a formatted time. This method can be overridden
+        in formatters to provide for any specific requirement, but the
+        basic behaviour is as follows: if datefmt (a string) is specified,
+        it is used with time.strftime() to format the creation time of the
+        record. Otherwise, the ISO8601 format is used. The resulting
+        string is returned. This function uses a user-configurable function
+        to convert the creation time to a tuple. By default, time.localtime()
+        is used; to change this for a particular formatter instance, set the
+        'converter' attribute to a function with the same signature as
+        time.localtime() or time.gmtime(). To change it for all formatters,
+        for example if you want all logging times to be shown in GMT,
+        set the 'converter' attribute in the Formatter class.
+        """
+        ct = self.converter(record.created)
+        if datefmt:
+            s = time.strftime(datefmt, ct)
+        else:
+            t = time.strftime("%Y-%m-%d %H:%M:%S", ct)
+            s = "%s,%03d" % (t, record.msecs)
+        return s
+
+    def formatException(self, ei):
+        """
+        Format and return the specified exception information as a string.
+
+        This default implementation just uses
+        traceback.print_exception()
+        """
+        import traceback
+        sio = cStringIO.StringIO()
+        traceback.print_exception(ei[0], ei[1], ei[2], None, sio)
+        s = sio.getvalue()
+        sio.close()
+        if s[-1] == "\n":
+            s = s[:-1]
+        return s
+
+    def format(self, record):
+        """
+        Format the specified record as text.
+
+        The record's attribute dictionary is used as the operand to a
+        string formatting operation which yields the returned string.
+        Before formatting the dictionary, a couple of preparatory steps
+        are carried out. The message attribute of the record is computed
+        using LogRecord.getMessage(). If the formatting string contains
+        "%(asctime)", formatTime() is called to format the event time.
+        If there is exception information, it is formatted using
+        formatException() and appended to the message.
+        """
+        record.message = record.getMessage()
+        if string.find(self._fmt,"%(asctime)") >= 0:
+            record.asctime = self.formatTime(record, self.datefmt)
+        s = self._fmt % record.__dict__
+        if record.exc_info:
+            if s[-1] != "\n":
+                s = s + "\n"
+            s = s + self.formatException(record.exc_info)
+        return s
+
+#
+#   The default formatter to use when no other is specified
+#
+_defaultFormatter = Formatter()
+
+class BufferingFormatter:
+    """
+    A formatter suitable for formatting a number of records.
+    """
+    def __init__(self, linefmt=None):
+        """
+        Optionally specify a formatter which will be used to format each
+        individual record.
+        """
+        if linefmt:
+            self.linefmt = linefmt
+        else:
+            self.linefmt = _defaultFormatter
+
+    def formatHeader(self, records):
+        """
+        Return the header string for the specified records.
+        """
+        return ""
+
+    def formatFooter(self, records):
+        """
+        Return the footer string for the specified records.
+        """
+        return ""
+
+    def format(self, records):
+        """
+        Format the specified records and return the result as a string.
+        """
+        rv = ""
+        if len(records) > 0:
+            rv = rv + self.formatHeader(records)
+            for record in records:
+                rv = rv + self.linefmt.format(record)
+            rv = rv + self.formatFooter(records)
+        return rv
+
+#---------------------------------------------------------------------------
+#   Filter classes and functions
+#---------------------------------------------------------------------------
+
+class Filter:
+    """
+    Filter instances are used to perform arbitrary filtering of LogRecords.
+
+    Loggers and Handlers can optionally use Filter instances to filter
+    records as desired. The base filter class only allows events which are
+    below a certain point in the logger hierarchy. For example, a filter
+    initialized with "A.B" will allow events logged by loggers "A.B",
+    "A.B.C", "A.B.C.D", "A.B.D" etc. but not "A.BB", "B.A.B" etc. If
+    initialized with the empty string, all events are passed.
+    """
+    def __init__(self, name=''):
+        """
+        Initialize a filter.
+
+        Initialize with the name of the logger which, together with its
+        children, will have its events allowed through the filter. If no
+        name is specified, allow every event.
+        """
+        self.name = name
+        self.nlen = len(name)
+
+    def filter(self, record):
+        """
+        Determine if the specified record is to be logged.
+
+        Is the specified record to be logged? Returns 0 for no, nonzero for
+        yes. If deemed appropriate, the record may be modified in-place.
+        """
+        if self.nlen == 0:
+            return 1
+        elif self.name == record.name:
+            return 1
+        elif string.find(record.name, self.name, 0, self.nlen) != 0:
+            return 0
+        return (record.name[self.nlen] == ".")
+
+class Filterer:
+    """
+    A base class for loggers and handlers which allows them to share
+    common code.
+    """
+    def __init__(self):
+        """
+        Initialize the list of filters to be an empty list.
+        """
+        self.filters = []
+
+    def addFilter(self, filter):
+        """
+        Add the specified filter to this handler.
+        """
+        if not (filter in self.filters):
+            self.filters.append(filter)
+
+    def removeFilter(self, filter):
+        """
+        Remove the specified filter from this handler.
+        """
+        if filter in self.filters:
+            self.filters.remove(filter)
+
+    def filter(self, record):
+        """
+        Determine if a record is loggable by consulting all the filters.
+
+        The default is to allow the record to be logged; any filter can veto
+        this and the record is then dropped. Returns a zero value if a record
+        is to be dropped, else non-zero.
+        """
+        rv = 1
+        for f in self.filters:
+            if not f.filter(record):
+                rv = 0
+                break
+        return rv
+
+#---------------------------------------------------------------------------
+#   Handler classes and functions
+#---------------------------------------------------------------------------
+
+_handlers = {}  #repository of handlers (for flushing when shutdown called)
+
+class Handler(Filterer):
+    """
+    Handler instances dispatch logging events to specific destinations.
+
+    The base handler class. Acts as a placeholder which defines the Handler
+    interface. Handlers can optionally use Formatter instances to format
+    records as desired. By default, no formatter is specified; in this case,
+    the 'raw' message as determined by record.message is logged.
+    """
+    def __init__(self, level=NOTSET):
+        """
+        Initializes the instance - basically setting the formatter to None
+        and the filter list to empty.
+        """
+        Filterer.__init__(self)
+        self.level = level
+        self.formatter = None
+        #get the module data lock, as we're updating a shared structure.
+        _acquireLock()
+        try:    #unlikely to raise an exception, but you never know...
+            _handlers[self] = 1
+        finally:
+            _releaseLock()
+        self.createLock()
+
+    def createLock(self):
+        """
+        Acquire a thread lock for serializing access to the underlying I/O.
+        """
+        if thread:
+            self.lock = thread.allocate_lock()
+        else:
+            self.lock = None
+
+    def acquire(self):
+        """
+        Acquire the I/O thread lock.
+        """
+        if self.lock:
+            self.lock.acquire()
+
+    def release(self):
+        """
+        Release the I/O thread lock.
+        """
+        if self.lock:
+            self.lock.release()
+
+    def setLevel(self, level):
+        """
+        Set the logging level of this handler.
+        """
+        self.level = level
+
+    def format(self, record):
+        """
+        Format the specified record.
+
+        If a formatter is set, use it. Otherwise, use the default formatter
+        for the module.
+        """
+        if self.formatter:
+            fmt = self.formatter
+        else:
+            fmt = _defaultFormatter
+        return fmt.format(record)
+
+    def emit(self, record):
+        """
+        Do whatever it takes to actually log the specified logging record.
+
+        This version is intended to be implemented by subclasses and so
+        raises a NotImplementedError.
+        """
+        raise NotImplementedError, 'emit must be implemented '\
+                                    'by Handler subclasses'
+
+    def handle(self, record):
+        """
+        Conditionally emit the specified logging record.
+
+        Emission depends on filters which may have been added to the handler.
+        Wrap the actual emission of the record with acquisition/release of
+        the I/O thread lock. Returns whether the filter passed the record for
+        emission.
+        """
+        rv = self.filter(record)
+        if rv:
+            self.acquire()
+            try:
+                self.emit(record)
+            finally:
+                self.release()
+        return rv
+
+    def setFormatter(self, fmt):
+        """
+        Set the formatter for this handler.
+        """
+        self.formatter = fmt
+
+    def flush(self):
+        """
+        Ensure all logging output has been flushed.
+
+        This version does nothing and is intended to be implemented by
+        subclasses.
+        """
+        pass
+
+    def close(self):
+        """
+        Tidy up any resources used by the handler.
+
+        This version does nothing and is intended to be implemented by
+        subclasses.
+        """
+        pass
+
+    def handleError(self, record):
+        """
+        Handle errors which occur during an emit() call.
+
+        This method should be called from handlers when an exception is
+        encountered during an emit() call. If raiseExceptions is false,
+        exceptions get silently ignored. This is what is mostly wanted
+        for a logging system - most users will not care about errors in
+        the logging system, they are more interested in application errors.
+        You could, however, replace this with a custom handler if you wish.
+        The record which was being processed is passed in to this method.
+        """
+        if raiseExceptions:
+            import traceback
+            ei = sys.exc_info()
+            traceback.print_exception(ei[0], ei[1], ei[2], None, sys.stderr)
+            del ei
+
+class StreamHandler(Handler):
+    """
+    A handler class which writes logging records, appropriately formatted,
+    to a stream. Note that this class does not close the stream, as
+    sys.stdout or sys.stderr may be used.
+    """
+    def __init__(self, strm=None):
+        """
+        Initialize the handler.
+
+        If strm is not specified, sys.stderr is used.
+        """
+        Handler.__init__(self)
+        if not strm:
+            strm = sys.stderr
+        self.stream = strm
+        self.formatter = None
+
+    def flush(self):
+        """
+        Flushes the stream.
+        """
+        self.stream.flush()
+
+    def emit(self, record):
+        """
+        Emit a record.
+
+        If a formatter is specified, it is used to format the record.
+        The record is then written to the stream with a trailing newline
+        [N.B. this may be removed depending on feedback]. If exception
+        information is present, it is formatted using
+        traceback.print_exception and appended to the stream.
+        """
+        try:
+            msg = self.format(record)
+            if not hasattr(types, "UnicodeType"): #if no unicode support...
+                self.stream.write("%s\n" % msg)
+            else:
+                try:
+                    self.stream.write("%s\n" % msg)
+                except UnicodeError:
+                    self.stream.write("%s\n" % msg.encode("UTF-8"))
+            self.flush()
+        except:
+            self.handleError(record)
+
+class FileHandler(StreamHandler):
+    """
+    A handler class which writes formatted logging records to disk files.
+    """
+    def __init__(self, filename, mode="a"):
+        """
+        Open the specified file and use it as the stream for logging.
+        """
+        StreamHandler.__init__(self, open(filename, mode))
+        self.baseFilename = filename
+        self.mode = mode
+
+    def close(self):
+        """
+        Closes the stream.
+        """
+        self.stream.close()
+
+#---------------------------------------------------------------------------
+#   Manager classes and functions
+#---------------------------------------------------------------------------
+
+class PlaceHolder:
+    """
+    PlaceHolder instances are used in the Manager logger hierarchy to take
+    the place of nodes for which no loggers have been defined [FIXME add
+    example].
+    """
+    def __init__(self, alogger):
+        """
+        Initialize with the specified logger being a child of this placeholder.
+        """
+        self.loggers = [alogger]
+
+    def append(self, alogger):
+        """
+        Add the specified logger as a child of this placeholder.
+        """
+        if alogger not in self.loggers:
+            self.loggers.append(alogger)
+
+#
+#   Determine which class to use when instantiating loggers.
+#
+_loggerClass = None
+
+def setLoggerClass(klass):
+    """
+    Set the class to be used when instantiating a logger. The class should
+    define __init__() such that only a name argument is required, and the
+    __init__() should call Logger.__init__()
+    """
+    if klass != Logger:
+        if not issubclass(klass, Logger):
+            raise TypeError, "logger not derived from logging.Logger: " + \
+                            klass.__name__
+    global _loggerClass
+    _loggerClass = klass
+
+class Manager:
+    """
+    There is [under normal circumstances] just one Manager instance, which
+    holds the hierarchy of loggers.
+    """
+    def __init__(self, rootnode):
+        """
+        Initialize the manager with the root node of the logger hierarchy.
+        """
+        self.root = rootnode
+        self.disable = 0
+        self.emittedNoHandlerWarning = 0
+        self.loggerDict = {}
+
+    def getLogger(self, name):
+        """
+        Get a logger with the specified name (channel name), creating it
+        if it doesn't yet exist.
+
+        If a PlaceHolder existed for the specified name [i.e. the logger
+        didn't exist but a child of it did], replace it with the created
+        logger and fix up the parent/child references which pointed to the
+        placeholder to now point to the logger.
+        """
+        rv = None
+        _acquireLock()
+        try:
+            if self.loggerDict.has_key(name):
+                rv = self.loggerDict[name]
+                if isinstance(rv, PlaceHolder):
+                    ph = rv
+                    rv = _loggerClass(name)
+                    rv.manager = self
+                    self.loggerDict[name] = rv
+                    self._fixupChildren(ph, rv)
+                    self._fixupParents(rv)
+            else:
+                rv = _loggerClass(name)
+                rv.manager = self
+                self.loggerDict[name] = rv
+                self._fixupParents(rv)
+        finally:
+            _releaseLock()
+        return rv
+
+    def _fixupParents(self, alogger):
+        """
+        Ensure that there are either loggers or placeholders all the way
+        from the specified logger to the root of the logger hierarchy.
+        """
+        name = alogger.name
+        i = string.rfind(name, ".")
+        rv = None
+        while (i > 0) and not rv:
+            substr = name[:i]
+            if not self.loggerDict.has_key(substr):
+                self.loggerDict[substr] = PlaceHolder(alogger)
+            else:
+                obj = self.loggerDict[substr]
+                if isinstance(obj, Logger):
+                    rv = obj
+                else:
+                    assert isinstance(obj, PlaceHolder)
+                    obj.append(alogger)
+            i = string.rfind(name, ".", 0, i - 1)
+        if not rv:
+            rv = self.root
+        alogger.parent = rv
+
+    def _fixupChildren(self, ph, alogger):
+        """
+        Ensure that children of the placeholder ph are connected to the
+        specified logger.
+        """
+        for c in ph.loggers:
+            if string.find(c.parent.name, alogger.name) <> 0:
+                alogger.parent = c.parent
+                c.parent = alogger
+
+#---------------------------------------------------------------------------
+#   Logger classes and functions
+#---------------------------------------------------------------------------
+
+class Logger(Filterer):
+    """
+    Instances of the Logger class represent a single logging channel. A
+    "logging channel" indicates an area of an application. Exactly how an
+    "area" is defined is up to the application developer. Since an
+    application can have any number of areas, logging channels are identified
+    by a unique string. Application areas can be nested (e.g. an area
+    of "input processing" might include sub-areas "read CSV files", "read
+    XLS files" and "read Gnumeric files"). To cater for this natural nesting,
+    channel names are organized into a namespace hierarchy where levels are
+    separated by periods, much like the Java or Python package namespace. So
+    in the instance given above, channel names might be "input" for the upper
+    level, and "input.csv", "input.xls" and "input.gnu" for the sub-levels.
+    There is no arbitrary limit to the depth of nesting.
+    """
+    def __init__(self, name, level=NOTSET):
+        """
+        Initialize the logger with a name and an optional level.
+        """
+        Filterer.__init__(self)
+        self.name = name
+        self.level = level
+        self.parent = None
+        self.propagate = 1
+        self.handlers = []
+        self.disabled = 0
+
+    def setLevel(self, level):
+        """
+        Set the logging level of this logger.
+        """
+        self.level = level
+
+#   def getRoot(self):
+#       """
+#       Get the root of the logger hierarchy.
+#       """
+#       return Logger.root
+
+    def debug(self, msg, *args, **kwargs):
+        """
+        Log 'msg % args' with severity 'DEBUG'.
+
+        To pass exception information, use the keyword argument exc_info with
+        a true value, e.g.
+
+        logger.debug("Houston, we have a %s", "thorny problem", exc_info=1)
+        """
+        if self.manager.disable >= DEBUG:
+            return
+        if DEBUG >= self.getEffectiveLevel():
+            apply(self._log, (DEBUG, msg, args), kwargs)
+
+    def info(self, msg, *args, **kwargs):
+        """
+        Log 'msg % args' with severity 'INFO'.
+
+        To pass exception information, use the keyword argument exc_info with
+        a true value, e.g.
+
+        logger.info("Houston, we have a %s", "interesting problem", exc_info=1)
+        """
+        if self.manager.disable >= INFO:
+            return
+        if INFO >= self.getEffectiveLevel():
+            apply(self._log, (INFO, msg, args), kwargs)
+
+    def warning(self, msg, *args, **kwargs):
+        """
+        Log 'msg % args' with severity 'WARNING'.
+
+        To pass exception information, use the keyword argument exc_info with
+        a true value, e.g.
+
+        logger.warning("Houston, we have a %s", "bit of a problem", exc_info=1)
+        """
+        if self.manager.disable >= WARNING:
+            return
+        if self.isEnabledFor(WARNING):
+            apply(self._log, (WARNING, msg, args), kwargs)
+
+    warn = warning
+
+    def error(self, msg, *args, **kwargs):
+        """
+        Log 'msg % args' with severity 'ERROR'.
+
+        To pass exception information, use the keyword argument exc_info with
+        a true value, e.g.
+
+        logger.error("Houston, we have a %s", "major problem", exc_info=1)
+        """
+        if self.manager.disable >= ERROR:
+            return
+        if self.isEnabledFor(ERROR):
+            apply(self._log, (ERROR, msg, args), kwargs)
+
+    def exception(self, msg, *args):
+        """
+        Convenience method for logging an ERROR with exception information.
+        """
+        apply(self.error, (msg,) + args, {'exc_info': 1})
+
+    def critical(self, msg, *args, **kwargs):
+        """
+        Log 'msg % args' with severity 'CRITICAL'.
+
+        To pass exception information, use the keyword argument exc_info with
+        a true value, e.g.
+
+        logger.critical("Houston, we have a %s", "major disaster", exc_info=1)
+        """
+        if self.manager.disable >= CRITICAL:
+            return
+        if CRITICAL >= self.getEffectiveLevel():
+            apply(self._log, (CRITICAL, msg, args), kwargs)
+
+    fatal = critical
+
+    def log(self, level, msg, *args, **kwargs):
+        """
+        Log 'msg % args' with the severity 'level'.
+
+        To pass exception information, use the keyword argument exc_info with
+        a true value, e.g.
+
+        logger.log(level, "We have a %s", "mysterious problem", exc_info=1)
+        """
+        if self.manager.disable >= level:
+            return
+        if self.isEnabledFor(level):
+            apply(self._log, (level, msg, args), kwargs)
+
+    def findCaller(self):
+        """
+        Find the stack frame of the caller so that we can note the source
+        file name and line number.
+        """
+        f = sys._getframe(1)
+        while 1:
+            co = f.f_code
+            filename = os.path.normcase(co.co_filename)
+            if filename == _srcfile:
+                f = f.f_back
+                continue
+            return filename, f.f_lineno
+
+    def makeRecord(self, name, level, fn, lno, msg, args, exc_info):
+        """
+        A factory method which can be overridden in subclasses to create
+        specialized LogRecords.
+        """
+        return LogRecord(name, level, fn, lno, msg, args, exc_info)
+
+    def _log(self, level, msg, args, exc_info=None):
+        """
+        Low-level logging routine which creates a LogRecord and then calls
+        all the handlers of this logger to handle the record.
+        """
+        if _srcfile:
+            fn, lno = self.findCaller()
+        else:
+            fn, lno = "<unknown file>", 0
+        if exc_info:
+            exc_info = sys.exc_info()
+        record = self.makeRecord(self.name, level, fn, lno, msg, args, exc_info)
+        self.handle(record)
+
+    def handle(self, record):
+        """
+        Call the handlers for the specified record.
+
+        This method is used for unpickled records received from a socket, as
+        well as those created locally. Logger-level filtering is applied.
+        """
+        if (not self.disabled) and self.filter(record):
+            self.callHandlers(record)
+
+    def addHandler(self, hdlr):
+        """
+        Add the specified handler to this logger.
+        """
+        if not (hdlr in self.handlers):
+            self.handlers.append(hdlr)
+
+    def removeHandler(self, hdlr):
+        """
+        Remove the specified handler from this logger.
+        """
+        if hdlr in self.handlers:
+            #hdlr.close()
+            self.handlers.remove(hdlr)
+
+    def callHandlers(self, record):
+        """
+        Pass a record to all relevant handlers.
+
+        Loop through all handlers for this logger and its parents in the
+        logger hierarchy. If no handler was found, output a one-off error
+        message to sys.stderr. Stop searching up the hierarchy whenever a
+        logger with the "propagate" attribute set to zero is found - that
+        will be the last logger whose handlers are called.
+        """
+        c = self
+        found = 0
+        while c:
+            for hdlr in c.handlers:
+                found = found + 1
+                if record.levelno >= hdlr.level:
+                    hdlr.handle(record)
+            if not c.propagate:
+                c = None    #break out
+            else:
+                c = c.parent
+        if (found == 0) and not self.manager.emittedNoHandlerWarning:
+            sys.stderr.write("No handlers could be found for logger"
+                             " \"%s\"\n" % self.name)
+            self.manager.emittedNoHandlerWarning = 1
+
+    def getEffectiveLevel(self):
+        """
+        Get the effective level for this logger.
+
+        Loop through this logger and its parents in the logger hierarchy,
+        looking for a non-zero logging level. Return the first one found.
+        """
+        logger = self
+        while logger:
+            if logger.level:
+                return logger.level
+            logger = logger.parent
+        return NOTSET
+
+    def isEnabledFor(self, level):
+        """
+        Is this logger enabled for level 'level'?
+        """
+        if self.manager.disable >= level:
+            return 0
+        return level >= self.getEffectiveLevel()
+
+class RootLogger(Logger):
+    """
+    A root logger is not that different to any other logger, except that
+    it must have a logging level and there is only one instance of it in
+    the hierarchy.
+    """
+    def __init__(self, level):
+        """
+        Initialize the logger with the name "root".
+        """
+        Logger.__init__(self, "root", level)
+
+_loggerClass = Logger
+
+root = RootLogger(WARNING)
+Logger.root = root
+Logger.manager = Manager(Logger.root)
+
+#---------------------------------------------------------------------------
+# Configuration classes and functions
+#---------------------------------------------------------------------------
+
+BASIC_FORMAT = "%(levelname)s:%(name)s:%(message)s"
+
+def basicConfig():
+    """
+    Do basic configuration for the logging system by creating a
+    StreamHandler with a default Formatter and adding it to the
+    root logger.
+    """
+    if len(root.handlers) == 0:
+        hdlr = StreamHandler()
+        fmt = Formatter(BASIC_FORMAT)
+        hdlr.setFormatter(fmt)
+        root.addHandler(hdlr)
+
+#---------------------------------------------------------------------------
+# Utility functions at module level.
+# Basically delegate everything to the root logger.
+#---------------------------------------------------------------------------
+
+def getLogger(name=None):
+    """
+    Return a logger with the specified name, creating it if necessary.
+
+    If no name is specified, return the root logger.
+    """
+    if name:
+        return Logger.manager.getLogger(name)
+    else:
+        return root
+
+#def getRootLogger():
+#    """
+#    Return the root logger.
+#
+#    Note that getLogger('') now does the same thing, so this function is
+#    deprecated and may disappear in the future.
+#    """
+#    return root
+
+def critical(msg, *args, **kwargs):
+    """
+    Log a message with severity 'CRITICAL' on the root logger.
+    """
+    if len(root.handlers) == 0:
+        basicConfig()
+    apply(root.critical, (msg,)+args, kwargs)
+
+fatal = critical
+
+def error(msg, *args, **kwargs):
+    """
+    Log a message with severity 'ERROR' on the root logger.
+    """
+    if len(root.handlers) == 0:
+        basicConfig()
+    apply(root.error, (msg,)+args, kwargs)
+
+def exception(msg, *args):
+    """
+    Log a message with severity 'ERROR' on the root logger,
+    with exception information.
+    """
+    apply(error, (msg,)+args, {'exc_info': 1})
+
+def warning(msg, *args, **kwargs):
+    """
+    Log a message with severity 'WARNING' on the root logger.
+    """
+    if len(root.handlers) == 0:
+        basicConfig()
+    apply(root.warning, (msg,)+args, kwargs)
+
+warn = warning
+
+def info(msg, *args, **kwargs):
+    """
+    Log a message with severity 'INFO' on the root logger.
+    """
+    if len(root.handlers) == 0:
+        basicConfig()
+    apply(root.info, (msg,)+args, kwargs)
+
+def debug(msg, *args, **kwargs):
+    """
+    Log a message with severity 'DEBUG' on the root logger.
+    """
+    if len(root.handlers) == 0:
+        basicConfig()
+    apply(root.debug, (msg,)+args, kwargs)
+
+def disable(level):
+    """
+    Disable all logging calls less severe than 'level'.
+    """
+    root.manager.disable = level
+
+def shutdown():
+    """
+    Perform any cleanup actions in the logging system (e.g. flushing
+    buffers).
+
+    Should be called at application exit.
+    """
+    for h in _handlers.keys():
+        h.flush()
+        h.close()
diff --git a/PlanetWebKit/planet/planet/compat_logging/config.py b/PlanetWebKit/planet/planet/compat_logging/config.py
new file mode 100644 (file)
index 0000000..d4d08f0
--- /dev/null
@@ -0,0 +1,299 @@
+# Copyright 2001-2002 by Vinay Sajip. All Rights Reserved.
+#
+# Permission to use, copy, modify, and distribute this software and its
+# documentation for any purpose and without fee is hereby granted,
+# provided that the above copyright notice appear in all copies and that
+# both that copyright notice and this permission notice appear in
+# supporting documentation, and that the name of Vinay Sajip
+# not be used in advertising or publicity pertaining to distribution
+# of the software without specific, written prior permission.
+# VINAY SAJIP DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
+# ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL
+# VINAY SAJIP BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR
+# ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
+# IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
+# OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+
+"""
+Logging package for Python. Based on PEP 282 and comments thereto in
+comp.lang.python, and influenced by Apache's log4j system.
+
+Should work under Python versions >= 1.5.2, except that source line
+information is not available unless 'inspect' is.
+
+Copyright (C) 2001-2002 Vinay Sajip. All Rights Reserved.
+
+To use, simply 'import logging' and log away!
+"""
+
+import sys, logging, logging.handlers, string, thread, threading, socket, struct, os
+
+from SocketServer import ThreadingTCPServer, StreamRequestHandler
+
+
+DEFAULT_LOGGING_CONFIG_PORT = 9030
+if sys.platform == "win32":
+    RESET_ERROR = 10054   #WSAECONNRESET
+else:
+    RESET_ERROR = 104     #ECONNRESET
+
+#
+#   The following code implements a socket listener for on-the-fly
+#   reconfiguration of logging.
+#
+#   _listener holds the server object doing the listening
+_listener = None
+
+def fileConfig(fname, defaults=None):
+    """
+    Read the logging configuration from a ConfigParser-format file.
+
+    This can be called several times from an application, allowing an end user
+    the ability to select from various pre-canned configurations (if the
+    developer provides a mechanism to present the choices and load the chosen
+    configuration).
+    In versions of ConfigParser which have the readfp method [typically
+    shipped in 2.x versions of Python], you can pass in a file-like object
+    rather than a filename, in which case the file-like object will be read
+    using readfp.
+    """
+    import ConfigParser
+
+    cp = ConfigParser.ConfigParser(defaults)
+    if hasattr(cp, 'readfp') and hasattr(fname, 'readline'):
+        cp.readfp(fname)
+    else:
+        cp.read(fname)
+    #first, do the formatters...
+    flist = cp.get("formatters", "keys")
+    if len(flist):
+        flist = string.split(flist, ",")
+        formatters = {}
+        for form in flist:
+            sectname = "formatter_%s" % form
+            opts = cp.options(sectname)
+            if "format" in opts:
+                fs = cp.get(sectname, "format", 1)
+            else:
+                fs = None
+            if "datefmt" in opts:
+                dfs = cp.get(sectname, "datefmt", 1)
+            else:
+                dfs = None
+            f = logging.Formatter(fs, dfs)
+            formatters[form] = f
+    #next, do the handlers...
+    #critical section...
+    logging._acquireLock()
+    try:
+        try:
+            #first, lose the existing handlers...
+            logging._handlers.clear()
+            #now set up the new ones...
+            hlist = cp.get("handlers", "keys")
+            if len(hlist):
+                hlist = string.split(hlist, ",")
+                handlers = {}
+                fixups = [] #for inter-handler references
+                for hand in hlist:
+                    sectname = "handler_%s" % hand
+                    klass = cp.get(sectname, "class")
+                    opts = cp.options(sectname)
+                    if "formatter" in opts:
+                        fmt = cp.get(sectname, "formatter")
+                    else:
+                        fmt = ""
+                    klass = eval(klass, vars(logging))
+                    args = cp.get(sectname, "args")
+                    args = eval(args, vars(logging))
+                    h = apply(klass, args)
+                    if "level" in opts:
+                        level = cp.get(sectname, "level")
+                        h.setLevel(logging._levelNames[level])
+                    if len(fmt):
+                        h.setFormatter(formatters[fmt])
+                    #temporary hack for FileHandler and MemoryHandler.
+                    if klass == logging.handlers.MemoryHandler:
+                        if "target" in opts:
+                            target = cp.get(sectname,"target")
+                        else:
+                            target = ""
+                        if len(target): #the target handler may not be loaded yet, so keep for later...
+                            fixups.append((h, target))
+                    handlers[hand] = h
+                #now all handlers are loaded, fixup inter-handler references...
+                for fixup in fixups:
+                    h = fixup[0]
+                    t = fixup[1]
+                    h.setTarget(handlers[t])
+            #at last, the loggers...first the root...
+            llist = cp.get("loggers", "keys")
+            llist = string.split(llist, ",")
+            llist.remove("root")
+            sectname = "logger_root"
+            root = logging.root
+            log = root
+            opts = cp.options(sectname)
+            if "level" in opts:
+                level = cp.get(sectname, "level")
+                log.setLevel(logging._levelNames[level])
+            for h in root.handlers[:]:
+                root.removeHandler(h)
+            hlist = cp.get(sectname, "handlers")
+            if len(hlist):
+                hlist = string.split(hlist, ",")
+                for hand in hlist:
+                    log.addHandler(handlers[hand])
+            #and now the others...
+            #we don't want to lose the existing loggers,
+            #since other threads may have pointers to them.
+            #existing is set to contain all existing loggers,
+            #and as we go through the new configuration we
+            #remove any which are configured. At the end,
+            #what's left in existing is the set of loggers
+            #which were in the previous configuration but
+            #which are not in the new configuration.
+            existing = root.manager.loggerDict.keys()
+            #now set up the new ones...
+            for log in llist:
+                sectname = "logger_%s" % log
+                qn = cp.get(sectname, "qualname")
+                opts = cp.options(sectname)
+                if "propagate" in opts:
+                    propagate = cp.getint(sectname, "propagate")
+                else:
+                    propagate = 1
+                logger = logging.getLogger(qn)
+                if qn in existing:
+                    existing.remove(qn)
+                if "level" in opts:
+                    level = cp.get(sectname, "level")
+                    logger.setLevel(logging._levelNames[level])
+                for h in logger.handlers[:]:
+                    logger.removeHandler(h)
+                logger.propagate = propagate
+                logger.disabled = 0
+                hlist = cp.get(sectname, "handlers")
+                if len(hlist):
+                    hlist = string.split(hlist, ",")
+                    for hand in hlist:
+                        logger.addHandler(handlers[hand])
+            #Disable any old loggers. There's no point deleting
+            #them as other threads may continue to hold references
+            #and by disabling them, you stop them doing any logging.
+            for log in existing:
+                root.manager.loggerDict[log].disabled = 1
+        except:
+            import traceback
+            ei = sys.exc_info()
+            traceback.print_exception(ei[0], ei[1], ei[2], None, sys.stderr)
+            del ei
+    finally:
+        logging._releaseLock()
+
+def listen(port=DEFAULT_LOGGING_CONFIG_PORT):
+    """
+    Start up a socket server on the specified port, and listen for new
+    configurations.
+
+    These will be sent as a file suitable for processing by fileConfig().
+    Returns a Thread object on which you can call start() to start the server,
+    and which you can join() when appropriate. To stop the server, call
+    stopListening().
+    """
+    if not thread:
+        raise NotImplementedError, "listen() needs threading to work"
+
+    class ConfigStreamHandler(StreamRequestHandler):
+        """
+        Handler for a logging configuration request.
+
+        It expects a completely new logging configuration and uses fileConfig
+        to install it.
+        """
+        def handle(self):
+            """
+            Handle a request.
+
+            Each request is expected to be a 4-byte length,
+            followed by the config file. Uses fileConfig() to do the
+            grunt work.
+            """
+            import tempfile
+            try:
+                conn = self.connection
+                chunk = conn.recv(4)
+                if len(chunk) == 4:
+                    slen = struct.unpack(">L", chunk)[0]
+                    chunk = self.connection.recv(slen)
+                    while len(chunk) < slen:
+                        chunk = chunk + conn.recv(slen - len(chunk))
+                    #Apply new configuration. We'd like to be able to
+                    #create a StringIO and pass that in, but unfortunately
+                    #1.5.2 ConfigParser does not support reading file
+                    #objects, only actual files. So we create a temporary
+                    #file and remove it later.
+                    file = tempfile.mktemp(".ini")
+                    f = open(file, "w")
+                    f.write(chunk)
+                    f.close()
+                    fileConfig(file)
+                    os.remove(file)
+            except socket.error, e:
+                if type(e.args) != types.TupleType:
+                    raise
+                else:
+                    errcode = e.args[0]
+                    if errcode != RESET_ERROR:
+                        raise
+
+    class ConfigSocketReceiver(ThreadingTCPServer):
+        """
+        A simple TCP socket-based logging config receiver.
+        """
+
+        allow_reuse_address = 1
+
+        def __init__(self, host='localhost', port=DEFAULT_LOGGING_CONFIG_PORT,
+                     handler=None):
+            ThreadingTCPServer.__init__(self, (host, port), handler)
+            logging._acquireLock()
+            self.abort = 0
+            logging._releaseLock()
+            self.timeout = 1
+
+        def serve_until_stopped(self):
+            import select
+            abort = 0
+            while not abort:
+                rd, wr, ex = select.select([self.socket.fileno()],
+                                           [], [],
+                                           self.timeout)
+                if rd:
+                    self.handle_request()
+                logging._acquireLock()
+                abort = self.abort
+                logging._releaseLock()
+
+    def serve(rcvr, hdlr, port):
+        server = rcvr(port=port, handler=hdlr)
+        global _listener
+        logging._acquireLock()
+        _listener = server
+        logging._releaseLock()
+        server.serve_until_stopped()
+
+    return threading.Thread(target=serve,
+                            args=(ConfigSocketReceiver,
+                                  ConfigStreamHandler, port))
+
+def stopListening():
+    """
+    Stop the listening server which was created with a call to listen().
+    """
+    global _listener
+    if _listener:
+        logging._acquireLock()
+        _listener.abort = 1
+        _listener = None
+        logging._releaseLock()
diff --git a/PlanetWebKit/planet/planet/compat_logging/handlers.py b/PlanetWebKit/planet/planet/compat_logging/handlers.py
new file mode 100644 (file)
index 0000000..26ca8ad
--- /dev/null
@@ -0,0 +1,728 @@
+# Copyright 2001-2002 by Vinay Sajip. All Rights Reserved.
+#
+# Permission to use, copy, modify, and distribute this software and its
+# documentation for any purpose and without fee is hereby granted,
+# provided that the above copyright notice appear in all copies and that
+# both that copyright notice and this permission notice appear in
+# supporting documentation, and that the name of Vinay Sajip
+# not be used in advertising or publicity pertaining to distribution
+# of the software without specific, written prior permission.
+# VINAY SAJIP DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
+# ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL
+# VINAY SAJIP BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR
+# ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
+# IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
+# OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+
+"""
+Logging package for Python. Based on PEP 282 and comments thereto in
+comp.lang.python, and influenced by Apache's log4j system.
+
+Should work under Python versions >= 1.5.2, except that source line
+information is not available unless 'inspect' is.
+
+Copyright (C) 2001-2002 Vinay Sajip. All Rights Reserved.
+
+To use, simply 'import logging' and log away!
+"""
+
+import sys, logging, socket, types, os, string, cPickle, struct, time
+
+from SocketServer import ThreadingTCPServer, StreamRequestHandler
+
+#
+# Some constants...
+#
+
+DEFAULT_TCP_LOGGING_PORT    = 9020
+DEFAULT_UDP_LOGGING_PORT    = 9021
+DEFAULT_HTTP_LOGGING_PORT   = 9022
+DEFAULT_SOAP_LOGGING_PORT   = 9023
+SYSLOG_UDP_PORT             = 514
+
+
+class RotatingFileHandler(logging.FileHandler):
+    def __init__(self, filename, mode="a", maxBytes=0, backupCount=0):
+        """
+        Open the specified file and use it as the stream for logging.
+
+        By default, the file grows indefinitely. You can specify particular
+        values of maxBytes and backupCount to allow the file to rollover at
+        a predetermined size.
+
+        Rollover occurs whenever the current log file is nearly maxBytes in
+        length. If backupCount is >= 1, the system will successively create
+        new files with the same pathname as the base file, but with extensions
+        ".1", ".2" etc. appended to it. For example, with a backupCount of 5
+        and a base file name of "app.log", you would get "app.log",
+        "app.log.1", "app.log.2", ... through to "app.log.5". The file being
+        written to is always "app.log" - when it gets filled up, it is closed
+        and renamed to "app.log.1", and if files "app.log.1", "app.log.2" etc.
+        exist, then they are renamed to "app.log.2", "app.log.3" etc.
+        respectively.
+
+        If maxBytes is zero, rollover never occurs.
+        """
+        logging.FileHandler.__init__(self, filename, mode)
+        self.maxBytes = maxBytes
+        self.backupCount = backupCount
+        if maxBytes > 0:
+            self.mode = "a"
+
+    def doRollover(self):
+        """
+        Do a rollover, as described in __init__().
+        """
+
+        self.stream.close()
+        if self.backupCount > 0:
+            for i in range(self.backupCount - 1, 0, -1):
+                sfn = "%s.%d" % (self.baseFilename, i)
+                dfn = "%s.%d" % (self.baseFilename, i + 1)
+                if os.path.exists(sfn):
+                    #print "%s -> %s" % (sfn, dfn)
+                    if os.path.exists(dfn):
+                        os.remove(dfn)
+                    os.rename(sfn, dfn)
+            dfn = self.baseFilename + ".1"
+            if os.path.exists(dfn):
+                os.remove(dfn)
+            os.rename(self.baseFilename, dfn)
+            #print "%s -> %s" % (self.baseFilename, dfn)
+        self.stream = open(self.baseFilename, "w")
+
+    def emit(self, record):
+        """
+        Emit a record.
+
+        Output the record to the file, catering for rollover as described
+        in doRollover().
+        """
+        if self.maxBytes > 0:                   # are we rolling over?
+            msg = "%s\n" % self.format(record)
+            self.stream.seek(0, 2)  #due to non-posix-compliant Windows feature
+            if self.stream.tell() + len(msg) >= self.maxBytes:
+                self.doRollover()
+        logging.FileHandler.emit(self, record)
+
+
+class SocketHandler(logging.Handler):
+    """
+    A handler class which writes logging records, in pickle format, to
+    a streaming socket. The socket is kept open across logging calls.
+    If the peer resets it, an attempt is made to reconnect on the next call.
+    The pickle which is sent is that of the LogRecord's attribute dictionary
+    (__dict__), so that the receiver does not need to have the logging module
+    installed in order to process the logging event.
+
+    To unpickle the record at the receiving end into a LogRecord, use the
+    makeLogRecord function.
+    """
+
+    def __init__(self, host, port):
+        """
+        Initializes the handler with a specific host address and port.
+
+        The attribute 'closeOnError' is set to 1 - which means that if
+        a socket error occurs, the socket is silently closed and then
+        reopened on the next logging call.
+        """
+        logging.Handler.__init__(self)
+        self.host = host
+        self.port = port
+        self.sock = None
+        self.closeOnError = 0
+
+    def makeSocket(self):
+        """
+        A factory method which allows subclasses to define the precise
+        type of socket they want.
+        """
+        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+        s.connect((self.host, self.port))
+        return s
+
+    def send(self, s):
+        """
+        Send a pickled string to the socket.
+
+        This function allows for partial sends which can happen when the
+        network is busy.
+        """
+        if hasattr(self.sock, "sendall"):
+            self.sock.sendall(s)
+        else:
+            sentsofar = 0
+            left = len(s)
+            while left > 0:
+                sent = self.sock.send(s[sentsofar:])
+                sentsofar = sentsofar + sent
+                left = left - sent
+
+    def makePickle(self, record):
+        """
+        Pickles the record in binary format with a length prefix, and
+        returns it ready for transmission across the socket.
+        """
+        s = cPickle.dumps(record.__dict__, 1)
+        #n = len(s)
+        #slen = "%c%c" % ((n >> 8) & 0xFF, n & 0xFF)
+        slen = struct.pack(">L", len(s))
+        return slen + s
+
+    def handleError(self, record):
+        """
+        Handle an error during logging.
+
+        An error has occurred during logging. Most likely cause -
+        connection lost. Close the socket so that we can retry on the
+        next event.
+        """
+        if self.closeOnError and self.sock:
+            self.sock.close()
+            self.sock = None        #try to reconnect next time
+        else:
+            logging.Handler.handleError(self, record)
+
+    def emit(self, record):
+        """
+        Emit a record.
+
+        Pickles the record and writes it to the socket in binary format.
+        If there is an error with the socket, silently drop the packet.
+        If there was a problem with the socket, re-establishes the
+        socket.
+        """
+        try:
+            s = self.makePickle(record)
+            if not self.sock:
+                self.sock = self.makeSocket()
+            self.send(s)
+        except:
+            self.handleError(record)
+
+    def close(self):
+        """
+        Closes the socket.
+        """
+        if self.sock:
+            self.sock.close()
+            self.sock = None
+
+class DatagramHandler(SocketHandler):
+    """
+    A handler class which writes logging records, in pickle format, to
+    a datagram socket.  The pickle which is sent is that of the LogRecord's
+    attribute dictionary (__dict__), so that the receiver does not need to
+    have the logging module installed in order to process the logging event.
+
+    To unpickle the record at the receiving end into a LogRecord, use the
+    makeLogRecord function.
+
+    """
+    def __init__(self, host, port):
+        """
+        Initializes the handler with a specific host address and port.
+        """
+        SocketHandler.__init__(self, host, port)
+        self.closeOnError = 0
+
+    def makeSocket(self):
+        """
+        The factory method of SocketHandler is here overridden to create
+        a UDP socket (SOCK_DGRAM).
+        """
+        s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
+        return s
+
+    def send(self, s):
+        """
+        Send a pickled string to a socket.
+
+        This function no longer allows for partial sends which can happen
+        when the network is busy - UDP does not guarantee delivery and
+        can deliver packets out of sequence.
+        """
+        self.sock.sendto(s, (self.host, self.port))
+
+class SysLogHandler(logging.Handler):
+    """
+    A handler class which sends formatted logging records to a syslog
+    server. Based on Sam Rushing's syslog module:
+    http://www.nightmare.com/squirl/python-ext/misc/syslog.py
+    Contributed by Nicolas Untz (after which minor refactoring changes
+    have been made).
+    """
+
+    # from <linux/sys/syslog.h>:
+    # ======================================================================
+    # priorities/facilities are encoded into a single 32-bit quantity, where
+    # the bottom 3 bits are the priority (0-7) and the top 28 bits are the
+    # facility (0-big number). Both the priorities and the facilities map
+    # roughly one-to-one to strings in the syslogd(8) source code.  This
+    # mapping is included in this file.
+    #
+    # priorities (these are ordered)
+
+    LOG_EMERG     = 0       #  system is unusable
+    LOG_ALERT     = 1       #  action must be taken immediately
+    LOG_CRIT      = 2       #  critical conditions
+    LOG_ERR       = 3       #  error conditions
+    LOG_WARNING   = 4       #  warning conditions
+    LOG_NOTICE    = 5       #  normal but significant condition
+    LOG_INFO      = 6       #  informational
+    LOG_DEBUG     = 7       #  debug-level messages
+
+    #  facility codes
+    LOG_KERN      = 0       #  kernel messages
+    LOG_USER      = 1       #  random user-level messages
+    LOG_MAIL      = 2       #  mail system
+    LOG_DAEMON    = 3       #  system daemons
+    LOG_AUTH      = 4       #  security/authorization messages
+    LOG_SYSLOG    = 5       #  messages generated internally by syslogd
+    LOG_LPR       = 6       #  line printer subsystem
+    LOG_NEWS      = 7       #  network news subsystem
+    LOG_UUCP      = 8       #  UUCP subsystem
+    LOG_CRON      = 9       #  clock daemon
+    LOG_AUTHPRIV  = 10  #  security/authorization messages (private)
+
+    #  other codes through 15 reserved for system use
+    LOG_LOCAL0    = 16      #  reserved for local use
+    LOG_LOCAL1    = 17      #  reserved for local use
+    LOG_LOCAL2    = 18      #  reserved for local use
+    LOG_LOCAL3    = 19      #  reserved for local use
+    LOG_LOCAL4    = 20      #  reserved for local use
+    LOG_LOCAL5    = 21      #  reserved for local use
+    LOG_LOCAL6    = 22      #  reserved for local use
+    LOG_LOCAL7    = 23      #  reserved for local use
+
+    priority_names = {
+        "alert":    LOG_ALERT,
+        "crit":     LOG_CRIT,
+        "critical": LOG_CRIT,
+        "debug":    LOG_DEBUG,
+        "emerg":    LOG_EMERG,
+        "err":      LOG_ERR,
+        "error":    LOG_ERR,        #  DEPRECATED
+        "info":     LOG_INFO,
+        "notice":   LOG_NOTICE,
+        "panic":    LOG_EMERG,      #  DEPRECATED
+        "warn":     LOG_WARNING,    #  DEPRECATED
+        "warning":  LOG_WARNING,
+        }
+
+    facility_names = {
+        "auth":     LOG_AUTH,
+        "authpriv": LOG_AUTHPRIV,
+        "cron":     LOG_CRON,
+        "daemon":   LOG_DAEMON,
+        "kern":     LOG_KERN,
+        "lpr":      LOG_LPR,
+        "mail":     LOG_MAIL,
+        "news":     LOG_NEWS,
+        "security": LOG_AUTH,       #  DEPRECATED
+        "syslog":   LOG_SYSLOG,
+        "user":     LOG_USER,
+        "uucp":     LOG_UUCP,
+        "local0":   LOG_LOCAL0,
+        "local1":   LOG_LOCAL1,
+        "local2":   LOG_LOCAL2,
+        "local3":   LOG_LOCAL3,
+        "local4":   LOG_LOCAL4,
+        "local5":   LOG_LOCAL5,
+        "local6":   LOG_LOCAL6,
+        "local7":   LOG_LOCAL7,
+        }
+
+    def __init__(self, address=('localhost', SYSLOG_UDP_PORT), facility=LOG_USER):
+        """
+        Initialize a handler.
+
+        If address is specified as a string, UNIX socket is used.
+        If facility is not specified, LOG_USER is used.
+        """
+        logging.Handler.__init__(self)
+
+        self.address = address
+        self.facility = facility
+        if type(address) == types.StringType:
+            self.socket = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
+            # syslog may require either DGRAM or STREAM sockets
+            try:
+                self.socket.connect(address)
+            except socket.error:
+                self.socket.close()
+                self.socket = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
+            self.socket.connect(address)
+            self.unixsocket = 1
+        else:
+            self.socket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
+            self.unixsocket = 0
+
+        self.formatter = None
+
+    # curious: when talking to the unix-domain '/dev/log' socket, a
+    #   zero-terminator seems to be required.  this string is placed
+    #   into a class variable so that it can be overridden if
+    #   necessary.
+    log_format_string = '<%d>%s\000'
+
+    def encodePriority (self, facility, priority):
+        """
+        Encode the facility and priority. You can pass in strings or
+        integers - if strings are passed, the facility_names and
+        priority_names mapping dictionaries are used to convert them to
+        integers.
+        """
+        if type(facility) == types.StringType:
+            facility = self.facility_names[facility]
+        if type(priority) == types.StringType:
+            priority = self.priority_names[priority]
+        return (facility << 3) | priority
+
+    def close (self):
+        """
+        Closes the socket.
+        """
+        if self.unixsocket:
+            self.socket.close()
+
+    def emit(self, record):
+        """
+        Emit a record.
+
+        The record is formatted, and then sent to the syslog server. If
+        exception information is present, it is NOT sent to the server.
+        """
+        msg = self.format(record)
+        """
+        We need to convert record level to lowercase, maybe this will
+        change in the future.
+        """
+        msg = self.log_format_string % (
+            self.encodePriority(self.facility,
+                                string.lower(record.levelname)),
+            msg)
+        try:
+            if self.unixsocket:
+                self.socket.send(msg)
+            else:
+                self.socket.sendto(msg, self.address)
+        except:
+            self.handleError(record)
+
+class SMTPHandler(logging.Handler):
+    """
+    A handler class which sends an SMTP email for each logging event.
+    """
+    def __init__(self, mailhost, fromaddr, toaddrs, subject):
+        """
+        Initialize the handler.
+
+        Initialize the instance with the from and to addresses and subject
+        line of the email. To specify a non-standard SMTP port, use the
+        (host, port) tuple format for the mailhost argument.
+        """
+        logging.Handler.__init__(self)
+        if type(mailhost) == types.TupleType:
+            host, port = mailhost
+            self.mailhost = host
+            self.mailport = port
+        else:
+            self.mailhost = mailhost
+            self.mailport = None
+        self.fromaddr = fromaddr
+        if type(toaddrs) == types.StringType:
+            toaddrs = [toaddrs]
+        self.toaddrs = toaddrs
+        self.subject = subject
+
+    def getSubject(self, record):
+        """
+        Determine the subject for the email.
+
+        If you want to specify a subject line which is record-dependent,
+        override this method.
+        """
+        return self.subject
+
+    weekdayname = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
+
+    monthname = [None,
+                 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
+                 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
+
+    def date_time(self):
+        """Return the current date and time formatted for a MIME header."""
+        year, month, day, hh, mm, ss, wd, y, z = time.gmtime(time.time())
+        s = "%s, %02d %3s %4d %02d:%02d:%02d GMT" % (
+                self.weekdayname[wd],
+                day, self.monthname[month], year,
+                hh, mm, ss)
+        return s
+
+    def emit(self, record):
+        """
+        Emit a record.
+
+        Format the record and send it to the specified addressees.
+        """
+        try:
+            import smtplib
+            port = self.mailport
+            if not port:
+                port = smtplib.SMTP_PORT
+            smtp = smtplib.SMTP(self.mailhost, port)
+            msg = self.format(record)
+            msg = "From: %s\r\nTo: %s\r\nSubject: %s\r\nDate: %s\r\n\r\n%s" % (
+                            self.fromaddr,
+                            string.join(self.toaddrs, ","),
+                            self.getSubject(record),
+                            self.date_time(), msg)
+            smtp.sendmail(self.fromaddr, self.toaddrs, msg)
+            smtp.quit()
+        except:
+            self.handleError(record)
+
+class NTEventLogHandler(logging.Handler):
+    """
+    A handler class which sends events to the NT Event Log. Adds a
+    registry entry for the specified application name. If no dllname is
+    provided, win32service.pyd (which contains some basic message
+    placeholders) is used. Note that use of these placeholders will make
+    your event logs big, as the entire message source is held in the log.
+    If you want slimmer logs, you have to pass in the name of your own DLL
+    which contains the message definitions you want to use in the event log.
+    """
+    def __init__(self, appname, dllname=None, logtype="Application"):
+        logging.Handler.__init__(self)
+        try:
+            import win32evtlogutil, win32evtlog
+            self.appname = appname
+            self._welu = win32evtlogutil
+            if not dllname:
+                dllname = os.path.split(self._welu.__file__)
+                dllname = os.path.split(dllname[0])
+                dllname = os.path.join(dllname[0], r'win32service.pyd')
+            self.dllname = dllname
+            self.logtype = logtype
+            self._welu.AddSourceToRegistry(appname, dllname, logtype)
+            self.deftype = win32evtlog.EVENTLOG_ERROR_TYPE
+            self.typemap = {
+                logging.DEBUG   : win32evtlog.EVENTLOG_INFORMATION_TYPE,
+                logging.INFO    : win32evtlog.EVENTLOG_INFORMATION_TYPE,
+                logging.WARNING : win32evtlog.EVENTLOG_WARNING_TYPE,
+                logging.ERROR   : win32evtlog.EVENTLOG_ERROR_TYPE,
+                logging.CRITICAL: win32evtlog.EVENTLOG_ERROR_TYPE,
+         }
+        except ImportError:
+            print "The Python Win32 extensions for NT (service, event "\
+                        "logging) appear not to be available."
+            self._welu = None
+
+    def getMessageID(self, record):
+        """
+        Return the message ID for the event record. If you are using your
+        own messages, you could do this by having the msg passed to the
+        logger being an ID rather than a formatting string. Then, in here,
+        you could use a dictionary lookup to get the message ID. This
+        version returns 1, which is the base message ID in win32service.pyd.
+        """
+        return 1
+
+    def getEventCategory(self, record):
+        """
+        Return the event category for the record.
+
+        Override this if you want to specify your own categories. This version
+        returns 0.
+        """
+        return 0
+
+    def getEventType(self, record):
+        """
+        Return the event type for the record.
+
+        Override this if you want to specify your own types. This version does
+        a mapping using the handler's typemap attribute, which is set up in
+        __init__() to a dictionary which contains mappings for DEBUG, INFO,
+        WARNING, ERROR and CRITICAL. If you are using your own levels you will
+        either need to override this method or place a suitable dictionary in
+        the handler's typemap attribute.
+        """
+        return self.typemap.get(record.levelno, self.deftype)
+
+    def emit(self, record):
+        """
+        Emit a record.
+
+        Determine the message ID, event category and event type. Then
+        log the message in the NT event log.
+        """
+        if self._welu:
+            try:
+                id = self.getMessageID(record)
+                cat = self.getEventCategory(record)
+                type = self.getEventType(record)
+                msg = self.format(record)
+                self._welu.ReportEvent(self.appname, id, cat, type, [msg])
+            except:
+                self.handleError(record)
+
+    def close(self):
+        """
+        Clean up this handler.
+
+        You can remove the application name from the registry as a
+        source of event log entries. However, if you do this, you will
+        not be able to see the events as you intended in the Event Log
+        Viewer - it needs to be able to access the registry to get the
+        DLL name.
+        """
+        #self._welu.RemoveSourceFromRegistry(self.appname, self.logtype)
+        pass
+
+class HTTPHandler(logging.Handler):
+    """
+    A class which sends records to a Web server, using either GET or
+    POST semantics.
+    """
+    def __init__(self, host, url, method="GET"):
+        """
+        Initialize the instance with the host, the request URL, and the method
+        ("GET" or "POST")
+        """
+        logging.Handler.__init__(self)
+        method = string.upper(method)
+        if method not in ["GET", "POST"]:
+            raise ValueError, "method must be GET or POST"
+        self.host = host
+        self.url = url
+        self.method = method
+
+    def mapLogRecord(self, record):
+        """
+        Default implementation of mapping the log record into a dict
+        that is send as the CGI data. Overwrite in your class.
+        Contributed by Franz  Glasner.
+        """
+        return record.__dict__
+
+    def emit(self, record):
+        """
+        Emit a record.
+
+        Send the record to the Web server as an URL-encoded dictionary
+        """
+        try:
+            import httplib, urllib
+            h = httplib.HTTP(self.host)
+            url = self.url
+            data = urllib.urlencode(self.mapLogRecord(record))
+            if self.method == "GET":
+                if (string.find(url, '?') >= 0):
+                    sep = '&'
+                else:
+                    sep = '?'
+                url = url + "%c%s" % (sep, data)
+            h.putrequest(self.method, url)
+            if self.method == "POST":
+                h.putheader("Content-length", str(len(data)))
+            h.endheaders()
+            if self.method == "POST":
+                h.send(data)
+            h.getreply()    #can't do anything with the result
+        except:
+            self.handleError(record)
+
+class BufferingHandler(logging.Handler):
+    """
+  A handler class which buffers logging records in memory. Whenever each
+  record is added to the buffer, a check is made to see if the buffer should
+  be flushed. If it should, then flush() is expected to do what's needed.
+    """
+    def __init__(self, capacity):
+        """
+        Initialize the handler with the buffer size.
+        """
+        logging.Handler.__init__(self)
+        self.capacity = capacity
+        self.buffer = []
+
+    def shouldFlush(self, record):
+        """
+        Should the handler flush its buffer?
+
+        Returns true if the buffer is up to capacity. This method can be
+        overridden to implement custom flushing strategies.
+        """
+        return (len(self.buffer) >= self.capacity)
+
+    def emit(self, record):
+        """
+        Emit a record.
+
+        Append the record. If shouldFlush() tells us to, call flush() to process
+        the buffer.
+        """
+        self.buffer.append(record)
+        if self.shouldFlush(record):
+            self.flush()
+
+    def flush(self):
+        """
+        Override to implement custom flushing behaviour.
+
+        This version just zaps the buffer to empty.
+        """
+        self.buffer = []
+
+class MemoryHandler(BufferingHandler):
+    """
+    A handler class which buffers logging records in memory, periodically
+    flushing them to a target handler. Flushing occurs whenever the buffer
+    is full, or when an event of a certain severity or greater is seen.
+    """
+    def __init__(self, capacity, flushLevel=logging.ERROR, target=None):
+        """
+        Initialize the handler with the buffer size, the level at which
+        flushing should occur and an optional target.
+
+        Note that without a target being set either here or via setTarget(),
+        a MemoryHandler is no use to anyone!
+        """
+        BufferingHandler.__init__(self, capacity)
+        self.flushLevel = flushLevel
+        self.target = target
+
+    def shouldFlush(self, record):
+        """
+        Check for buffer full or a record at the flushLevel or higher.
+        """
+        return (len(self.buffer) >= self.capacity) or \
+                (record.levelno >= self.flushLevel)
+
+    def setTarget(self, target):
+        """
+        Set the target handler for this handler.
+        """
+        self.target = target
+
+    def flush(self):
+        """
+        For a MemoryHandler, flushing means just sending the buffered
+        records to the target, if there is one. Override if you want
+        different behaviour.
+        """
+        if self.target:
+            for record in self.buffer:
+                self.target.handle(record)
+            self.buffer = []
+
+    def close(self):
+        """
+        Flush, set the target to None and lose the buffer.
+        """
+        self.flush()
+        self.target = None
+        self.buffer = []
diff --git a/PlanetWebKit/planet/planet/feedparser.py b/PlanetWebKit/planet/planet/feedparser.py
new file mode 100644 (file)
index 0000000..615ee7e
--- /dev/null
@@ -0,0 +1,2931 @@
+#!/usr/bin/env python
+"""Universal feed parser
+
+Handles RSS 0.9x, RSS 1.0, RSS 2.0, CDF, Atom 0.3, and Atom 1.0 feeds
+
+Visit http://feedparser.org/ for the latest version
+Visit http://feedparser.org/docs/ for the latest documentation
+
+Required: Python 2.1 or later
+Recommended: Python 2.3 or later
+Recommended: CJKCodecs and iconv_codec <http://cjkpython.i18n.org/>
+"""
+
+__version__ = "4.1"# + "$Revision: 1.92 $"[11:15] + "-cvs"
+__license__ = """Copyright (c) 2002-2006, Mark Pilgrim, All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice,
+  this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS'
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE."""
+__author__ = "Mark Pilgrim <http://diveintomark.org/>"
+__contributors__ = ["Jason Diamond <http://injektilo.org/>",
+                    "John Beimler <http://john.beimler.org/>",
+                    "Fazal Majid <http://www.majid.info/mylos/weblog/>",
+                    "Aaron Swartz <http://aaronsw.com/>",
+                    "Kevin Marks <http://epeus.blogspot.com/>"]
+_debug = 0
+
+# HTTP "User-Agent" header to send to servers when downloading feeds.
+# If you are embedding feedparser in a larger application, you should
+# change this to your application name and URL.
+USER_AGENT = "UniversalFeedParser/%s +http://feedparser.org/" % __version__
+
+# HTTP "Accept" header to send to servers when downloading feeds.  If you don't
+# want to send an Accept header, set this to None.
+ACCEPT_HEADER = "application/atom+xml,application/rdf+xml,application/rss+xml,application/x-netcdf,application/xml;q=0.9,text/xml;q=0.2,*/*;q=0.1"
+
+# List of preferred XML parsers, by SAX driver name.  These will be tried first,
+# but if they're not installed, Python will keep searching through its own list
+# of pre-installed parsers until it finds one that supports everything we need.
+PREFERRED_XML_PARSERS = ["drv_libxml2"]
+
+# If you want feedparser to automatically run HTML markup through HTML Tidy, set
+# this to 1.  Requires mxTidy <http://www.egenix.com/files/python/mxTidy.html>
+# or utidylib <http://utidylib.berlios.de/>.
+TIDY_MARKUP = 0
+
+# List of Python interfaces for HTML Tidy, in order of preference.  Only useful
+# if TIDY_MARKUP = 1
+PREFERRED_TIDY_INTERFACES = ["uTidy", "mxTidy"]
+
+# ---------- required modules (should come with any Python distribution) ----------
+import sgmllib, re, sys, copy, urlparse, time, rfc822, types, cgi, urllib, urllib2
+try:
+    from cStringIO import StringIO as _StringIO
+except:
+    from StringIO import StringIO as _StringIO
+
+# ---------- optional modules (feedparser will work without these, but with reduced functionality) ----------
+
+# gzip is included with most Python distributions, but may not be available if you compiled your own
+try:
+    import gzip
+except:
+    gzip = None
+try:
+    import zlib
+except:
+    zlib = None
+
+# If a real XML parser is available, feedparser will attempt to use it.  feedparser has
+# been tested with the built-in SAX parser, PyXML, and libxml2.  On platforms where the
+# Python distribution does not come with an XML parser (such as Mac OS X 10.2 and some
+# versions of FreeBSD), feedparser will quietly fall back on regex-based parsing.
+try:
+    import xml.sax
+    xml.sax.make_parser(PREFERRED_XML_PARSERS) # test for valid parsers
+    from xml.sax.saxutils import escape as _xmlescape
+    _XML_AVAILABLE = 1
+except:
+    _XML_AVAILABLE = 0
+    def _xmlescape(data,entities={}):
+        data = data.replace('&', '&amp;')
+        data = data.replace('>', '&gt;')
+        data = data.replace('<', '&lt;')
+        for char, entity in entities:
+            data = data.replace(char, entity)
+        return data
+
+# base64 support for Atom feeds that contain embedded binary data
+try:
+    import base64, binascii
+except:
+    base64 = binascii = None
+
+# cjkcodecs and iconv_codec provide support for more character encodings.
+# Both are available from http://cjkpython.i18n.org/
+try:
+    import cjkcodecs.aliases
+except:
+    pass
+try:
+    import iconv_codec
+except:
+    pass
+
+# chardet library auto-detects character encodings
+# Download from http://chardet.feedparser.org/
+try:
+    import chardet
+    if _debug:
+        import chardet.constants
+        chardet.constants._debug = 1
+except:
+    chardet = None
+
+# ---------- don't touch these ----------
+class ThingsNobodyCaresAboutButMe(Exception): pass
+class CharacterEncodingOverride(ThingsNobodyCaresAboutButMe): pass
+class CharacterEncodingUnknown(ThingsNobodyCaresAboutButMe): pass
+class NonXMLContentType(ThingsNobodyCaresAboutButMe): pass
+class UndeclaredNamespace(Exception): pass
+
+sgmllib.tagfind = re.compile('[a-zA-Z][-_.:a-zA-Z0-9]*')
+sgmllib.special = re.compile('<!')
+sgmllib.charref = re.compile('&#(x?[0-9A-Fa-f]+)[^0-9A-Fa-f]')
+
+SUPPORTED_VERSIONS = {'': 'unknown',
+                      'rss090': 'RSS 0.90',
+                      'rss091n': 'RSS 0.91 (Netscape)',
+                      'rss091u': 'RSS 0.91 (Userland)',
+                      'rss092': 'RSS 0.92',
+                      'rss093': 'RSS 0.93',
+                      'rss094': 'RSS 0.94',
+                      'rss20': 'RSS 2.0',
+                      'rss10': 'RSS 1.0',
+                      'rss': 'RSS (unknown version)',
+                      'atom01': 'Atom 0.1',
+                      'atom02': 'Atom 0.2',
+                      'atom03': 'Atom 0.3',
+                      'atom10': 'Atom 1.0',
+                      'atom': 'Atom (unknown version)',
+                      'cdf': 'CDF',
+                      'hotrss': 'Hot RSS'
+                      }
+
+try:
+    UserDict = dict
+except NameError:
+    # Python 2.1 does not have dict
+    from UserDict import UserDict
+    def dict(aList):
+        rc = {}
+        for k, v in aList:
+            rc[k] = v
+        return rc
+
+class FeedParserDict(UserDict):
+    keymap = {'channel': 'feed',
+              'items': 'entries',
+              'guid': 'id',
+              'date': 'updated',
+              'date_parsed': 'updated_parsed',
+              'description': ['subtitle', 'summary'],
+              'url': ['href'],
+              'modified': 'updated',
+              'modified_parsed': 'updated_parsed',
+              'issued': 'published',
+              'issued_parsed': 'published_parsed',
+              'copyright': 'rights',
+              'copyright_detail': 'rights_detail',
+              'tagline': 'subtitle',
+              'tagline_detail': 'subtitle_detail'}
+    def __getitem__(self, key):
+        if key == 'category':
+            return UserDict.__getitem__(self, 'tags')[0]['term']
+        if key == 'categories':
+            return [(tag['scheme'], tag['term']) for tag in UserDict.__getitem__(self, 'tags')]
+        realkey = self.keymap.get(key, key)
+        if type(realkey) == types.ListType:
+            for k in realkey:
+                if UserDict.has_key(self, k):
+                    return UserDict.__getitem__(self, k)
+        if UserDict.has_key(self, key):
+            return UserDict.__getitem__(self, key)
+        return UserDict.__getitem__(self, realkey)
+
+    def __setitem__(self, key, value):
+        for k in self.keymap.keys():
+            if key == k:
+                key = self.keymap[k]
+                if type(key) == types.ListType:
+                    key = key[0]
+        return UserDict.__setitem__(self, key, value)
+
+    def get(self, key, default=None):
+        if self.has_key(key):
+            return self[key]
+        else:
+            return default
+
+    def setdefault(self, key, value):
+        if not self.has_key(key):
+            self[key] = value
+        return self[key]
+        
+    def has_key(self, key):
+        try:
+            return hasattr(self, key) or UserDict.has_key(self, key)
+        except AttributeError:
+            return False
+        
+    def __getattr__(self, key):
+        try:
+            return self.__dict__[key]
+        except KeyError:
+            pass
+        try:
+            assert not key.startswith('_')
+            return self.__getitem__(key)
+        except:
+            raise AttributeError, "object has no attribute '%s'" % key
+
+    def __setattr__(self, key, value):
+        if key.startswith('_') or key == 'data':
+            self.__dict__[key] = value
+        else:
+            return self.__setitem__(key, value)
+
+    def __contains__(self, key):
+        return self.has_key(key)
+
+def zopeCompatibilityHack():
+    global FeedParserDict
+    del FeedParserDict
+    def FeedParserDict(aDict=None):
+        rc = {}
+        if aDict:
+            rc.update(aDict)
+        return rc
+
+_ebcdic_to_ascii_map = None
+def _ebcdic_to_ascii(s):
+    global _ebcdic_to_ascii_map
+    if not _ebcdic_to_ascii_map:
+        emap = (
+            0,1,2,3,156,9,134,127,151,141,142,11,12,13,14,15,
+            16,17,18,19,157,133,8,135,24,25,146,143,28,29,30,31,
+            128,129,130,131,132,10,23,27,136,137,138,139,140,5,6,7,
+            144,145,22,147,148,149,150,4,152,153,154,155,20,21,158,26,
+            32,160,161,162,163,164,165,166,167,168,91,46,60,40,43,33,
+            38,169,170,171,172,173,174,175,176,177,93,36,42,41,59,94,
+            45,47,178,179,180,181,182,183,184,185,124,44,37,95,62,63,
+            186,187,188,189,190,191,192,193,194,96,58,35,64,39,61,34,
+            195,97,98,99,100,101,102,103,104,105,196,197,198,199,200,201,
+            202,106,107,108,109,110,111,112,113,114,203,204,205,206,207,208,
+            209,126,115,116,117,118,119,120,121,122,210,211,212,213,214,215,
+            216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,
+            123,65,66,67,68,69,70,71,72,73,232,233,234,235,236,237,
+            125,74,75,76,77,78,79,80,81,82,238,239,240,241,242,243,
+            92,159,83,84,85,86,87,88,89,90,244,245,246,247,248,249,
+            48,49,50,51,52,53,54,55,56,57,250,251,252,253,254,255
+            )
+        import string
+        _ebcdic_to_ascii_map = string.maketrans( \
+            ''.join(map(chr, range(256))), ''.join(map(chr, emap)))
+    return s.translate(_ebcdic_to_ascii_map)
+cp1252 = {
+  unichr(128): unichr(8364), # euro sign
+  unichr(130): unichr(8218), # single low-9 quotation mark
+  unichr(131): unichr( 402), # latin small letter f with hook
+  unichr(132): unichr(8222), # double low-9 quotation mark
+  unichr(133): unichr(8230), # horizontal ellipsis
+  unichr(134): unichr(8224), # dagger
+  unichr(135): unichr(8225), # double dagger
+  unichr(136): unichr( 710), # modifier letter circumflex accent
+  unichr(137): unichr(8240), # per mille sign
+  unichr(138): unichr( 352), # latin capital letter s with caron
+  unichr(139): unichr(8249), # single left-pointing angle quotation mark
+  unichr(140): unichr( 338), # latin capital ligature oe
+  unichr(142): unichr( 381), # latin capital letter z with caron
+  unichr(145): unichr(8216), # left single quotation mark
+  unichr(146): unichr(8217), # right single quotation mark
+  unichr(147): unichr(8220), # left double quotation mark
+  unichr(148): unichr(8221), # right double quotation mark
+  unichr(149): unichr(8226), # bullet
+  unichr(150): unichr(8211), # en dash
+  unichr(151): unichr(8212), # em dash
+  unichr(152): unichr( 732), # small tilde
+  unichr(153): unichr(8482), # trade mark sign
+  unichr(154): unichr( 353), # latin small letter s with caron
+  unichr(155): unichr(8250), # single right-pointing angle quotation mark
+  unichr(156): unichr( 339), # latin small ligature oe
+  unichr(158): unichr( 382), # latin small letter z with caron
+  unichr(159): unichr( 376)} # latin capital letter y with diaeresis
+
+_urifixer = re.compile('^([A-Za-z][A-Za-z0-9+-.]*://)(/*)(.*?)')
+def _urljoin(base, uri):
+    uri = _urifixer.sub(r'\1\3', uri)
+    return urlparse.urljoin(base, uri)
+
+class _FeedParserMixin:
+    namespaces = {'': '',
+                  'http://backend.userland.com/rss': '',
+                  'http://blogs.law.harvard.edu/tech/rss': '',
+                  'http://purl.org/rss/1.0/': '',
+                  'http://my.netscape.com/rdf/simple/0.9/': '',
+                  'http://example.com/newformat#': '',
+                  'http://example.com/necho': '',
+                  'http://purl.org/echo/': '',
+                  'uri/of/echo/namespace#': '',
+                  'http://purl.org/pie/': '',
+                  'http://purl.org/atom/ns#': '',
+                  'http://www.w3.org/2005/Atom': '',
+                  'http://purl.org/rss/1.0/modules/rss091#': '',
+                  
+                  'http://webns.net/mvcb/':                               'admin',
+                  'http://purl.org/rss/1.0/modules/aggregation/':         'ag',
+                  'http://purl.org/rss/1.0/modules/annotate/':            'annotate',
+                  'http://media.tangent.org/rss/1.0/':                    'audio',
+                  'http://backend.userland.com/blogChannelModule':        'blogChannel',
+                  'http://web.resource.org/cc/':                          'cc',
+                  'http://backend.userland.com/creativeCommonsRssModule': 'creativeCommons',
+                  'http://purl.org/rss/1.0/modules/company':              'co',
+                  'http://purl.org/rss/1.0/modules/content/':             'content',
+                  'http://my.theinfo.org/changed/1.0/rss/':               'cp',
+                  'http://purl.org/dc/elements/1.1/':                     'dc',
+                  'http://purl.org/dc/terms/':                            'dcterms',
+                  'http://purl.org/rss/1.0/modules/email/':               'email',
+                  'http://purl.org/rss/1.0/modules/event/':               'ev',
+                  'http://rssnamespace.org/feedburner/ext/1.0':           'feedburner',
+                  'http://freshmeat.net/rss/fm/':                         'fm',
+                  'http://xmlns.com/foaf/0.1/':                           'foaf',
+                  'http://www.w3.org/2003/01/geo/wgs84_pos#':             'geo',
+                  'http://postneo.com/icbm/':                             'icbm',
+                  'http://purl.org/rss/1.0/modules/image/':               'image',
+                  'http://www.itunes.com/DTDs/PodCast-1.0.dtd':           'itunes',
+                  'http://example.com/DTDs/PodCast-1.0.dtd':              'itunes',
+                  'http://purl.org/rss/1.0/modules/link/':                'l',
+                  'http://search.yahoo.com/mrss':                         'media',
+                  'http://madskills.com/public/xml/rss/module/pingback/': 'pingback',
+                  'http://prismstandard.org/namespaces/1.2/basic/':       'prism',
+                  'http://www.w3.org/1999/02/22-rdf-syntax-ns#':          'rdf',
+                  'http://www.w3.org/2000/01/rdf-schema#':                'rdfs',
+                  'http://purl.org/rss/1.0/modules/reference/':           'ref',
+                  'http://purl.org/rss/1.0/modules/richequiv/':           'reqv',
+                  'http://purl.org/rss/1.0/modules/search/':              'search',
+                  'http://purl.org/rss/1.0/modules/slash/':               'slash',
+                  'http://schemas.xmlsoap.org/soap/envelope/':            'soap',
+                  'http://purl.org/rss/1.0/modules/servicestatus/':       'ss',
+                  'http://hacks.benhammersley.com/rss/streaming/':        'str',
+                  'http://purl.org/rss/1.0/modules/subscription/':        'sub',
+                  'http://purl.org/rss/1.0/modules/syndication/':         'sy',
+                  'http://purl.org/rss/1.0/modules/taxonomy/':            'taxo',
+                  'http://purl.org/rss/1.0/modules/threading/':           'thr',
+                  'http://purl.org/rss/1.0/modules/textinput/':           'ti',
+                  'http://madskills.com/public/xml/rss/module/trackback/':'trackback',
+                  'http://wellformedweb.org/commentAPI/':                 'wfw',
+                  'http://purl.org/rss/1.0/modules/wiki/':                'wiki',
+                  'http://www.w3.org/1999/xhtml':                         'xhtml',
+                  'http://www.w3.org/XML/1998/namespace':                 'xml',
+                  'http://schemas.pocketsoap.com/rss/myDescModule/':      'szf'
+}
+    _matchnamespaces = {}
+
+    can_be_relative_uri = ['link', 'id', 'wfw_comment', 'wfw_commentrss', 'docs', 'url', 'href', 'comments', 'license', 'icon', 'logo']
+    can_contain_relative_uris = ['content', 'title', 'summary', 'info', 'tagline', 'subtitle', 'copyright', 'rights', 'description']
+    can_contain_dangerous_markup = ['content', 'title', 'summary', 'info', 'tagline', 'subtitle', 'copyright', 'rights', 'description']
+    html_types = ['text/html', 'application/xhtml+xml']
+    
+    def __init__(self, baseuri=None, baselang=None, encoding='utf-8'):
+        if _debug: sys.stderr.write('initializing FeedParser\n')
+        if not self._matchnamespaces:
+            for k, v in self.namespaces.items():
+                self._matchnamespaces[k.lower()] = v
+        self.feeddata = FeedParserDict() # feed-level data
+        self.encoding = encoding # character encoding
+        self.entries = [] # list of entry-level data
+        self.version = '' # feed type/version, see SUPPORTED_VERSIONS
+        self.namespacesInUse = {} # dictionary of namespaces defined by the feed
+
+        # the following are used internally to track state;
+        # this is really out of control and should be refactored
+        self.infeed = 0
+        self.inentry = 0
+        self.incontent = 0
+        self.intextinput = 0
+        self.inimage = 0
+        self.inauthor = 0
+        self.incontributor = 0
+        self.inpublisher = 0
+        self.insource = 0
+        self.sourcedata = FeedParserDict()
+        self.contentparams = FeedParserDict()
+        self._summaryKey = None
+        self.namespacemap = {}
+        self.elementstack = []
+        self.basestack = []
+        self.langstack = []
+        self.baseuri = baseuri or ''
+        self.lang = baselang or None
+        if baselang:
+            self.feeddata['language'] = baselang
+
+    def unknown_starttag(self, tag, attrs):
+        if _debug: sys.stderr.write('start %s with %s\n' % (tag, attrs))
+        # normalize attrs
+        attrs = [(k.lower(), v) for k, v in attrs]
+        attrs = [(k, k in ('rel', 'type') and v.lower() or v) for k, v in attrs]
+        
+        # track xml:base and xml:lang
+        attrsD = dict(attrs)
+        baseuri = attrsD.get('xml:base', attrsD.get('base')) or self.baseuri
+        self.baseuri = _urljoin(self.baseuri, baseuri)
+        lang = attrsD.get('xml:lang', attrsD.get('lang'))
+        if lang == '':
+            # xml:lang could be explicitly set to '', we need to capture that
+            lang = None
+        elif lang is None:
+            # if no xml:lang is specified, use parent lang
+            lang = self.lang
+        if lang:
+            if tag in ('feed', 'rss', 'rdf:RDF'):
+                self.feeddata['language'] = lang
+        self.lang = lang
+        self.basestack.append(self.baseuri)
+        self.langstack.append(lang)
+        
+        # track namespaces
+        for prefix, uri in attrs:
+            if prefix.startswith('xmlns:'):
+                self.trackNamespace(prefix[6:], uri)
+            elif prefix == 'xmlns':
+                self.trackNamespace(None, uri)
+
+        # track inline content
+        if self.incontent and self.contentparams.has_key('type') and not self.contentparams.get('type', 'xml').endswith('xml'):
+            # element declared itself as escaped markup, but it isn't really
+            self.contentparams['type'] = 'application/xhtml+xml'
+        if self.incontent and self.contentparams.get('type') == 'application/xhtml+xml':
+            # Note: probably shouldn't simply recreate localname here, but
+            # our namespace handling isn't actually 100% correct in cases where
+            # the feed redefines the default namespace (which is actually
+            # the usual case for inline content, thanks Sam), so here we
+            # cheat and just reconstruct the element based on localname
+            # because that compensates for the bugs in our namespace handling.
+            # This will horribly munge inline content with non-empty qnames,
+            # but nobody actually does that, so I'm not fixing it.
+            tag = tag.split(':')[-1]
+            return self.handle_data('<%s%s>' % (tag, self.strattrs(attrs)), escape=0)
+
+        # match namespaces
+        if tag.find(':') <> -1:
+            prefix, suffix = tag.split(':', 1)
+        else:
+            prefix, suffix = '', tag
+        prefix = self.namespacemap.get(prefix, prefix)
+        if prefix:
+            prefix = prefix + '_'
+
+        # special hack for better tracking of empty textinput/image elements in illformed feeds
+        if (not prefix) and tag not in ('title', 'link', 'description', 'name'):
+            self.intextinput = 0
+        if (not prefix) and tag not in ('title', 'link', 'description', 'url', 'href', 'width', 'height'):
+            self.inimage = 0
+        
+        # call special handler (if defined) or default handler
+        methodname = '_start_' + prefix + suffix
+        try:
+            method = getattr(self, methodname)
+            return method(attrsD)
+        except AttributeError:
+            return self.push(prefix + suffix, 1)
+
+    def unknown_endtag(self, tag):
+        if _debug: sys.stderr.write('end %s\n' % tag)
+        # match namespaces
+        if tag.find(':') <> -1:
+            prefix, suffix = tag.split(':', 1)
+        else:
+            prefix, suffix = '', tag
+        prefix = self.namespacemap.get(prefix, prefix)
+        if prefix:
+            prefix = prefix + '_'
+
+        # call special handler (if defined) or default handler
+        methodname = '_end_' + prefix + suffix
+        try:
+            method = getattr(self, methodname)
+            method()
+        except AttributeError:
+            self.pop(prefix + suffix)
+
+        # track inline content
+        if self.incontent and self.contentparams.has_key('type') and not self.contentparams.get('type', 'xml').endswith('xml'):
+            # element declared itself as escaped markup, but it isn't really
+            self.contentparams['type'] = 'application/xhtml+xml'
+        if self.incontent and self.contentparams.get('type') == 'application/xhtml+xml':
+            tag = tag.split(':')[-1]
+            self.handle_data('</%s>' % tag, escape=0)
+
+        # track xml:base and xml:lang going out of scope
+        if self.basestack:
+            self.basestack.pop()
+            if self.basestack and self.basestack[-1]:
+                self.baseuri = self.basestack[-1]
+        if self.langstack:
+            self.langstack.pop()
+            if self.langstack: # and (self.langstack[-1] is not None):
+                self.lang = self.langstack[-1]
+
+    def handle_charref(self, ref):
+        # called for each character reference, e.g. for '&#160;', ref will be '160'
+        if not self.elementstack: return
+        ref = ref.lower()
+        if ref in ('34', '38', '39', '60', '62', 'x22', 'x26', 'x27', 'x3c', 'x3e'):
+            text = '&#%s;' % ref
+        else:
+            if ref[0] == 'x':
+                c = int(ref[1:], 16)
+            else:
+                c = int(ref)
+            text = unichr(c).encode('utf-8')
+        self.elementstack[-1][2].append(text)
+
+    def handle_entityref(self, ref):
+        # called for each entity reference, e.g. for '&copy;', ref will be 'copy'
+        if not self.elementstack: return
+        if _debug: sys.stderr.write('entering handle_entityref with %s\n' % ref)
+        if ref in ('lt', 'gt', 'quot', 'amp', 'apos'):
+            text = '&%s;' % ref
+        else:
+            # entity resolution graciously donated by Aaron Swartz
+            def name2cp(k):
+                import htmlentitydefs
+                if hasattr(htmlentitydefs, 'name2codepoint'): # requires Python 2.3
+                    return htmlentitydefs.name2codepoint[k]
+                k = htmlentitydefs.entitydefs[k]
+                if k.startswith('&#') and k.endswith(';'):
+                    return int(k[2:-1]) # not in latin-1
+                return ord(k)
+            try: name2cp(ref)
+            except KeyError: text = '&%s;' % ref
+            else: text = unichr(name2cp(ref)).encode('utf-8')
+        self.elementstack[-1][2].append(text)
+
+    def handle_data(self, text, escape=1):
+        # called for each block of plain text, i.e. outside of any tag and
+        # not containing any character or entity references
+        if not self.elementstack: return
+        if escape and self.contentparams.get('type') == 'application/xhtml+xml':
+            text = _xmlescape(text)
+        self.elementstack[-1][2].append(text)
+
+    def handle_comment(self, text):
+        # called for each comment, e.g. <!-- insert message here -->
+        pass
+
+    def handle_pi(self, text):
+        # called for each processing instruction, e.g. <?instruction>
+        pass
+
+    def handle_decl(self, text):
+        pass
+
+    def parse_declaration(self, i):
+        # override internal declaration handler to handle CDATA blocks
+        if _debug: sys.stderr.write('entering parse_declaration\n')
+        if self.rawdata[i:i+9] == '<![CDATA[':
+            k = self.rawdata.find(']]>', i)
+            if k == -1: k = len(self.rawdata)
+            self.handle_data(_xmlescape(self.rawdata[i+9:k]), 0)
+            return k+3
+        else:
+            k = self.rawdata.find('>', i)
+            return k+1
+
+    def mapContentType(self, contentType):
+        contentType = contentType.lower()
+        if contentType == 'text':
+            contentType = 'text/plain'
+        elif contentType == 'html':
+            contentType = 'text/html'
+        elif contentType == 'xhtml':
+            contentType = 'application/xhtml+xml'
+        return contentType
+    
+    def trackNamespace(self, prefix, uri):
+        loweruri = uri.lower()
+        if (prefix, loweruri) == (None, 'http://my.netscape.com/rdf/simple/0.9/') and not self.version:
+            self.version = 'rss090'
+        if loweruri == 'http://purl.org/rss/1.0/' and not self.version:
+            self.version = 'rss10'
+        if loweruri == 'http://www.w3.org/2005/atom' and not self.version:
+            self.version = 'atom10'
+        if loweruri.find('backend.userland.com/rss') <> -1:
+            # match any backend.userland.com namespace
+            uri = 'http://backend.userland.com/rss'
+            loweruri = uri
+        if self._matchnamespaces.has_key(loweruri):
+            self.namespacemap[prefix] = self._matchnamespaces[loweruri]
+            self.namespacesInUse[self._matchnamespaces[loweruri]] = uri
+        else:
+            self.namespacesInUse[prefix or ''] = uri
+
+    def resolveURI(self, uri):
+        return _urljoin(self.baseuri or '', uri)
+    
+    def decodeEntities(self, element, data):
+        return data
+
+    def strattrs(self, attrs):
+        return ''.join([' %s="%s"' % (t[0],_xmlescape(t[1],{'"':'&quot;'})) for t in attrs])
+
+    def push(self, element, expectingText):
+        self.elementstack.append([element, expectingText, []])
+
+    def pop(self, element, stripWhitespace=1):
+        if not self.elementstack: return
+        if self.elementstack[-1][0] != element: return
+        
+        element, expectingText, pieces = self.elementstack.pop()
+
+        if self.version == 'atom10' and self.contentparams.get('type','text') == 'application/xhtml+xml':
+            # remove enclosing child element, but only if it is a <div> and
+            # only if all the remaining content is nested underneath it.
+            # This means that the divs would be retained in the following:
+            #    <div>foo</div><div>bar</div>
+            if pieces and (pieces[0] == '<div>' or pieces[0].startswith('<div ')) and pieces[-1]=='</div>':
+                depth = 0
+                for piece in pieces[:-1]:
+                    if piece.startswith('</'):
+                        depth -= 1
+                        if depth == 0: break
+                    elif piece.startswith('<') and not piece.endswith('/>'):
+                        depth += 1
+                else:
+                    pieces = pieces[1:-1]
+
+        output = ''.join(pieces)
+        if stripWhitespace:
+            output = output.strip()
+        if not expectingText: return output
+
+        # decode base64 content
+        if base64 and self.contentparams.get('base64', 0):
+            try:
+                output = base64.decodestring(output)
+            except binascii.Error:
+                pass
+            except binascii.Incomplete:
+                pass
+                
+        # resolve relative URIs
+        if (element in self.can_be_relative_uri) and output:
+            output = self.resolveURI(output)
+        
+        # decode entities within embedded markup
+        if not self.contentparams.get('base64', 0):
+            output = self.decodeEntities(element, output)
+
+        # remove temporary cruft from contentparams
+        try:
+            del self.contentparams['mode']
+        except KeyError:
+            pass
+        try:
+            del self.contentparams['base64']
+        except KeyError:
+            pass
+
+        # resolve relative URIs within embedded markup
+        if self.mapContentType(self.contentparams.get('type', 'text/html')) in self.html_types:
+            if element in self.can_contain_relative_uris:
+                output = _resolveRelativeURIs(output, self.baseuri, self.encoding)
+        
+        # sanitize embedded markup
+        if self.mapContentType(self.contentparams.get('type', 'text/html')) in self.html_types:
+            if element in self.can_contain_dangerous_markup:
+                output = _sanitizeHTML(output, self.encoding)
+
+        if self.encoding and type(output) != type(u''):
+            try:
+                output = unicode(output, self.encoding)
+            except:
+                pass
+
+        # address common error where people take data that is already
+        # utf-8, presume that it is iso-8859-1, and re-encode it.
+        if self.encoding=='utf-8' and type(output) == type(u''):
+            try:
+                output = unicode(output.encode('iso-8859-1'), 'utf-8')
+            except:
+                pass
+
+        # map win-1252 extensions to the proper code points
+        if type(output) == type(u''):
+            output = u''.join([c in cp1252 and cp1252[c] or c for c in output])
+
+        # categories/tags/keywords/whatever are handled in _end_category
+        if element == 'category':
+            return output
+        
+        # store output in appropriate place(s)
+        if self.inentry and not self.insource:
+            if element == 'content':
+                self.entries[-1].setdefault(element, [])
+                contentparams = copy.deepcopy(self.contentparams)
+                contentparams['value'] = output
+                self.entries[-1][element].append(contentparams)
+            elif element == 'link':
+                self.entries[-1][element] = output
+                if output:
+                    self.entries[-1]['links'][-1]['href'] = output
+            else:
+                if element == 'description':
+                    element = 'summary'
+                self.entries[-1][element] = output
+                if self.incontent:
+                    contentparams = copy.deepcopy(self.contentparams)
+                    contentparams['value'] = output
+                    self.entries[-1][element + '_detail'] = contentparams
+        elif (self.infeed or self.insource) and (not self.intextinput) and (not self.inimage):
+            context = self._getContext()
+            if element == 'description':
+                element = 'subtitle'
+            context[element] = output
+            if element == 'link':
+                context['links'][-1]['href'] = output
+            elif self.incontent:
+                contentparams = copy.deepcopy(self.contentparams)
+                contentparams['value'] = output
+                context[element + '_detail'] = contentparams
+        return output
+
+    def pushContent(self, tag, attrsD, defaultContentType, expectingText):
+        self.incontent += 1
+        self.contentparams = FeedParserDict({
+            'type': self.mapContentType(attrsD.get('type', defaultContentType)),
+            'language': self.lang,
+            'base': self.baseuri})
+        self.contentparams['base64'] = self._isBase64(attrsD, self.contentparams)
+        self.push(tag, expectingText)
+
+    def popContent(self, tag):
+        value = self.pop(tag)
+        self.incontent -= 1
+        self.contentparams.clear()
+        return value
+        
+    def _mapToStandardPrefix(self, name):
+        colonpos = name.find(':')
+        if colonpos <> -1:
+            prefix = name[:colonpos]
+            suffix = name[colonpos+1:]
+            prefix = self.namespacemap.get(prefix, prefix)
+            name = prefix + ':' + suffix
+        return name
+        
+    def _getAttribute(self, attrsD, name):
+        return attrsD.get(self._mapToStandardPrefix(name))
+
+    def _isBase64(self, attrsD, contentparams):
+        if attrsD.get('mode', '') == 'base64':
+            return 1
+        if self.contentparams['type'].startswith('text/'):
+            return 0
+        if self.contentparams['type'].endswith('+xml'):
+            return 0
+        if self.contentparams['type'].endswith('/xml'):
+            return 0
+        return 1
+
+    def _itsAnHrefDamnIt(self, attrsD):
+        href = attrsD.get('url', attrsD.get('uri', attrsD.get('href', None)))
+        if href:
+            try:
+                del attrsD['url']
+            except KeyError:
+                pass
+            try:
+                del attrsD['uri']
+            except KeyError:
+                pass
+            attrsD['href'] = href
+        return attrsD
+    
+    def _save(self, key, value):
+        context = self._getContext()
+        context.setdefault(key, value)
+
+    def _start_rss(self, attrsD):
+        versionmap = {'0.91': 'rss091u',
+                      '0.92': 'rss092',
+                      '0.93': 'rss093',
+                      '0.94': 'rss094'}
+        if not self.version:
+            attr_version = attrsD.get('version', '')
+            version = versionmap.get(attr_version)
+            if version:
+                self.version = version
+            elif attr_version.startswith('2.'):
+                self.version = 'rss20'
+            else:
+                self.version = 'rss'
+    
+    def _start_dlhottitles(self, attrsD):
+        self.version = 'hotrss'
+
+    def _start_channel(self, attrsD):
+        self.infeed = 1
+        self._cdf_common(attrsD)
+    _start_feedinfo = _start_channel
+
+    def _cdf_common(self, attrsD):
+        if attrsD.has_key('lastmod'):
+            self._start_modified({})
+            self.elementstack[-1][-1] = attrsD['lastmod']
+            self._end_modified()
+        if attrsD.has_key('href'):
+            self._start_link({})
+            self.elementstack[-1][-1] = attrsD['href']
+            self._end_link()
+    
+    def _start_feed(self, attrsD):
+        self.infeed = 1
+        versionmap = {'0.1': 'atom01',
+                      '0.2': 'atom02',
+                      '0.3': 'atom03'}
+        if not self.version:
+            attr_version = attrsD.get('version')
+            version = versionmap.get(attr_version)
+            if version:
+                self.version = version
+            else:
+                self.version = 'atom'
+
+    def _end_channel(self):
+        self.infeed = 0
+    _end_feed = _end_channel
+    
+    def _start_image(self, attrsD):
+        self.inimage = 1
+        self.push('image', 0)
+        context = self._getContext()
+        context.setdefault('image', FeedParserDict())
+            
+    def _end_image(self):
+        self.pop('image')
+        self.inimage = 0
+
+    def _start_textinput(self, attrsD):
+        self.intextinput = 1
+        self.push('textinput', 0)
+        context = self._getContext()
+        context.setdefault('textinput', FeedParserDict())
+    _start_textInput = _start_textinput
+    
+    def _end_textinput(self):
+        self.pop('textinput')
+        self.intextinput = 0
+    _end_textInput = _end_textinput
+
+    def _start_author(self, attrsD):
+        self.inauthor = 1
+        self.push('author', 1)
+    _start_managingeditor = _start_author
+    _start_dc_author = _start_author
+    _start_dc_creator = _start_author
+    _start_itunes_author = _start_author
+
+    def _end_author(self):
+        self.pop('author')
+        self.inauthor = 0
+        self._sync_author_detail()
+    _end_managingeditor = _end_author
+    _end_dc_author = _end_author
+    _end_dc_creator = _end_author
+    _end_itunes_author = _end_author
+
+    def _start_itunes_owner(self, attrsD):
+        self.inpublisher = 1
+        self.push('publisher', 0)
+
+    def _end_itunes_owner(self):
+        self.pop('publisher')
+        self.inpublisher = 0
+        self._sync_author_detail('publisher')
+
+    def _start_contributor(self, attrsD):
+        self.incontributor = 1
+        context = self._getContext()
+        context.setdefault('contributors', [])
+        context['contributors'].append(FeedParserDict())
+        self.push('contributor', 0)
+
+    def _end_contributor(self):
+        self.pop('contributor')
+        self.incontributor = 0
+
+    def _start_dc_contributor(self, attrsD):
+        self.incontributor = 1
+        context = self._getContext()
+        context.setdefault('contributors', [])
+        context['contributors'].append(FeedParserDict())
+        self.push('name', 0)
+
+    def _end_dc_contributor(self):
+        self._end_name()
+        self.incontributor = 0
+
+    def _start_name(self, attrsD):
+        self.push('name', 0)
+    _start_itunes_name = _start_name
+
+    def _end_name(self):
+        value = self.pop('name')
+        if self.inpublisher:
+            self._save_author('name', value, 'publisher')
+        elif self.inauthor:
+            self._save_author('name', value)
+        elif self.incontributor:
+            self._save_contributor('name', value)
+        elif self.intextinput:
+            context = self._getContext()
+            context['textinput']['name'] = value
+    _end_itunes_name = _end_name
+
+    def _start_width(self, attrsD):
+        self.push('width', 0)
+
+    def _end_width(self):
+        value = self.pop('width')
+        try:
+            value = int(value)
+        except:
+            value = 0
+        if self.inimage:
+            context = self._getContext()
+            context['image']['width'] = value
+
+    def _start_height(self, attrsD):
+        self.push('height', 0)
+
+    def _end_height(self):
+        value = self.pop('height')
+        try:
+            value = int(value)
+        except:
+            value = 0
+        if self.inimage:
+            context = self._getContext()
+            context['image']['height'] = value
+
+    def _start_url(self, attrsD):
+        self.push('href', 1)
+    _start_homepage = _start_url
+    _start_uri = _start_url
+
+    def _end_url(self):
+        value = self.pop('href')
+        if self.inauthor:
+            self._save_author('href', value)
+        elif self.incontributor:
+            self._save_contributor('href', value)
+        elif self.inimage:
+            context = self._getContext()
+            context['image']['href'] = value
+        elif self.intextinput:
+            context = self._getContext()
+            context['textinput']['link'] = value
+    _end_homepage = _end_url
+    _end_uri = _end_url
+
+    def _start_email(self, attrsD):
+        self.push('email', 0)
+    _start_itunes_email = _start_email
+
+    def _end_email(self):
+        value = self.pop('email')
+        if self.inpublisher:
+            self._save_author('email', value, 'publisher')
+        elif self.inauthor:
+            self._save_author('email', value)
+        elif self.incontributor:
+            self._save_contributor('email', value)
+    _end_itunes_email = _end_email
+
+    def _getContext(self):
+        if self.insource:
+            context = self.sourcedata
+        elif self.inentry:
+            context = self.entries[-1]
+        else:
+            context = self.feeddata
+        return context
+
+    def _save_author(self, key, value, prefix='author'):
+        context = self._getContext()
+        context.setdefault(prefix + '_detail', FeedParserDict())
+        context[prefix + '_detail'][key] = value
+        self._sync_author_detail()
+
+    def _save_contributor(self, key, value):
+        context = self._getContext()
+        context.setdefault('contributors', [FeedParserDict()])
+        context['contributors'][-1][key] = value
+
+    def _sync_author_detail(self, key='author'):
+        context = self._getContext()
+        detail = context.get('%s_detail' % key)
+        if detail:
+            name = detail.get('name')
+            email = detail.get('email')
+            if name and email:
+                context[key] = '%s (%s)' % (name, email)
+            elif name:
+                context[key] = name
+            elif email:
+                context[key] = email
+        else:
+            author = context.get(key)
+            if not author: return
+            emailmatch = re.search(r'''(([a-zA-Z0-9\_\-\.\+]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?))''', author)
+            if not emailmatch: return
+            email = emailmatch.group(0)
+            # probably a better way to do the following, but it passes all the tests
+            author = author.replace(email, '')
+            author = author.replace('()', '')
+            author = author.strip()
+            if author and (author[0] == '('):
+                author = author[1:]
+            if author and (author[-1] == ')'):
+                author = author[:-1]
+            author = author.strip()
+            context.setdefault('%s_detail' % key, FeedParserDict())
+            context['%s_detail' % key]['name'] = author
+            context['%s_detail' % key]['email'] = email
+
+    def _start_subtitle(self, attrsD):
+        self.pushContent('subtitle', attrsD, 'text/plain', 1)
+    _start_tagline = _start_subtitle
+    _start_itunes_subtitle = _start_subtitle
+
+    def _end_subtitle(self):
+        self.popContent('subtitle')
+    _end_tagline = _end_subtitle
+    _end_itunes_subtitle = _end_subtitle
+            
+    def _start_rights(self, attrsD):
+        self.pushContent('rights', attrsD, 'text/plain', 1)
+    _start_dc_rights = _start_rights
+    _start_copyright = _start_rights
+
+    def _end_rights(self):
+        self.popContent('rights')
+    _end_dc_rights = _end_rights
+    _end_copyright = _end_rights
+
+    def _start_item(self, attrsD):
+        self.entries.append(FeedParserDict())
+        self.push('item', 0)
+        self.inentry = 1
+        self.guidislink = 0
+        id = self._getAttribute(attrsD, 'rdf:about')
+        if id:
+            context = self._getContext()
+            context['id'] = id
+        self._cdf_common(attrsD)
+    _start_entry = _start_item
+    _start_product = _start_item
+
+    def _end_item(self):
+        self.pop('item')
+        self.inentry = 0
+    _end_entry = _end_item
+
+    def _start_dc_language(self, attrsD):
+        self.push('language', 1)
+    _start_language = _start_dc_language
+
+    def _end_dc_language(self):
+        self.lang = self.pop('language')
+    _end_language = _end_dc_language
+
+    def _start_dc_publisher(self, attrsD):
+        self.push('publisher', 1)
+    _start_webmaster = _start_dc_publisher
+
+    def _end_dc_publisher(self):
+        self.pop('publisher')
+        self._sync_author_detail('publisher')
+    _end_webmaster = _end_dc_publisher
+
+    def _start_published(self, attrsD):
+        self.push('published', 1)
+    _start_dcterms_issued = _start_published
+    _start_issued = _start_published
+
+    def _end_published(self):
+        value = self.pop('published')
+        self._save('published_parsed', _parse_date(value))
+    _end_dcterms_issued = _end_published
+    _end_issued = _end_published
+
+    def _start_updated(self, attrsD):
+        self.push('updated', 1)
+    _start_modified = _start_updated
+    _start_dcterms_modified = _start_updated
+    _start_pubdate = _start_updated
+    _start_dc_date = _start_updated
+
+    def _end_updated(self):
+        value = self.pop('updated')
+        parsed_value = _parse_date(value)
+        self._save('updated_parsed', parsed_value)
+    _end_modified = _end_updated
+    _end_dcterms_modified = _end_updated
+    _end_pubdate = _end_updated
+    _end_dc_date = _end_updated
+
+    def _start_created(self, attrsD):
+        self.push('created', 1)
+    _start_dcterms_created = _start_created
+
+    def _end_created(self):
+        value = self.pop('created')
+        self._save('created_parsed', _parse_date(value))
+    _end_dcterms_created = _end_created
+
+    def _start_expirationdate(self, attrsD):
+        self.push('expired', 1)
+
+    def _end_expirationdate(self):
+        self._save('expired_parsed', _parse_date(self.pop('expired')))
+
+    def _start_cc_license(self, attrsD):
+        self.push('license', 1)
+        value = self._getAttribute(attrsD, 'rdf:resource')
+        if value:
+            self.elementstack[-1][2].append(value)
+        self.pop('license')
+        
+    def _start_creativecommons_license(self, attrsD):
+        self.push('license', 1)
+
+    def _end_creativecommons_license(self):
+        self.pop('license')
+
+    def _addTag(self, term, scheme, label):
+        context = self._getContext()
+        tags = context.setdefault('tags', [])
+        if (not term) and (not scheme) and (not label): return
+        value = FeedParserDict({'term': term, 'scheme': scheme, 'label': label})
+        if value not in tags:
+            tags.append(FeedParserDict({'term': term, 'scheme': scheme, 'label': label}))
+
+    def _start_category(self, attrsD):
+        if _debug: sys.stderr.write('entering _start_category with %s\n' % repr(attrsD))
+        term = attrsD.get('term')
+        scheme = attrsD.get('scheme', attrsD.get('domain'))
+        label = attrsD.get('label')
+        self._addTag(term, scheme, label)
+        self.push('category', 1)
+    _start_dc_subject = _start_category
+    _start_keywords = _start_category
+        
+    def _end_itunes_keywords(self):
+        for term in self.pop('itunes_keywords').split():
+            self._addTag(term, 'http://www.itunes.com/', None)
+        
+    def _start_itunes_category(self, attrsD):
+        self._addTag(attrsD.get('text'), 'http://www.itunes.com/', None)
+        self.push('category', 1)
+        
+    def _end_category(self):
+        value = self.pop('category')
+        if not value: return
+        context = self._getContext()
+        tags = context['tags']
+        if value and len(tags) and not tags[-1]['term']:
+            tags[-1]['term'] = value
+        else:
+            self._addTag(value, None, None)
+    _end_dc_subject = _end_category
+    _end_keywords = _end_category
+    _end_itunes_category = _end_category
+
+    def _start_cloud(self, attrsD):
+        self._getContext()['cloud'] = FeedParserDict(attrsD)
+        
+    def _start_link(self, attrsD):
+        attrsD.setdefault('rel', 'alternate')
+        attrsD.setdefault('type', 'text/html')
+        attrsD = self._itsAnHrefDamnIt(attrsD)
+        if attrsD.has_key('href'):
+            attrsD['href'] = self.resolveURI(attrsD['href'])
+        expectingText = self.infeed or self.inentry or self.insource
+        context = self._getContext()
+        context.setdefault('links', [])
+        context['links'].append(FeedParserDict(attrsD))
+        if attrsD['rel'] == 'enclosure':
+            self._start_enclosure(attrsD)
+        if attrsD.has_key('href'):
+            expectingText = 0
+            if (attrsD.get('rel') == 'alternate') and (self.mapContentType(attrsD.get('type')) in self.html_types):
+                context['link'] = attrsD['href']
+        else:
+            self.push('link', expectingText)
+    _start_producturl = _start_link
+
+    def _end_link(self):
+        value = self.pop('link')
+        context = self._getContext()
+        if self.intextinput:
+            context['textinput']['link'] = value
+        if self.inimage:
+            context['image']['link'] = value
+    _end_producturl = _end_link
+
+    def _start_guid(self, attrsD):
+        self.guidislink = (attrsD.get('ispermalink', 'true') == 'true')
+        self.push('id', 1)
+
+    def _end_guid(self):
+        value = self.pop('id')
+        self._save('guidislink', self.guidislink and not self._getContext().has_key('link'))
+        if self.guidislink:
+            # guid acts as link, but only if 'ispermalink' is not present or is 'true',
+            # and only if the item doesn't already have a link element
+            self._save('link', value)
+
+    def _start_title(self, attrsD):
+        self.pushContent('title', attrsD, 'text/plain', self.infeed or self.inentry or self.insource)
+    _start_dc_title = _start_title
+    _start_media_title = _start_title
+
+    def _end_title(self):
+        value = self.popContent('title')
+        context = self._getContext()
+        if self.intextinput:
+            context['textinput']['title'] = value
+        elif self.inimage:
+            context['image']['title'] = value
+    _end_dc_title = _end_title
+    _end_media_title = _end_title
+
+    def _start_description(self, attrsD):
+        context = self._getContext()
+        if context.has_key('summary'):
+            self._summaryKey = 'content'
+            self._start_content(attrsD)
+        else:
+            self.pushContent('description', attrsD, 'text/html', self.infeed or self.inentry or self.insource)
+
+    def _start_abstract(self, attrsD):
+        self.pushContent('description', attrsD, 'text/plain', self.infeed or self.inentry or self.insource)
+
+    def _end_description(self):
+        if self._summaryKey == 'content':
+            self._end_content()
+        else:
+            value = self.popContent('description')
+            context = self._getContext()
+            if self.intextinput:
+                context['textinput']['description'] = value
+            elif self.inimage:
+                context['image']['description'] = value
+        self._summaryKey = None
+    _end_abstract = _end_description
+
+    def _start_info(self, attrsD):
+        self.pushContent('info', attrsD, 'text/plain', 1)
+    _start_feedburner_browserfriendly = _start_info
+
+    def _end_info(self):
+        self.popContent('info')
+    _end_feedburner_browserfriendly = _end_info
+
+    def _start_generator(self, attrsD):
+        if attrsD:
+            attrsD = self._itsAnHrefDamnIt(attrsD)
+            if attrsD.has_key('href'):
+                attrsD['href'] = self.resolveURI(attrsD['href'])
+        self._getContext()['generator_detail'] = FeedParserDict(attrsD)
+        self.push('generator', 1)
+
+    def _end_generator(self):
+        value = self.pop('generator')
+        context = self._getContext()
+        if context.has_key('generator_detail'):
+            context['generator_detail']['name'] = value
+            
+    def _start_admin_generatoragent(self, attrsD):
+        self.push('generator', 1)
+        value = self._getAttribute(attrsD, 'rdf:resource')
+        if value:
+            self.elementstack[-1][2].append(value)
+        self.pop('generator')
+        self._getContext()['generator_detail'] = FeedParserDict({'href': value})
+
+    def _start_admin_errorreportsto(self, attrsD):
+        self.push('errorreportsto', 1)
+        value = self._getAttribute(attrsD, 'rdf:resource')
+        if value:
+            self.elementstack[-1][2].append(value)
+        self.pop('errorreportsto')
+        
+    def _start_summary(self, attrsD):
+        context = self._getContext()
+        if context.has_key('summary'):
+            self._summaryKey = 'content'
+            self._start_content(attrsD)
+        else:
+            self._summaryKey = 'summary'
+            self.pushContent(self._summaryKey, attrsD, 'text/plain', 1)
+    _start_itunes_summary = _start_summary
+
+    def _end_summary(self):
+        if self._summaryKey == 'content':
+            self._end_content()
+        else:
+            self.popContent(self._summaryKey or 'summary')
+        self._summaryKey = None
+    _end_itunes_summary = _end_summary
+        
+    def _start_enclosure(self, attrsD):
+        attrsD = self._itsAnHrefDamnIt(attrsD)
+        self._getContext().setdefault('enclosures', []).append(FeedParserDict(attrsD))
+        href = attrsD.get('href')
+        if href:
+            context = self._getContext()
+            if not context.get('id'):
+                context['id'] = href
+            
+    def _start_source(self, attrsD):
+        self.insource = 1
+
+    def _end_source(self):
+        self.insource = 0
+        self._getContext()['source'] = copy.deepcopy(self.sourcedata)
+        self.sourcedata.clear()
+
+    def _start_content(self, attrsD):
+        self.pushContent('content', attrsD, 'text/plain', 1)
+        src = attrsD.get('src')
+        if src:
+            self.contentparams['src'] = src
+        self.push('content', 1)
+
+    def _start_prodlink(self, attrsD):
+        self.pushContent('content', attrsD, 'text/html', 1)
+
+    def _start_body(self, attrsD):
+        self.pushContent('content', attrsD, 'application/xhtml+xml', 1)
+    _start_xhtml_body = _start_body
+
+    def _start_content_encoded(self, attrsD):
+        self.pushContent('content', attrsD, 'text/html', 1)
+    _start_fullitem = _start_content_encoded
+
+    def _end_content(self):
+        copyToDescription = self.mapContentType(self.contentparams.get('type')) in (['text/plain'] + self.html_types)
+        value = self.popContent('content')
+        if copyToDescription:
+            self._save('description', value)
+    _end_body = _end_content
+    _end_xhtml_body = _end_content
+    _end_content_encoded = _end_content
+    _end_fullitem = _end_content
+    _end_prodlink = _end_content
+
+    def _start_itunes_image(self, attrsD):
+        self.push('itunes_image', 0)
+        self._getContext()['image'] = FeedParserDict({'href': attrsD.get('href')})
+    _start_itunes_link = _start_itunes_image
+        
+    def _end_itunes_block(self):
+        value = self.pop('itunes_block', 0)
+        self._getContext()['itunes_block'] = (value == 'yes') and 1 or 0
+
+    def _end_itunes_explicit(self):
+        value = self.pop('itunes_explicit', 0)
+        self._getContext()['itunes_explicit'] = (value == 'yes') and 1 or 0
+
+if _XML_AVAILABLE:
+    class _StrictFeedParser(_FeedParserMixin, xml.sax.handler.ContentHandler):
+        def __init__(self, baseuri, baselang, encoding):
+            if _debug: sys.stderr.write('trying StrictFeedParser\n')
+            xml.sax.handler.ContentHandler.__init__(self)
+            _FeedParserMixin.__init__(self, baseuri, baselang, encoding)
+            self.bozo = 0
+            self.exc = None
+        
+        def startPrefixMapping(self, prefix, uri):
+            self.trackNamespace(prefix, uri)
+        
+        def startElementNS(self, name, qname, attrs):
+            namespace, localname = name
+            lowernamespace = str(namespace or '').lower()
+            if lowernamespace.find('backend.userland.com/rss') <> -1:
+                # match any backend.userland.com namespace
+                namespace = 'http://backend.userland.com/rss'
+                lowernamespace = namespace
+            if qname and qname.find(':') > 0:
+                givenprefix = qname.split(':')[0]
+            else:
+                givenprefix = None
+            prefix = self._matchnamespaces.get(lowernamespace, givenprefix)
+            if givenprefix and (prefix == None or (prefix == '' and lowernamespace == '')) and not self.namespacesInUse.has_key(givenprefix):
+                    raise UndeclaredNamespace, "'%s' is not associated with a namespace" % givenprefix
+            if prefix:
+                localname = prefix + ':' + localname
+            localname = str(localname).lower()
+            if _debug: sys.stderr.write('startElementNS: qname = %s, namespace = %s, givenprefix = %s, prefix = %s, attrs = %s, localname = %s\n' % (qname, namespace, givenprefix, prefix, attrs.items(), localname))
+
+            # qname implementation is horribly broken in Python 2.1 (it
+            # doesn't report any), and slightly broken in Python 2.2 (it
+            # doesn't report the xml: namespace). So we match up namespaces
+            # with a known list first, and then possibly override them with
+            # the qnames the SAX parser gives us (if indeed it gives us any
+            # at all).  Thanks to MatejC for helping me test this and
+            # tirelessly telling me that it didn't work yet.
+            attrsD = {}
+            for (namespace, attrlocalname), attrvalue in attrs._attrs.items():
+                lowernamespace = (namespace or '').lower()
+                prefix = self._matchnamespaces.get(lowernamespace, '')
+                if prefix:
+                    attrlocalname = prefix + ':' + attrlocalname
+                attrsD[str(attrlocalname).lower()] = attrvalue
+            for qname in attrs.getQNames():
+                attrsD[str(qname).lower()] = attrs.getValueByQName(qname)
+            self.unknown_starttag(localname, attrsD.items())
+
+        def characters(self, text):
+            self.handle_data(text)
+
+        def endElementNS(self, name, qname):
+            namespace, localname = name
+            lowernamespace = str(namespace or '').lower()
+            if qname and qname.find(':') > 0:
+                givenprefix = qname.split(':')[0]
+            else:
+                givenprefix = ''
+            prefix = self._matchnamespaces.get(lowernamespace, givenprefix)
+            if prefix:
+                localname = prefix + ':' + localname
+            localname = str(localname).lower()
+            self.unknown_endtag(localname)
+
+        def error(self, exc):
+            self.bozo = 1
+            self.exc = exc
+            
+        def fatalError(self, exc):
+            self.error(exc)
+            raise exc
+
+class _BaseHTMLProcessor(sgmllib.SGMLParser):
+    elements_no_end_tag = ['area', 'base', 'basefont', 'br', 'col', 'frame', 'hr',
+      'img', 'input', 'isindex', 'link', 'meta', 'param']
+    
+    def __init__(self, encoding):
+        self.encoding = encoding
+        if _debug: sys.stderr.write('entering BaseHTMLProcessor, encoding=%s\n' % self.encoding)
+        sgmllib.SGMLParser.__init__(self)
+        
+    def reset(self):
+        self.pieces = []
+        sgmllib.SGMLParser.reset(self)
+
+    def _shorttag_replace(self, match):
+        tag = match.group(1)
+        if tag in self.elements_no_end_tag:
+            return '<' + tag + ' />'
+        else:
+            return '<' + tag + '></' + tag + '>'
+        
+    def feed(self, data):
+        data = re.compile(r'<!((?!DOCTYPE|--|\[))', re.IGNORECASE).sub(r'&lt;!\1', data)
+        #data = re.sub(r'<(\S+?)\s*?/>', self._shorttag_replace, data) # bug [ 1399464 ] Bad regexp for _shorttag_replace
+        data = re.sub(r'<([^<\s]+?)\s*/>', self._shorttag_replace, data) 
+        data = data.replace('&#39;', "'")
+        data = data.replace('&#34;', '"')
+        if self.encoding and type(data) == type(u''):
+            data = data.encode(self.encoding)
+        sgmllib.SGMLParser.feed(self, data)
+        sgmllib.SGMLParser.close(self)
+
+    def normalize_attrs(self, attrs):
+        # utility method to be called by descendants
+        attrs = [(k.lower(), v) for k, v in attrs]
+        attrs = [(k, k in ('rel', 'type') and v.lower() or v) for k, v in attrs]
+        return attrs
+
+    def unknown_starttag(self, tag, attrs):
+        # called for each start tag
+        # attrs is a list of (attr, value) tuples
+        # e.g. for <pre class='screen'>, tag='pre', attrs=[('class', 'screen')]
+        if _debug: sys.stderr.write('_BaseHTMLProcessor, unknown_starttag, tag=%s\n' % tag)
+        uattrs = []
+        # thanks to Kevin Marks for this breathtaking hack to deal with (valid) high-bit attribute values in UTF-8 feeds
+        for key, value in attrs:
+            if type(value) != type(u''):
+                value = unicode(value, self.encoding)
+            uattrs.append((unicode(key, self.encoding), value))
+        strattrs = u''.join([u' %s="%s"' % (key, value) for key, value in uattrs]).encode(self.encoding)
+        if tag in self.elements_no_end_tag:
+            self.pieces.append('<%(tag)s%(strattrs)s />' % locals())
+        else:
+            self.pieces.append('<%(tag)s%(strattrs)s>' % locals())
+
+    def unknown_endtag(self, tag):
+        # called for each end tag, e.g. for </pre>, tag will be 'pre'
+        # Reconstruct the original end tag.
+        if tag not in self.elements_no_end_tag:
+            self.pieces.append("</%(tag)s>" % locals())
+
+    def handle_charref(self, ref):
+        # called for each character reference, e.g. for '&#160;', ref will be '160'
+        # Reconstruct the original character reference.
+        self.pieces.append('&#%(ref)s;' % locals())
+        
+    def handle_entityref(self, ref):
+        # called for each entity reference, e.g. for '&copy;', ref will be 'copy'
+        # Reconstruct the original entity reference.
+        import htmlentitydefs
+        if not hasattr(htmlentitydefs, 'name2codepoint') or htmlentitydefs.name2codepoint.has_key(ref):
+            self.pieces.append('&%(ref)s;' % locals())
+        else:
+            self.pieces.append('&amp;%(ref)s' % locals())
+
+    def handle_data(self, text):
+        # called for each block of plain text, i.e. outside of any tag and
+        # not containing any character or entity references
+        # Store the original text verbatim.
+        if _debug: sys.stderr.write('_BaseHTMLProcessor, handle_text, text=%s\n' % text)
+        self.pieces.append(text)
+        
+    def handle_comment(self, text):
+        # called for each HTML comment, e.g. <!-- insert Javascript code here -->
+        # Reconstruct the original comment.
+        self.pieces.append('<!--%(text)s-->' % locals())
+        
+    def handle_pi(self, text):
+        # called for each processing instruction, e.g. <?instruction>
+        # Reconstruct original processing instruction.
+        self.pieces.append('<?%(text)s>' % locals())
+
+    def handle_decl(self, text):
+        # called for the DOCTYPE, if present, e.g.
+        # <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
+        #     "http://www.w3.org/TR/html4/loose.dtd">
+        # Reconstruct original DOCTYPE
+        self.pieces.append('<!%(text)s>' % locals())
+        
+    _new_declname_match = re.compile(r'[a-zA-Z][-_.a-zA-Z0-9:]*\s*').match
+    def _scan_name(self, i, declstartpos):
+        rawdata = self.rawdata
+        n = len(rawdata)
+        if i == n:
+            return None, -1
+        m = self._new_declname_match(rawdata, i)
+        if m:
+            s = m.group()
+            name = s.strip()
+            if (i + len(s)) == n:
+                return None, -1  # end of buffer
+            return name.lower(), m.end()
+        else:
+            self.handle_data(rawdata)
+#            self.updatepos(declstartpos, i)
+            return None, -1
+
+    def output(self):
+        '''Return processed HTML as a single string'''
+        return ''.join([str(p) for p in self.pieces])
+
+class _LooseFeedParser(_FeedParserMixin, _BaseHTMLProcessor):
+    def __init__(self, baseuri, baselang, encoding):
+        sgmllib.SGMLParser.__init__(self)
+        _FeedParserMixin.__init__(self, baseuri, baselang, encoding)
+
+    def decodeEntities(self, element, data):
+        data = data.replace('&#60;', '&lt;')
+        data = data.replace('&#x3c;', '&lt;')
+        data = data.replace('&#x3C;', '&lt;')
+        data = data.replace('&#62;', '&gt;')
+        data = data.replace('&#x3e;', '&gt;')
+        data = data.replace('&#x3E;', '&gt;')
+        data = data.replace('&#38;', '&amp;')
+        data = data.replace('&#x26;', '&amp;')
+        data = data.replace('&#34;', '&quot;')
+        data = data.replace('&#x22;', '&quot;')
+        data = data.replace('&#39;', '&apos;')
+        data = data.replace('&#x27;', '&apos;')
+        if self.contentparams.has_key('type') and not self.contentparams.get('type', 'xml').endswith('xml'):
+            data = data.replace('&lt;', '<')
+            data = data.replace('&gt;', '>')
+            data = data.replace('&amp;', '&')
+            data = data.replace('&quot;', '"')
+            data = data.replace('&apos;', "'")
+        return data
+        
+    def strattrs(self, attrs):
+        return ''.join([' %s="%s"' % t for t in attrs])
+class _RelativeURIResolver(_BaseHTMLProcessor):
+    relative_uris = [('a', 'href'),
+                     ('applet', 'codebase'),
+                     ('area', 'href'),
+                     ('blockquote', 'cite'),
+                     ('body', 'background'),
+                     ('del', 'cite'),
+                     ('form', 'action'),
+                     ('frame', 'longdesc'),
+                     ('frame', 'src'),
+                     ('iframe', 'longdesc'),
+                     ('iframe', 'src'),
+                     ('head', 'profile'),
+                     ('img', 'longdesc'),
+                     ('img', 'src'),
+                     ('img', 'usemap'),
+                     ('input', 'src'),
+                     ('input', 'usemap'),
+                     ('ins', 'cite'),
+                     ('link', 'href'),
+                     ('object', 'classid'),
+                     ('object', 'codebase'),
+                     ('object', 'data'),
+                     ('object', 'usemap'),
+                     ('q', 'cite'),
+                     ('script', 'src')]
+
+    def __init__(self, baseuri, encoding):
+        _BaseHTMLProcessor.__init__(self, encoding)
+        self.baseuri = baseuri
+
+    def resolveURI(self, uri):
+        return _urljoin(self.baseuri, uri)
+    
+    def unknown_starttag(self, tag, attrs):
+        attrs = self.normalize_attrs(attrs)
+        attrs = [(key, ((tag, key) in self.relative_uris) and self.resolveURI(value) or value) for key, value in attrs]
+        _BaseHTMLProcessor.unknown_starttag(self, tag, attrs)
+        
+def _resolveRelativeURIs(htmlSource, baseURI, encoding):
+    if _debug: sys.stderr.write('entering _resolveRelativeURIs\n')
+    p = _RelativeURIResolver(baseURI, encoding)
+    p.feed(htmlSource)
+    return p.output()
+
+class _HTMLSanitizer(_BaseHTMLProcessor):
+    acceptable_elements = ['a', 'abbr', 'acronym', 'address', 'area', 'b', 'big',
+      'blockquote', 'br', 'button', 'caption', 'center', 'cite', 'code', 'col',
+      'colgroup', 'dd', 'del', 'dfn', 'dir', 'div', 'dl', 'dt', 'em', 'fieldset',
+      'font', 'form', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'img', 'input',
+      'ins', 'kbd', 'label', 'legend', 'li', 'map', 'menu', 'ol', 'optgroup',
+      'option', 'p', 'pre', 'q', 's', 'samp', 'select', 'small', 'span', 'strike',
+      'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'textarea', 'tfoot', 'th',
+      'thead', 'tr', 'tt', 'u', 'ul', 'var']
+
+    acceptable_attributes = ['abbr', 'accept', 'accept-charset', 'accesskey',
+      'action', 'align', 'alt', 'axis', 'border', 'cellpadding', 'cellspacing',
+      'char', 'charoff', 'charset', 'checked', 'cite', 'class', 'clear', 'cols',
+      'colspan', 'color', 'compact', 'coords', 'datetime', 'dir', 'disabled',
+      'enctype', 'for', 'frame', 'headers', 'height', 'href', 'hreflang', 'hspace',
+      'id', 'ismap', 'label', 'lang', 'longdesc', 'maxlength', 'media', 'method',
+      'multiple', 'name', 'nohref', 'noshade', 'nowrap', 'prompt', 'readonly',
+      'rel', 'rev', 'rows', 'rowspan', 'rules', 'scope', 'selected', 'shape', 'size',
+      'span', 'src', 'start', 'summary', 'tabindex', 'target', 'title', 'type',
+      'usemap', 'valign', 'value', 'vspace', 'width', 'xml:lang']
+
+    unacceptable_elements_with_end_tag = ['script', 'applet']
+
+    def reset(self):
+        _BaseHTMLProcessor.reset(self)
+        self.unacceptablestack = 0
+        
+    def unknown_starttag(self, tag, attrs):
+        if not tag in self.acceptable_elements:
+            if tag in self.unacceptable_elements_with_end_tag:
+                self.unacceptablestack += 1
+            return
+        attrs = self.normalize_attrs(attrs)
+        attrs = [(key, value) for key, value in attrs if key in self.acceptable_attributes]
+        _BaseHTMLProcessor.unknown_starttag(self, tag, attrs)
+        
+    def unknown_endtag(self, tag):
+        if not tag in self.acceptable_elements:
+            if tag in self.unacceptable_elements_with_end_tag:
+                self.unacceptablestack -= 1
+            return
+        _BaseHTMLProcessor.unknown_endtag(self, tag)
+
+    def handle_pi(self, text):
+        pass
+
+    def handle_decl(self, text):
+        pass
+
+    def handle_data(self, text):
+        if not self.unacceptablestack:
+            _BaseHTMLProcessor.handle_data(self, text)
+
+def _sanitizeHTML(htmlSource, encoding):
+    p = _HTMLSanitizer(encoding)
+    p.feed(htmlSource)
+    data = p.output()
+    if TIDY_MARKUP:
+        # loop through list of preferred Tidy interfaces looking for one that's installed,
+        # then set up a common _tidy function to wrap the interface-specific API.
+        _tidy = None
+        for tidy_interface in PREFERRED_TIDY_INTERFACES:
+            try:
+                if tidy_interface == "uTidy":
+                    from tidy import parseString as _utidy
+                    def _tidy(data, **kwargs):
+                        return str(_utidy(data, **kwargs))
+                    break
+                elif tidy_interface == "mxTidy":
+                    from mx.Tidy import Tidy as _mxtidy
+                    def _tidy(data, **kwargs):
+                        nerrors, nwarnings, data, errordata = _mxtidy.tidy(data, **kwargs)
+                        return data
+                    break
+            except:
+                pass
+        if _tidy:
+            utf8 = type(data) == type(u'')
+            if utf8:
+                data = data.encode('utf-8')
+            data = _tidy(data, output_xhtml=1, numeric_entities=1, wrap=0, char_encoding="utf8")
+            if utf8:
+                data = unicode(data, 'utf-8')
+            if data.count('<body'):
+                data = data.split('<body', 1)[1]
+                if data.count('>'):
+                    data = data.split('>', 1)[1]
+            if data.count('</body'):
+                data = data.split('</body', 1)[0]
+    data = data.strip().replace('\r\n', '\n')
+    return data
+
+class _FeedURLHandler(urllib2.HTTPDigestAuthHandler, urllib2.HTTPRedirectHandler, urllib2.HTTPDefaultErrorHandler):
+    def http_error_default(self, req, fp, code, msg, headers):
+        if ((code / 100) == 3) and (code != 304):
+            return self.http_error_302(req, fp, code, msg, headers)
+        infourl = urllib.addinfourl(fp, headers, req.get_full_url())
+        infourl.status = code
+        return infourl
+
+    def http_error_302(self, req, fp, code, msg, headers):
+        if headers.dict.has_key('location'):
+            infourl = urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
+        else:
+            infourl = urllib.addinfourl(fp, headers, req.get_full_url())
+        if not hasattr(infourl, 'status'):
+            infourl.status = code
+        return infourl
+
+    def http_error_301(self, req, fp, code, msg, headers):
+        if headers.dict.has_key('location'):
+            infourl = urllib2.HTTPRedirectHandler.http_error_301(self, req, fp, code, msg, headers)
+        else:
+            infourl = urllib.addinfourl(fp, headers, req.get_full_url())
+        if not hasattr(infourl, 'status'):
+            infourl.status = code
+        return infourl
+
+    http_error_300 = http_error_302
+    http_error_303 = http_error_302
+    http_error_307 = http_error_302
+        
+    def http_error_401(self, req, fp, code, msg, headers):
+        # Check if
+        # - server requires digest auth, AND
+        # - we tried (unsuccessfully) with basic auth, AND
+        # - we're using Python 2.3.3 or later (digest auth is irreparably broken in earlier versions)
+        # If all conditions hold, parse authentication information
+        # out of the Authorization header we sent the first time
+        # (for the username and password) and the WWW-Authenticate
+        # header the server sent back (for the realm) and retry
+        # the request with the appropriate digest auth headers instead.
+        # This evil genius hack has been brought to you by Aaron Swartz.
+        host = urlparse.urlparse(req.get_full_url())[1]
+        try:
+            assert sys.version.split()[0] >= '2.3.3'
+            assert base64 != None
+            user, passw = base64.decodestring(req.headers['Authorization'].split(' ')[1]).split(':')
+            realm = re.findall('realm="([^"]*)"', headers['WWW-Authenticate'])[0]
+            self.add_password(realm, host, user, passw)
+            retry = self.http_error_auth_reqed('www-authenticate', host, req, headers)
+            self.reset_retry_count()
+            return retry
+        except:
+            return self.http_error_default(req, fp, code, msg, headers)
+
+def _open_resource(url_file_stream_or_string, etag, modified, agent, referrer, handlers):
+    """URL, filename, or string --> stream
+
+    This function lets you define parsers that take any input source
+    (URL, pathname to local or network file, or actual data as a string)
+    and deal with it in a uniform manner.  Returned object is guaranteed
+    to have all the basic stdio read methods (read, readline, readlines).
+    Just .close() the object when you're done with it.
+
+    If the etag argument is supplied, it will be used as the value of an
+    If-None-Match request header.
+
+    If the modified argument is supplied, it must be a tuple of 9 integers
+    as returned by gmtime() in the standard Python time module. This MUST
+    be in GMT (Greenwich Mean Time). The formatted date/time will be used
+    as the value of an If-Modified-Since request header.
+
+    If the agent argument is supplied, it will be used as the value of a
+    User-Agent request header.
+
+    If the referrer argument is supplied, it will be used as the value of a
+    Referer[sic] request header.
+
+    If handlers is supplied, it is a list of handlers used to build a
+    urllib2 opener.
+    """
+
+    if hasattr(url_file_stream_or_string, 'read'):
+        return url_file_stream_or_string
+
+    if url_file_stream_or_string == '-':
+        return sys.stdin
+
+    if urlparse.urlparse(url_file_stream_or_string)[0] in ('http', 'https', 'ftp'):
+        if not agent:
+            agent = USER_AGENT
+        # test for inline user:password for basic auth
+        auth = None
+        if base64:
+            urltype, rest = urllib.splittype(url_file_stream_or_string)
+            realhost, rest = urllib.splithost(rest)
+            if realhost:
+                user_passwd, realhost = urllib.splituser(realhost)
+                if user_passwd:
+                    url_file_stream_or_string = '%s://%s%s' % (urltype, realhost, rest)
+                    auth = base64.encodestring(user_passwd).strip()
+        # try to open with urllib2 (to use optional headers)
+        request = urllib2.Request(url_file_stream_or_string)
+        request.add_header('User-Agent', agent)
+        if etag:
+            request.add_header('If-None-Match', etag)
+        if modified:
+            # format into an RFC 1123-compliant timestamp. We can't use
+            # time.strftime() since the %a and %b directives can be affected
+            # by the current locale, but RFC 2616 states that dates must be
+            # in English.
+            short_weekdays = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
+            months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
+            request.add_header('If-Modified-Since', '%s, %02d %s %04d %02d:%02d:%02d GMT' % (short_weekdays[modified[6]], modified[2], months[modified[1] - 1], modified[0], modified[3], modified[4], modified[5]))
+        if referrer:
+            request.add_header('Referer', referrer)
+        if gzip and zlib:
+            request.add_header('Accept-encoding', 'gzip, deflate')
+        elif gzip:
+            request.add_header('Accept-encoding', 'gzip')
+        elif zlib:
+            request.add_header('Accept-encoding', 'deflate')
+        else:
+            request.add_header('Accept-encoding', '')
+        if auth:
+            request.add_header('Authorization', 'Basic %s' % auth)
+        if ACCEPT_HEADER:
+            request.add_header('Accept', ACCEPT_HEADER)
+        request.add_header('A-IM', 'feed') # RFC 3229 support
+        opener = apply(urllib2.build_opener, tuple([_FeedURLHandler()] + handlers))
+        opener.addheaders = [] # RMK - must clear so we only send our custom User-Agent
+        try:
+            return opener.open(request)
+        finally:
+            opener.close() # JohnD
+    
+    # try to open with native open function (if url_file_stream_or_string is a filename)
+    try:
+        return open(url_file_stream_or_string)
+    except:
+        pass
+
+    # treat url_file_stream_or_string as string
+    return _StringIO(str(url_file_stream_or_string))
+
+_date_handlers = []
+def registerDateHandler(func):
+    '''Register a date handler function (takes string, returns 9-tuple date in GMT)'''
+    _date_handlers.insert(0, func)
+    
+# ISO-8601 date parsing routines written by Fazal Majid.
+# The ISO 8601 standard is very convoluted and irregular - a full ISO 8601
+# parser is beyond the scope of feedparser and would be a worthwhile addition
+# to the Python library.
+# A single regular expression cannot parse ISO 8601 date formats into groups
+# as the standard is highly irregular (for instance is 030104 2003-01-04 or
+# 0301-04-01), so we use templates instead.
+# Please note the order in templates is significant because we need a
+# greedy match.
+_iso8601_tmpl = ['YYYY-?MM-?DD', 'YYYY-MM', 'YYYY-?OOO',
+                'YY-?MM-?DD', 'YY-?OOO', 'YYYY', 
+                '-YY-?MM', '-OOO', '-YY',
+                '--MM-?DD', '--MM',
+                '---DD',
+                'CC', '']
+_iso8601_re = [
+    tmpl.replace(
+    'YYYY', r'(?P<year>\d{4})').replace(
+    'YY', r'(?P<year>\d\d)').replace(
+    'MM', r'(?P<month>[01]\d)').replace(
+    'DD', r'(?P<day>[0123]\d)').replace(
+    'OOO', r'(?P<ordinal>[0123]\d\d)').replace(
+    'CC', r'(?P<century>\d\d$)')
+    + r'(T?(?P<hour>\d{2}):(?P<minute>\d{2})'
+    + r'(:(?P<second>\d{2}))?'
+    + r'(?P<tz>[+-](?P<tzhour>\d{2})(:(?P<tzmin>\d{2}))?|Z)?)?'
+    for tmpl in _iso8601_tmpl]
+del tmpl
+_iso8601_matches = [re.compile(regex).match for regex in _iso8601_re]
+del regex
+def _parse_date_iso8601(dateString):
+    '''Parse a variety of ISO-8601-compatible formats like 20040105'''
+    m = None
+    for _iso8601_match in _iso8601_matches:
+        m = _iso8601_match(dateString)
+        if m: break
+    if not m: return
+    if m.span() == (0, 0): return
+    params = m.groupdict()
+    ordinal = params.get('ordinal', 0)
+    if ordinal:
+        ordinal = int(ordinal)
+    else:
+        ordinal = 0
+    year = params.get('year', '--')
+    if not year or year == '--':
+        year = time.gmtime()[0]
+    elif len(year) == 2:
+        # ISO 8601 assumes current century, i.e. 93 -> 2093, NOT 1993
+        year = 100 * int(time.gmtime()[0] / 100) + int(year)
+    else:
+        year = int(year)
+    month = params.get('month', '-')
+    if not month or month == '-':
+        # ordinals are NOT normalized by mktime, we simulate them
+        # by setting month=1, day=ordinal
+        if ordinal:
+            month = 1
+        else:
+            month = time.gmtime()[1]
+    month = int(month)
+    day = params.get('day', 0)
+    if not day:
+        # see above
+        if ordinal:
+            day = ordinal
+        elif params.get('century', 0) or \
+                 params.get('year', 0) or params.get('month', 0):
+            day = 1
+        else:
+            day = time.gmtime()[2]
+    else:
+        day = int(day)
+    # special case of the century - is the first year of the 21st century
+    # 2000 or 2001 ? The debate goes on...
+    if 'century' in params.keys():
+        year = (int(params['century']) - 1) * 100 + 1
+    # in ISO 8601 most fields are optional
+    for field in ['hour', 'minute', 'second', 'tzhour', 'tzmin']:
+        if not params.get(field, None):
+            params[field] = 0
+    hour = int(params.get('hour', 0))
+    minute = int(params.get('minute', 0))
+    second = int(params.get('second', 0))
+    # weekday is normalized by mktime(), we can ignore it
+    weekday = 0
+    # daylight savings is complex, but not needed for feedparser's purposes
+    # as time zones, if specified, include mention of whether it is active
+    # (e.g. PST vs. PDT, CET). Using -1 is implementation-dependent and
+    # and most implementations have DST bugs
+    daylight_savings_flag = 0
+    tm = [year, month, day, hour, minute, second, weekday,
+          ordinal, daylight_savings_flag]
+    # ISO 8601 time zone adjustments
+    tz = params.get('tz')
+    if tz and tz != 'Z':
+        if tz[0] == '-':
+            tm[3] += int(params.get('tzhour', 0))
+            tm[4] += int(params.get('tzmin', 0))
+        elif tz[0] == '+':
+            tm[3] -= int(params.get('tzhour', 0))
+            tm[4] -= int(params.get('tzmin', 0))
+        else:
+            return None
+    # Python's time.mktime() is a wrapper around the ANSI C mktime(3c)
+    # which is guaranteed to normalize d/m/y/h/m/s.
+    # Many implementations have bugs, but we'll pretend they don't.
+    return time.localtime(time.mktime(tm))
+registerDateHandler(_parse_date_iso8601)
+    
+# 8-bit date handling routines written by ytrewq1.
+_korean_year  = u'\ub144' # b3e2 in euc-kr
+_korean_month = u'\uc6d4' # bff9 in euc-kr
+_korean_day   = u'\uc77c' # c0cf in euc-kr
+_korean_am    = u'\uc624\uc804' # bfc0 c0fc in euc-kr
+_korean_pm    = u'\uc624\ud6c4' # bfc0 c8c4 in euc-kr
+
+_korean_onblog_date_re = \
+    re.compile('(\d{4})%s\s+(\d{2})%s\s+(\d{2})%s\s+(\d{2}):(\d{2}):(\d{2})' % \
+               (_korean_year, _korean_month, _korean_day))
+_korean_nate_date_re = \
+    re.compile(u'(\d{4})-(\d{2})-(\d{2})\s+(%s|%s)\s+(\d{,2}):(\d{,2}):(\d{,2})' % \
+               (_korean_am, _korean_pm))
+def _parse_date_onblog(dateString):
+    '''Parse a string according to the OnBlog 8-bit date format'''
+    m = _korean_onblog_date_re.match(dateString)
+    if not m: return
+    w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)s' % \
+                {'year': m.group(1), 'month': m.group(2), 'day': m.group(3),\
+                 'hour': m.group(4), 'minute': m.group(5), 'second': m.group(6),\
+                 'zonediff': '+09:00'}
+    if _debug: sys.stderr.write('OnBlog date parsed as: %s\n' % w3dtfdate)
+    return _parse_date_w3dtf(w3dtfdate)
+registerDateHandler(_parse_date_onblog)
+
+def _parse_date_nate(dateString):
+    '''Parse a string according to the Nate 8-bit date format'''
+    m = _korean_nate_date_re.match(dateString)
+    if not m: return
+    hour = int(m.group(5))
+    ampm = m.group(4)
+    if (ampm == _korean_pm):
+        hour += 12
+    hour = str(hour)
+    if len(hour) == 1:
+        hour = '0' + hour
+    w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)s' % \
+                {'year': m.group(1), 'month': m.group(2), 'day': m.group(3),\
+                 'hour': hour, 'minute': m.group(6), 'second': m.group(7),\
+                 'zonediff': '+09:00'}
+    if _debug: sys.stderr.write('Nate date parsed as: %s\n' % w3dtfdate)
+    return _parse_date_w3dtf(w3dtfdate)
+registerDateHandler(_parse_date_nate)
+
+_mssql_date_re = \
+    re.compile('(\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2})(\.\d+)?')
+def _parse_date_mssql(dateString):
+    '''Parse a string according to the MS SQL date format'''
+    m = _mssql_date_re.match(dateString)
+    if not m: return
+    w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)s' % \
+                {'year': m.group(1), 'month': m.group(2), 'day': m.group(3),\
+                 'hour': m.group(4), 'minute': m.group(5), 'second': m.group(6),\
+                 'zonediff': '+09:00'}
+    if _debug: sys.stderr.write('MS SQL date parsed as: %s\n' % w3dtfdate)
+    return _parse_date_w3dtf(w3dtfdate)
+registerDateHandler(_parse_date_mssql)
+
+# Unicode strings for Greek date strings
+_greek_months = \
+  { \
+   u'\u0399\u03b1\u03bd': u'Jan',       # c9e1ed in iso-8859-7
+   u'\u03a6\u03b5\u03b2': u'Feb',       # d6e5e2 in iso-8859-7
+   u'\u039c\u03ac\u03ce': u'Mar',       # ccdcfe in iso-8859-7
+   u'\u039c\u03b1\u03ce': u'Mar',       # cce1fe in iso-8859-7
+   u'\u0391\u03c0\u03c1': u'Apr',       # c1f0f1 in iso-8859-7
+   u'\u039c\u03ac\u03b9': u'May',       # ccdce9 in iso-8859-7
+   u'\u039c\u03b1\u03ca': u'May',       # cce1fa in iso-8859-7
+   u'\u039c\u03b1\u03b9': u'May',       # cce1e9 in iso-8859-7
+   u'\u0399\u03bf\u03cd\u03bd': u'Jun', # c9effded in iso-8859-7
+   u'\u0399\u03bf\u03bd': u'Jun',       # c9efed in iso-8859-7
+   u'\u0399\u03bf\u03cd\u03bb': u'Jul', # c9effdeb in iso-8859-7
+   u'\u0399\u03bf\u03bb': u'Jul',       # c9f9eb in iso-8859-7
+   u'\u0391\u03cd\u03b3': u'Aug',       # c1fde3 in iso-8859-7
+   u'\u0391\u03c5\u03b3': u'Aug',       # c1f5e3 in iso-8859-7
+   u'\u03a3\u03b5\u03c0': u'Sep',       # d3e5f0 in iso-8859-7
+   u'\u039f\u03ba\u03c4': u'Oct',       # cfeaf4 in iso-8859-7
+   u'\u039d\u03bf\u03ad': u'Nov',       # cdefdd in iso-8859-7
+   u'\u039d\u03bf\u03b5': u'Nov',       # cdefe5 in iso-8859-7
+   u'\u0394\u03b5\u03ba': u'Dec',       # c4e5ea in iso-8859-7
+  }
+
+_greek_wdays = \
+  { \
+   u'\u039a\u03c5\u03c1': u'Sun', # caf5f1 in iso-8859-7
+   u'\u0394\u03b5\u03c5': u'Mon', # c4e5f5 in iso-8859-7
+   u'\u03a4\u03c1\u03b9': u'Tue', # d4f1e9 in iso-8859-7
+   u'\u03a4\u03b5\u03c4': u'Wed', # d4e5f4 in iso-8859-7
+   u'\u03a0\u03b5\u03bc': u'Thu', # d0e5ec in iso-8859-7
+   u'\u03a0\u03b1\u03c1': u'Fri', # d0e1f1 in iso-8859-7
+   u'\u03a3\u03b1\u03b2': u'Sat', # d3e1e2 in iso-8859-7   
+  }
+
+_greek_date_format_re = \
+    re.compile(u'([^,]+),\s+(\d{2})\s+([^\s]+)\s+(\d{4})\s+(\d{2}):(\d{2}):(\d{2})\s+([^\s]+)')
+
+def _parse_date_greek(dateString):
+    '''Parse a string according to a Greek 8-bit date format.'''
+    m = _greek_date_format_re.match(dateString)
+    if not m: return
+    try:
+        wday = _greek_wdays[m.group(1)]
+        month = _greek_months[m.group(3)]
+    except:
+        return
+    rfc822date = '%(wday)s, %(day)s %(month)s %(year)s %(hour)s:%(minute)s:%(second)s %(zonediff)s' % \
+                 {'wday': wday, 'day': m.group(2), 'month': month, 'year': m.group(4),\
+                  'hour': m.group(5), 'minute': m.group(6), 'second': m.group(7),\
+                  'zonediff': m.group(8)}
+    if _debug: sys.stderr.write('Greek date parsed as: %s\n' % rfc822date)
+    return _parse_date_rfc822(rfc822date)
+registerDateHandler(_parse_date_greek)
+
+# Unicode strings for Hungarian date strings
+_hungarian_months = \
+  { \
+    u'janu\u00e1r':   u'01',  # e1 in iso-8859-2
+    u'febru\u00e1ri': u'02',  # e1 in iso-8859-2
+    u'm\u00e1rcius':  u'03',  # e1 in iso-8859-2
+    u'\u00e1prilis':  u'04',  # e1 in iso-8859-2
+    u'm\u00e1ujus':   u'05',  # e1 in iso-8859-2
+    u'j\u00fanius':   u'06',  # fa in iso-8859-2
+    u'j\u00falius':   u'07',  # fa in iso-8859-2
+    u'augusztus':     u'08',
+    u'szeptember':    u'09',
+    u'okt\u00f3ber':  u'10',  # f3 in iso-8859-2
+    u'november':      u'11',
+    u'december':      u'12',
+  }
+
+_hungarian_date_format_re = \
+  re.compile(u'(\d{4})-([^-]+)-(\d{,2})T(\d{,2}):(\d{2})((\+|-)(\d{,2}:\d{2}))')
+
+def _parse_date_hungarian(dateString):
+    '''Parse a string according to a Hungarian 8-bit date format.'''
+    m = _hungarian_date_format_re.match(dateString)
+    if not m: return
+    try:
+        month = _hungarian_months[m.group(2)]
+        day = m.group(3)
+        if len(day) == 1:
+            day = '0' + day
+        hour = m.group(4)
+        if len(hour) == 1:
+            hour = '0' + hour
+    except:
+        return
+    w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s%(zonediff)s' % \
+                {'year': m.group(1), 'month': month, 'day': day,\
+                 'hour': hour, 'minute': m.group(5),\
+                 'zonediff': m.group(6)}
+    if _debug: sys.stderr.write('Hungarian date parsed as: %s\n' % w3dtfdate)
+    return _parse_date_w3dtf(w3dtfdate)
+registerDateHandler(_parse_date_hungarian)
+
+# W3DTF-style date parsing adapted from PyXML xml.utils.iso8601, written by
+# Drake and licensed under the Python license.  Removed all range checking
+# for month, day, hour, minute, and second, since mktime will normalize
+# these later
+def _parse_date_w3dtf(dateString):
+    def __extract_date(m):
+        year = int(m.group('year'))
+        if year < 100:
+            year = 100 * int(time.gmtime()[0] / 100) + int(year)
+        if year < 1000:
+            return 0, 0, 0
+        julian = m.group('julian')
+        if julian:
+            julian = int(julian)
+            month = julian / 30 + 1
+            day = julian % 30 + 1
+            jday = None
+            while jday != julian:
+                t = time.mktime((year, month, day, 0, 0, 0, 0, 0, 0))
+                jday = time.gmtime(t)[-2]
+                diff = abs(jday - julian)
+                if jday > julian:
+                    if diff < day:
+                        day = day - diff
+                    else:
+                        month = month - 1
+                        day = 31
+                elif jday < julian:
+                    if day + diff < 28:
+                       day = day + diff
+                    else:
+                        month = month + 1
+            return year, month, day
+        month = m.group('month')
+        day = 1
+        if month is None:
+            month = 1
+        else:
+            month = int(month)
+            day = m.group('day')
+            if day:
+                day = int(day)
+            else:
+                day = 1
+        return year, month, day
+
+    def __extract_time(m):
+        if not m:
+            return 0, 0, 0
+        hours = m.group('hours')
+        if not hours:
+            return 0, 0, 0
+        hours = int(hours)
+        minutes = int(m.group('minutes'))
+        seconds = m.group('seconds')
+        if seconds:
+            seconds = int(seconds)
+        else:
+            seconds = 0
+        return hours, minutes, seconds
+
+    def __extract_tzd(m):
+        '''Return the Time Zone Designator as an offset in seconds from UTC.'''
+        if not m:
+            return 0
+        tzd = m.group('tzd')
+        if not tzd:
+            return 0
+        if tzd == 'Z':
+            return 0
+        hours = int(m.group('tzdhours'))
+        minutes = m.group('tzdminutes')
+        if minutes:
+            minutes = int(minutes)
+        else:
+            minutes = 0
+        offset = (hours*60 + minutes) * 60
+        if tzd[0] == '+':
+            return -offset
+        return offset
+
+    __date_re = ('(?P<year>\d\d\d\d)'
+                 '(?:(?P<dsep>-|)'
+                 '(?:(?P<julian>\d\d\d)'
+                 '|(?P<month>\d\d)(?:(?P=dsep)(?P<day>\d\d))?))?')
+    __tzd_re = '(?P<tzd>[-+](?P<tzdhours>\d\d)(?::?(?P<tzdminutes>\d\d))|Z)'
+    __tzd_rx = re.compile(__tzd_re)
+    __time_re = ('(?P<hours>\d\d)(?P<tsep>:|)(?P<minutes>\d\d)'
+                 '(?:(?P=tsep)(?P<seconds>\d\d(?:[.,]\d+)?))?'
+                 + __tzd_re)
+    __datetime_re = '%s(?:T%s)?' % (__date_re, __time_re)
+    __datetime_rx = re.compile(__datetime_re)
+    m = __datetime_rx.match(dateString)
+    if (m is None) or (m.group() != dateString): return
+    gmt = __extract_date(m) + __extract_time(m) + (0, 0, 0)
+    if gmt[0] == 0: return
+    return time.gmtime(time.mktime(gmt) + __extract_tzd(m) - time.timezone)
+registerDateHandler(_parse_date_w3dtf)
+
+def _parse_date_rfc822(dateString):
+    '''Parse an RFC822, RFC1123, RFC2822, or asctime-style date'''
+    data = dateString.split()
+    if data[0][-1] in (',', '.') or data[0].lower() in rfc822._daynames:
+        del data[0]
+    if len(data) == 4:
+        s = data[3]
+        i = s.find('+')
+        if i > 0:
+            data[3:] = [s[:i], s[i+1:]]
+        else:
+            data.append('')
+        dateString = " ".join(data)
+    if len(data) < 5:
+        dateString += ' 00:00:00 GMT'
+    tm = rfc822.parsedate_tz(dateString)
+    if tm:
+        return time.gmtime(rfc822.mktime_tz(tm))
+# rfc822.py defines several time zones, but we define some extra ones.
+# 'ET' is equivalent to 'EST', etc.
+_additional_timezones = {'AT': -400, 'ET': -500, 'CT': -600, 'MT': -700, 'PT': -800}
+rfc822._timezones.update(_additional_timezones)
+registerDateHandler(_parse_date_rfc822)    
+
+def _parse_date(dateString):
+    '''Parses a variety of date formats into a 9-tuple in GMT'''
+    for handler in _date_handlers:
+        try:
+            date9tuple = handler(dateString)
+            if not date9tuple: continue
+            if len(date9tuple) != 9:
+                if _debug: sys.stderr.write('date handler function must return 9-tuple\n')
+                raise ValueError
+            map(int, date9tuple)
+            return date9tuple
+        except Exception, e:
+            if _debug: sys.stderr.write('%s raised %s\n' % (handler.__name__, repr(e)))
+            pass
+    return None
+
+def _getCharacterEncoding(http_headers, xml_data):
+    '''Get the character encoding of the XML document
+
+    http_headers is a dictionary
+    xml_data is a raw string (not Unicode)
+    
+    This is so much trickier than it sounds, it's not even funny.
+    According to RFC 3023 ('XML Media Types'), if the HTTP Content-Type
+    is application/xml, application/*+xml,
+    application/xml-external-parsed-entity, or application/xml-dtd,
+    the encoding given in the charset parameter of the HTTP Content-Type
+    takes precedence over the encoding given in the XML prefix within the
+    document, and defaults to 'utf-8' if neither are specified.  But, if
+    the HTTP Content-Type is text/xml, text/*+xml, or
+    text/xml-external-parsed-entity, the encoding given in the XML prefix
+    within the document is ALWAYS IGNORED and only the encoding given in
+    the charset parameter of the HTTP Content-Type header should be
+    respected, and it defaults to 'us-ascii' if not specified.
+
+    Furthermore, discussion on the atom-syntax mailing list with the
+    author of RFC 3023 leads me to the conclusion that any document
+    served with a Content-Type of text/* and no charset parameter
+    must be treated as us-ascii.  (We now do this.)  And also that it
+    must always be flagged as non-well-formed.  (We now do this too.)
+    
+    If Content-Type is unspecified (input was local file or non-HTTP source)
+    or unrecognized (server just got it totally wrong), then go by the
+    encoding given in the XML prefix of the document and default to
+    'iso-8859-1' as per the HTTP specification (RFC 2616).
+    
+    Then, assuming we didn't find a character encoding in the HTTP headers
+    (and the HTTP Content-type allowed us to look in the body), we need
+    to sniff the first few bytes of the XML data and try to determine
+    whether the encoding is ASCII-compatible.  Section F of the XML
+    specification shows the way here:
+    http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info
+
+    If the sniffed encoding is not ASCII-compatible, we need to make it
+    ASCII compatible so that we can sniff further into the XML declaration
+    to find the encoding attribute, which will tell us the true encoding.
+
+    Of course, none of this guarantees that we will be able to parse the
+    feed in the declared character encoding (assuming it was declared
+    correctly, which many are not).  CJKCodecs and iconv_codec help a lot;
+    you should definitely install them if you can.
+    http://cjkpython.i18n.org/
+    '''
+
+    def _parseHTTPContentType(content_type):
+        '''takes HTTP Content-Type header and returns (content type, charset)
+
+        If no charset is specified, returns (content type, '')
+        If no content type is specified, returns ('', '')
+        Both return parameters are guaranteed to be lowercase strings
+        '''
+        content_type = content_type or ''
+        content_type, params = cgi.parse_header(content_type)
+        return content_type, params.get('charset', '').replace("'", '')
+
+    sniffed_xml_encoding = ''
+    xml_encoding = ''
+    true_encoding = ''
+    http_content_type, http_encoding = _parseHTTPContentType(http_headers.get('content-type'))
+    # Must sniff for non-ASCII-compatible character encodings before
+    # searching for XML declaration.  This heuristic is defined in
+    # section F of the XML specification:
+    # http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info
+    try:
+        if xml_data[:4] == '\x4c\x6f\xa7\x94':
+            # EBCDIC
+            xml_data = _ebcdic_to_ascii(xml_data)
+        elif xml_data[:4] == '\x00\x3c\x00\x3f':
+            # UTF-16BE
+            sniffed_xml_encoding = 'utf-16be'
+            xml_data = unicode(xml_data, 'utf-16be').encode('utf-8')
+        elif (len(xml_data) >= 4) and (xml_data[:2] == '\xfe\xff') and (xml_data[2:4] != '\x00\x00'):
+            # UTF-16BE with BOM
+            sniffed_xml_encoding = 'utf-16be'
+            xml_data = unicode(xml_data[2:], 'utf-16be').encode('utf-8')
+        elif xml_data[:4] == '\x3c\x00\x3f\x00':
+            # UTF-16LE
+            sniffed_xml_encoding = 'utf-16le'
+            xml_data = unicode(xml_data, 'utf-16le').encode('utf-8')
+        elif (len(xml_data) >= 4) and (xml_data[:2] == '\xff\xfe') and (xml_data[2:4] != '\x00\x00'):
+            # UTF-16LE with BOM
+            sniffed_xml_encoding = 'utf-16le'
+            xml_data = unicode(xml_data[2:], 'utf-16le').encode('utf-8')
+        elif xml_data[:4] == '\x00\x00\x00\x3c':
+            # UTF-32BE
+            sniffed_xml_encoding = 'utf-32be'
+            xml_data = unicode(xml_data, 'utf-32be').encode('utf-8')
+        elif xml_data[:4] == '\x3c\x00\x00\x00':
+            # UTF-32LE
+            sniffed_xml_encoding = 'utf-32le'
+            xml_data = unicode(xml_data, 'utf-32le').encode('utf-8')
+        elif xml_data[:4] == '\x00\x00\xfe\xff':
+            # UTF-32BE with BOM
+            sniffed_xml_encoding = 'utf-32be'
+            xml_data = unicode(xml_data[4:], 'utf-32be').encode('utf-8')
+        elif xml_data[:4] == '\xff\xfe\x00\x00':
+            # UTF-32LE with BOM
+            sniffed_xml_encoding = 'utf-32le'
+            xml_data = unicode(xml_data[4:], 'utf-32le').encode('utf-8')
+        elif xml_data[:3] == '\xef\xbb\xbf':
+            # UTF-8 with BOM
+            sniffed_xml_encoding = 'utf-8'
+            xml_data = unicode(xml_data[3:], 'utf-8').encode('utf-8')
+        else:
+            # ASCII-compatible
+            pass
+        xml_encoding_match = re.compile('^<\?.*encoding=[\'"](.*?)[\'"].*\?>').match(xml_data)
+    except:
+        xml_encoding_match = None
+    if xml_encoding_match:
+        xml_encoding = xml_encoding_match.groups()[0].lower()
+        if sniffed_xml_encoding and (xml_encoding in ('iso-10646-ucs-2', 'ucs-2', 'csunicode', 'iso-10646-ucs-4', 'ucs-4', 'csucs4', 'utf-16', 'utf-32', 'utf_16', 'utf_32', 'utf16', 'u16')):
+            xml_encoding = sniffed_xml_encoding
+    acceptable_content_type = 0
+    application_content_types = ('application/xml', 'application/xml-dtd', 'application/xml-external-parsed-entity')
+    text_content_types = ('text/xml', 'text/xml-external-parsed-entity')
+    if (http_content_type in application_content_types) or \
+       (http_content_type.startswith('application/') and http_content_type.endswith('+xml')):
+        acceptable_content_type = 1
+        true_encoding = http_encoding or xml_encoding or 'utf-8'
+    elif (http_content_type in text_content_types) or \
+         (http_content_type.startswith('text/')) and http_content_type.endswith('+xml'):
+        acceptable_content_type = 1
+        true_encoding = http_encoding or 'us-ascii'
+    elif http_content_type.startswith('text/'):
+        true_encoding = http_encoding or 'us-ascii'
+    elif http_headers and (not http_headers.has_key('content-type')):
+        true_encoding = xml_encoding or 'iso-8859-1'
+    else:
+        true_encoding = xml_encoding or 'utf-8'
+    return true_encoding, http_encoding, xml_encoding, sniffed_xml_encoding, acceptable_content_type
+    
+def _toUTF8(data, encoding):
+    '''Changes an XML data stream on the fly to specify a new encoding
+
+    data is a raw sequence of bytes (not Unicode) that is presumed to be in %encoding already
+    encoding is a string recognized by encodings.aliases
+    '''
+    if _debug: sys.stderr.write('entering _toUTF8, trying encoding %s\n' % encoding)
+    # strip Byte Order Mark (if present)
+    if (len(data) >= 4) and (data[:2] == '\xfe\xff') and (data[2:4] != '\x00\x00'):
+        if _debug:
+            sys.stderr.write('stripping BOM\n')
+            if encoding != 'utf-16be':
+                sys.stderr.write('trying utf-16be instead\n')
+        encoding = 'utf-16be'
+        data = data[2:]
+    elif (len(data) >= 4) and (data[:2] == '\xff\xfe') and (data[2:4] != '\x00\x00'):
+        if _debug:
+            sys.stderr.write('stripping BOM\n')
+            if encoding != 'utf-16le':
+                sys.stderr.write('trying utf-16le instead\n')
+        encoding = 'utf-16le'
+        data = data[2:]
+    elif data[:3] == '\xef\xbb\xbf':
+        if _debug:
+            sys.stderr.write('stripping BOM\n')
+            if encoding != 'utf-8':
+                sys.stderr.write('trying utf-8 instead\n')
+        encoding = 'utf-8'
+        data = data[3:]
+    elif data[:4] == '\x00\x00\xfe\xff':
+        if _debug:
+            sys.stderr.write('stripping BOM\n')
+            if encoding != 'utf-32be':
+                sys.stderr.write('trying utf-32be instead\n')
+        encoding = 'utf-32be'
+        data = data[4:]
+    elif data[:4] == '\xff\xfe\x00\x00':
+        if _debug:
+            sys.stderr.write('stripping BOM\n')
+            if encoding != 'utf-32le':
+                sys.stderr.write('trying utf-32le instead\n')
+        encoding = 'utf-32le'
+        data = data[4:]
+    newdata = unicode(data, encoding)
+    if _debug: sys.stderr.write('successfully converted %s data to unicode\n' % encoding)
+    declmatch = re.compile('^<\?xml[^>]*?>')
+    newdecl = '''<?xml version='1.0' encoding='utf-8'?>'''
+    if declmatch.search(newdata):
+        newdata = declmatch.sub(newdecl, newdata)
+    else:
+        newdata = newdecl + u'\n' + newdata
+    return newdata.encode('utf-8')
+
+def _stripDoctype(data):
+    '''Strips DOCTYPE from XML document, returns (rss_version, stripped_data)
+
+    rss_version may be 'rss091n' or None
+    stripped_data is the same XML document, minus the DOCTYPE
+    '''
+    entity_pattern = re.compile(r'<!ENTITY([^>]*?)>', re.MULTILINE)
+    data = entity_pattern.sub('', data)
+    doctype_pattern = re.compile(r'<!DOCTYPE([^>]*?)>', re.MULTILINE)
+    doctype_results = doctype_pattern.findall(data)
+    doctype = doctype_results and doctype_results[0] or ''
+    if doctype.lower().count('netscape'):
+        version = 'rss091n'
+    else:
+        version = None
+    data = doctype_pattern.sub('', data)
+    return version, data
+    
+def parse(url_file_stream_or_string, etag=None, modified=None, agent=None, referrer=None, handlers=[]):
+    '''Parse a feed from a URL, file, stream, or string'''
+    result = FeedParserDict()
+    result['feed'] = FeedParserDict()
+    result['entries'] = []
+    if _XML_AVAILABLE:
+        result['bozo'] = 0
+    if type(handlers) == types.InstanceType:
+        handlers = [handlers]
+    try:
+        f = _open_resource(url_file_stream_or_string, etag, modified, agent, referrer, handlers)
+        data = f.read()
+    except Exception, e:
+        result['bozo'] = 1
+        result['bozo_exception'] = e
+        data = ''
+        f = None
+
+    # if feed is gzip-compressed, decompress it
+    if f and data and hasattr(f, 'headers'):
+        if gzip and f.headers.get('content-encoding', '') == 'gzip':
+            try:
+                data = gzip.GzipFile(fileobj=_StringIO(data)).read()
+            except Exception, e:
+                # Some feeds claim to be gzipped but they're not, so
+                # we get garbage.  Ideally, we should re-request the
+                # feed without the 'Accept-encoding: gzip' header,
+                # but we don't.
+                result['bozo'] = 1
+                result['bozo_exception'] = e
+                data = ''
+        elif zlib and f.headers.get('content-encoding', '') == 'deflate':
+            try:
+                data = zlib.decompress(data, -zlib.MAX_WBITS)
+            except Exception, e:
+                result['bozo'] = 1
+                result['bozo_exception'] = e
+                data = ''
+
+    # save HTTP headers
+    if hasattr(f, 'info'):
+        info = f.info()
+        result['etag'] = info.getheader('ETag')
+        last_modified = info.getheader('Last-Modified')
+        if last_modified:
+            result['modified'] = _parse_date(last_modified)
+    if hasattr(f, 'url'):
+        result['href'] = f.url
+        result['status'] = 200
+    if hasattr(f, 'status'):
+        result['status'] = f.status
+    if hasattr(f, 'headers'):
+        result['headers'] = f.headers.dict
+    if hasattr(f, 'close'):
+        f.close()
+
+    # there are four encodings to keep track of:
+    # - http_encoding is the encoding declared in the Content-Type HTTP header
+    # - xml_encoding is the encoding declared in the <?xml declaration
+    # - sniffed_encoding is the encoding sniffed from the first 4 bytes of the XML data
+    # - result['encoding'] is the actual encoding, as per RFC 3023 and a variety of other conflicting specifications
+    http_headers = result.get('headers', {})
+    result['encoding'], http_encoding, xml_encoding, sniffed_xml_encoding, acceptable_content_type = \
+        _getCharacterEncoding(http_headers, data)
+    if http_headers and (not acceptable_content_type):
+        if http_headers.has_key('content-type'):
+            bozo_message = '%s is not an XML media type' % http_headers['content-type']
+        else:
+            bozo_message = 'no Content-type specified'
+        result['bozo'] = 1
+        result['bozo_exception'] = NonXMLContentType(bozo_message)
+        
+    result['version'], data = _stripDoctype(data)
+
+    baseuri = http_headers.get('content-location', result.get('href'))
+    baselang = http_headers.get('content-language', None)
+
+    # if server sent 304, we're done
+    if result.get('status', 0) == 304:
+        result['version'] = ''
+        result['debug_message'] = 'The feed has not changed since you last checked, ' + \
+            'so the server sent no data.  This is a feature, not a bug!'
+        return result
+
+    # if there was a problem downloading, we're done
+    if not data:
+        return result
+
+    # determine character encoding
+    use_strict_parser = 0
+    known_encoding = 0
+    tried_encodings = []
+    # try: HTTP encoding, declared XML encoding, encoding sniffed from BOM
+    for proposed_encoding in (result['encoding'], xml_encoding, sniffed_xml_encoding):
+        if not proposed_encoding: continue
+        if proposed_encoding in tried_encodings: continue
+        tried_encodings.append(proposed_encoding)
+        try:
+            data = _toUTF8(data, proposed_encoding)
+            known_encoding = use_strict_parser = 1
+            break
+        except:
+            pass
+    # if no luck and we have auto-detection library, try that
+    if (not known_encoding) and chardet:
+        try:
+            proposed_encoding = chardet.detect(data)['encoding']
+            if proposed_encoding and (proposed_encoding not in tried_encodings):
+                tried_encodings.append(proposed_encoding)
+                data = _toUTF8(data, proposed_encoding)
+                known_encoding = use_strict_parser = 1
+        except:
+            pass
+    # if still no luck and we haven't tried utf-8 yet, try that
+    if (not known_encoding) and ('utf-8' not in tried_encodings):
+        try:
+            proposed_encoding = 'utf-8'
+            tried_encodings.append(proposed_encoding)
+            data = _toUTF8(data, proposed_encoding)
+            known_encoding = use_strict_parser = 1
+        except:
+            pass
+    # if still no luck and we haven't tried windows-1252 yet, try that
+    if (not known_encoding) and ('windows-1252' not in tried_encodings):
+        try:
+            proposed_encoding = 'windows-1252'
+            tried_encodings.append(proposed_encoding)
+            data = _toUTF8(data, proposed_encoding)
+            known_encoding = use_strict_parser = 1
+        except:
+            pass
+    # if still no luck, give up
+    if not known_encoding:
+        result['bozo'] = 1
+        result['bozo_exception'] = CharacterEncodingUnknown( \
+            'document encoding unknown, I tried ' + \
+            '%s, %s, utf-8, and windows-1252 but nothing worked' % \
+            (result['encoding'], xml_encoding))
+        result['encoding'] = ''
+    elif proposed_encoding != result['encoding']:
+        result['bozo'] = 1
+        result['bozo_exception'] = CharacterEncodingOverride( \
+            'documented declared as %s, but parsed as %s' % \
+            (result['encoding'], proposed_encoding))
+        result['encoding'] = proposed_encoding
+
+    if not _XML_AVAILABLE:
+        use_strict_parser = 0
+    if use_strict_parser:
+        # initialize the SAX parser
+        feedparser = _StrictFeedParser(baseuri, baselang, 'utf-8')
+        saxparser = xml.sax.make_parser(PREFERRED_XML_PARSERS)
+        saxparser.setFeature(xml.sax.handler.feature_namespaces, 1)
+        saxparser.setContentHandler(feedparser)
+        saxparser.setErrorHandler(feedparser)
+        source = xml.sax.xmlreader.InputSource()
+        source.setByteStream(_StringIO(data))
+        if hasattr(saxparser, '_ns_stack'):
+            # work around bug in built-in SAX parser (doesn't recognize xml: namespace)
+            # PyXML doesn't have this problem, and it doesn't have _ns_stack either
+            saxparser._ns_stack.append({'http://www.w3.org/XML/1998/namespace':'xml'})
+        try:
+            saxparser.parse(source)
+        except Exception, e:
+            if _debug:
+                import traceback
+                traceback.print_stack()
+                traceback.print_exc()
+                sys.stderr.write('xml parsing failed\n')
+            result['bozo'] = 1
+            result['bozo_exception'] = feedparser.exc or e
+            use_strict_parser = 0
+    if not use_strict_parser:
+        feedparser = _LooseFeedParser(baseuri, baselang, known_encoding and 'utf-8' or '')
+        feedparser.feed(data)
+    result['feed'] = feedparser.feeddata
+    result['entries'] = feedparser.entries
+    result['version'] = result['version'] or feedparser.version
+    result['namespaces'] = feedparser.namespacesInUse
+    return result
+
+if __name__ == '__main__':
+    if not sys.argv[1:]:
+        print __doc__
+        sys.exit(0)
+    else:
+        urls = sys.argv[1:]
+    zopeCompatibilityHack()
+    from pprint import pprint
+    for url in urls:
+        print url
+        print
+        result = parse(url)
+        pprint(result)
+        print
+
+#REVISION HISTORY
+#1.0 - 9/27/2002 - MAP - fixed namespace processing on prefixed RSS 2.0 elements,
+#  added Simon Fell's test suite
+#1.1 - 9/29/2002 - MAP - fixed infinite loop on incomplete CDATA sections
+#2.0 - 10/19/2002
+#  JD - use inchannel to watch out for image and textinput elements which can
+#  also contain title, link, and description elements
+#  JD - check for isPermaLink='false' attribute on guid elements
+#  JD - replaced openAnything with open_resource supporting ETag and
+#  If-Modified-Since request headers
+#  JD - parse now accepts etag, modified, agent, and referrer optional
+#  arguments
+#  JD - modified parse to return a dictionary instead of a tuple so that any
+#  etag or modified information can be returned and cached by the caller
+#2.0.1 - 10/21/2002 - MAP - changed parse() so that if we don't get anything
+#  be