annotate backend/AbstractFeedUpdater.py @ 166:04c3b9796b89

feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
author dirk
date Sat, 03 Sep 2011 04:12:35 +0200
parents 86f828096aaf
children a3c945ce434c
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
1
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
2 from datetime import datetime
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
3 import feedparser
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
4 import logging
166
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
5 from urllib2 import ProxyHandler
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
6
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
7 STATUS_ERROR = 400
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
8 log = logging.getLogger("FeedUpdater")
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
9
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
10 class AbstractFeedUpdater(object):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
11 '''
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
12 Abstract base class for FeedUpdater implementations - handles all the parsing of the feed.
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
13 Subclasses need to implement creating and storing the new feed entries.
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
14 '''
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
15
166
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
16 @staticmethod
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
17 def parseFeed(url):
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
18 proxy = ProxyHandler( {"http":"http://your.proxy.here:8080/"} )
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
19 return feedparser.parse(url, handlers = [proxy])
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
20
166
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
21 def __init__(self, preferences):
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
22 self.preferences = preferences
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
23
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
24 def update(self, feed):
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
25 self.feed = feed
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
26 log.info("updating " + feed.rss_url)
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
27 result = self._retrieveFeed()
160
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
28 self._processEntries(result)
166
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
29 self._setFeedTitle(result)
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
30
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
31 def _retrieveFeed(self):
166
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
32 if self.preferences.isProxyConfigured():
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
33 proxyUrl = "http://%s:%i" % (self.preferences.proxyHost(), self.preferences.proxyPort())
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
34 proxyHandler = ProxyHandler({"http" : proxyUrl})
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
35 result = feedparser.parse(self.feed.rss_url, handlers=[proxyHandler])
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
36 else:
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
37 result = feedparser.parse(self.feed.rss_url)
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
38 # bozo flags if a feed is well-formed.
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
39 # if result["bozo"] > 0:
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
40 # raise FeedUpdateException()
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
41 status = result["status"]
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
42 if status >= STATUS_ERROR:
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
43 raise FeedUpdateException("HTTP status " + str(status))
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
44 return result
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
45
160
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
46 def _processEntries(self, feedDict):
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
47 for entry in feedDict.entries:
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
48 self._normalize(entry)
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
49 self._processEntry(entry)
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
50 self._incrementFeedUpdateDate()
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
51
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
52 def _normalize(self, entry):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
53 if not hasattr(entry, "id"):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
54 entry.id = entry.link
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
55 if not hasattr(entry, "updated_parsed"):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
56 entry.updated_parsed = datetime.today()
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
57 else:
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
58 entry.updated_parsed = datetime(*entry.updated_parsed[:6])
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
59 if not hasattr(entry, "summary"):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
60 if hasattr(entry, "content"):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
61 entry.summary = entry.content[0].value
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
62 else:
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
63 entry.summary = ""
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
64
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
65 def _processEntry(self, entry):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
66 raise Exception("_processEntry is abstract, subclasses must override")
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
67
144
74217db92993 updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 141
diff changeset
68 def _incrementFeedUpdateDate(self):
74217db92993 updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 141
diff changeset
69 raise Exception("_incrementNextUpdateDate is abstract, subclasses must override")
74217db92993 updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 141
diff changeset
70
166
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
71 def _setFeedTitle(self, feedDict):
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
72 if self.feed.title is None:
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
73 if feedDict.feed.has_key("title"):
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
74 self.feed.title = feedDict.feed.title
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
75 else:
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
76 self.feed.title = self.feed.rss_url
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
77
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
78
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
79 class FeedUpdateException(Exception):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
80 pass