annotate backend/AbstractFeedUpdater.py @ 192:9d422fa90096

optimize marking feed entries as read: do not round trip to the database if the entry was already marked as read
author dirk
date Wed, 21 Sep 2011 13:23:25 +0200
parents 2f2016a10f7d
children e604c32f67aa
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
1
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
2 from datetime import datetime
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
3 import feedparser
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
4 import logging
166
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
5 from urllib2 import ProxyHandler
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
6
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
7 STATUS_ERROR = 400
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
8 log = logging.getLogger("FeedUpdater")
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
9
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
10 class AbstractFeedUpdater(object):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
11 '''
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
12 Abstract base class for FeedUpdater implementations - handles all the parsing of the feed.
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
13 Subclasses need to implement creating and storing the new feed entries.
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
14 '''
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
15
166
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
16 def __init__(self, preferences):
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
17 self.preferences = preferences
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
18
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
19 def update(self, feed):
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
20 self.feed = feed
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
21 log.info("updating " + feed.rss_url)
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
22 result = self._retrieveFeed()
167
a3c945ce434c adjust the sqlalchemy backend to the changes in AbstractFeedUpdater
dirk
parents: 166
diff changeset
23 self._setFeedTitle(result)
160
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
24 self._processEntries(result)
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
25
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
26 def _retrieveFeed(self):
166
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
27 if self.preferences.isProxyConfigured():
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
28 proxyUrl = "http://%s:%i" % (self.preferences.proxyHost(), self.preferences.proxyPort())
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
29 proxyHandler = ProxyHandler({"http" : proxyUrl})
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
30 result = feedparser.parse(self.feed.rss_url, handlers=[proxyHandler])
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
31 else:
167
a3c945ce434c adjust the sqlalchemy backend to the changes in AbstractFeedUpdater
dirk
parents: 166
diff changeset
32 # when updating to python3 see http://code.google.com/p/feedparser/issues/detail?id=260
166
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
33 result = feedparser.parse(self.feed.rss_url)
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
34 # bozo flags if a feed is well-formed.
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
35 # if result["bozo"] > 0:
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
36 # raise FeedUpdateException()
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
37 status = result["status"]
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
38 if status >= STATUS_ERROR:
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
39 raise FeedUpdateException("HTTP status " + str(status))
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
40 return result
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
41
160
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
42 def _processEntries(self, feedDict):
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
43 for entry in feedDict.entries:
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
44 self._normalize(entry)
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
45 self._processEntry(entry)
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
46 self._incrementFeedUpdateDate()
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
47
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
48 def _normalize(self, entry):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
49 if not hasattr(entry, "id"):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
50 entry.id = entry.link
187
2f2016a10f7d handle a missing updated_parsed attribute in a feed entry gracefully
dirk
parents: 167
diff changeset
51 if not hasattr(entry, "updated_parsed") or entry.updated_parsed is None:
2f2016a10f7d handle a missing updated_parsed attribute in a feed entry gracefully
dirk
parents: 167
diff changeset
52 # TODO try to parse the entry.updated date string
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
53 entry.updated_parsed = datetime.today()
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
54 else:
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
55 entry.updated_parsed = datetime(*entry.updated_parsed[:6])
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
56 if not hasattr(entry, "summary"):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
57 if hasattr(entry, "content"):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
58 entry.summary = entry.content[0].value
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
59 else:
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
60 entry.summary = ""
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
61
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
62 def _processEntry(self, entry):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
63 raise Exception("_processEntry is abstract, subclasses must override")
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
64
144
74217db92993 updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 141
diff changeset
65 def _incrementFeedUpdateDate(self):
74217db92993 updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 141
diff changeset
66 raise Exception("_incrementNextUpdateDate is abstract, subclasses must override")
74217db92993 updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 141
diff changeset
67
166
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
68 def _setFeedTitle(self, feedDict):
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
69 if self.feed.title is None:
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
70 if feedDict.feed.has_key("title"):
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
71 self.feed.title = feedDict.feed.title
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
72 else:
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
73 self.feed.title = self.feed.rss_url
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
74
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
75
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
76 class FeedUpdateException(Exception):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
77 pass