Mercurial > hg > Feedworm
annotate backend/AbstractFeedUpdater.py @ 197:e604c32f67aa
normalize the published date if the feed contains none
author | dirk |
---|---|
date | Tue, 24 Jan 2012 10:08:45 +0100 |
parents | 2f2016a10f7d |
children | f74fe7cb5091 |
rev | line source |
---|---|
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
1 |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
2 from datetime import datetime |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
3 import feedparser |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
4 import logging |
166
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
5 from urllib2 import ProxyHandler |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
6 |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
7 STATUS_ERROR = 400 |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
8 log = logging.getLogger("FeedUpdater") |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
9 |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
10 class AbstractFeedUpdater(object): |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
11 ''' |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
12 Abstract base class for FeedUpdater implementations - handles all the parsing of the feed. |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
13 Subclasses need to implement creating and storing the new feed entries. |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
14 ''' |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
15 |
166
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
16 def __init__(self, preferences): |
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
17 self.preferences = preferences |
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
18 |
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
19 def update(self, feed): |
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
20 self.feed = feed |
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
21 log.info("updating " + feed.rss_url) |
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
22 result = self._retrieveFeed() |
167
a3c945ce434c
adjust the sqlalchemy backend to the changes in AbstractFeedUpdater
dirk
parents:
166
diff
changeset
|
23 self._setFeedTitle(result) |
160
86f828096aaf
Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents:
144
diff
changeset
|
24 self._processEntries(result) |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
25 |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
26 def _retrieveFeed(self): |
166
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
27 if self.preferences.isProxyConfigured(): |
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
28 proxyUrl = "http://%s:%i" % (self.preferences.proxyHost(), self.preferences.proxyPort()) |
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
29 proxyHandler = ProxyHandler({"http" : proxyUrl}) |
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
30 result = feedparser.parse(self.feed.rss_url, handlers=[proxyHandler]) |
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
31 else: |
167
a3c945ce434c
adjust the sqlalchemy backend to the changes in AbstractFeedUpdater
dirk
parents:
166
diff
changeset
|
32 # when updating to python3 see http://code.google.com/p/feedparser/issues/detail?id=260 |
166
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
33 result = feedparser.parse(self.feed.rss_url) |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
34 # bozo flags if a feed is well-formed. |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
35 # if result["bozo"] > 0: |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
36 # raise FeedUpdateException() |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
37 status = result["status"] |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
38 if status >= STATUS_ERROR: |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
39 raise FeedUpdateException("HTTP status " + str(status)) |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
40 return result |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
41 |
160
86f828096aaf
Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents:
144
diff
changeset
|
42 def _processEntries(self, feedDict): |
86f828096aaf
Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents:
144
diff
changeset
|
43 for entry in feedDict.entries: |
86f828096aaf
Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents:
144
diff
changeset
|
44 self._normalize(entry) |
86f828096aaf
Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents:
144
diff
changeset
|
45 self._processEntry(entry) |
86f828096aaf
Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents:
144
diff
changeset
|
46 self._incrementFeedUpdateDate() |
86f828096aaf
Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents:
144
diff
changeset
|
47 |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
48 def _normalize(self, entry): |
197
e604c32f67aa
normalize the published date if the feed contains none
dirk
parents:
187
diff
changeset
|
49 self._normalizeId(entry) |
e604c32f67aa
normalize the published date if the feed contains none
dirk
parents:
187
diff
changeset
|
50 self._normalizePublishedDate(entry) |
e604c32f67aa
normalize the published date if the feed contains none
dirk
parents:
187
diff
changeset
|
51 self._normalizeUpdatedDate(entry) |
e604c32f67aa
normalize the published date if the feed contains none
dirk
parents:
187
diff
changeset
|
52 self._normalizeSummary(entry) |
e604c32f67aa
normalize the published date if the feed contains none
dirk
parents:
187
diff
changeset
|
53 |
e604c32f67aa
normalize the published date if the feed contains none
dirk
parents:
187
diff
changeset
|
54 def _normalizeId(self, entry): |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
55 if not hasattr(entry, "id"): |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
56 entry.id = entry.link |
197
e604c32f67aa
normalize the published date if the feed contains none
dirk
parents:
187
diff
changeset
|
57 |
e604c32f67aa
normalize the published date if the feed contains none
dirk
parents:
187
diff
changeset
|
58 def _normalizePublishedDate(self, entry): |
e604c32f67aa
normalize the published date if the feed contains none
dirk
parents:
187
diff
changeset
|
59 if not hasattr(entry, "published"): |
e604c32f67aa
normalize the published date if the feed contains none
dirk
parents:
187
diff
changeset
|
60 if hasattr(entry, "updated"): |
e604c32f67aa
normalize the published date if the feed contains none
dirk
parents:
187
diff
changeset
|
61 entry.published = entry.updated |
e604c32f67aa
normalize the published date if the feed contains none
dirk
parents:
187
diff
changeset
|
62 |
e604c32f67aa
normalize the published date if the feed contains none
dirk
parents:
187
diff
changeset
|
63 def _normalizeUpdatedDate(self, entry): |
187
2f2016a10f7d
handle a missing updated_parsed attribute in a feed entry gracefully
dirk
parents:
167
diff
changeset
|
64 if not hasattr(entry, "updated_parsed") or entry.updated_parsed is None: |
2f2016a10f7d
handle a missing updated_parsed attribute in a feed entry gracefully
dirk
parents:
167
diff
changeset
|
65 # TODO try to parse the entry.updated date string |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
66 entry.updated_parsed = datetime.today() |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
67 else: |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
68 entry.updated_parsed = datetime(*entry.updated_parsed[:6]) |
197
e604c32f67aa
normalize the published date if the feed contains none
dirk
parents:
187
diff
changeset
|
69 |
e604c32f67aa
normalize the published date if the feed contains none
dirk
parents:
187
diff
changeset
|
70 def _normalizeSummary(self, entry): |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
71 if not hasattr(entry, "summary"): |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
72 if hasattr(entry, "content"): |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
73 entry.summary = entry.content[0].value |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
74 else: |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
75 entry.summary = "" |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
76 |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
77 def _processEntry(self, entry): |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
78 raise Exception("_processEntry is abstract, subclasses must override") |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
79 |
144
74217db92993
updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
141
diff
changeset
|
80 def _incrementFeedUpdateDate(self): |
74217db92993
updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
141
diff
changeset
|
81 raise Exception("_incrementNextUpdateDate is abstract, subclasses must override") |
74217db92993
updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
141
diff
changeset
|
82 |
166
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
83 def _setFeedTitle(self, feedDict): |
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
84 if self.feed.title is None: |
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
85 if feedDict.feed.has_key("title"): |
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
86 self.feed.title = feedDict.feed.title |
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
87 else: |
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
88 self.feed.title = self.feed.rss_url |
04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents:
160
diff
changeset
|
89 |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
90 |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
91 class FeedUpdateException(Exception): |
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
92 pass |