annotate backend/AbstractFeedUpdater.py @ 242:03e3ebb1d52f

Disable the use of a proxy when updating feeds - https traffic does not seem to work currently over a proxy.
author Dirk Olmes <dirk@xanthippe.ping.de>
date Mon, 08 Jun 2015 19:20:46 +0200
parents e34c53a3e407
children 51d2c3d55f4b
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
217
bb3c851b18b1 add source file endcoding header
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 213
diff changeset
1 # -*- coding: utf-8 -*-
206
f74fe7cb5091 when updating feeds, only ever create new Feed objects for entries that are younger than the current expire date. This ensures that we do not see old, read, expired entries again
dirk
parents: 197
diff changeset
2 import AbstractBackend
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
3 import feedparser
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
4 import logging
218
699d8f1cebd4 unify imports, especially Qt imports. Use consistent super syntax
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 217
diff changeset
5 from datetime import datetime
699d8f1cebd4 unify imports, especially Qt imports. Use consistent super syntax
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 217
diff changeset
6 from urllib2 import ProxyHandler
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
7
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
8 STATUS_ERROR = 400
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
9 log = logging.getLogger("FeedUpdater")
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
10
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
11 class AbstractFeedUpdater(object):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
12 '''
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
13 Abstract base class for FeedUpdater implementations - handles all the parsing of the feed.
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
14 Subclasses need to implement creating and storing the new feed entries.
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
15 '''
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
16
166
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
17 def __init__(self, preferences):
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
18 self.preferences = preferences
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
19
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
20 def update(self, feed):
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
21 self.feed = feed
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
22 log.info("updating " + feed.rss_url)
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
23 result = self._retrieveFeed()
167
a3c945ce434c adjust the sqlalchemy backend to the changes in AbstractFeedUpdater
dirk
parents: 166
diff changeset
24 self._setFeedTitle(result)
160
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
25 self._processEntries(result)
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
26
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
27 def _retrieveFeed(self):
242
03e3ebb1d52f Disable the use of a proxy when updating feeds - https traffic does not seem to work currently over a proxy.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 233
diff changeset
28 # Retrieving https connections over a proxy does not seem to work currently
03e3ebb1d52f Disable the use of a proxy when updating feeds - https traffic does not seem to work currently over a proxy.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 233
diff changeset
29 #if self.preferences.isProxyConfigured():
03e3ebb1d52f Disable the use of a proxy when updating feeds - https traffic does not seem to work currently over a proxy.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 233
diff changeset
30 # proxyUrl = '{0}:{1}'.format(self.preferences.proxyHost(), self.preferences.proxyPort())
03e3ebb1d52f Disable the use of a proxy when updating feeds - https traffic does not seem to work currently over a proxy.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 233
diff changeset
31 # proxyHandler = ProxyHandler({'http': proxyUrl, 'https': proxyUrl})
03e3ebb1d52f Disable the use of a proxy when updating feeds - https traffic does not seem to work currently over a proxy.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 233
diff changeset
32 # result = feedparser.parse(self.feed.rss_url, handlers=[proxyHandler])
03e3ebb1d52f Disable the use of a proxy when updating feeds - https traffic does not seem to work currently over a proxy.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 233
diff changeset
33 #else:
03e3ebb1d52f Disable the use of a proxy when updating feeds - https traffic does not seem to work currently over a proxy.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 233
diff changeset
34 # # when updating to python3 see http://code.google.com/p/feedparser/issues/detail?id=260
03e3ebb1d52f Disable the use of a proxy when updating feeds - https traffic does not seem to work currently over a proxy.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 233
diff changeset
35 result = feedparser.parse(self.feed.rss_url)
03e3ebb1d52f Disable the use of a proxy when updating feeds - https traffic does not seem to work currently over a proxy.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 233
diff changeset
36 print(result)
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
37 # bozo flags if a feed is well-formed.
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
38 # if result["bozo"] > 0:
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
39 # raise FeedUpdateException()
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
40 status = result["status"]
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
41 if status >= STATUS_ERROR:
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
42 raise FeedUpdateException("HTTP status " + str(status))
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
43 return result
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
44
160
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
45 def _processEntries(self, feedDict):
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
46 for entry in feedDict.entries:
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
47 self._normalize(entry)
206
f74fe7cb5091 when updating feeds, only ever create new Feed objects for entries that are younger than the current expire date. This ensures that we do not see old, read, expired entries again
dirk
parents: 197
diff changeset
48 if not self._isExpired(entry):
f74fe7cb5091 when updating feeds, only ever create new Feed objects for entries that are younger than the current expire date. This ensures that we do not see old, read, expired entries again
dirk
parents: 197
diff changeset
49 self._processEntry(entry)
160
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
50 self._incrementFeedUpdateDate()
86f828096aaf Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents: 144
diff changeset
51
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
52 def _normalize(self, entry):
197
e604c32f67aa normalize the published date if the feed contains none
dirk
parents: 187
diff changeset
53 self._normalizeId(entry)
e604c32f67aa normalize the published date if the feed contains none
dirk
parents: 187
diff changeset
54 self._normalizePublishedDate(entry)
e604c32f67aa normalize the published date if the feed contains none
dirk
parents: 187
diff changeset
55 self._normalizeUpdatedDate(entry)
e604c32f67aa normalize the published date if the feed contains none
dirk
parents: 187
diff changeset
56 self._normalizeSummary(entry)
e604c32f67aa normalize the published date if the feed contains none
dirk
parents: 187
diff changeset
57
e604c32f67aa normalize the published date if the feed contains none
dirk
parents: 187
diff changeset
58 def _normalizeId(self, entry):
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
59 if not hasattr(entry, "id"):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
60 entry.id = entry.link
197
e604c32f67aa normalize the published date if the feed contains none
dirk
parents: 187
diff changeset
61
e604c32f67aa normalize the published date if the feed contains none
dirk
parents: 187
diff changeset
62 def _normalizePublishedDate(self, entry):
e604c32f67aa normalize the published date if the feed contains none
dirk
parents: 187
diff changeset
63 if not hasattr(entry, "published"):
e604c32f67aa normalize the published date if the feed contains none
dirk
parents: 187
diff changeset
64 if hasattr(entry, "updated"):
e604c32f67aa normalize the published date if the feed contains none
dirk
parents: 187
diff changeset
65 entry.published = entry.updated
e604c32f67aa normalize the published date if the feed contains none
dirk
parents: 187
diff changeset
66
e604c32f67aa normalize the published date if the feed contains none
dirk
parents: 187
diff changeset
67 def _normalizeUpdatedDate(self, entry):
187
2f2016a10f7d handle a missing updated_parsed attribute in a feed entry gracefully
dirk
parents: 167
diff changeset
68 if not hasattr(entry, "updated_parsed") or entry.updated_parsed is None:
213
524cbf9e413c use correct TODO tags so they show up in the tasks view in Eclipse
dirk
parents: 206
diff changeset
69 # TODO: try to parse the entry.updated date string
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
70 entry.updated_parsed = datetime.today()
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
71 else:
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
72 entry.updated_parsed = datetime(*entry.updated_parsed[:6])
197
e604c32f67aa normalize the published date if the feed contains none
dirk
parents: 187
diff changeset
73
e604c32f67aa normalize the published date if the feed contains none
dirk
parents: 187
diff changeset
74 def _normalizeSummary(self, entry):
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
75 if not hasattr(entry, "summary"):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
76 if hasattr(entry, "content"):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
77 entry.summary = entry.content[0].value
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
78 else:
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
79 entry.summary = ""
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
80
206
f74fe7cb5091 when updating feeds, only ever create new Feed objects for entries that are younger than the current expire date. This ensures that we do not see old, read, expired entries again
dirk
parents: 197
diff changeset
81 def _isExpired(self, entry):
f74fe7cb5091 when updating feeds, only ever create new Feed objects for entries that are younger than the current expire date. This ensures that we do not see old, read, expired entries again
dirk
parents: 197
diff changeset
82 expireDate = AbstractBackend.calculateExpireDate(self.preferences)
f74fe7cb5091 when updating feeds, only ever create new Feed objects for entries that are younger than the current expire date. This ensures that we do not see old, read, expired entries again
dirk
parents: 197
diff changeset
83 return entry.updated_parsed < expireDate
f74fe7cb5091 when updating feeds, only ever create new Feed objects for entries that are younger than the current expire date. This ensures that we do not see old, read, expired entries again
dirk
parents: 197
diff changeset
84
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
85 def _processEntry(self, entry):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
86 raise Exception("_processEntry is abstract, subclasses must override")
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
87
144
74217db92993 updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 141
diff changeset
88 def _incrementFeedUpdateDate(self):
74217db92993 updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 141
diff changeset
89 raise Exception("_incrementNextUpdateDate is abstract, subclasses must override")
74217db92993 updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 141
diff changeset
90
166
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
91 def _setFeedTitle(self, feedDict):
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
92 if self.feed.title is None:
233
e34c53a3e407 fixes from eric's style check
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 218
diff changeset
93 if 'title' in feedDict.feed:
166
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
94 self.feed.title = feedDict.feed.title
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
95 else:
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
96 self.feed.title = self.feed.rss_url
04c3b9796b89 feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
dirk
parents: 160
diff changeset
97
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
98
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
99 class FeedUpdateException(Exception):
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
100 pass