annotate backend/sqlalchemy/FeedUpdater.py @ 155:a05719a6175e

move common functionality into an abstract backend class, have both backends inherit from it. Implement enough of the couchdb backend that reading feeds (and marking feed entries as read) is possible
author Dirk Olmes <dirk@xanthippe.ping.de>
date Sat, 27 Aug 2011 08:52:03 +0200
parents 74217db92993
children 86f828096aaf
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
1
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 123
diff changeset
2 from backend.AbstractFeedUpdater import AbstractFeedUpdater, FeedUpdateException
5
bfd47f55d85b add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 4
diff changeset
3 from datetime import datetime
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
4 from Feed import Feed
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
5 from FeedEntry import FeedEntry
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
6 import feedparser
11
e87c54b3a216 use the logging framework for printing messages
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 10
diff changeset
7 import logging
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
8
28
72dfae865899 better logging when updating feeds, handle entries that have no id
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 27
diff changeset
9 log = logging.getLogger("FeedUpdater")
9
fd4c8bfa62d6 FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 7
diff changeset
10
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
11 def updateAllFeeds(session):
35
aaec263f07ca Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 28
diff changeset
12 allFeeds = findFeedsToUpdate(session)
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
13 for feed in allFeeds:
10
01a86b178e60 catch the FeedUpdateException that might be raised when updating a feed, print it and continue with next feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 9
diff changeset
14 try:
01a86b178e60 catch the FeedUpdateException that might be raised when updating a feed, print it and continue with next feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 9
diff changeset
15 FeedUpdater(session, feed).update()
62
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
16 except FeedUpdateException, fue:
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
17 log.warn("problems while updating feed " + feed.rss_url + ": " + str(fue))
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
18 session.commit()
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
19
35
aaec263f07ca Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 28
diff changeset
20 def findFeedsToUpdate(session):
aaec263f07ca Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 28
diff changeset
21 return session.query(Feed).filter(Feed.next_update < datetime.now())
aaec263f07ca Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 28
diff changeset
22
123
862760b161b4 restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 121
diff changeset
23 def createNewFeed(url, session):
862760b161b4 restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 121
diff changeset
24 # when updating to python3 see http://code.google.com/p/feedparser/issues/detail?id=260
862760b161b4 restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 121
diff changeset
25 result = feedparser.parse(url)
862760b161b4 restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 121
diff changeset
26 if result.has_key("title"):
862760b161b4 restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 121
diff changeset
27 title = result["feed"].title
862760b161b4 restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 121
diff changeset
28 else:
862760b161b4 restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 121
diff changeset
29 title = url
862760b161b4 restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 121
diff changeset
30 newFeed = Feed(title, url)
862760b161b4 restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 121
diff changeset
31 session.add(newFeed)
862760b161b4 restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 121
diff changeset
32
862760b161b4 restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 121
diff changeset
33 FeedUpdater(session, newFeed).update()
862760b161b4 restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 121
diff changeset
34
862760b161b4 restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 121
diff changeset
35
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 123
diff changeset
36 class FeedUpdater(AbstractFeedUpdater):
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
37 def __init__(self, session, feed):
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 123
diff changeset
38 AbstractFeedUpdater.__init__(self, feed)
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
39 self.session = session
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
40
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 123
diff changeset
41 def _processEntry(self, entry):
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
42 feedEntry = FeedEntry.findById(entry.id, self.session)
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
43 if feedEntry is None:
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 123
diff changeset
44 self._createFeedEntry(entry)
100
99807963d9e0 use the URL as feed title if the feed itself does not come with a title
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 85
diff changeset
45
141
6ea813cfac33 pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 123
diff changeset
46 def _createFeedEntry(self, entry):
62
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
47 new = FeedEntry.create(entry)
5
bfd47f55d85b add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 4
diff changeset
48 new.feed = self.feed
bfd47f55d85b add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 4
diff changeset
49 self.session.add(new)
66
97c4e94f99cf log when creating a new FeedEntry
dirk@xanthippe.ping.de
parents: 62
diff changeset
50 log.info("new feed entry: " + entry.title)
144
74217db92993 updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 143
diff changeset
51
74217db92993 updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 143
diff changeset
52 def _incrementFeedUpdateDate(self):
74217db92993 updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 143
diff changeset
53 self.feed.incrementNextUpdateDate()