Mercurial > hg > Feedworm
annotate backend/sqlalchemy/FeedUpdater.py @ 155:a05719a6175e
move common functionality into an abstract backend class, have both backends inherit from it. Implement enough of the couchdb backend that reading feeds (and marking feed entries as read) is possible
author | Dirk Olmes <dirk@xanthippe.ping.de> |
---|---|
date | Sat, 27 Aug 2011 08:52:03 +0200 |
parents | 74217db92993 |
children | 86f828096aaf |
rev | line source |
---|---|
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
1 |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
2 from backend.AbstractFeedUpdater import AbstractFeedUpdater, FeedUpdateException |
5
bfd47f55d85b
add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
4
diff
changeset
|
3 from datetime import datetime |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
4 from Feed import Feed |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
5 from FeedEntry import FeedEntry |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
6 import feedparser |
11
e87c54b3a216
use the logging framework for printing messages
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
10
diff
changeset
|
7 import logging |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
8 |
28
72dfae865899
better logging when updating feeds, handle entries that have no id
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
27
diff
changeset
|
9 log = logging.getLogger("FeedUpdater") |
9
fd4c8bfa62d6
FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
7
diff
changeset
|
10 |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
11 def updateAllFeeds(session): |
35
aaec263f07ca
Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
28
diff
changeset
|
12 allFeeds = findFeedsToUpdate(session) |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
13 for feed in allFeeds: |
10
01a86b178e60
catch the FeedUpdateException that might be raised when updating a feed, print it and continue with next feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
9
diff
changeset
|
14 try: |
01a86b178e60
catch the FeedUpdateException that might be raised when updating a feed, print it and continue with next feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
9
diff
changeset
|
15 FeedUpdater(session, feed).update() |
62
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
16 except FeedUpdateException, fue: |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
17 log.warn("problems while updating feed " + feed.rss_url + ": " + str(fue)) |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
18 session.commit() |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
19 |
35
aaec263f07ca
Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
28
diff
changeset
|
20 def findFeedsToUpdate(session): |
aaec263f07ca
Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
28
diff
changeset
|
21 return session.query(Feed).filter(Feed.next_update < datetime.now()) |
aaec263f07ca
Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
28
diff
changeset
|
22 |
123
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
23 def createNewFeed(url, session): |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
24 # when updating to python3 see http://code.google.com/p/feedparser/issues/detail?id=260 |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
25 result = feedparser.parse(url) |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
26 if result.has_key("title"): |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
27 title = result["feed"].title |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
28 else: |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
29 title = url |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
30 newFeed = Feed(title, url) |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
31 session.add(newFeed) |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
32 |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
33 FeedUpdater(session, newFeed).update() |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
34 |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
35 |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
36 class FeedUpdater(AbstractFeedUpdater): |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
37 def __init__(self, session, feed): |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
38 AbstractFeedUpdater.__init__(self, feed) |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
39 self.session = session |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
40 |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
41 def _processEntry(self, entry): |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
42 feedEntry = FeedEntry.findById(entry.id, self.session) |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
43 if feedEntry is None: |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
44 self._createFeedEntry(entry) |
100
99807963d9e0
use the URL as feed title if the feed itself does not come with a title
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
85
diff
changeset
|
45 |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
46 def _createFeedEntry(self, entry): |
62
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
47 new = FeedEntry.create(entry) |
5
bfd47f55d85b
add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
4
diff
changeset
|
48 new.feed = self.feed |
bfd47f55d85b
add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
4
diff
changeset
|
49 self.session.add(new) |
66 | 50 log.info("new feed entry: " + entry.title) |
144
74217db92993
updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
143
diff
changeset
|
51 |
74217db92993
updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
143
diff
changeset
|
52 def _incrementFeedUpdateDate(self): |
74217db92993
updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
143
diff
changeset
|
53 self.feed.incrementNextUpdateDate() |