Mercurial > hg > Feedworm
annotate backend/sqlalchemy/FeedUpdater.py @ 166:04c3b9796b89
feedparser uses the proxy now if one is configured. To implement this the FeedUpdater had to change a bit - sqlalchemy backend is not yet refactored.
author | dirk |
---|---|
date | Sat, 03 Sep 2011 04:12:35 +0200 |
parents | 86f828096aaf |
children | a3c945ce434c |
rev | line source |
---|---|
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
1 |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
2 from backend.AbstractFeedUpdater import AbstractFeedUpdater, FeedUpdateException |
5
bfd47f55d85b
add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
4
diff
changeset
|
3 from datetime import datetime |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
4 from Feed import Feed |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
5 from FeedEntry import FeedEntry |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
6 import feedparser |
11
e87c54b3a216
use the logging framework for printing messages
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
10
diff
changeset
|
7 import logging |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
8 |
28
72dfae865899
better logging when updating feeds, handle entries that have no id
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
27
diff
changeset
|
9 log = logging.getLogger("FeedUpdater") |
9
fd4c8bfa62d6
FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
7
diff
changeset
|
10 |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
11 def updateAllFeeds(session): |
35
aaec263f07ca
Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
28
diff
changeset
|
12 allFeeds = findFeedsToUpdate(session) |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
13 for feed in allFeeds: |
10
01a86b178e60
catch the FeedUpdateException that might be raised when updating a feed, print it and continue with next feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
9
diff
changeset
|
14 try: |
01a86b178e60
catch the FeedUpdateException that might be raised when updating a feed, print it and continue with next feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
9
diff
changeset
|
15 FeedUpdater(session, feed).update() |
62
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
16 except FeedUpdateException, fue: |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
17 log.warn("problems while updating feed " + feed.rss_url + ": " + str(fue)) |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
18 session.commit() |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
19 |
35
aaec263f07ca
Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
28
diff
changeset
|
20 def findFeedsToUpdate(session): |
aaec263f07ca
Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
28
diff
changeset
|
21 return session.query(Feed).filter(Feed.next_update < datetime.now()) |
aaec263f07ca
Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
28
diff
changeset
|
22 |
123
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
23 def createNewFeed(url, session): |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
24 # when updating to python3 see http://code.google.com/p/feedparser/issues/detail?id=260 |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
25 result = feedparser.parse(url) |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
26 if result.has_key("title"): |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
27 title = result["feed"].title |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
28 else: |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
29 title = url |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
30 newFeed = Feed(title, url) |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
31 session.add(newFeed) |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
32 |
160
86f828096aaf
Do not fetch and parse the feed twice when creating a new one. Pass the parsed info into the update method instead to reuse.
dirk
parents:
144
diff
changeset
|
33 FeedUpdater(session, newFeed).update(result) |
123
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
34 |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
35 |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
36 class FeedUpdater(AbstractFeedUpdater): |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
37 def __init__(self, session, feed): |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
38 AbstractFeedUpdater.__init__(self, feed) |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
39 self.session = session |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
40 |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
41 def _processEntry(self, entry): |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
42 feedEntry = FeedEntry.findById(entry.id, self.session) |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
43 if feedEntry is None: |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
44 self._createFeedEntry(entry) |
100
99807963d9e0
use the URL as feed title if the feed itself does not come with a title
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
85
diff
changeset
|
45 |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
46 def _createFeedEntry(self, entry): |
62
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
47 new = FeedEntry.create(entry) |
5
bfd47f55d85b
add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
4
diff
changeset
|
48 new.feed = self.feed |
bfd47f55d85b
add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
4
diff
changeset
|
49 self.session.add(new) |
66 | 50 log.info("new feed entry: " + entry.title) |
144
74217db92993
updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
143
diff
changeset
|
51 |
74217db92993
updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
143
diff
changeset
|
52 def _incrementFeedUpdateDate(self): |
74217db92993
updating feeds on the couchdb backend works now
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
143
diff
changeset
|
53 self.feed.incrementNextUpdateDate() |