Mercurial > hg > Feedworm
annotate backend/sqlalchemy/FeedUpdater.py @ 141:6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
author | Dirk Olmes <dirk@xanthippe.ping.de> |
---|---|
date | Wed, 24 Aug 2011 10:53:46 +0200 |
parents | 862760b161b4 |
children | 1524e1cefd39 |
rev | line source |
---|---|
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
1 |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
2 from backend.AbstractFeedUpdater import AbstractFeedUpdater, FeedUpdateException |
5
bfd47f55d85b
add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
4
diff
changeset
|
3 from datetime import datetime |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
4 from Feed import Feed |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
5 from FeedEntry import FeedEntry |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
6 import feedparser |
11
e87c54b3a216
use the logging framework for printing messages
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
10
diff
changeset
|
7 import logging |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
8 |
28
72dfae865899
better logging when updating feeds, handle entries that have no id
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
27
diff
changeset
|
9 STATUS_ERROR = 400 |
72dfae865899
better logging when updating feeds, handle entries that have no id
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
27
diff
changeset
|
10 log = logging.getLogger("FeedUpdater") |
9
fd4c8bfa62d6
FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
7
diff
changeset
|
11 |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
12 def updateAllFeeds(session): |
35
aaec263f07ca
Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
28
diff
changeset
|
13 allFeeds = findFeedsToUpdate(session) |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
14 for feed in allFeeds: |
10
01a86b178e60
catch the FeedUpdateException that might be raised when updating a feed, print it and continue with next feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
9
diff
changeset
|
15 try: |
01a86b178e60
catch the FeedUpdateException that might be raised when updating a feed, print it and continue with next feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
9
diff
changeset
|
16 FeedUpdater(session, feed).update() |
62
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
17 except FeedUpdateException, fue: |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
18 log.warn("problems while updating feed " + feed.rss_url + ": " + str(fue)) |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
19 session.commit() |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
20 |
35
aaec263f07ca
Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
28
diff
changeset
|
21 def findFeedsToUpdate(session): |
aaec263f07ca
Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
28
diff
changeset
|
22 return session.query(Feed).filter(Feed.next_update < datetime.now()) |
aaec263f07ca
Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
28
diff
changeset
|
23 |
62
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
24 def normalize(entry): |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
25 if not hasattr(entry, "id"): |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
26 entry.id = entry.link |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
27 if not hasattr(entry, "updated_parsed"): |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
28 entry.updated_parsed = datetime.today() |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
29 else: |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
30 entry.updated_parsed = datetime(*entry.updated_parsed[:6]) |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
31 if not hasattr(entry, "summary"): |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
32 if hasattr(entry, "content"): |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
33 entry.summary = entry.content[0].value |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
34 else: |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
35 entry.summary = "" |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
36 |
123
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
37 def createNewFeed(url, session): |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
38 # when updating to python3 see http://code.google.com/p/feedparser/issues/detail?id=260 |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
39 result = feedparser.parse(url) |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
40 if result.has_key("title"): |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
41 title = result["feed"].title |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
42 else: |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
43 title = url |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
44 newFeed = Feed(title, url) |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
45 session.add(newFeed) |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
46 |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
47 FeedUpdater(session, newFeed).update() |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
48 |
862760b161b4
restructured adding a feed so that only the URL is passed into the backend - the rest of the operation is backend-internal
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
121
diff
changeset
|
49 |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
50 class FeedUpdater(AbstractFeedUpdater): |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
51 def __init__(self, session, feed): |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
52 AbstractFeedUpdater.__init__(self, feed) |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
53 self.session = session |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
54 |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
55 def _processEntry(self, entry): |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
56 feedEntry = FeedEntry.findById(entry.id, self.session) |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
57 if feedEntry is None: |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
58 self._createFeedEntry(entry) |
100
99807963d9e0
use the URL as feed title if the feed itself does not come with a title
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
85
diff
changeset
|
59 |
141
6ea813cfac33
pull out common code for updating a feed into an abstract class, have the sqlalchemy backend use that class.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
123
diff
changeset
|
60 def _createFeedEntry(self, entry): |
62
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
61 new = FeedEntry.create(entry) |
5
bfd47f55d85b
add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
4
diff
changeset
|
62 new.feed = self.feed |
bfd47f55d85b
add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
4
diff
changeset
|
63 self.session.add(new) |
66 | 64 log.info("new feed entry: " + entry.title) |