Mercurial > hg > Feedworm
annotate FeedUpdater.py @ 118:0e73adb2dec4 backend
branch for extracting backends
author | Dirk Olmes <dirk@xanthippe.ping.de> |
---|---|
date | Sun, 21 Aug 2011 02:47:25 +0200 |
parents | e4038dd8cc0e |
children |
rev | line source |
---|---|
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
1 |
5
bfd47f55d85b
add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
4
diff
changeset
|
2 from datetime import datetime |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
3 from Feed import Feed |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
4 from FeedEntry import FeedEntry |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
5 import feedparser |
11
e87c54b3a216
use the logging framework for printing messages
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
10
diff
changeset
|
6 import logging |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
7 |
28
72dfae865899
better logging when updating feeds, handle entries that have no id
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
27
diff
changeset
|
8 STATUS_ERROR = 400 |
72dfae865899
better logging when updating feeds, handle entries that have no id
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
27
diff
changeset
|
9 log = logging.getLogger("FeedUpdater") |
9
fd4c8bfa62d6
FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
7
diff
changeset
|
10 |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
11 def updateAllFeeds(session): |
35
aaec263f07ca
Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
28
diff
changeset
|
12 allFeeds = findFeedsToUpdate(session) |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
13 for feed in allFeeds: |
10
01a86b178e60
catch the FeedUpdateException that might be raised when updating a feed, print it and continue with next feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
9
diff
changeset
|
14 try: |
01a86b178e60
catch the FeedUpdateException that might be raised when updating a feed, print it and continue with next feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
9
diff
changeset
|
15 FeedUpdater(session, feed).update() |
62
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
16 except FeedUpdateException, fue: |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
17 log.warn("problems while updating feed " + feed.rss_url + ": " + str(fue)) |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
18 session.commit() |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
19 |
35
aaec263f07ca
Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
28
diff
changeset
|
20 def findFeedsToUpdate(session): |
aaec263f07ca
Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
28
diff
changeset
|
21 return session.query(Feed).filter(Feed.next_update < datetime.now()) |
aaec263f07ca
Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
28
diff
changeset
|
22 |
27
bdd1296a4b8c
implemented adding a feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
11
diff
changeset
|
23 def createNewFeed(url, session): |
112
e4038dd8cc0e
add a comment about a Feedparser bug wrt python3
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
101
diff
changeset
|
24 # when updating to python3 see http://code.google.com/p/feedparser/issues/detail?id=260 |
27
bdd1296a4b8c
implemented adding a feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
11
diff
changeset
|
25 result = feedparser.parse(url) |
100
99807963d9e0
use the URL as feed title if the feed itself does not come with a title
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
85
diff
changeset
|
26 if result.has_key("title"): |
99807963d9e0
use the URL as feed title if the feed itself does not come with a title
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
85
diff
changeset
|
27 title = result["feed"].title |
99807963d9e0
use the URL as feed title if the feed itself does not come with a title
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
85
diff
changeset
|
28 else: |
99807963d9e0
use the URL as feed title if the feed itself does not come with a title
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
85
diff
changeset
|
29 title = url |
27
bdd1296a4b8c
implemented adding a feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
11
diff
changeset
|
30 newFeed = Feed(title, url) |
bdd1296a4b8c
implemented adding a feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
11
diff
changeset
|
31 session.add(newFeed) |
100
99807963d9e0
use the URL as feed title if the feed itself does not come with a title
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
85
diff
changeset
|
32 |
45
0604e374c1d6
pass session when creating a new feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
35
diff
changeset
|
33 FeedUpdater(session, newFeed).update() |
27
bdd1296a4b8c
implemented adding a feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
11
diff
changeset
|
34 |
62
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
35 def normalize(entry): |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
36 if not hasattr(entry, "id"): |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
37 entry.id = entry.link |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
38 if not hasattr(entry, "updated_parsed"): |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
39 entry.updated_parsed = datetime.today() |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
40 else: |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
41 entry.updated_parsed = datetime(*entry.updated_parsed[:6]) |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
42 if not hasattr(entry, "summary"): |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
43 if hasattr(entry, "content"): |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
44 entry.summary = entry.content[0].value |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
45 else: |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
46 entry.summary = "" |
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
47 |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
48 class FeedUpdater(object): |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
49 def __init__(self, session, feed): |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
50 self.session = session |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
51 self.feed = feed |
100
99807963d9e0
use the URL as feed title if the feed itself does not come with a title
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
85
diff
changeset
|
52 |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
53 def update(self): |
28
72dfae865899
better logging when updating feeds, handle entries that have no id
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
27
diff
changeset
|
54 log.info("updating " + self.feed.rss_url) |
9
fd4c8bfa62d6
FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
7
diff
changeset
|
55 result = self.getFeed() |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
56 for entry in result.entries: |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
57 self.processEntry(entry) |
35
aaec263f07ca
Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
28
diff
changeset
|
58 self.feed.incrementNextUpdateDate() |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
59 |
9
fd4c8bfa62d6
FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
7
diff
changeset
|
60 def getFeed(self): |
fd4c8bfa62d6
FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
7
diff
changeset
|
61 result = feedparser.parse(self.feed.rss_url) |
101
b2a51c24f209
Provide a better error message if updating a feed fails.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
100
diff
changeset
|
62 # bozo flags if a feed is well-formed. |
b2a51c24f209
Provide a better error message if updating a feed fails.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
100
diff
changeset
|
63 # if result["bozo"] > 0: |
b2a51c24f209
Provide a better error message if updating a feed fails.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
100
diff
changeset
|
64 # raise FeedUpdateException() |
b2a51c24f209
Provide a better error message if updating a feed fails.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
100
diff
changeset
|
65 status = result["status"] |
b2a51c24f209
Provide a better error message if updating a feed fails.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
100
diff
changeset
|
66 if status >= STATUS_ERROR: |
b2a51c24f209
Provide a better error message if updating a feed fails.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
100
diff
changeset
|
67 raise FeedUpdateException("HTTP status " + str(status)) |
9
fd4c8bfa62d6
FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
7
diff
changeset
|
68 return result |
fd4c8bfa62d6
FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
7
diff
changeset
|
69 |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
70 def processEntry(self, entry): |
62
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
71 normalize(entry) |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
72 feedEntry = FeedEntry.findById(entry.id, self.session) |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
73 if feedEntry is None: |
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
74 self.createFeedEntry(entry) |
100
99807963d9e0
use the URL as feed title if the feed itself does not come with a title
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
85
diff
changeset
|
75 |
4
e0199f383442
retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff
changeset
|
76 def createFeedEntry(self, entry): |
62
abc0516a1c0c
FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents:
58
diff
changeset
|
77 new = FeedEntry.create(entry) |
5
bfd47f55d85b
add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
4
diff
changeset
|
78 new.feed = self.feed |
bfd47f55d85b
add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
4
diff
changeset
|
79 self.session.add(new) |
66 | 80 log.info("new feed entry: " + entry.title) |
9
fd4c8bfa62d6
FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
7
diff
changeset
|
81 |
fd4c8bfa62d6
FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
7
diff
changeset
|
82 class FeedUpdateException(Exception): |
10
01a86b178e60
catch the FeedUpdateException that might be raised when updating a feed, print it and continue with next feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
9
diff
changeset
|
83 pass |