annotate backend/sqlalchemy/FeedUpdater.py @ 121:510a5d00e98a backend

re-enabled AddFeed - does not work yet
author Dirk Olmes <dirk@xanthippe.ping.de>
date Sun, 21 Aug 2011 04:17:13 +0200
parents FeedUpdater.py@e4038dd8cc0e
children 862760b161b4
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
1
5
bfd47f55d85b add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 4
diff changeset
2 from datetime import datetime
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
3 from Feed import Feed
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
4 from FeedEntry import FeedEntry
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
5 import feedparser
11
e87c54b3a216 use the logging framework for printing messages
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 10
diff changeset
6 import logging
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
7
28
72dfae865899 better logging when updating feeds, handle entries that have no id
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 27
diff changeset
8 STATUS_ERROR = 400
72dfae865899 better logging when updating feeds, handle entries that have no id
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 27
diff changeset
9 log = logging.getLogger("FeedUpdater")
9
fd4c8bfa62d6 FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 7
diff changeset
10
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
11 def updateAllFeeds(session):
35
aaec263f07ca Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 28
diff changeset
12 allFeeds = findFeedsToUpdate(session)
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
13 for feed in allFeeds:
10
01a86b178e60 catch the FeedUpdateException that might be raised when updating a feed, print it and continue with next feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 9
diff changeset
14 try:
01a86b178e60 catch the FeedUpdateException that might be raised when updating a feed, print it and continue with next feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 9
diff changeset
15 FeedUpdater(session, feed).update()
62
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
16 except FeedUpdateException, fue:
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
17 log.warn("problems while updating feed " + feed.rss_url + ": " + str(fue))
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
18 session.commit()
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
19
35
aaec263f07ca Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 28
diff changeset
20 def findFeedsToUpdate(session):
aaec263f07ca Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 28
diff changeset
21 return session.query(Feed).filter(Feed.next_update < datetime.now())
aaec263f07ca Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 28
diff changeset
22
62
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
23 def normalize(entry):
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
24 if not hasattr(entry, "id"):
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
25 entry.id = entry.link
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
26 if not hasattr(entry, "updated_parsed"):
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
27 entry.updated_parsed = datetime.today()
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
28 else:
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
29 entry.updated_parsed = datetime(*entry.updated_parsed[:6])
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
30 if not hasattr(entry, "summary"):
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
31 if hasattr(entry, "content"):
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
32 entry.summary = entry.content[0].value
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
33 else:
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
34 entry.summary = ""
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
35
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
36 class FeedUpdater(object):
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
37 def __init__(self, session, feed):
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
38 self.session = session
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
39 self.feed = feed
100
99807963d9e0 use the URL as feed title if the feed itself does not come with a title
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 85
diff changeset
40
121
510a5d00e98a re-enabled AddFeed - does not work yet
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 112
diff changeset
41 # TODO this is a HACK! creating new instances from itself is bad but required due to the storage of the session.
510a5d00e98a re-enabled AddFeed - does not work yet
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 112
diff changeset
42 def createNewFeed(self, url):
510a5d00e98a re-enabled AddFeed - does not work yet
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 112
diff changeset
43 # when updating to python3 see http://code.google.com/p/feedparser/issues/detail?id=260
510a5d00e98a re-enabled AddFeed - does not work yet
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 112
diff changeset
44 result = feedparser.parse(url)
510a5d00e98a re-enabled AddFeed - does not work yet
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 112
diff changeset
45 if result.has_key("title"):
510a5d00e98a re-enabled AddFeed - does not work yet
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 112
diff changeset
46 title = result["feed"].title
510a5d00e98a re-enabled AddFeed - does not work yet
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 112
diff changeset
47 else:
510a5d00e98a re-enabled AddFeed - does not work yet
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 112
diff changeset
48 title = url
510a5d00e98a re-enabled AddFeed - does not work yet
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 112
diff changeset
49 newFeed = Feed(title, url)
510a5d00e98a re-enabled AddFeed - does not work yet
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 112
diff changeset
50 self.session.add(newFeed)
510a5d00e98a re-enabled AddFeed - does not work yet
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 112
diff changeset
51
510a5d00e98a re-enabled AddFeed - does not work yet
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 112
diff changeset
52 FeedUpdater(self.session, newFeed).update()
510a5d00e98a re-enabled AddFeed - does not work yet
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 112
diff changeset
53
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
54 def update(self):
28
72dfae865899 better logging when updating feeds, handle entries that have no id
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 27
diff changeset
55 log.info("updating " + self.feed.rss_url)
9
fd4c8bfa62d6 FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 7
diff changeset
56 result = self.getFeed()
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
57 for entry in result.entries:
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
58 self.processEntry(entry)
35
aaec263f07ca Feeds manage the point in time when the next update should happen. FeedUpdater only updates feeds that are due.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 28
diff changeset
59 self.feed.incrementNextUpdateDate()
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
60
9
fd4c8bfa62d6 FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 7
diff changeset
61 def getFeed(self):
fd4c8bfa62d6 FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 7
diff changeset
62 result = feedparser.parse(self.feed.rss_url)
101
b2a51c24f209 Provide a better error message if updating a feed fails.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 100
diff changeset
63 # bozo flags if a feed is well-formed.
b2a51c24f209 Provide a better error message if updating a feed fails.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 100
diff changeset
64 # if result["bozo"] > 0:
b2a51c24f209 Provide a better error message if updating a feed fails.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 100
diff changeset
65 # raise FeedUpdateException()
b2a51c24f209 Provide a better error message if updating a feed fails.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 100
diff changeset
66 status = result["status"]
b2a51c24f209 Provide a better error message if updating a feed fails.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 100
diff changeset
67 if status >= STATUS_ERROR:
b2a51c24f209 Provide a better error message if updating a feed fails.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 100
diff changeset
68 raise FeedUpdateException("HTTP status " + str(status))
9
fd4c8bfa62d6 FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 7
diff changeset
69 return result
fd4c8bfa62d6 FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 7
diff changeset
70
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
71 def processEntry(self, entry):
62
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
72 normalize(entry)
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
73 feedEntry = FeedEntry.findById(entry.id, self.session)
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
74 if feedEntry is None:
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
75 self.createFeedEntry(entry)
100
99807963d9e0 use the URL as feed title if the feed itself does not come with a title
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 85
diff changeset
76
4
e0199f383442 retrieve a feed for the given URL, store entries as feed_entry rows into the database
Dirk Olmes <dirk@xanthippe.ping.de>
parents:
diff changeset
77 def createFeedEntry(self, entry):
62
abc0516a1c0c FeedEntry provides a static method for creating new entries: better modularization and support for working with the class in interactive mode. FeedUpdater's normalize method is a module function now, again for ease of use in interactive scenarios
dirk@xanthippe.ping.de
parents: 58
diff changeset
78 new = FeedEntry.create(entry)
5
bfd47f55d85b add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 4
diff changeset
79 new.feed = self.feed
bfd47f55d85b add the updated date of the feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 4
diff changeset
80 self.session.add(new)
66
97c4e94f99cf log when creating a new FeedEntry
dirk@xanthippe.ping.de
parents: 62
diff changeset
81 log.info("new feed entry: " + entry.title)
9
fd4c8bfa62d6 FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 7
diff changeset
82
fd4c8bfa62d6 FeedUpdater throws an exception if the URL could not be retrieved successfully. Includes unit tests.
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 7
diff changeset
83 class FeedUpdateException(Exception):
10
01a86b178e60 catch the FeedUpdateException that might be raised when updating a feed, print it and continue with next feed
Dirk Olmes <dirk@xanthippe.ping.de>
parents: 9
diff changeset
84 pass