解决了问题,请在接受的帖子中查看解决方案
我正在尝试收集来自指定地理区域的50条推文.我下面的代码将打印50条推文,但其中许多都没有坐标.这是否意味着不是从指定区域生成带有“ NONE”的这些推文?您能解释一下这里发生了什么吗?以及如何从该指定地理区域收集50条推文?提前致谢.
# Import Tweepy, sys, sleep, credentials.py
try:
import json
except ImportError:
import simplejson as json
import tweepy, sys
from time import sleep
from credentials import *
# Access and authorize our Twitter credentials from credentials.py
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# Assign coordinates to the variable
box = [-74.0,40.73,-73.0,41.73]
#override tweepy.StreamListener to add logic to on_status
class MyStreamListener(tweepy.StreamListener):
def __init__(self, api=None):
super(MyStreamListener, self).__init__()
self.counter = 0
def on_status(self, status):
record = {'Text': status.text, 'Coordinates': status.coordinates, 'Created At': status.created_at}
self.counter += 1
if self.counter <= 50:
print record
return True
else:
return False
def on_error(self, status_code):
if status_code == 420:
#returning False in on_data disconnects the stream
return False
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(api.auth, listener=myStreamListener)
myStream.filter(locations=box, async=True)
print myStream
结果如下:
{'Text': u"What?...", 'Created At': datetime.datetime(2017, 3, 12, 2, 55, 6), 'Coordinates': {u'type': u'Point', u'coordinates': [-74.
1234567, 40.1234567]}}
{'Text': u'WHEN?...', 'Created A
t': datetime.datetime(2017, 3, 12, 2, 55, 8), 'Coordinates': None}
{'Text': u'Wooo...', 'Created At': datetime.datetime(2017, 3, 12, 2, 55, 9), 'Coordinates': None}
{'Text': u'Man...', 'Created At': datetime.datetime(2017, 3, 12, 2, 55, 9), 'Coordina
tes': None}
{'Text': u'The...', 'Created At': datetime.datetime(201
7, 3, 12, 2, 55, 10), 'Coordinates': None}
解决方法:
从文档:
Only geolocated Tweets falling
within the requested bounding boxes will be included—unlike the Search
API, the user’s location field is not used to filter Tweets.
确保响应中的推文来自提供的边界框.
边界框过滤器如何工作?
The streaming API uses the following heuristic to determine whether a
given Tweet falls within a bounding box:
If the coordinates field is populated, the values there will be tested against the bounding box. Note that this field uses geoJSON
order (longitude, latitude).If coordinates is empty but place is populated,the region defined in place is checked for intersection against the locations bounding box.
Any overlap will match. If none of the rules listed above match, the
Tweet does not match the location query.
再一次,这意味着坐标字段可以为None,但是bbox过滤器保证可以从边界框区域返回推文
来源:https://dev.twitter.com/streaming/overview/request-parameters#locations
编辑:地点是响应中类似于坐标的字段.