我正在使用Reddit API Praw进行情感分析.我的代码如下:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import praw
from IPython import display
from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA
from pprint import pprint
import pandas as pd
import nltk
import seaborn as sns
import datetime
sns.set(style='darkgrid', context='talk', palette='Dark2')
reddit = praw.Reddit(client_id='XXXXXXXXXXX',
client_secret='XXXXXXXXXXXXXXXXXXX',
user_agent='*')
headlines = set()
results = []
sia = SIA()
for submission in reddit.subreddit('bitcoin').new(limit=None):
pol_score = sia.polarity_scores(submission.title)
pol_score['headline'] = submission.title
readable = datetime.datetime.fromtimestamp(submission.created_utc).isoformat()
results.append((submission.title, readable, pol_score["compound"]))
display.clear_output()
问题A:使用此代码,我只能提取文本的标题以及其他一些键.我想以JSON格式提取所有内容,但是研究我没有看到的文档(如果可能).
如果我只在reddit.subreddit(‘bitcoin’)中调用提交,则只会得到ID码.我想提取所有信息,所有信息并将其保存在JSON文件中.
问题B:如何提取特定日期的评论/消息?
解决方法:
问题A:
您只需在帖子的完整URL末尾添加.json即可获取该页面的完整Json,其中包括标题,作者,评论,投票和其他所有内容.
一旦您使用submitt.permalink获得了帖子的完整URL.您可以使用请求来获取该页面的Json.
import requests
url = submission.permalink
response = requests.get('http' + url + '.json')
json = response.content # your Json
问题B:
不幸的是,Reddit去年某个时候从其搜索API中删除了时间戳搜索.这是一个announcement post.
Besides some minor syntax differences, the most notable change is that searches by exact timestamp are no longer supported on the newer system. Limiting results to the past hour, day, week, month and year is still supported via the ?t= parameter (e.g. ?t=day)
因此,当前无法使用Praw进行此操作.但是您可以研究提供此功能的Pushshift api.