Questions and the responses they elicit are a ubiquitous and fundamental part of our everyday communication. Through such Questions and Answers (QA), we quench our curiosities, clarify doubts, validate our ideas, and seek advice, among others. It has been established that questions form an integral part in our quest to extend our knowledge boundaries [1]. It has also been observed that useful responses correspond to good questions [2]. This raises the following challenge: what factors constitute a good question which is more likely to elicit a response?
Importance of asking right questions in specific settings have been previously explored, e.g., in classroom [3], and in corporate environment [4]. However, most of these studies either had no empirical evaluation at all or
otherwise consisted of very small samples.
Along with the growth of the World Wide Web (WWW), many large online QA sites, such as Yahoo Answers, Stack Overflow, Quora, etc., have been successful in connecting responders to inquirers who post questions on these sites. Such online QA forums may be categorized as Single Inquirer Multiple Responders (SIMR), where a question from a single user may be responded to by multiple other responders. Prior research have used datasets from these sites to analyze which response to a question is most likely to be selected as the best response [5] . However, analyzing factors of a question which are likely to elicit a response has been outside the scope of such prior work.
To address these shortcomings, in this paper we present an empirical analysis to determine factors of a question which are more likely to elicit a response. We make use of the IAmA subreddit of the popular Internet website Reddit.com. In each discussion thread of this online forum, a celebrity answers questions submitted by anonymous users. Thus, dataset from this subreddit may be categorized as Multiple Inquirers Single Responder (MISR). Such MISR datasets provide an ideal starting point to identify response-eliciting factors of a question, as the undesirable confounds produced due to the presence of multiple responders in SIMR datasets are not present in such MISR datasets.
We make the following contribution:
• We address the important problem of automatically identifying response-eliciting factors of a question. We explore effectiveness of various factors, viz., orthographic, temporal, syntactic and also semantic of the question. To the best of our knowledge, this is the first such analysis of its kind.
• We make use of a novel dataset, questions and responses from the IAmA subreddit of reddit.com. This MISR dataset provides additional benefits compared to SIMR datasets which have been explored in previous related research.
• We provide a sparse, non-negative matrix factorization-based framework to automatically induce semantic factors of a question collection. Through extensive experiments on real datasets, we demonstrate that such factorization-based technique results in significantly more interpretable factors compared to standard topic modeling techniques, such as Latent Dirichlet Allocation (LDA).
• We hope to make all the code and datasets used in the paper publicly available upon publication of the paper.
Studies on questioning techniques date back to Socrates [6], [7], who encouraged a systematic, disciplined, and deep questioning of fundamental concepts, theories, issues and problems. Socratic questioning is widely adopted in education and psychotherapy. Under the Socratic Questioning scheme [8], questions are grouped as follows:
i) Clarifying questions, ones seeking further explanation, ii) Challenging the assumptions, questions that challenge the constraints, iii) Argument based questions, ones that reason behind the underlying theory or seek evidence, iv) Alternate viewpoints, questions that analyze the given scenario with an altogether different perspective, v) Implication and Consequence based questions
Since Socrates, many different taxonomies have been discussed. Bloom’s revised taxonomy given by Krathwohl [9] is based upon dividing questions into levels such that the amount of mental activity required to respond increases after each level. Their categories are — remembering, understanding, applying, analyzing, evaluating and creating. Nam et al.[10] group questions into Factual, Procedural, Opinion-oriented, Task-oriented and Advice related categories.
Role of Socratic Techniques in thinking, teaching and learning has also been explored [11]. Hypothetical questions too have been studied independently and have been found to foster creativity [12]. While there has been considerable thought given over such demarcations and question formulation techniques, none of them are supported by any large datasets as most of the experiments were performed in a typical classroom sized setting.
Unlike the mentioned qualitative analysis, Whittaker et al.[13] adopted a data-centric approach and uncovered the general demographic patterns among large samples of Usenet newsgroups. Some amount of research is also done on different QA forums like Yahoo! answers, e.g., [5] have proposed solutions to predict whether a particular answer will be chosen best by the inquirer or not.
Although the aforementioned taxonomies are helpful in understanding general questioning paradigms, we are more curious about the qualities of a question that are more likely to generate a response. To the best of our knowledge, there have been no attempts to study questions with the objective of maximizing response rate.
Variety of interesting questions have been studied using the Reddit conversation network. It has been used to understand how people react to online discussions[14], and to model the most reportable events in stories[15]. Domestic abuse analysis in [16] also was based upon Reddit.
An empirical case study to understand factors underlying successful favor requests online were studied in [17]. Like in that paper, we also make use of a subreddit as our primary dataset. Even though the setting explored in [17] is different than ours, this is probably the closest paper in motivation and spirit.
Unlike other analysis on online forums including Reddit and Yahoo! answers, our dataset is unique as it falls into MISR category, where there is just one responder but multiple inquirers. To the best of our knowledge, this is a first attempt to understand any such dataset.
Reddit is the 26th most popular website, with about 36 million user accounts. It also comprises of over 9,000 subreddits which are sub-forums within reddit, these subreddits are focussed towards specific topics. Subreddits span diverse categories like News, Sports, Machine Learning etc. Reddit is also a home of subreddits like: ELIF (Explain like I’m five), TIL (Today I learnt), AMA(Ask Me Anything) etc.
Various celebrities and noteworthy personalities have used reddit as a means to interact with the popular
TABLE I. REDDIT IAMA DATASETS FROM FOUR DOMAINS USED IN THE EXPERIMENTS IN THIS PAPER. SEE SECTION III FOR FURTHER DETAILS
internet crowd, such conversations fall under the Ask-Me-Anything and its variant subreddits. IAmA, AMA and casualama are 3 of the popular Ask-Me-Anything variants. IAmA is reserved for distinguished personalities, with an exception for people who have a truly interesting and unique event to take questions about. The other two AMA’s are open to a more wider audience for sharing their life events and allowing other reddit users to ask questions related to those events.
IAmA’s is one of the most popular subreddits that has featured notable politicians, actors, directors, authors, businessmen, athletes and musicians. IAmA posts gain a lot of attention, and thousands of questions are asked in each IAmA post. But owing to time constraints, not all questions are answered. This gives us a good ground to understand and analyze what gets answered and what not.
In particular, we study four popular categories of celebrities — actors, authors, directors and politicians. In each category we analyse the top 50 upvoted posts, which aggregate over 110,000 questions, and the average reply rate is 10.16%. Since some questions arrive after the celebrity has moved out of the conversation, we ignore all the questions after the last successfully answered question. Reddit allows for threaded conversations, where users can comment over other comments. But to avoid any bias from the discourse of the comments in such threads, we ignore questions in deep threaded conversations and constrain ourselves to questions posted at the topmost level only. Since some comments also get posted at the topmost level, we only consider comments that have a question mark in them. Table I throws some light over the statistics about the questions we considered as a part of our study.
In this section, we study various factors of questions that can result in healthy response rates. The factors we consider range from orthographic, temporal, social, and syntactical, to semantic aspects.
A. Orthographic Factors
Length: Do short questions win over their longer variants, as the responder may not be interested in comprehending and then answering long questions? Or, are longer questions better as they offer more context? Are shorter and crisper questions more direct and focused and have a better chance at getting answered? We analyze the impact of question on response rate to answer the aforementioned question.
B. Temporal Factors
Time of Question: Does the time of asking question play any role in determining the response rate? We hypothesize that questions that are asked early on have far less competing questions and hence should have better chances of soliciting response.
We capture temporal information in two ways: (1) we note the fraction of questions answered in the IAmA before a given question is posted as an estimate of the time of question; (2) we use the fraction of time elapsed in the IAmA as another indicator of the time of the question. In most cases, we see that the time features complement each other.
C. Social Factors
Politeness: Are polite questions more likely to generate a response? Or, is it the case that the default level of politeness expressed in the IAmA dataset already sufficient, and hence any additional politeness in the question is unlikely to positively affect response rate?
Politeness has been actively explored in the recent past in a variety of others research settings[18][19]. We employ the model introduced by Danescu-Niculescu-Mizil et al. [20] to measure politeness level of questions. This model bases its politeness score on the occurrences of greetings, apologies and hedges in the question.
D. Syntactic Factors
Syntactic: We ask whether questions that are simply formulated have better chances of getting answered? Syntactic features, such as parse tree depth, verb phrase depth, and their ratios [21], etc., have been used in past research as proxies for sentence complexity. In fact, such features have also been recently used to study syntactic
TABLE III. A FEW EXAMPLES OF REDUNDANT QUESTIONS ASKED TO A CHEF. SEE SECTION IV-E FOR DETAILS
complexity of reddit comments [15]. After generating constituent parse trees from the Stanford Corenlp package [22], we employ 16 such features to capture the essence of syntactic complexity in a given question.
We look at a few simple and a few complex sentences from the IAmA by President Barack Obama in Table II. We demonstrate how our features capture the varied levels of complexity. Since there can be various sentences and sub-questions in a given question, we calculate the average, maximum and minimum values of parse tree depths and verb phrase depths. It is because of such statistical aggregation techniques that we end up with 16 syntax features, but the basis of these features rest upon — parse tree of the sentence, verb phrase subtree and their ratios.
E. Forum Factors
Redundancy: Is a question which is very similar to already asked (or answered) questions in a given IAmA forum less likely to get a response? We think that is indeed the case and include factors in our analysis to account for question redundancy. Consider the questions in Table III asked to a popular Chef.
As the first few questions were not answered in the series of the above mentioned questions, it is nearly certain that the responder is not interesting in any such questions. By accounting for redundancy we hope to tackle similar and frequent scenarios.
We estimated the redundancy score of a given question as the maximum similarity score achieved with any of the other questions previously asked in the same IAmA.
Relevance: For each IAmA, the responder usually posts a description to set the tone of the IAmA. We ask whether questions which are more aligned to the posted description more likely to receive a response? The posted descriptions usually carry information about the celebrity responder’s current affiliation and engagements, and hence the hypothesis is that questions which are in line with such descriptions should outweigh other questions. In other words, relevant questions should attract more responses from the responder.
For both the relevance and redundancy factors, we came up with our own novel extension of Jaccard Similarity to account for sentence similarities. For two given sets A and B, the Jaccard Similarity is given by
For our case, let A and B be sets of words corresponding to the two questions to be compared. Strictly, would translate to the count of the words matched across the sets of A and B. But consider the following two sentences:
– How far is your workplace from your house?
– How far is your office from your home?
With the strict definition, we would not be able to capture that the two sentences are completely similar, for all practical purposes. Hence we consider the Glove embeddings[23], and synset hierarchies to extend the scope of our matching. Two words are considered same, if (1) the two words are synonyms to each other and (2) if one word lies in top-K nearest neighbours of the other word in Glove embedding space.
This technique helps us to capture similarity of pairs like <home, house> and <office, workplace> and hence helps us better estimate the similarity of two sentences.
F. Semantic Factors
The factors described so far consider various aspects of the questions being analyzed. However, none of them explicitly look at the semantic content of the question and perform analysis based on the semantic type of the question. For example, given questions of the following form posed to actors, ”what is your favorite movie?”, ”what is your favorite book?”, etc. we would like to group all such preference-probing questions into one category and then determine the response rate for such types of questions from actor responders. However, such categorization of questions are not readily available as we only have the list of question, and no additional annotation on top of them.
Ideally, we would like to discover such categorical structure in the data automatically. Topic modeling techniques such as Latent Dirichlet Allocation (LDA)
TABLE II. A FEW EXAMPLE SENTENCES FROM PRESIDENT OBAMA’S IAMA AND THEIR CORRESPONDING SYNTAX FEATURES. SEE SECTION IV-D FOR DETAILS
[24] may be employed to discover such latent structure in the question dataset. Given a set of questions, such techniques will induce topics as probability distribution over words. Ultimately, each question is going to be represented in terms of such induced topics. We note that interpretability, i.e., coherence among questions which share a given topic with high weights, is of paramount importance here as all subsequent response-rate analysis are going to be hinged on the label or meaning of each topic. Unfortunately, as we shall see in Section V, topics induced by LDA don’t achieve the desired level of interpretability.
To overcome this limitation, we explore other latent factorization methods. Recently, Non-Negative Sparse Embedding (NNSE) [25], [26] has been proposed which tends to induce effective as well as interpretable embeddings. In order to apply NNSE to our question dataset, we first represent the data as a co-occurrence matrix X where rows correspond to questions and columns correspond to words. Each question is additionally augmented with word sense-restricted synsets from Wordnet. The effect after the synset extension from Wordnet can be seen in Table IV This extended co-occurrence matrix X is usually of very high dimension (e.g., 100k x 1m). We first reduce dimensionality of the matrix using sparse SVD. The number of dimensions in the SVD space is selected based on knee-plot analysis of eigenvalues obtained during SVD decomposition. The rank r approximation obtained from SVD is then factorized into two matrices using NNSE, which minimize the following objective.
TABLE IV. EFFECT OF EXTENSION USING WORDNET SYNSETS ON THE CO-OCCURENCE MATRIX. SEE SECTION IV-F
TABLE V. AVERAGE PRECISION (AP) GAINS FOR TEMPORAL AND REDUNDANCY FACTORS OVER A RANDOM BASELINE. SEE SECTION V-A3 FOR DETAILS.
where n is the number of questions, and k is the resulting number of latent factors induced by NNSE. We note that NNSE imposes non-negativity and sparsity penalty on the rows of matrix A. Though the objective represents a non-convex system, but when we solve for A with a fixed D (and vice versa) the loss function is convex. In such scenarios Alternating Minimization has been established to converge to a local optima [27], [25]. The solution for A is found with LARS implementation[28] of LASSO regression with non-negativity constrains; and D is found via gradient descent methods. The SPAMS package may be used for this optimization [29]. At the end of this process, represents the membership weight of question i belonging to latent factor j.
In this section, we evaluate impact of various factors discussed in Section IV on response rate of questions from different domains.
A. Is Response Rate Predictable?
Datasets: We experiment with four popular domains — actors, authors, director and politicians. These do-
TABLE VI. ROC AUC VALUES FOR A REGULARIZED LOGISTIC REGRESSION CLASSIFIER USING DIFFERENT FEATURES IN VARIOUS DOMAINS. FOR REFERENCE, PERFORMANCE OF A RANDOM BASELINE IS ALSO SHOWN. APART FROM LENGTH, ALL OTHER FEATURES IMPROVE PERFORMANCE OVER THE RANDOM BASELINE. SEE SECTION V-A. SEE SECTION V FOR DETAILS.
TABLE VII. FOUR RANDOMLY SELECTED LATENT FACTORS INDUCED EACH BY LDA AND NNSE, AND TOP RANKING QUESTIONS IN EACH SUCH FACTOR. THE MAIN PERCEIVED THEME OF EACH QUESTION IS HIGHLIGHTED IN BOLD MANUALLY. WE FIND THAT THE FACTORS INDUCED BY NNSE ARE USUALLY MUCH MORE INTERPRETABLE COMPARED TO LDA. BECAUSE OF THIS INTERPRETABILITY, WE USE SEMANTIC FACTORS INDUCED BY NNSE FOR ALL EXPERIMENTS IN THE PAPER. SEE SECTION V-B FOR DETAILS.
mains covered more than 110,000 questions, and only about 10% of them generated a response. Statistics of the IAmA datasets are presented in Table I.
Metric & Classifier: In order to measure response rate predictive power of a subset of factors, we train a and
regularized (i.e., elastic net) classifier using only those subset of factors. Hyperparameters of the classifier is tuned using over a development set using grid search. We use area under the receiver operating characteristics curve (ROC AUC) of the classifier on held out test data as our metric. This metric essentially measures how well the classifier ranks a randomly chosen positive question over a randomly chosen negative question. Please note that the dataset is highly skewed with significantly more negative questions than positive ones. This measure provides a balanced metric while accounting for the data skew.
Baselines: To evaluate the strength and decisiveness of our probable factors, we test our system against
TABLE VIII. AUTOMATICALLY INDUCED LATENT SEMANTIC FACTORS WITH HIGHEST AND LOWEST RESPONSE RATES IN MULTIPLE DOMAINS ARE SHOWN. BASE RESPONSE RATE FOR THE DOMAIN, AND THE RESPONSE RATE FOR EACH FACTOR IS SHOWN IN BRACKETS. TOP RANKING QUESTIONS IN EACH LATENT FACTOR ALONG WITH THE MOST FREQUENT N-GRAMS IN QUESTIONS BELONGING TO THE PARTICULAR LATENT FACTOR ARE ALSO SHOWN. WE POINT OUT THE INTERPRETABLE NATURE OF EACH SEMANTIC FACTOR (BASED ON HIGH-RANKING QUESTIONS ASSOCIATED WITH IT), WHICH ALLOWS US TO DRAW SAMPLE CONCLUSION AS FOLLOWS: WHILE ACTORS ARE UNWILLING TO ANSWER QUESTIONS RELATING TO THEIR FAVORITES OR REAL LIFE, AUTHORS ARE MORE WILLING TO ANSWER QUESTIONS RELATING TO SUPPORTING ASPIRING NEW AUTHORS. ABILITY TO DISCOVER SUCH INSIGHTS USING AN AUTOMATED PROCESS AND A NOVEL DATASET IS THE MAIN CONTRIBUTION OF THE PAPER. PLEASE SEE SECTION V-B FOR DETAILS.
the random and bag-of-words (BoW) baselines. In the Random Baseline, each question is randomly given one of the two labels — answered or not answered.
The bag of words model comprises of each and every word in the vocabulary as a feature, hence aggregating up to thousands of features for every questions. Due to the large number of features, this Unigram model performs reasonably well, but it doesn’t help us in answering our general question of — Which factors help a question get answered? – because the unigram features don’t generalize to the factors that we are interested in evaluating.
Experimental results comparing performance of the classifier with different features on multiple datasets are presented in Table VI. Based on this table, we discuss predictive capabilities of various factors below. Please refer to Section IV for description of the factors and how we computed them.
1) Orthographic Factors: From Table VI, we observe that the length of the questions (measured in terms of numbers of tokens in the question), the only orthographic factor feature we considered, plays practically no role in influencing response rate. This is evident from the fact that the classifier with length as the only feature achieves AUC of 0.51 on average across all four domains
compared to AUC of 0.5 of the random classifier.
2) Syntactic Factors: From Table VI, we clearly see that syntax-based features add very little little predictive power to the classifier (0.52 vs 0.50 of random). Though our syntax features are rigorous enough to capture the nuances of complexity (e.g., see Table II), but the responses to questions don’t heavily depend on the complexity of the sentence. We observed that combining syntax with orthographic features also didn’t increase predictive power.
3) Temporal Factors: We find that temporal features play a significant role in the response rate. This is evident from Table VI where the classifier with temporal factor features achieves a significantly higher AUC score of 0.66 compared to random 0.5. As we had hypothesized earlier, questions that are asked early tend to be replied more often than others.
In addition to classifier’s AUC score, we measured effect of temporal factors using Alternative Precision (AP) as well. For questions in a given domain, AP is computed over two ordering of the questions in that domain: (1) ordering of all questions based on the value of the temporal factor features; and (2) randomly shuffled question sequences. Percentage AP gains of the featurebased ranking over the random ranking (AP averaged over thousand trials) are summarized in Table V. From this, we observe the clear trend that temporal factor features significantly aid in response prediction, sometimes with gains as high as 218%. We think that the responder is initially exposed to far lesser number of questions compared to a situation in the middle or towards the end of the IAmA when the number of questions demanding his or her attention are huge.
4) Forum Factors: Redundancy: Our dataset consists of prominent celebrities, and they gain undeniably high attention among Reddit users. Due to large participation, the number of similar questions is high, as many users wish to know similar facts, preferences, likings and happenings. Redundancy comes out as one of the most promising factors in understanding questions that get answered. Examples of a few redundant questions are shown in Section IV-E.
The original, and genuine questions, which are identi-fied by our redundant factor feature, are heavily preferred over questions that are redundant and stale. This is established by the fact the classifier which accounts for redundancy achieves a significantly higher AUC score of 0.66 compared to the random baseline.
Relevance: Relevance of the question, with the post description by the celebrity responder, show only faint signals with the response rate. The description given by the celebrities is usually very short to capture the variety of questions. Hence we don’t see any meaningful dependencies between relevance and response rate (0.52 AUC).
Overall, with all the three forum features included, the classifier achieves an AUC score of 0.68.
5) Politeness: Politeness, a seemingly important cue for demystifying question qualities, surprisingly, didn’t come out as a strong predictor of response rate. In Table VI the classifier with politeness forum factor feature achieves an AUC score of only 0.52. We have observed that the Reddit culture is very informal, frank and open. Hence, making requests extra polite might not help while framing questions in such scenarios. Of all domains, politeness is most important in the case of prominent politicians.
6) Unigram: In addition to the factors mentioned above, we also experimented with the bag-of-words-based unigram model. As mentioned previously, in this case, each token of the question was added a feature. From Table VI, we observe that the unigram model achieves an AUC of 0.68 which is significantly better than the random baseline of 0.5. However, the Unigram model uses 13704 features (averaged across all four domains). It is encouraging to note that performance of this Unigram model with thousands of features is superseded by the classifier using only 4 form factor features (AUC 0.65 vs 0.68) in the response prediction task.
B. Do Induced Semantic Factors Help Discover Re- sponse Trends?
So far, we have tried to handcraft the seemingly most important factors but we can never account for patterns other than what we are looking for. In any large dataset as ours, creating an exhaustive set that can capture all such factors is humanly impossible. Also for each factor, we need to train a system that can well detect and measure it in an unknown question. In such scenarios, the need to automatically discover latent dimensions is essential. As mentioned in Section IV-F, we use LDA and NNSE to induce semantic factors present in the question dataset. First we shall present comparisons between interpretability of factors induced by these two methods. Subsequently, we shall measure the response predictive power of these induced semantic factors.
LDA vs NNSE: We reiterate that finding latent factors that are interpretable is not just a luxury but a bare necessity in our setting as we need to understand what kind of latent semantic factors play a role in maximizing response rate. For this, we compared the latent factors induced by LDA and NNSE, examples of which are in Table VII. In this table, four randomly selected latent factors induced each by LDA and NNSE are shown. Also, for each latent factor, top two most active questions in that dimension are shown. For easy reference, the main theme of each question is manually marked in bold. From this table, we observe that NNSE is able to produce much more interpretable latent semantic factors compared LDA. Such lack of interpretability in LDA topics was also observed in another prior work [17]. Given the interpretability advantage with NNSE, we use the latent factors induced by this method in subsequent analysis.
Having successfully induced interpretable semantic factors using NNSE which have good number of questions attached to them, we analyzed the dimensions of questions with extremely high and extremely low reply rates. Please note that such latent factors are induced separately for each domain. Experimental results comparing NNSE latent factors in three domains, overall response rate in the domain, response rate over questions in the factor, and examples of top questions in each such factor are shown in Table VIII. Based on this table, we list below a few trends. We point out that this analysis and trend recognition would have been impossible without the ability to automatically induce interpretable semantic factors.
1) Actors: We found that adulation techniques worked well in eliciting a response for actors: 15.88% response rate in Actor latent factor 524 in Table VIII compared to domain response rate of 5.19%. Based on the top questions in this factor, we can easily identify that this is a fan-related factor. Authors seem to reply more if the inquirer describes himself as a huge fan or if he expresses some liking for their movies and role. We also learnt that actors weren’t very comfortable when it came to questions diving into their non-camera life (Actor factor 880). Also many actors were evasive when asked about their favorite actors, movies, meals etc (Author factor 852).
2) Politicians: We observe that Politicians were prompt in clarifying all fund related issues pertaining to their campaigns (Politician factor 927 in Table VIII). Whereas not many politicians seemed to be happy in taking questions on wage rise and the job situations in the country (Politician factor 304).
3) Author: We observe that many users inquired authors about how they can pursue a career in writing, even more asked about writing advices. We found that such questions were generously replied: 36.53% response rate in factor 742 of the Author domain, compared to domain response rate of 17.62%. Also, authors answered a lot of questions that questioned about their ideas, thoughts and preferences (Author factor 136). However, they were a little less responsive when asked about inspiration (factor 4) or favorites (factor 118). This might be attributed to the fact that questions of these types are extremely frequently posed to authors, and due to the redundancy, they may answer only a few of them (please note that the response rate in these factors are not 0).
C. Summary of Results
From Section V-A, we observe that all our designed factors in conjunction beat the Random and the Bag-of-words baseline for all the domains. We also use far less features compared to the thousands of features in BoW (Unigram). This clearly demonstrates that we have arrived at a good mix of concise factors that are helpful in understanding response rate.
From Section V-B, we see that our technique was able to capture some hard to find semantic factors that resulted in high reply rates. This also allowed us to identify factors in questions that are scantily replied.
Question-Answering forms an integral part of our everyday communication. While some questions elicit a lot of responses, many others go unanswered. In this paper, we present a large-scale empirical analysis to identify factors underlying response-eliciting questions. To the best of our knowledge, this is the first such analysis of its kind. In particular, we focus on the Multiple Inquirers Single Responder(MISR) online setting where there are multiple users asking questions to a single responder, and where the responder has a choice to not answer any particular question. We used a novel dataset from the website Reddit.com, and considered several factors underlying questions, viz., orthographic, temporal, syntactic, and semantic. For semantic features, we used a sparse non-negative matrix factorization technique to automatically identify interpretable latent factors. Because of this automated analysis, we are able to observe a few interesting and non-trivial trends. For instance we observed that all the advice related questions were generously entertained by Authors, as long as they carried some context about their writing pursuits. Similarly Actors were keen on making people aware about the behind-the-scene events, whenever asked. These trends are hard to capture otherwise, as designing a system to detect such particular cases requires training over large annotated corpus.
As part of future work, we hope to explore other factorization techniques, e.g., hierarchical latent factors, for even more effective and interpretable latent factors. Additionally, we hope to use the insights gained in this study to explore how an existing question may be rewritten to elicit response from voluntary responders. We hope to make all the datasets and code publicly available upon publication of the paper.
This research is supported in part by gifts from Google Research and Accenture Technology Labs.
[1] C. Sammut and R. B. Banerji, “Learning concepts by asking questions,” Machine learning: An artificial intelligence approach, vol. 2, pp. 167–192, 1986.
[2] E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne, “Finding high-quality content in social media,” in Proceedings of the 2008 International Conference on Web Search and Data Mining. ACM, 2008, pp. 183–194.
[3] A. King, “Guiding knowledge construction in the classroom: Effects of teaching children how to question and how to explain,” American educational research journal, vol. 31, no. 2, pp. 338–368, 1994.
[4] J. Ross, “How to ask better questions,” Harvard Business Review Blogs. Viitattu, vol. 8, p. 2010, 2009.
[5] L. A. Adamic, J. Zhang, E. Bakshy, and M. S. Ackerman, “Knowledge sharing and yahoo answers: everyone knows something,” in Proceedings of the 17th international conference on World Wide Web. ACM, 2008, pp. 665–674.
[6] R. Paul and L. Elder, “Critical thinking: The art of socratic questioning,” Journal of Developmental Education, vol. 31, no. 1, p. 36, 2007.
[7] T. A. Carey and R. J. Mullan, “What is socratic questioning?” Psychotherapy: Theory, Research, Practice, Training, vol. 41, no. 3, p. 217, 2004.
[8] R. Paul and L. Elder, Thinker’s Guide to the Art of Socratic Questioning. Foundation Critical Thinking, 2006.
[9] L. W. Anderson, D. R. Krathwohl, and B. S. Bloom, A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. Allyn & Bacon, 2001.
[10] K. K. Nam, M. S. Ackerman, and L. A. Adamic, “Questions in, knowledge in?: a study of naver’s question answering community,” in Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 2009, pp. 779– 788.
[11] L. Elder and R. Paul, “The role of socratic questioning in thinking, teaching, and learning,” The Clearing House, vol. 71, no. 5, pp. 297–301, 1998.
[12] C. F. Newman, “Hypotheticals in cognitive psychotherapy: Creative questions, novel answers, and therapeutic change,” Journal of Cognitive Psychotherapy, vol. 14, no. 2, pp. 135– 147, 2000.
[13] S. Whittaker, L. Terveen, W. Hill, and L. Cherny, “The dynamics of mass interaction,” in From Usenet to CoWebs. Springer, 2003, pp. 79–91.
[14] A. Jaech, V. Zayats, H. Fang, M. Ostendorf, and H. Hajishirzi, “Talking to the crowd: What do people react to in online discussions?” arXiv preprint arXiv:1507.02205, 2015.
[15] J. Ouyang and K. McKeown, “Modeling reportable events as turning points in narrative,” in EMNLP, 20015.
[16] N. S. C. O. A. Ray and P. C. M. Homan, “An analysis of domestic abuse discourse on reddit,” anxiety, vol. 4183, p. 23300.
[17] T. Althoff, C. Danescu-Niculescu-Mizil, and D. Jurafsky, “How to ask for a favor: A case study on the success of altruistic requests,” ICWSM, 2014.
[18] J.-A. Tsang, “Brief report gratitude and prosocial behaviour: An experimental test of gratitude,” Cognition & Emotion, vol. 20, no. 1, pp. 138–148, 2006.
[19] M. Y. Bartlett and D. DeSteno, “Gratitude and prosocial behavior helping when it costs you,” Psychological science, vol. 17, no. 4, pp. 319–325, 2006.
[20] C. Danescu-Niculescu-Mizil, M. Sudhof, D. Jurafsky, J. Leskovec, and C. Potts, “A computational approach to politeness with application to social factors,” in ACL, 2013.
[21] D. Klein and C. D. Manning, “Accurate unlexicalized parsing,” in Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Association for Computational Linguistics, 2003, pp. 423–430.
[22] C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky, “The stanford corenlp natural language processing toolkit,” in Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2014, pp. 55–60.
[23] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014), vol. 12, pp. 1532–1543, 2014.
[24] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” the Journal of machine Learning research, vol. 3, pp. 993–1022, 2003.
[25] B. Murphy, P. P. Talukdar, and T. M. Mitchell, “Learning effective and interpretable semantic models using non-negative sparse embedding.” in COLING, 2012, pp. 1933–1950.
[26] A. Fyshe, L. Wehbe, P. P. Talukdar, B. Murphy, and T. M.
Mitchell, “A compositional and interpretable semantic space,” Proceedings of the NAACL-HLT, Denver, USA, 2015.
[27] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online learning for matrix factorization and sparse coding,” The Journal of Machine Learning Research, vol. 11, pp. 19–60, 2010.
[28] B. Efron, T. Hastie, I. Johnstone, R. Tibshirani et al., “Least angle regression,” The Annals of statistics, vol. 32, no. 2, pp. 407–499, 2004.
[29] F. Bach, J. Mairal, J. Ponce, and G. Sapiro, “Sparse coding and dictionary learning for image analysis,” in Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2010.