Online social networks have become an essential part of people’s life nowadays. Social networks users like to share information by publishing posts about daily activities, feelings, opinions, interests or goals. The posts vary in content type to include text, images, video clips, or even URLs. Even though discovering human intention from the social networks posts depends on the reader understanding, researchers attempted to extract the intention. The researchers acknowledged intention detection within social networks as a valuable source of information to understand human behaviour and needs as in customers of online businesses [1].
The importance of studying human intention is acknowledged in various research disciplines such as psychology, sociology, and computer science. Consequently, there are several definitions for the intention was used. To give an example, [2] defined human or ”an agent” intention as the state of mind at the time of taking action. In this paper, we adopt the following definition which was adapted from [3] : in a given system, user’s intention is represented in the form of the user’s goal of performing an action or a set of actions. A user’s action could be surfing a web page online, publishing posts over a social network, or making an online query over a search engine. This research focuses on users intention in posting over social network platforms. Detecting the users’ purposes and goals from their actions and instructions is known as Intent Mining [3,4]. The goal behind Intent Mining is to enhance the services that a system provides to users. Intent mining focuses on collecting and analysing the user’s preferences and data from the system through systems logs [5–7], browsing history, web search data [8–10] , or online social data [3,11–14]. System logs hold all system-user interactions as clicks or browsing history. Web page queries and link access also provide rich information about users. In social networks, users publish their desires, wishes, likes, and dislikes and share it with others. The intent can be either explicit or implicit for the readers. The intention can be expressed explicitly in the form of posts over social networks, or queries on web pages, etc. To give an example, a user searching for a restaurant in a particular area or posting ”heading to city centre for lunch!”, displays an explicit the intention for travelling somewhere, the ”city centre”, with a goal of having food. On the other hand, some users actions indicate implicitly user intention, from the forehead example, the user is looking for restaurants to have his lunch even though the ”restaurant” word is not mentioned in the post. Furthermore, extracting users’ intent using a machine with an accurate understanding of the users’ needs and goals from the system is a challenging process. For instance, the posts on social networks are not presented in a clear format for mining, as the language that used is usually informal, with abbreviations, misspelling, emot-icons, hash-tags, or even having multiple data formats as in images, audio, and videos. This paper is limited to social network data and microblogging social network posts in specific.
In this paper, we study the feature space to determine and identify the features that define intention over the online social network. We applied text mining mechanisms to extract user’s intention features from social media in general and microblogging in particular. Knime as a data mining tool is used to implement feature selection and classification techniques. Section 2 discusses the preliminaries and the previous studies on intention mining and online social networks, followed by, description of the dataset, data mining tools, and data corpus and how to retrieve the social data online in section 3. In addition, the applied schemes are described in section 4, followed by, discussion of the experiments and results in section 5. Finally, conclusion and future works are presented at the end of this paper.
Analyzing social networks to extract patterns from users’ data gives better understanding to the human behaviour in general and human intentions in specific. In this section, we look into intention mining literature, and review the social network microblogging.
2.1 Intention Mining
Intention mining has been an active research area in the recent years. A number of researchers showed interest in discovering individuals intention through studying the online behaviour and public data of those individuals. The users information is retrieved from web queries, systems logs, and social networks. In this paper, we are studying the features that define intention as a classifi-cation problem over social networks in general and feature selection in specific. Various data mining techniques and feature selection algorithms are found in the literature. The researchers in [15] studied the Web pages textual features that resulted from users Web searches and queries to predict the users commercial intention for her online activities. Their prediction technique was based on the Support Vector Machine(SVM) algorithm. Chen et al., in [16], worked on identifying users intents over forums posts based on Information Gain method given by [17] for feature selection and Expectation Maximization algorithm. Vineet et al. [10], also, worked on the linguistic features of the expressions over Yahoo!answers and Quora to extract user purchase intention. They used ’bag-of-words’ to extract features. Moreover, Ding et al [4] used word embedding feature selection technique as in Convolution Neural Network (CNN) to identify the consumption intent of users over micro-blogging in China. Their model of consumption intention was based on a binary classification of the posts to decide if the sentence contains consumption intention or not. Furthermore, Kim et al. [18] studied the travelling intention of the social network users. They built a textual features vector from the users shared information over social networks using word embedding techniques, and the researchers used classification algorithms such as Random Forest, SVM, and Deep Neural Networks(DNN) and Nave Base(NB) to validate the created vector. In Zhang et al. [9], applied neural network algorithms over text queries to capture user intention from online medical queries. Another approach, by [19], proposed using intent keywords instead of using bag-of-words to apply a graph based semi-supervised learning technique for mining user intent and classify the tweets into six categories. From literature, it becomes obvious that the problem intention-mining of the users of the online social network, in general, depends on extracting the textual features that defined the users intention over the online social network, and these features are used in predictive models that built using the classification techniques.
2.2 Microblogging
Many Platforms with dedicated user interfaces are used to enable users to access their related social networks easily, by using different technologies as in mobile devices. Consequently, social network platforms provide us with input, e.g. Twitter feeds, for our intention mining as in the followed sections. Microblogging platforms distinguish from other social network platforms in having a short text messaging that avoid any information overload. In addition, it differs in making all the content publicly available to the other users in the platform. It also differs from the common blogging by having a limited number of characters per post.
Taking for instance, it is one of the most well known micro-blogging services. users can follow other users in Twitter or can be followed with no need for any reciprocation. Twitter users get information about ”what are you doing or thinking” as tweets of their Twitter friends in real time [20]. One of the differences between Facebook and Twitter is that, Facebook used to help users to interact and communicate with their friends and family in the real world, while Twitter helps users to communicate with any person who shares the same interest. Twitter has both website and a related mobile application, and associated APIs that support applications’ programmers and enable them to develop new functions and services for the Twitter platform. Moreover, Twitter enables mobile device users to send new tweets to the Twitter web site not only through the mobile application, but also by short messaging service (SMS).
is another example on micro-blogging platforms which popularity comes from storytelling using gif images that users add to their posts make it more descriptive. It also supports multimedia as audio and videos. Tumblr users can share external contents from other sources such as articles or external URLs by adding them to their posts [21]. In this paper, our interest is in Twitter due to the wide availability and extensive use by English speaking users. In addition, its APIs are easy to use and supported by many data mining tools. There is a possibility to apply our further research on Tumblr in the future but not in this paper.
As our current research is focused on the micro-blogging feeds in English, we limit our consideration to those services that are primarily used by English speakers. However, the growth of micro-blogging services in a wide range of languages, e.g. Sina Weibo [22], makes the generalisation of our research reported here to other languages an interesting venture.
2.3 Feature Selection
Feature selection is the process of selecting a proper minimum features set from the overall features available. This process is achieved by taking out any irrelevant features and remove any redundancy [23]. This reduces the dimensionality of the data and increase the performance of executing the classification algorithms [23]. Feature selection algorithms can be categorized into supervised, unsupervised, and semi-supervised feature selection. Supervised feature selection methods can be categorized into wrapper models, filter models, and embedded models and hybrid models. The wrapper-based models generate the subsets of features using any one of the searching techniques and evaluates these subsets using the supervised classification algorithm in terms of accuracy. Wrapper method has some noticeable defects such as searching overhead, over-fitting, and increased runtime. The embedded based feature selection models use a part of the learning process of the supervised learning algorithm for feature selection. Embedded models offer less computational cost comparing to the wrapper models. Yet, embedded models suffer from poor generality. This embedded models are categorized into three namely pruning method, built-in mechanism, and regularization models. The filter feature selection models are independent of the supervised learning algorithm therefore consider more general and computationally cheaper comparing to the wrapper and the embedded approaches. Therefore, filter models are better in processing high-dimensional data rather than the wrapper and embedded methods [24]. Among the most representative algorithms of the filter model we have: Relief, Fisher score, Information Gain. The hybrid feature selection models are based on the combination of different approaches as filter and wrapper-based approach.The feature selection is considered an initial step in supervised data mining analysis, yet it is a challenging problem, especially for social post that are massive, noisy, and sometimes incomplete [25]. There is a need for feature selection algorithm that can deal with such data. In social networks, the elements with high-dimensional features are often linked together. Another difficult problem is how to integrate link information to guide feature selection [26]. In this paper, we applied hybrid approach to by using Information gain as filter feature selection model and wrapper approach since the problem is supervised learning problem.
The focus of our research is the social networks feeds; therefore, we need to have a dataset that hold enough information. The needed dataset considered as Big-data that need to be analysed using special tools. The tools should have different techniques for extracting the needed information in order to achieve our goal. Examples of the datasets that have been used in the literature are revised in this section. Similarly, key data mining tools are discussed.
3.1 Datasets
Twitter has been used as target of this paper. Several online datasets that are published by researchers that focus on the sentiment analysis but not much related to intention mining. [27–30] For example, Sentiment 140 dataset which is available for research purposes containing 1600000 records for training and 497 records for testing. The data was collected from Twitter in 2009 [27]. The dataset was used in WASSA shared task on emotion (EmoInt) and included in the
a Weka package for analyzing emotion and sentiment of English written tweets [28].
SemEval datasets are well-known datasets for carrying out number of semantic analysis tasks for text in a series called SemEval(Semantic Evolution). In the time of writing this paper the SemEval-2018 task 1 dataset was published for affect in tweets [29]. This dataset consists of 100 millions tweets ids. The tweets collected based on emotion-related words such as angry, annoyed, panic, happy, elated, surprised, etc [30]. However, the online datasets that used Twitter usually published in anonymised form. Indexes are used to hide the tweets and users, and that make the reuse of these datasets a complicated process. A reverse engineering process is needed in order to retrieve the original tweets which takes time and efforts, and in many cases the tweets could not be founded since they are deleted by the users. In our work, the original posts are required to be analysed for intention and processed for feature extraction. In addition, we intended to conduct our experiments in a control environment. Therefore, we crawled Twitter using Application Provider Interfaces (APIs) to build our own dataset.
3.2 Data Mining and Analysis Tools
Data mining tools such as Knime, Weka, Orange and RapidMiner, etc are used by researchers to study and analyze structured and unstructured data collections. For our work, we explored these state of art tools and evaluated them in relation to their GUI, ease of use, and supported algorithms.
(Konstanz Information Miner) is an open source workflow data mining platform based on Java and Eclipse platform. It works under different operating systems Windows, Linux, macOS [31]. In addition, it supports big data analysis with graphical user interface. Its visual interface gives the ability to access data and apply data transformation and it supports powerful predictive analytics [32]. Knime workflow consists of connected nodes or extensions [33]. Moreover, Knime supports integration of different data analytic tools such as R, Python scripting, Weka, and other third party applications such as Google Analytics. Furthermore, Knime provides nodes for connection to social media platform such as Twitter. This integration makes the use of Knime suitable for our work. Each Node in Knime takes a part in processing data before passing it to the following nodes through their connections. The data are stored in each node in a table format. The tables could be saved permanently at any point to be processed in a different format. Due to the expandability of Knime, new nodes can be added at any point to apply different kind of processing without the need to re-execute the previous nodes. Knime can be downloaded and used freely under an open source license (GPL) [33].
stands for (Waikato Environment for Knowledge Analysis), which is a free data mining tool based on Java. It is supported from different operating systems. It combines several tools of data preprocessing, machine learning, visualization, and feature selection. Weka user interface built of several components, which are accessed through an Explorer. In addition to access component-based knowledge flow interface and the command line. One of Weka’s features is the Experimenter component, which facilitates executing a systematic comparison on a collection of datasets and applying several machine-learning algorithms at once. Weka has GNU general public license, which make it free to install. However, Weka does not support connecting to non-Java based databases, and sometimes fails in reading CSV files [31].
is a data-mining tool that implemented in C++ and Python. This tool supports different operating systems. It has different data mining and machine learning algorithms. Python libraries should be installed in order to have this tool running smoothly. Different components are provided for data preprocessing, feature filtering, data modelling and evaluation, and visualisation. However, it has a limited reporting capability.
is a stand-alone application with user friendly interface that supports various operating systems. It works as an integrated platform and supports different machine learning and data mining techniques including text mining, with predictive analytics and business analytics. It adopts graphical ETL (Extraction-Transform-Load) process workflows. The simplicity use of Rapid Miner Studio can be seen in the drag and drop operations for the operators, setting parameters and combining operators. Moreover, it supports different data input and output file formats and can be connected to relational and non-relational databases. Yet, Rapid miner requires a knowledge of SQL and database handling. In this paper, we implemented our work using Knime due to the availability of the aforementioned characteristics.
Table 1 summarises our investigation in relation to the following factors: GUI, Ease of Use (EoU), Connectivity (C), supported Data Processing Algorithms (DPA), Machine Learning (ML) techniques available, the availability of Data Visualisation (DV), and Programming Languages (PL) supported.
3.3 Building Social Corpus From Twitter Post
Since we could not find an intention corpus for twitter post that serve our research, we built our own corpus SICorp and adopted the following formal representation of corpus intention classes:
Suppose a corpus R of n short statements documents set D , where D = , and a set C of predefined m target classes,
Table 1: Tools Analysis
There will be a prediction function f such that
which is indicate that an intention in the intention set C is presented in the record
in the corpus R.
The class set for the research is a text vector: . These words were selected as the initial seeds for retrieving social post.
Twitter APIs make collecting large number of tweets a relatively easy task since it supports different programming languages and data mining tools. In order to retrieve published posts, a certain condition or conditions should be set, such as term or terms included in the post, the user who created the post, the location of the user, or the language of the retrieved posts. In addition to the retrieved post, Twitter APIs return information such as the tweet id, publication date and time, author’s username and id, location, hashtags, number of retweets, number of followers and friends, the language and other data.
Twitter API was used through Knime tools to connect to Twitter and used to collect data; the connection requires API Key and Access Token. Certain search queries are used to retrieve the tweets. Our dataset is built with certain conditions. First, we retrieved all the tweets that contain any of the words that presented in the words vector with total number of 7000 tweets. Second, we filtered the tweets to be limited to the English ones, also, we removed all the advertisements posts. The number of tweets was reduced to 5896 in the English language. The number of tweets consider sufficient to conduct our exterminates on a small scale.
We assumed that each post represent a single sentence since the post cannot exceed 140 characters by Twitter platform rule. Based on this assumption, we considered the intention within the post to represent user’s intention and all the words of the post are leading to this intention. The posts are represented in a text format and labelled based on the intention word that introduced in the search vector. The algorithm divided posts into polarity class set as a target class set Y es, No, where Y es indicates the text contains intention words and No is the opposite. The class distribution over the dataset which is 3452 tweets labelled as Y es and 2444 No labelled tweets with difference of 17% for the Y es class.
Two schemes were conducted Schem1 and Schem2, each had two parts . The first part in both schemes is the selection of features from the dataset. In Schem1, we used machine-learning Information Gain (IG) algorithm feature selection. In Schem2, machine-learning algorithms are used back to back as a Hybrid feature selection. Both are followed by a supervised classification algorithm to classify the dataset based on the selected features using four machine-learning algorithms, which are Decision Tree, Naive Bayes, Support Vector Machine, and Feed-Forward Learner Neural Network.The accuracy of those classification algorithms reflects the quality of the feature selection techniques.
4.1 IG Feature Selection
In this part of scheme1, we focused on extracting the features of the posts that specify intention, and selecting the features by applying Information Gain (IG) Algorithm. IG is used as a machine learning technique to predict features according to the terms in text [17]. It measures the number of text features obtained for the category prediction by knowing the presences or absence of a term in a text. IG value for term t calculated as follows:
where stands for the set of the categories in the target space;
) stands for the probability of category occurs; P(t) is the probability of term t occur;P(t) represent the term t does not occur; where m is the total number of target classes. IG usage reduces the dimension of the features and speed up the clas-sification processes. Tweet vector is used to select the set of features that will be used for classifying tweets into two classes, Y es as if an intention exists in the text and No if there is no intention.
A Tweet Vector (V T) represents a word vector of terms in tweets space as binary values. The extracted feature vectors is constructed using Bag of Words model (BOW) and Term of Frequency (TF). BOW creates a vector of unigrams for the terms that exist in the text based on PoS tagging that done the preprocessing phase [34]. TF is used to calculate the frequency of the term in the text, terms considered as features to be extracted [17,35].
4.2 Hybrid Feature Selection
In Scheme2, a threshold of the features is set to specify the number of features that reach maximize score in the form of terms vector. The scheme is built into two parts. The first part is hybrid of feature selection based on two different algorithms. Starting by selecting the features based on IG Algorithm. IG algorithm extracted eighty two features as in Scheme1. The second phase of feature selection is applied using Forward-Feature Selection algorithm (FFS). FFS starts by building an empty set of features and adds one feature at time to the set and start evaluating. The algorithm depends on measuring the Leave-One-Out Cross Validation (LOOCV) error of the one-feature subset to find the best individual feature.Four algorithms FFS are applied to select features that are NB, SVM, ANN, and DT. FFS feature threshold is set on least number of features that give the maximum accuracy score which is in our case was different with each algorithm applied. Those features are expected to be the words that used in the Twitter feed. In the second part, four classification algorithms are used as DT, SVM, ANN, and NB to test the quality of the selection technique that used.
The experiments have been carried out based on the scheme described in section 4. The experiments were designed to test the possibility of mining the users’ intention by applying data mining techniques on the SICorp corpus.This section describes the setup of the experiments, followed by explaining how the data has been preprocessed due to the informality and noisy nature of the social posts. Furthermore, we analyse the results for each algorithm that has been used. We conclude with a critical analysis of the experiments results. We look at the feature selection using IG experiment that followed by one of the classification algorithms DT, ANN, NB, and SVM. Following sections, we look deeply in the performance of the classification algorithms (DT, ANN, NB, and SVM) after conducting the second experiment (the hybrid feature selection that used IG+DT, IG+ANN, IG+NB, and IG+SVM). Figure 1 illustrates the experiments framework.
Figure 1: Experiments Framework
5.1 Experiments Settings
The Knime 3.4 64-bit platform is used through all experiments’ phases. The used machine is operated by Windows 10 64-bit operating system. The machine processing power and memory are Intel i7 2.2GHz and 8 GB - 12 GB RAM respectively. The dataset is described in section 3-3.1. We start our experiments using 8 GB RAM machine. However, we faced many difficulties in running the experiments such as long execution time and OutOfMemory problems. These problems noticed when executing the machine learning algorithms for extracting features from 2836 features, consuming a massive portion of memory 6 GB -8 GB which lead to OutOfMemory problem. In addition, the same problem occurred when we used the whole set of features 2836 for classification part of the experiment, specially with SVM algorithm due to its computational complexity. increase the machine memory to 12 GB.
5.2 Preprocessing Data
The collected dataset from Twitter is noisy and difficult to analyse due to language informality, misspelling, emot-icons, and URLs, therefore, several preprocessing steps are considered. These preprocessing steps are presented using text mining techniques. In the first preprocessing step, (Preprocessing-1), all the URLs are removed from the text to eliminate any conflicts of having URLs. The URLs could be fall words with weight when building the word vectors on the following steps. In the second preprocessing, (Preprocessing-2), a set of different text filtering techniques is employed. These techniques are; Part of speech (PoS) tagging which used to tag each term based on its position in the sentence a noun, an adjective, verb, or adverb. Also, to remove all the punctuations symbols within the text and the stop words. The output of this step is filtered text of each tweet record prepared to apply feature extraction and classification algorithms.
5.3 Experiment 1: IG Feature Selection based Classifica-tion
Before applying the feature selection techniques we trained four well-known classification models, which are Decision Tree Classifier (DT), Support Vector Machine (SVM), Feedforward Learner Neural Network (ANN), and Naive Bayes (NB) for the whole sets of labelled tweets using all 2836 features, the results are shown in table 2. The goal of this step is to setup a benchmark to test the impact of feature selection on the classifiers performance. As noticed from the table2, when all the features are provided the highest measure in (F-measure) and accuracy metrics for training ANN as 86.73% and 84.07% respectively. While the NB produce the worst performance as 61.79% and 72.55% for both accuracy and (F-measure) metrics.
Information Gain (IG) uses the entropy to measure the uncertainty between text and target class with and without the features. This means the most important features to classify the tweets are used. It is widely used to extract features from text [36]. In the first scheme1 , IG feature reduction technique was applied which selected eighty-two features from 2836 features on the whole sets of collected tweets, see figure 2. Because IG algorithm calculates the mutual information ratio of the dataset, the selected features have the highest mutual information ratio, i.e. all the eighty-two terms information gain ratio is greater than zero (IG(t) > 0) while the rest of the 2754 features is equal to zero.
Table 2: Experiment result of using all features before applying the feature selection technique by a classifier algorithm, which are NB-Naive Bayes, SVMSupport Vector Machine, and ANN- Neural Network
Figure 2: The Output of Applying Information Gain
The eighty-two features are used to train classification models (DT), (SVM), (ANN) and (NB)for the labelled tweets. In the classification phase, with ten folds Cross-Validation setup of leave one out. Hence, for each fold of the cross-validation the algorithm is trained on all the items except one-instance. Although, feature extraction was not based on the context of the tweets, IG algorithm reduced the features significantly. In table 3, it is noticed that applying IG by itself to extract features provides the highest measure in (F-measure) metric for training DT as 86.14%. This is considered as the selected features hold enough information to give prediction.
The illustration of the accuracy for the learning classification models over the collected dataset is presented in figure 3. In the table the
Table 3: Experiment result of using 82 features selected by Information Gain as a feature selection technique followed by a classifier algorithm, which are NB-Naive Bayes, SVM-Support Vector Machine, and ANN- Neural Network
Figure 3: The accuracy of applying Information Gain feature selection in Experiment 1
5.4 Experiment 2: Hybrid Feature Selection based Clas-sification
5.4.1 Decision Tree (DT)
DT setup, which is based on C4.5 [37], for the experimentation was as follows:
• The quality measure is calculated based on the Gini Index as splitting technique, with no pruning [37].
• The minimum number of nodes is 2. The split point value is calculated according to the mean value of the two attribute values that separate two partitions. Working on eight cores to speed performance.
Since the decision trees learning method predicts the values of target variable by learning simple decision rules inferred from the data features, it resulted in a relative high outcome. It is robust to noisy data, and since it is a heuristic algorithm, that means a decision is obtained locally and does not guarantee to return the globally optimal solution.
Table 4: Experiment results of using Decision Tree classifier with eight features selected through the Hybrid feature selection
Applying two phases to reduce feature gives a relativity close results, even though, the features are reduced to eight. The reduction of the features reduces data processing time, yet, the accuracy is slightly less. By observing Table 4, almost the same accuracy values have been resulted for DT with very slight difference.
5.4.2 Naive Bayes (NB)
The basic NB classifier is used to decide the right class of the input data by referring to the highest probability values that calculated by the trainer classifier using the Bayes formula. The right class is represented by the class which has the highest probability value as Bayes classification rule states [38, 39]. The class is calculated as follows:
Where ) is the same for all the classes. For applying the NB for feature selection in the second experiment, the probability of the word feature
occurrence in a text document is independent of the word’s position and the occurrence of other words in the text document. So the probability of
) would be :
Where is the number of time that a word occurs in a document; and
number of the words in a document.
Applying NB classifier with eighty two features produced the lowest accuracy in the IG feature selection from the first experiment with an accuracy of 80.61% as shown in Table 3. However, the accuracy increased when the feature set reduced to eight to eleven features in the second experiment as shown in Table 5.
Table 5: Experiment results of using Naive Bayes classifier with the features selected through the Hybrid feature selection
5.4.3 Artificial Neural Network (ANN)
The ANN algorithm is used based on FeedForward Learning with two inner layers with 100 output units each, and learning rate of 0.1. XAVIER initialization weight strategy [40] is used with ReLU Activation Function. The number of training iteration is one. The optimization Algorithm used is Stochastic Gradient Descent(SGD). The loss function that used is Mean Squared Error. Applying the hybrid feature selection shows an improvement as ANN improves the accuracy into 82.23%, as seen in table 6.
Table 6: Experiment results of using FeedForward Neural Network classifier with the features selected through the Hybrid feature selection in experiment 2
5.4.4 Suport Vector Machine (SVM)
SVM which is based on LibSVM algorithm [41] has been used with overlapping penalty set to one, kernel used is Radial Basis Function (RBF) with Gamma equals to one. For SVM classification technique, the highest accuracy is reduced when applying hybrid comparing to IG as shown in tables 3 and 7.
Table 7: Experiment results of using Support Vector Machine classifier on the features selected through the Hybrid feature selection in experiment 2
5.5 Critical Analysis
The two experiments that aimed to predict the existence of intentions on the feeds that users post on social networks. The prediction techniques were based on text features.
In figure 3, the accuracy of the classification learning models is illustrated over the collected dataset, using IG algorithm as a feature selection technique produce higher accuracy for feature selection comparing to other techniques that are used in the second the experiment. The DT C4.5 produced the highest accuracy locally comparing to the other classification techniques followed by ANN, SVM, and NB.
In the second experiment, applying a hybrid feature selection to reduce the number of features did not show a significant difference in classification results. This means that with a minimum number of features it is possible to get the very close accuracy as using eighty-two features. The features were reduced significantly from eighty-two to up to eleven features.
By observing the tables 3, 4, 5, 6, and 7, a slight difference in accuracy values between the four different classifier algorithms is noticed from applying the different techniques of feature selection. Using IG by itself with all the features or combine it with Forward ANN with the reduced features did not show any difference in the accuracy for the DT classifier as shown in table3 and table 6. Moreover, applying NB and SVM as a second stage feature selection resulted in same accuracy and F-measure for DT classification, as shown in tables 4. Whereas, selecting ANN for the second stage showed a slight improvement in accuracy. Since the DT learning method predicts the values of target variable by learning simple decision rules inferred from the data features, it resulted in relative high outcome.
NB classifier showed an improvement from adding another feature selection stage and reducing the features as shown in table 5. In table 3, the accuracy of using NB classifier after applying IG as a single feature selection technique
Figure 4: The accuracy of the four classifier algorithms after using IG and NB feature selection
Figure 5: The accuracy of the four classifier algorithms after using IG and DT feature selection
was 80.61%. Whereas , this accuracy increased to 82.43% from using the hybrid feature selection in the second experiment as table 5 illustrates.
Table 7 shows that adding the ANN as second stage feature extraction gives a slight improvement in accuracy for applying SVM classification. However, applying the NB and SVM gives the same accuracy and F-measure but not for DT as a second stage. In addition, the accuracy from using the ANN as learner
Figure 6: The accuracy of the four classifier algorithms after using IG and SVM feature selection
Figure 7: The accuracy of the four classifier algorithms after using IG and ANN feature selection
classifier improved when the SVM is added to the feature selection procedure. Table 6, illustrates the measurements for using ANN classifier. Reducing the features gave an advantage in increasing the speed of processing the data.
From figure 4, we can conclude that NB classifier can be improved when having the two phase feature selection especially as NB, SVM, and ANN. The ANN classifier improved when adding a DT as a second phase of feature selection. Whereas, SVM showed better performance when adding the ANN to the feature selection technique.
However, The eighty-two features that were extracted using IG are not considered based on the context, which can be understood by a human reader. As most of the output results are considered high, still it does not reflect the context of the tweets in a way that represents all features of social users intention. Some factors have to be considered in the future for this work, such as the accuracy of labelling. Labelling phase was done through applying search for a certain string within the retrieved tweets text, in other words, by labelling any tweets that have phrases from the intention vector, see section 3, which is considered to hold an intention. Most of the previous studies made use of human-judge-labelling. Another factor is including more search words to retrieve the data from Twitter. More words patterns and terms are needed to be taken into consideration. Therefore, more experiments are needed to study the effect of applying different features from social network.
Social networks have gained great interest from researchers because it provides a mean to study the human behaviour from online daily activities. There have been number of studies that focused on detecting intention of computer systems users. In this paper, we have looked at some datasets that are available online and used by other researchers. However, we faced difficulty in dealing with this datasets, therefore, we worked on extracting our own dataset from Twitter as a microblogging example. Different data mining tools have been reviewed here, and each one has its advantage and disadvantage. We used Knime tool because of its implementation of text mining techniques and the ability to save the result in different formats. In addition, it supports different programming languages that can help in implementing our model. The dataset was preprocessed using text mining filter techniques to remove any unneeded data such as URLs or symbols. The resulted dataset then used in two experiments. Both experiments had two parts, namely feature selection and classification. The first experiment had one feature selection phase using Information Gain. The second had two phases feature selection as a hybrid using Information Gain with three other algorithms. In both experiments, feature selection was used in classification and the performance of the algorithms was critically reviewed.
[1] F. Khan, S. Borah, and A. Pradhan, “Mining consumption intent from social data : A survey mining consumption intent from social data : A survey,” International Conference on Computing & Communication (ICCC-2016), no. March, 2016.
[2] M. E. Bratman and J. D. Velleman, “Intention, Plans, and Practical Rea- son,” The Philosophical Review, vol. 100, no. 2, p. 277, Apr 1991.
[3] H. Purohit, G. Dong, V. Shalin, K. Thirunarayan, and A. Sheth, “Intent classification of short-text on social media,” in 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), Dec 2015, pp. 222–228.
[4] X. Ding, T. Liu, J. Duan, and J.-Y. Nie, “Mining user consumption in- tention from social media using domain adaptive convolutional neural network,” Proceedings of the 29th AAAI Conference on Artificial Intelligence , pp. 2389–2395, 2015.
[5] Z. Chen, F. Lin, H. Liu, Y. Liu, W.-Y. Ma, and L. Wenyin, “User inten- tionmodeling in web applications using data mining,” World Wide Web, vol. 5, no. 3, pp. 181–191, 2002.
[6] E. Horvitz, J. Breese, D. Heckerman, D. Hovel, and K. Rommelse, “The lumiere project: Bayesian user modeling for inferring the goals and needs of software users,” Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 256–265, 1998.
[7] G. Khodabandelou, C. Hug, and C. Salinesi, “Mining users’ intents from logs,” International Journal of Information System Modeling and Design, vol. 6, no. 2, pp. 43–71, 2015.
[8] L. Chen, D. Zhang, and L. Mark, “Understanding user intent in community question answering,” Ph.D. dissertation, University of London, 2014.
[9] S. Zhang and N. Wang, “Classification model for intent mining in per- sonal website based on support vector machine,” International Journal of Database Theory and Application, vol. 9, no. 1, pp. 145–152, Feb 2016.
[10] G. Vineet, V. Devesh, J. Harsh, K. Deepam, and K. Shweta, “Identifying purchase intent from social posts,” Proceedings of the 8th International Conference on Weblogs and Social Media (ICWSM 2014), pp. 180–186, 2014.
[11] H. M. Salaheldeen, “Detecting , Modeling , and Predicting User Temporal Intention in Social Media,” Ph.D. dissertation, OLD DOMINION UNIVERSITY, 2015.
[12] W. Chaouali, “Once a user, always a user: Enablers and inhibitors of continuance intention of mobile social networking sites,” Telematics and Informatics, vol. 33, no. 4, pp. 1022–1033, 2016.
[13] N. Banerjee, D. Chakraborty, A. Joshi, S. Mittal, A. Rai, and B. Ravindran, “Towards analyzing micro-blogs for detection and classification of real-time intentions.” in ICWSM, no. January, 2012, pp. 391–394.
[14] D. H. Park, Y. Fang, M. Liu, and C. Zhai, “Mobile app retrieval for social media users via inference of implicit intent in social media text,” in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 2016, pp. 959–968.
[15] H. K. Dai, L. Zhao, Z. Nie, J.-R. Wen, L. Wang, and Y. Li, “Detecting online commercial intention (oci),” Proceedings of the 15th international conference on World Wide Web, pp. 829–837, 2006.
[16] Z. Chen, B. Liu, M. Hsu, M. Castellanos, and R. Ghosh, “Identifying Inten- tion Posts in Discussion Forums,” Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, no. June, pp. 1041–1050, 2013.
[17] Y. Yang and J. O. Pedersen, “A comparative study on feature selection in text categorization,” Proceedings of the Fourteenth International Confer- , pp. 412–420, 1997.
[18] Z. M. Kim, Y.-s. Jeong, J. Hyeon, H. Oh, and H.-j. Choi, “Classifying travel-related intents in textual data,” International Journal of Computing, Communication and Instrumentation Engineering, vol. 3, no. 1, pp. 96–101, Jan 2016.
[19] J. Wang, G. Cong, X. Zhao, and X. Li, “Mining User Intents in Twitter: A Semi-Supervised Approach to Inferring Intent Categories for Tweets,” Pro- , pp. 339–345, 2015.
[20] H. Kwak, C. Lee, H. Park, and S. Moon, “What is twitter , a social net- work or a news media?” The International World Wide Web Conference Committee (IW3C2), pp. 1–10, 2010.
[21] S. Agarwal and A. Sureka, “Characterizing Linguistic Attributes for Au- tomatic Classification of Intent Based Racist/Radicalized Posts on Tumblr Micro-Blogging Website,” arXiv preprint arXiv:1701.04931, 2017.
[22] X. W. Zhao, Y. Guo, Y. He, H. Jiang, Y. Wu, and X. Li, “We know what you want to buy: A demographic-based system for product recommendation on microblogs,” Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1935– 1944, 2014.
[23] B. Xue, M. Zhang, W. N. Browne, and X. Yao, “A survey on evolution- ary computation approaches to feature selection,” IEEE Transactions on Evolutionary Computation, vol. 20, no. 4, pp. 606–626, 2016.
[24] D. Asir Antony Gnana Singh, S. Appavu alias Balamurugan KLN, and E. Jebamalar Leavline, “Literature Review on Feature Selection Methods for High-Dimensional Data,” International Journal of Computer Applications, vol. 136, no. 1, pp. 975–8887, 2016.
[25] J. Tang and H. Liu, “Feature selection for social media data,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 8, no. 4, pp. 19:1–19:27, Oct. 2014.
[26] J. Li, K. Cheng, S. Wang, F. Morstatter, R. P. Trevino, J. Tang, and H. Liu, “Feature selection: A data perspective,” Journal of Machine Learning Research, pp. 1–73, 2016.
[27] A. Go, R. Bhayani, and L. Huang, “Twitter Sentiment Classification using Distant Supervision,” Stanford, Tech. Rep., 2009.
[28] S. M. Mohammad and F. Bravo-Marquez, “Emotion Intensities in Tweets,” in In Proceedings of the Joint Conference on Lexical and Computational Semantics (*Sem), Vancouver, Canada, 2017, pp. 65–77.
[29] S. M. Mohammad, F. Bravo-Marquez, M. Salameh, and S. Kiritchenko, “Semeval-2018 Task 1: Affect in tweets,” in Proceedings of International Workshop on Semantic Evaluation (SemEval-2018), New Orleans, LA, USA, 2018.
[30] S. M. Mohammad and S. Kiritchenko, “Understanding emotions: A dataset of tweets to study interactions between affect categories,” in Proceedings of the 11th Edition of the Language Resources and Evaluation Conference, Miyazaki, Japan, 2018.
[31] K. Rangra and K. L. Bansal, “Comparative study of data mining tools,” International Journal of Advanced Research in Computer Science and Software Engineering, vol. 4, no. 6, pp. 2277–128, 2014.
[32] K. Gibert, M. S´anchez-Marr´e, and B. Sevilla, “Tools for environmental data mining and intelligent decision support,” in International Congress on Environmental Modelling and Software, 2012.
[33] M. R. Berthold, B. Wiswedel, and T. R. Gabriel, “Fuzzy logic in knime modules for approximate reasoning ,” International Journal of Computational Intelligence Systems, vol. 6, no. 2013, pp. 34–45, 2013.
[34] J. K. Rout, K. K. R. Choo, A. K. Dash, S. Bakshi, S. K. Jena, and K. L. Williams, “A model for sentiment and emotion analysis of unstructured social media text,” Electronic Commerce Research, vol. 18, no. 1, pp. 181– 199, 2018.
[35] S. Baccianella, A. Esuli, and F. Sebastiani, “Using micro-documents for feature selection: The case of ordinal text classification,” Expert Systems with Applications, vol. 40, no. 11, pp. 4687–4696, 2013.
[36] Y. Liu, J. W. Bi, and Z. P. Fan, “Multi-class sentiment classification: The experimental comparisons of feature selection and machine learning algorithms,” Expert Systems with Applications, vol. 80, pp. 323–339, 2017.
[37] J. Shafer, R. Agrawal, and M. Mehta, “Sprint: A scalable parallel classi er for data mining,” in Proc. 1996 Int. Conf. Very Large Data Bases, 1996, pp. 544–555.
[38] J. Chen, H. Huang, S. Tian, and Y. Qu, “Feature selection for text classi- fication with Na¨ıve Bayes,” Expert Systems with Applications, vol. 36, no. 3 PART 1, pp. 5432–5435, 2009.
[39] L. H. Lee and D. Isa, “Automatically computed document dependent weighting factor facility for Na¨ıve Bayes classification,” Expert Systems with Applications, vol. 37, no. 12, pp. 8471–8478, 2010.
[40] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” PMLR, vol. 9, pp. 249–256, 2010.
[41] C.-c. Chang and C.-j. Lin, “LIBSVM : A Library for Support Vector Ma- chines,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, pp. 1–39, 2013.