Abstract
Expecting a shipment of 1 billion Android devices in 2017, cyber criminals have naturally extended
their vicious activities towards Google’s mobile operating system: threat researchers are reporting an
alarming increase of detected Android malware from 2014 to 2015. In order to have some control over
the estimated 700 new Android applications that are being released every day, there is need for a form
of automated analysis to quickly detect and isolate new malware instances.
Android is an open source Linux-based mobile operating system distributed by Google. According
to the latest statistics, android powers hundreds of thousands mobile devices over 190 countries [26].
Google Play [51] is the official android centralized market place maintained by Google, where any
independent application developer can submit his/her android app and make it available to the users.
The growing popularity of this android ecosystem also is becoming a worthy target for security and
privacy violations. Highly sensitive and confidential information such as text messages, private and
business contacts, calendar data, etc may be leaked through an application. Sensors such as GPS present
in the phones allow applications to provide context-sensitive user experience, they also create additional
privacy concerns it can exploit the data for tracking or monitoring. Apart from these issues, smart
phones are also susceptible to various malware threats such as viruses, Trojan horses, worms, etc. [50]
Android security model relies highly on permission-based mechanism. There are about 130 per-
missions that govern access to different resources. Whenever an user tries to install a new application,
he/she is prompted to approve or reject all the permissions requested by the application. The application
will be installed only after the user accepts all the necessary permissions requested by it.
In this work, we use the permissions and api level information from the apps as the features to
detect malicious applications. Further we observe that, android store [51] defines a category for every
published application. We have done extensive studies and discovered that, certain categories are highly
prone to malicious acts compared to other categories. We explicitly incorporate this information in our
model and learn a naive bayes classifier for each category using the features that encode information
about permissions and api calls. Given a new test application with a known category, we apply an
appropriate classifier to detect if the application is malicious. We created a large data set of android
applications and achieve an improvement of 3 4% by incorporating category level information.
Secondly, we combine the association rule mining and classification rule mining techniques to build
a classifier. The integration is done by focusing on mining a special subset of association rules, called
class association rules (CARs). To select the best features that distinguish between malware from benign apps, we rely on API level information within the bytecode since it conveys substantial semantics about
the apps behaviour. More specifically, we focus on critical API calls and their package level information.
Rather than simply treating the individual api calls as items, we represent an item as a combination
of caller and callee api. We capture one level of control flow and context between caller and callee. Each
item in our model is of the form A%B, where A is the caller and B is the callee. We use Androguard [8],
a reverse engineering tool to perform API level feature extraction and data flow analysis. In summary,
• combining association rule mining and classification rule mining for Android malware detection.
• We achieved a detection rate of 85% over the baseline classifier of 0.69%.