In a social network, common characteristics of a set of nodes (location, interests, oc- cupation, etc.,) can be called a community. For law enforcement, a network drawn on the basis of call detail records (CDR) provide a wealth of information that can help to identify suspects, in that they can reveal details as to an individual’s relationships with associates, communication, behaviour patterns, location data can establish a commu- nity of suspects perpetrating a crime. Since, relational databases cannot scale with the hierarchical queries; direct querying is infeasible to detect ”network” patterns. The tra- ditional Community detection is essentially clustering (distance or similarity measures). Network data tends to be ”discrete”, leading to algorithms using the graph property directly (cliques and centrality) in order to detect communities.
In objective of this project is to identify a terrorist network whose community property being tightly knit sizes between 3 to 10, and uncovering calling patterns as identified by criminal psychologists. There are about 2 Billion Call Data Records and 200 Million unique contacts in the network. The approach is to intitially eliminate the non-suspects (Big data Processing) and later identifying the suspects (Machine learning).
In the era of Big Data where unstructured data is growing rapidly, natural language processing (NLP) is of great importance to understand and personalize various consumer services, implement different crime detection strategies. The objective of this project is to develop and implement core Text analysis algorithms : Sentiment Analysis and Entity Extraction. Sentiment analysis is used for inferring information such as opin- ion, mood, and emotion from Text using Text analytics techniques such as (Stemming, Lemmatization, Named Entity Recognition, Relation Extraction etc.). Entity extraction is a statistical technique to identify people, organizations, place names, symbols, certain abbreviations, and so on. These core algorithms are used in two commercial solutions - first, Mobile App Ranking and Scoring application utilizing the sentiment analysis of reviews and comments of Mobile Apps. The second application is to augment the social network analysis for crime detection using CDR data by the virtue of entity extraction and relation mining in order to augment the network property graphs.
As we know an app store is a digital distribution platform for mobile apps and it’s curated by their owners, through an approval process for all submitted apps. We have different mobile App stores for different platform’s applications or we can say same application for different mobile platform or store. Many new apps are being made available on the stores each day. The process of app discovery is led my methods employed by these stores apps, so aim of project is to recommend the best cross platform application on the basis of ”Ranking and Scoring Algorithms”.
Amongst the recent developments in the Computing, Mobile Technologies are enabling making information and data available to the users without location or temporal con- straints. Various business and personal productivity applications are being increasingly deployed on Mobile Platforms. As mobile devices proliferate, there will be an increased emphasis on serving the needs of the mobile user in diverse contexts and environments, which are being called Data products. Mobility also impose significant management challenges for IT organizations as they lose control of user endpoint devices. There is a need for a unified mobile deployment platform or mBaaS (i.e Mobile Backend as a Ser- vice) that can integrate various data products and serve new features to all the users of an enterprise. This mBaaS when deployed on the cloud can scale elastically depending on the workload characteristics.
Weather forecasting model is to predict the state of the atmosphere for a given location. The inputs from country-based weather services - surface observations from automated weather stations at ground level, over land and from weather buoys at sea. The observations measure features such as temperature, humidity, speed of the wind, pressure, snow volume etc., This model is build based on the past decade observations of various geographical locations which are captured for every 5 minute intervals from automated weather stations.
The Internet of Things (IoT) is an umbrella term used to describe a next step in the evolution of the Internet. While the first phase of the web can be thought of as a combination of an internet of hyper-text documents and an internet of applications (think blogs, online email, social sites, etc.), one of the next steps is an Internet of augmented ‘smart’ objects – or ‘things’ – being accessible to human beings and each other over network connections. This is the internet of Things. Underpinning the development of the Internet of Things is the ever increasing proliferation of networked devices in everyday usage. Such devices include laptops, smart phones, fridges, smart meters, RFIDs, etc. The number of devices in common usage is set to increase worldwide from the current level of 4.5 billion to 50 billion by 2050 and may even include human implants.
Product Price Comparison services have been evolving from the initial days of e- commerce (B2C). These comparison services were providing as many quantitative features such as price offers, ratings and reviews of a product/service in question. The purpose of this project is to give comparative analysis of products from different e- commerce sites. The main challenging part of this project is to collect the data of the products that are to be analyzed from different ecommerce sites amazon, flipkart, eBay using different techniques called web scraping and web crawling. All this collected data is stored using Hadoop. Using machine learning algorithms analysis on the stored product details is performed to draw the conclusions about the given input product. Sentimental analysis is the part of this project plays a important role. Data collected from the data sources is unstructured. We will use the lexicons like words, phrases and idioms for the sentiment analysis. Here we have to analyze the polarity good or bad or neutral for these some of the words which are extracted from reviews and will conclude number of positive negative and neutral reviews for the particular product .Where this analysis adds more weight-age to the product analysis
Understanding Consumer behavior has been a very successful use case in Retail and Marketing domains. However, extending this motif to assess the risk through modeling and detecting fraud detection is a new dimension for ensuring secure transactions made by a consumer. Traditionally a loan or approving an e-commerce transaction is based on customer’s probability of loan repayment or the existence of funds or ensuring that the transaction is genuine. In this Risk assessment solution, we will detect the worthiness of consumer by not only considering financial background, but also his social behavior. The consumer data is extracted from social media such as Facebook, Twitter, LinkedIn andGoogle+ etc. Analyzing extracted data, determine the behavior of consumer which gives helps to decide whether to take a risk for loan approval or not and reduce the fraud in e- commerce.
The use of machine learning in real world has huge potential of applications for various use cases across domains such as Retail ( Segmentation), Finance and Banking (Risk and Fraud Detection), Telecom (Churn), Insurance (Claim Prediction) , Healthcare (Cancer Prediction) etc., These domains have various solution themes such as Anomaly Detection, Ranking/Scoring, Segmentation, Forecasting, Prediction and Association. As a part of the project, we will also implement various munging and data profiling techniques on the datasets such as Outlier detection, missing value treatment, Cast and melt, merge, joins etc., since they become essential for dealing with data sets that are gathered from various sources.
Once the Machine Learning (ML) solution is modeled and data is prepared, the Model needs to be evaluated to know whether it will do a good job of predicting the target on new and future data. Since, the future instances have unknown target values, there should be some metrics for accuracy. These measures are applicable for Classification and Regression Models. Evaluating a model includes various tasks viz. Test Harness, Performance Measure, Cross Validation etc. Parameters on which the outcomes can be compared are - Confusion Matrix, Learning curve, Lift chart etc., A Data Scientist will have to make decisions at 2 levels in determining a ML solution - The Algorithms that is best fit for a specific condition and the Second is to determine the fine-tuned parameter that yields the most accurate result. It is often, a Data Scientist who runs a Model multiple times to evaluate to arrive at an optimal solution. In this project, we will automate the runs to eliminate the cumbersome work in identifying the best fit solution for a given case.
The use of real-time analytics in real world has huge potential in various use cases such as Applications (intrusion detection, fraud detection, log processing), Sensors (malfunction detection, dynamic process optimization, Supply chain planning), Web (site analytics, recommendations, sentiment analysis), Mobile applications (network analysis, locational analytics and ad promotions). These applications have various solution requirements related to performance, latency, persistence goals as well as ability to handle different workloads. We will consider implementing various real time frameworks such as flume, storm, kafka and spark for achieving different solution requirements. The goal of this implementation is to assess and build applications that are reliable and robust, at the same time, minimalist.
Classification and Regression are two forms of Predictive Data analysis techniques belong Supervised machine Learning. Classification is a segmentation technique to label a set of objects; Regression is a used predicts future data trends. In this work we find various Use cases that are representative for processing the data collected. Depending on the goal of predictive models several algorithmic approaches can be used interchangeably. Therefore comparisons of various predictive learning models are done to identify a better model for particular trend with best accuracy.