Clustering for Unsupervised Learning

Clustering algorithms have been widely studied in many scientific areas, such as data mining, knowledge discovery, bioinformatics and machine learning. More specifically, cluster analysis which is an essential technique for unsupervised learning, aims to find the underlying structure of a dataset following some given clustering criteria and specific properties of input data. We propose the seed-and-extension-based density peaks (SDP) algorithm which incorporates a new center selection strategy into the famous density peaks (DP) algorithm [Rodriguez and Laio, Science, 344(6191): 1492-1496, 2014]. In particular, SDP selects the centers that hold the features of their clusters while building a spanning forest, and meanwhile, constructs the output clusters in a seed-and-extension manner. SDP is more accurate than existing clustering approaches for a variety of types of datasets, including time-series data. In particular, we also build an Automated Machine Learning (AutoML) framework for clustering, which can smartly choose a proper algorithm and feature preprocessing steps for a new dataset at hand, and set their respective hyperparameters as well. We believe that SDP and the AutoML framework would be helpful to unsupervised learning as well as many real applications. (more details)

Online Algorithms & Incremental Learning

Machine learning aims at dealing with uncertainty and making predictions about the future scenarios in the unknown environment. The core idea of coping with such uncertainty, or decision making without knowing the information of future input is very similar to another well-studied field in theoretical algorithm design, i.e. online algorithms with competitive analysis. Specifically, an online algorithm receives an input sequence of items, one at a time. To process each input item one-by-one, an online algorithm must immediately generate an output or make a decision with no idea about the whole input from the beginning. More precisely, online algorithms are somehow more inclined to develop a machine learning model in which new training data is gradually input over time to continuously train the model and further extend the strength of the existing model. This is also similar to the so-called incremental learning model. Another similar direction is the online learning model, which is also a variant of traditional machine learning problems. Online learning algorithms mainly learn from a stream of past events and make a prediction for each element one by one in the stream. Then the algorithm receives immediate feedback on each prediction and uses this feedback to improve its accuracy of subsequent predictions. (more details)

Modern Route Planning

Modern technologies such as Global Positioning Systems (GPS) and mobile communication have contributed to the development of dynamic navigation planning based on real-time information. However, traffic conditions vary enormously and unpredictable accidents significantly affect planned routes, which increase the problem complexity, even though current navigation systems use information about road distances and speed limits to find the fastest routes. Therefore, online decision-making strategies play an important role in solving traffic congestion problems. We consider an old online route planning problem, called the Canadian Traveller Problem (CTP), which finds practical applications in designing dynamic navigation systems. We study several generalizations of the CTP and propose deterministic algorithms with theoretical competitive ratio. Furthermore, we present the first polynomial time randomized algorithm that surpasses the long-standing deterministic lower bound.

Recently, we also consider the electric vehicle (EV) routing problem that takes into consideration of possible battery charging or swapping operations. We develop efficient algorithms which can be implemented on an online EV routing map interface. The result demonstrates the effectiveness of the EV routes from both the theoretical and practical perspectives. Very recently, we extended route planning to sharing economy, i.e., the car-sharing problem (not only for cars, but also for other resources like bikes and shuttle-buses). For instance, there are several service stations in the city, in residential areas and downtown, at popular sightseeing spots, and so on. Customers can make a request with a pick-up time and place and a drop-off time and place. The decision for accepting or rejecting a request should be made in an online fashion, and the goal is to maximize the profit by accepting as many requests as possible. (more details)

Network Alignment for Recommendation

Nowadays online recommendation systems aim at offering a wide range of up-to-date information and providing personalized recommendations since people usually read news shortly. However, existing studies either recommend news from the same media platform or just bring multiple sources of information together without considering readers’ preferences. Sometimes the former may lead to the media bias problem or even the fake news spread.

Based on our previous network-alignment approach, we propose a novel cross-media news recommendation system, which can provide articles with diverse perspectives from different news media and preserve both document and topological information based on the inter- and intra-connections between news articles. Moreover, according to user browsing history, the recommendations made by the system can not only help readers understand their personalized interested news from a broader overview but mitigate the media bias problem. Empirical experiments with real users demonstrate the effectiveness of our system, and reveals the personalization and diversity of the recommendations.

On the other hand, in systems biology, a fundamental goal of biology is to understand the cell as a system of interacting components and especially, almost every biological process is mediated by a network of molecular interactions. In particular, there has been a considerable amount of research devoted to the discovery and exploration of interactions between proteins in the last decade. Since many cellular activities are a result of protein interactions, proteins often interact with other proteins to perform their functions, and form a complex biological system, i.e., a protein-protein interaction (PPI) network. This powerful way of representing and analyzing the vast corpus of PPI data describes the interaction relationship among proteins in a cell. Furthermore, knowledge about the topology of a PPI network in one organism can yield insights about not only the networks of similar organisms, but also the function of their components. Hence comparison between protein interaction networks is becoming central to systems biology. We have collaborated with the MIT team and developed global alignment algorithms for performing comparative analysis of multiple biological networks. (more details)

Selected Publication List:

(Full list)

1. Ming-Hao Tung, Yi-Ping Phoebe Chen, Chen-Yu Liu, Chung-Shou Liao. A Fast and More Accurate Seed-and-Extension Density-based Clustering Algorithm, IEEE Transactions on Knowledge and Data Engineering, published online, March 2022. DOI: 10.1109/TKDE.2022.3161117

2. Ya-Chun Liang, Chung-Shou Liao, Xinping Yi. Topological Interference Management with Adversarial Topology Perturbation: An Algorithmic Perspective, IEEE Transactions on Communications, published online, Oct. 2022. DOI: 10.1109/TCOMM.2022.3216632

3. Erik Demaine, Yamming Huang, Chung-Shou Liao, and Kunihiko Sadakane. Canadians Should Travel Randomly, in Proc. the 41st International Colloquium on Automata, Languages, and Programming (ICALP 2014), Copenhagen, Denmark. (The full version entitled Approximating the Canadian Traveller Problem with Online Randomization, is published in Algorithmica, Vol. 83, (2021) pp. 1524-1543.)

4. Chung-Shou Liao, Shang-Hung Lu, and Zuo-Jun Max Shen. The Electric Vehicle Touring Problem, Transportation Research Part B: Methodological, Vol. 86, (2016), pp. 163-180.

5. Chung-Shou Liao and D. T. Lee. Power domination in circular-arc graphs, Algorithmica, Vol. 65 No. 2 (2013) pp. 443-466.

6. Chung-Shou Liao, Kanghao Lu, Michael Baym, Rohit Singh, and Bonnie Berger. IsoRankN: Spectral methods for global alignment of multiple protein networks, in Proceedings of the 17th International Conference on Intelligent Systems for Molecular Biology (ISMB’09), Stockholm, Sweden (The full version is invited to be published in Bioinformatics, Vol 25 No. 12 (2009) pp. i253-i258).


Comments are closed.