A Bifactor Approximation Algorithm for Cloudlet Placement in Edge Computing

A bifactor approximation algorithm (ACP) to solve the heterogeneous cloudlet placement problem to guarantee a bounded latency and placement cost, while fully mapping user applications to appropriate cloudlets.

We aim to efficiently place cloudlets to specific locations in a region to serve the demands of all the end devices (IoT) that require edge services. We model the region as a two-dimensional space (grid), where cloudlets and devices can exist. The devices could be at any point in the space. On the other hand, we assume only a set of candidate points within the grid are available where the cloudlets can be placed and the devices can be best served from.

Publication Information

IEEE Transactions on Parallel and Distributed Systems

Authors
- Dixit Bhatta
- Lena Mashayekhy

Approaches Implemented

IP and LP (solved using CPLEX library)
- OCP Cost - Optimal Cost Placement
- OCP Latency - Optimal Latency Placement
- LP Cost - LP Cost Placement
GACP - Genetic Algorithm Based Cloudlet Placement
ACP - Approximate Cloudlet Placement (our approach)

Important Classes

Core Classes: Cloudlet, CandidatePoint, EndDevice
- Extended Classes (for implemanting ACP algorithm): NewCloudlet, NewCandidatePoint, NewEndDevice
CPLEX Model: CplexCloudletPlacement, CplexLPCloudletPlacement
Genetic Algorithm: GeneticCloudletPlacement
Approximation Algorithm: ApproxLPRounding

Data Creation

Complete Dataset used in the experiments is available in datasets directory
Datasets can be reproduced by running classes in OpenDataToDataset
- Pass the integer argument (1-5) to getDevices() and getCandidates() methods to indicate different boroughs of NYC and generate respective base_device.csv and base_points.csv files.
- The samples can be generated by running DatasetSampling (Note: change the root and total_num variables to set the root path and number of rows in the base files. Also, change the outpath and filename depending on the dataset samples you want to create i.e., devices.csv or points.csv.)
- For each sample, you can generate cloudlet.csv and cost.csv by running CalcCandidateCosts by passing the path of directory containing sub-sample to setCloudletsAndCandidateCosts() method.
- Likewise, latencies.csv can be produced by setting the source directory in main() method of CalcLatency.
Please note that the datasets prodcued using random sampling might lead to slightly different results. Use of provided dataset will reproduce exact results.

Running the Code

Setup and Dependencies
- You can import the code as a standard Java Project into any IDE, or from command-line in a local diretory.
- The main depency required is cplex.jar available in external_lib directory. Add this as an external JAR to your project. CPLEX installation is also required for run. (The process differs depending on the IDE or command-line setup).
MainRunner
- The main method runs different approaches based on the first input argument (1-4) to run() method in the order below
  1. OCP Cost
  2. OCP Latency
  3. ACP
  4. GACP
- These have been broken down into individual methods for smooth running. No need to change or update the code.
Special runtime considerations
- OCP Latency and ACP can be run for all samples without any special changes to the code.
- OCP Cost and GACP need additional arguments to run across different samples due to the machine limitations and convergence behavior of the algorithms.
  - OCP Cost approach requires node limited solutions for larger instances since CPLEX cannot converge to optimal solutions for a very long time. For that, a Node Limited Solution is needed. The code is already configured with these node limits for OCP Cost run. No need to modify code for it. [More details in next section]
  - GACP requires coverage values less than 1.0 for complex or larger instances since it may not converge for a long time when full coverage is expected. This value needs to be adjusted for specific datasets. The code is already configured with these thresholds for GACP run. No need to modify code for it.[More details in next section]

Specific parameters for OCP Cost and GACP

OCP Node Limit Value
- Staten Island: Not Needed
- Bronx: 75,000
- Queens: 70,000
- Brooklyn: 30,000
- Manhattan: 5,500
GACP Coverage threshold Value
- Staten Island: 0.90 - 1.0 (For samples 1-30, in order: [1.0,1.0,0.90,0.95,0.95,0.95,0.95,0.95,0.95,1.0, \ 0.95,0.95,0.95,0.95,1.0,0.95,1.0,0.95,0.95,1.0, \ 0.95,1.0,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95])
- Bronx: 0.92 (for all)
- Queens: 0.90 (for all)
- Brooklyn: 0.87 (for all)
- Manhattan: 0.87 (for all)

Interpreting the Output

results_summary in each method for given approaches of MainRunner contains the results. It prints multiple lines to the console or log file where each row contains (in order): approach, cost, latency, total runtime and other relevant results.
The following results must be tabulated for all 30 samples for 5 boroughs and their mean value must be compared to our results.
- Cost
- Latency
Additonal results such as coverage and solution gap will be consistent if the cost and latency values are reproducible. There is no need to additonally check them.
The runtime depends on the machine where the code is run. The specifications used for running them are specified in the paper.
The GACP is fed with LP Cost and Number of Cloudlets to speed up runtime. The LP Time from ACP results must be included for corresponding run in GACP to find its total runtime.

Result plots

Results can be plotted using Python (Seaborn) scripts available in scripts.
Each file is descriptive in terms of output it is plotting. Some of them already contain the results data and you can simply run them to visualize results.

uclid / cloudletplacement_final