IndexFiguresTables |
Eun-bi Shin♦ and Yeong Min Jang°Implementation of Multiple Transmitter Tracking Mechanism Based on DeepSORT for 2D MIMO Optical Camera CommunicationsAbstract: This paper presents an enhanced 2D MIMO Optical Camera Communication (OCC) system that integrates a multi-transmitter tracking mechanism enhanced by the DeepSORT algorithm to address real-time communication challenges in dynamic environments. The system employs YOLOv11 for accurate LED transmitter detection and a deep learning-based decoder to improve decoding efficiency, resulting in a lower Bit Error Rate compared to conventional methods. The DeepSORT algorithm enables the simultaneous tracking of up to 10 LED transmitters with high accuracy, reaching 99.6% multiple object tracking accuracy and 99.3% multiple object tracking precision. Experimental results demonstrate a throughput of up to 15.360 kbps for a 16x16 LED matrix, which showcases the system’s scalability and suitability for IoT and smart city applications. In addition, the deep learning-based decoder improves the signal-to-noise ratio under challenging lighting conditions, which further enhances system robustness. These advancements highlight the system’s potential for stable, high-performance communication in dynamic and noisy environments and offer a promising solution for future optical wireless communication technologies Keywords: 2D MIMO , Optical Camera Communication (OCC) , YOLOv11 , DeepSORT , Multi-Transmitter Tracking Ⅰ. IntroductionThe evolution of 6G wireless communication promises to deliver seamless connectivity, ultra-low latency, and high data rates, which enables transformative applications in artificial intelligence (AI), autonomous systems, and immersive technologies. However, this progress also introduces significant challenges, including device compatibility problems and the demand for secure, reliable, and energy- efficient solutions. Concurrently, the rapid expansion of the Internet of Things (IoT) has led to the proliferation of interconnected devices that exchange real-time data. While this connectivity drives innovation, it raises concerns about the potential health risks associated with prolonged exposure to radio frequency (RF) radiation. Studies have linked RF radiation to adverse effects on brain function and sleep quality, even within the limits of exposure regulations[1]. In response to these challenges, researchers have explored alternative communication technologies that offer more safety and efficiency. Optical Wireless Communication (OWC), which utilizes visible light for data transmission, has emerged as a potential solution. Key OWC technologies, such as Optical Camera Communication (OCC), Visible Light Communication (VLC), Free-Space Optical (FSO), and Light Fidelity (Li-Fi), have shown significant benefits for specific applications. OCC systems generally use LEDs as transmitters and image sensors, such as global-shutter or rolling-shutter cameras, as receivers. Among these, OCC is particularly notable for its low cost, high data capacity, and good visibility, which makes it very suitable for intelligent transportation and IoT applications[2]. In addition, AI has become an essential enabler for various OWC applications, primarily through computer vision, which aids in the detection of light sources from transmitters. Models such as Convolutional Neural Networks (CNNs), Region-Based CNNs (R-CNNs), and You Only Look Once (YOLO) have been successfully used for light-source detection and classification. However, the implementation of OCC systems faces significant challenges, particularly when it comes to real-time tracking of multiple transmitters under dynamic conditions with optical disturbances[3]. In order to overcome these challenges, Multiple Object Tracking (MOT), an AI technique that integrates detection and object association across consecutive video frames, has been proposed. In a Multiple-Input Multiple-Output (MIMO) based OCC system, MOT plays a critical role in ensuring communication stability and addressing problems such as object movement, occlusion, overlapping, and variations in lighting conditions. Specifically, the 2D-MIMO configuration enables multiple simultaneous transmissions by arranging LED transmitters in a spatial grid, allowing the system to transmit more data in parallel. This structure improves spatial multiplexing and enhances robustness, making the OCC system more resilient to environmental disturbances and dynamic scenarios. This study enhances the 2D-MIMO-based OCC system by implementing the YOLOv11 algorithm for real-time LED transmitter detection and a deep learning- based decoder to improve decoding efficiency and reduce bit error rates. It supports commercially available cameras and manages frame rate fluctuations with a tagging mechanism. Furthermore, a Deep Simple Online and Realtime Tracking (DeepSORT) based multi-transmitter tracking mechanism tracks up to ten LED transmitters in dynamic environments, which improves system reliability for IoT and smart city applications while ensuring robustness against signal overlap, obstructions, and fast transmitter movement. Ⅱ. Related ResearchAs introduced in Section I, OWC offers a promising alternative to RF communication and provides immunity to electromagnetic interference (EMI)[4]. Among OWC techniques, OCC utilizes LEDs as transmitters and image sensors (e.g., global or rolling shutter cameras) as receivers. This configuration provides a low-cost, high-capacity transmission solution with advantages such as high data rates, cost efficiency, and visibility. These benefits make it particularly suitable for intelligent transportation systems and IoT networks[5]. However, the implementation of OCC faces challenges such as environmental disturbances, variable lighting conditions, and the need for real-time tracking of multiple LED transmitters over large areas[6]. In order to overcome these challenges, MOT has been proposed as an enhancement for MIMO-based OCC systems. MOT improves communication stability by enhancing object detection and association, particularly in dynamic environments with challenging conditions like occlusion and overlapping transmitters. YOLO-based models have shown effectiveness in tracking LED transmitters under varying conditions[7]. However, as the number of tracked objects increases and their motion patterns become more complex, maintaining consistent identity associations becomes a significant challenge. Conventional models like YOLO and CNNs often struggle with this problem― especially when transmitters overlap or are occluded[8], e.g., in scenarios where multiple transmitters overlap or become visually obstructed. The DeepSORT algorithm addresses these limitations by incorporating appearance-based embeddings, which enables it to maintain object identity despite occlusion or visual obstruction[9]. This algorithm’s ability to handle dynamic environments (including fluctuating camera frame rates and real-time data) makes it especially suitable for OCC systems that use 2D MIMO technology. DeepSORT can track up to ten LED transmitters simultaneously, which is essential for smart city and IoT applications that involve multiple devices that communicate via optical signals. The algorithm’s resilience to signal overlap and rapid transmitter movement ensures stable communication, even when the transmitters are in close proximity or motion. Furthermore, the integration of deep learning-based decoders into OCC systems has been shown to improve decoding efficiency and reduce bit error rates, which ensures robust transmission despite fluctuations in lighting and other environmental disturbances. This integration makes OCC systems more reliable, especially in applications where high data transmission reliability is critical. Ⅲ. Proposed Methods3.1 System Design of 2D-MIMO OCCThe proposed 2D-MIMO OCC system is designed to overcome the challenges associated with real-time communication in IoT applications, such as environmental monitoring. The system combines a camera as the receiver and LEDs as transmitters, with On-Off Keying (OOK) modulation to encode data. Bit ‘1’ is represented by the presence of light, while bit’ 0’ is encoded by the absence of light[10]. In order to ensure reliable data transmission without the need for complex hardware, error correction can be achieved using a Hamming (11/15) code. The system architecture is enhanced with AI to facilitate both light-source tracking and data decoding. This approach enables real-time operation under dynamic and noisy conditions. The design is capable of processing data efficiently, even in environments with fluctuating lighting and visual disturbances, as demonstrated in Figure 1. The proposed system utilizes a combination of hardware and software components optimized for real- time processing. The detailed specifications of the experimental setup are summarized in Table 1. Table 1. Hardware and software environment.
3.2 Data Modulation and DecodingOOK modulation is employed in the 2D-MIMO OCC system due to its simplicity and energy efficiency, both of which are essential for IoT applications. A deep learning-based decoder is used on the receiver side to enhance decoding efficiency in noisy environments and over long distances. The deep learning-based decoder is trained on a dataset of 7,000 samples, which covers diverse conditions such as varying transmission distances (up to 22 meters) and different indoor and outdoor settings. The training process employs a three-layer neural network structure to prevent overfitting. The deep learning-based decoder predicts the intensity threshold required to differentiate between bit ‘0' and bit '1' – see Figure 2. In addition to the practical implementation, the theoretical Bit Error Rate (BER) for the proposed OOK modulation under an Additive White Gaussian Noise (AWGN) channel is formulated to analytically evaluate the system performance. The probability of error is given by Equation (1):
where [TeX:] $$E_b$$ represents the bit energy and [TeX:] $$\sigma^2$$ is the noise variance of the system. This Equation (1) provides a baseline for assessing the communication reliability under various noise conditions. The experimental evaluation and comparison with this theoretical BER model are discussed in the next section. This prediction method improves system performance by maintaining the signal quality, even in environments with high optical interference. The incorporation of deep learning enhances the system's robustness, which is evidenced by the improved signal- to-noise ratio (SNR) in challenging conditions 3.3 Data SynchronizationEnsuring synchronization between the camera's frame rate and the LED transmitters transmission rate is a primary challenge in OCC systems. When the camera’s frame rate exceeds the LED transmitter transmission rate, oversampling can occur, which generates both redundant data and increasing computational load. Conversely, undersampling may result in data loss if the camera frame rate is too low. To address this, the system uses Sequence Numbers (SN) to uniquely identify each data packet. This approach enables comparison algorithms to remove redundant data and restore lost packets. The transmitter-side arrangement of data, including the LED matrix zones and the distribution of bits within each data packet, is illustrated in Figure 3. 3.4 LED Transmitter Detection and TrackingIn the proposed system, LED transmitter detection and tracking are critical to maintain accurate communication in dynamic environments. CNNs, such as YOLO, are used to detect LED transmitters in real time. However, the main challenge lies in ensuring continuity of object tracking across frames, especially when dealing with occlusions, rapid movement, or changes in lighting conditions. The system employs the DeepSORT algorithm to address these challenges. As an extension of the Simple Online and Realtime Tracking (SORT) algorithm[11], DeepSORT incorporates deep learning to develop an appearance- based association metric that significantly improves object tracking performance in dynamic and complex environments[12]. This metric helps DeepSORT to manage lost or occluded objects while preserving their identities across frames. A schematic of the DeepSORT mechanism is illustrated in Figure 4, which shows how object tracking is performed, starting from object detection via YOLO and followed by frame-to-frame comparison using Kalman filtering. The association process then applies several metrics, including Mahalanobis distance and Deep Appearance Descriptors, to ensure accurate multi- object tracking―even in challenging scenarios such as overlapping or fast-moving transmitters. Ⅳ. Experiment and Results4.1 Performance Evaluation of the 2D MIMO OCC SystemThe experiment was conducted using a Logitech BRIO Ultra HD Pro Webcam with 60 Frames Per Second (FPS) to evaluate the performance of the 2D-MIMO system across different frame rates. LED transmitter detection and tracking were performed using YOLOv11, which detects lost data packets and combines them based on SN to ensure accurate decoding. The system configuration is shown in Figure 5. The experimental results indicate the transmitter's amplitude intensity at a distance of four meters with an exposure time of 300 microseconds, which varies depending on the environment and the speed of the light source. The transmitter uses a 2D-MIMO encoder (controlled by Arduino and an LED matrix), while tracking and decoding are implemented using Python 3.11. under a transmitter mobility of 3 m/s, the throughput reaches 15.360 kbps for a 16 × 16 LED matrix and 3.840 kbps for an 8 × 8 LED matrix, with throughput improvement achievable through matrix-size adjustments - see Table 2. Table 2. Key parameters used in the proposed scheme.
The BER assessment was performed at different distances and exposure times. At a distance of ten meters and an exposure time of 100 μs, the deep learning method achieved a lower BER compared to the BER of 10-2 obtained from conventional methods – see Figure 6. This result confirms the system’s performance improvement, especially in high-noise conditions and scenarios involving mobility. The application of Hamming (11/15) channel coding significantly reduced bit errors, which improves both range and transmission reliability. Moreover, the optimization of the exposure time is crucial to maintaining signal quality, considering its impact on bandwidth and signal noise. To further validate the system's generalization capability, the BER evaluation was systematically conducted under various conditions, including different transmission distances ranging from 0 to 25 meters, dynamic transmitter movements, and optical disturbances. As shown in Figure 6, the deep learning- based decoder consistently achieved lower BER values compared to the conventional decoder in all evaluated scenarios. These results confirm that the proposed method maintains superior performance across diverse practical conditions Fig. 6. BER analysis of an 8×8 LED matrix at different distances, transmitter mobility of 3 m/s, and 100 μs exposure. ![]() 4.2 LED Transmitter Detection ResultYOLOv11, a recent upgrade of the YOLO model, integrates bounding-box regression and object classification into a single forward pass for efficient real- time detection. This version introduces several architectural enhancements, including the C3k2 block for improved feature extraction, the Spatial Pyramid Pooling Fast (SPPF) module for faster computation, and the Channel-Position Self-Attention (C2PSA) module to boost attention to critical spatial areas. YOLOv11 is available in five sizes to meet various computational needs. Table 3 shows significant improvements in mAP, speed, and computational efficiency, with YOLOv11n providing the best balance between efficiency and low latency. These numbers make it ideal for LED transmitter detection systems in optical communication. Table 3. YOLOv11 performance comparison for LED transmitter detection.
Table 4 shows that YOLOv11n achieves a higher FPS than YOLOv10n when tested with 730 frames of video. This confirms that YOLOv11 is more efficient in terms of processing speed, which makes it more suitable for real-time applications in optical communication. Table 4. YOLOv10 vs YOLOv11 FPS comparison.
YOLOv11n was selected for our OCC system implementation due to its computational efficiency and low latency. The model was trained on an NVIDIA GeForce RTX 3050 for 30 epochs. This approach resulted in rapid convergence and high accuracy in detecting LED transmitters under varying lighting conditions and distances. 4.3 LED Transmitter-Tracking Results in the 2D-MIMO OCCThis section compares two main tracking algorithms, SORT and DeepSORT, that were tested within the 2D MIMO OCC system. The evaluation was performed using a dataset consisting of 600 frames, each containing ten dynamically moving LED transmitters – see Figure 7. This evaluation aimed to identify the most effective algorithm based on tracking metrics such as accuracy, speed, robustness to occlusions, and the ability to handle dynamic LED transmitter movement accurately. The experiment used a dataset of 600 frames with a ten-LED matrix configuration. This dataset was designed to assess the tracking algorithm's capability to handle conditions such as varying distance, occlusions, and dynamic LED transmitter movement. The algorithms were evaluated based on several key metrics. Multiple Object Tracking Accuracy (MOTA) measures tracking accuracy by considering false positives [TeX:] $$\left(F P_t\right),$$ false negatives [TeX:] $$\left(F N_t\right),$$ and identity switches [TeX:] $$\left(IDS_t\right).$$ This is calculated using Equation (2), where MOTA is given by:
Here, GT represents the number of objects in the ground truth. Multiple Object Tracking Precision (MOTP) measures the accuracy of object localization using Inter-section over Union (IoU), and is calculated as the average dissimilarity between the predicted and ground truth bounding boxes across all matches, as shown in Equation (3):
where [TeX:] $$d_{t,i}$$ is the distance (e.g., 1−IoU) between matched ground truth and predicted bounding boxes for object i in frame t, and [TeX:] $$c_t$$ is the number of matches in frame t. Identity Switches (IDs) assess errors in assigning object identities, while MT (Mostly Tracked) and ML (Mostly Lost) evaluate the completeness of tracking and the algorithm's ability to handle object loss. Finally, FPS is a critical metric that measures the algorithm's processing speed, which is essential for real-time applications. The results of the evaluation are shown in Table 5, which compares the performance of SORT and DeepSORT with respect to tracking ten LED transmitters. Table 5. Performance comparison between SORT and DeepSORT.
DeepSORT demonstrates superior performance compared to SORT in terms of tracking accuracy (MOTA 99.6%) and precision (MOTP 99.3%). Although SORT was slightly faster (FPS 29.05), DeepSORT excelled in accuracy and robustness against occlusions and dynamic movement. The OCC system implemented in this study successfully tracked all ten LED transmitters at a distance of two meters despite the LED transmitters moving dynamically within the field of view. DeepSORT has proven to be more reliable in maintaining object identity, even when facing temporary occlusions, and offers better resilience than SORT―especially in scenarios with fast movement or occlusions. Table 6 presents a comparison between DeepSORT and SORT in terms of average processing times, including preprocessing, inference, postprocessing, and total time per frame. From the comparison, it can be seen that DeepSORT requires slightly more total time per frame (20.92 ms) compared to SORT (19.28 ms). Although DeepSORT is slightly slower in inference time, SORT requires more time in the preprocessing stage. Table 6. Algorithm complexity comparison.
DeepSORT integrates bounding box association with a confidence score to enhance both tracking efficiency and accuracy, even under challenging conditions such as disturbances or varying lighting. The algorithm effectively tracks fast-moving LED transmitters without identity errors and is capable of handling partial occlusions, maintaining object IDs even when LED transmitters reappear after being blocked. The association process utilizes metrics like Mahalanobis distance and Deep Appearance Descriptors to provide more accurate and robust multi- object tracking, even in dynamic environments. Ⅴ. ConclusionThis paper introduces a multi-transmitter tracking mechanism that utilizes the DeepSORT algorithm for 2D MIMO OCC systems to improve tracking accuracy and decoding efficiency in dynamic operating environments. Experimental results showed that YOLOv11 for LED transmitter detection and DeepSORT for multi-object tracking achieved a 99.6% MOTA and 99.3% MOTP, which is significantly better than conventional methods. The Bit Error Rate dropped lower compared to conventional methods, indicating enhanced reliability and efficiency― even under fluctuating lighting. In addition, the throughput scaled with the LED matrix size. It reached 15.360 kbps for a 16×16 matrix, which is suitable for IoT and smart-city applications. The deep learning-based decoder further improved the SNR and ensured robust performance across various environments, including scenarios up to 22 meters. These results highlight the system’s potential for stable communication. Future work will focus on optimizing tracking performance and system robustness in more complex scenarios with higher transmitter densities, by adopting more lightweight and efficient tracking methods or enhancing the current tracking mechanism. BiographyBiographyYeong Min Jang1985 : B.S. degree, Kyungpook National University 1987 : M.S. degree, Kyungpook National University 1999 : Ph.D. degree, University of Massachusetts 2002-Present : Professor, School of Electrical Engineering, Kookmin University <Research Interests> AI, OWC, FSO, OCC, Internet of energy, Sensor Fusion [ORCID:0000-0002-9963-303X] References
|
StatisticsCite this articleIEEE StyleE. Shin and Y. M. Jang, "Implementation of Multiple Transmitter Tracking Mechanism Based on DeepSORT for 2D MIMO Optical Camera Communications," The Journal of Korean Institute of Communications and Information Sciences, vol. 50, no. 5, pp. 803-811, 2025. DOI: 10.7840/kics.2025.50.5.803.
ACM Style Eun-bi Shin and Yeong Min Jang. 2025. Implementation of Multiple Transmitter Tracking Mechanism Based on DeepSORT for 2D MIMO Optical Camera Communications. The Journal of Korean Institute of Communications and Information Sciences, 50, 5, (2025), 803-811. DOI: 10.7840/kics.2025.50.5.803.
KICS Style Eun-bi Shin and Yeong Min Jang, "Implementation of Multiple Transmitter Tracking Mechanism Based on DeepSORT for 2D MIMO Optical Camera Communications," The Journal of Korean Institute of Communications and Information Sciences, vol. 50, no. 5, pp. 803-811, 5. 2025. (https://doi.org/10.7840/kics.2025.50.5.803)
|