Spatial Structure-Oriented and Angle-Based Human Pose Estimation for Pose Classification

This study provides a detailed analysis of the performance of different pose classification models trained using data from the human pose classification model. The approach involves considering both spatial structure-oriented techniques, which incorporate body part coordinates and their relative positions, and angle-based methods that calculate the angles between joints. This combined spatial and angular data play a crucial role in enhancing the precision of pose classification. It is worth noting that while our primary investigation is based on a yoga pose dataset, the versatility and applicability of our approach extend to other pose datasets, showcasing the broad potential of our spatial and angle-based methodology. In summary, this research embarks on the integration of Human Pose Estimation with machine learning for yoga pose classification. The outcomes promise not only to advance the field of pose classification but also to yield practical applications in exercise, fitness, and beyond. This research has practical implications, aiming to integrate the developed model into a project we developed titled “ AI-Based Human Pose Detection Tool . “The tool uses real-time video analysis to track users' movements during workouts, with the Blazepose model detecting key landmarks and assessing metrics. This enhances posture and form assessment, making the tool valuable for fitness enthusiasts.


Introduction
Human Pose Estimation in computer vision is a transformative technology decoding human body language, empowering machines to understand postures and movements.With applications in fitness, sports, medical diagnostics, and gaming, its core focus is characterizing key body part positions, bridging the physical-digital gap, and promising a revolution in perception and interaction.
The synergy between Human Pose Estimation (HPE) and classification techniques further amplifies this technology's significance.Our research centers on integrating the Blazepose model, an HPE model, with machine learning techniques for yoga pose classification.This nuanced approach to interpreting human movement is showcased through the exploration of five fundamental yoga poses: down dog, goddess, plank, tree, and warrior.
The effectiveness of our approach is evident in the utilization of data from human pose estimation models compared to the direct use of raw images.These models excel in abstracting vital pose information by focusing on key joint positions and relationships, streamlining feature extraction.Their robustness to • Email: editor@ijfmr.com

IJFMR23069614
Volume 5, Issue 6, November-December 2023 2 variations in lighting, background, and environmental conditions enhances classification accuracy.The inherent dimensionality reduction, coupled with generalization capabilities and computational efficiency, solidifies their suitability for classification tasks.Moreover, the interpretability of these models provides clearer insights into the classification rationale, a feature often challenging to achieve with raw image data.In summary, our research demonstrates that incorporating data from human pose estimation models optimizes classification through abstraction, robust feature extraction, and enhanced interpretability, ultimately improving accuracy and efficiency in pose classification tasks.

Related Works
Our research drew from various papers, offering insights into human pose estimation and diverse dataset normalization techniques, with a focus on methods for improvement.Vivek Anand Thoutam et al. [1] introduced a yoga pose classification method using joint and key point angles.However, they did not address dataset normalization, a critical factor for data standardization and reducing variations in joint positions and angles.
Utkarsh Bahukhandi et al. [3] attained a notable 94% accuracy by employing joint coordinates to train logistic regression and SVM models.However, they did not include joint angles as training parameters and omitted dataset normalization, which are essential for ensuring data consistency and accurate model training.In contrast, Steven Chen et al. [2] introduced key point normalization as part of their work on constructing a pose trainer.This concept shed light on valuable data processing techniques, particularly normalization by reference points, which we have considered for implementation in our research.
Ashish Ohri et al. [5] introduced a pose correction method using Dynamic Time Warping and emphasized the effectiveness of the MediaPipe model, known for its exceptional accuracy, speed, and robustness in real-time human pose estimation.Rohit Srivatsa et al. [4] conducted a comparative analysis of various open-source pose estimation models.This comparative study proved instrumental in guiding our selection of the most suitable model for our specific use case.
Sen Qiao et al. [9] introduced a cost-effective system for grading human gestures using Openpose, focusing on a novel approach based on the spatial distance between joints, potentially improving model accuracy.
Anilkumar et al. [10] proposed a self-practice yoga monitoring system using angle conditions among key joints.Yet, manual angle input may be time-consuming.Incorporating joint coordinates alongside angles could provide a more comprehensive solution.As a part of our research, we have considered 2 datasets, one with only coordinates and the other with coordinates and joint angles, to evaluate the importance of the angle parameter.
Prof. Rupal More et al. [12] solely utilized Logistic Regression for yoga pose classification, but it may not always be the ideal choice.They did not explore potential benefits from model fine-tuning or alternative models.In our research, we have included 4 models for training and analyzed each model's performance.Chhaihuoy Long et al. [11] proposed a transfer learning method for pose classification.Unlike image- based classification, using an HPE model to extract joint coordinates and angles offers more accurate and robust data, improving classification accuracy and resilience to image-related challenges.
These research contributions informed our work on human pose estimation, dataset processing, model evaluation, and performance analysis, prompting us to explore more efficient and effective methods for data processing and model tuning.

Methodology
This research focuses on three core objectives: 1. Processing Yoga Pose Image Data: Dataset processing, normalization, and joint angle integration using Blazepose HPE model coordinates for improved classification.2. Evaluation of Classification Models: Rigorous evaluation of machine learning models for yoga pose classification, focusing on accuracy and efficiency with body part coordinates and joint angles.

Analysis of Model Performance:
In-depth analysis of classification models, providing insights into their strengths, limitations, and effectiveness in yoga pose classification.
The machine learning models at the heart of our research include -Multilayer Perceptron (MLP), Random Forest Classifier, Support Vector Machine (SVM), and XGBoost.

Dataset
The "Yoga Poses Dataset" on Kaggle is a comprehensive resource for yoga pose classification.It features images of individuals performing various yoga poses, including "Downward Dog," "Goddess Pose," "Plank Pose," "Tree Pose," and "Warrior2 Pose."The dataset is well-curated, offering diverse images captured from different angles to ensure dataset completeness.The below table gives an idea of the dataset.relative to the image's width and height, respectively.2. z: This value represents the depth of each landmark, measured from the reference point of the hips' midpoint.A smaller z value indicates the landmark's proximity to the camera, and this scale is somewhat akin to that of x. 3. visibility: Expressed as a numerical value between 0.0 and 1.0, this parameter signifies the probability of a landmark being observable in the image.A higher visibility value implies a greater likelihood that the landmark is both present and unobstructed within the image.In this stage, the process is executed for each image within the image dataset.In essence, this involves creating a comprehensive dictionary that encompasses the attributes of various body parts, thereby providing a detailed representation of the pose landmarks.Simultaneously, the label, which corresponds to the folder name and signifies the specific yoga pose depicted in the image, is thoughtfully incorporated into this dictionary.This stage extends beyond pose landmarks and labels, involving the calculation and inclusion of a list of angles.These angles play a crucial role, detailed in the subsequent feature processing stage, enhancing our comprehension of each yoga pose In this process, images without pose landmarks are excluded from the dataset, while images with landmarks are collected into a list.Each list entry is enriched with pose landmarks and the corresponding label (folder name).This method culminates in a list containing these augmented dictionaries.

Feature Processing
Following feature extraction, the Feature Processing stage is a pivotal phase in our research.During this stage, the extracted pose landmarks are further enriched by calculating six essential angles associated with specific body parts.Figure 3 shows the angles considered.These angles, namely 'left_arm_angle'(1), 'right_arm_angle'(2), 'left_shoulder_angle'(3), 'right_shoulder_angle'(4), 'left_knee_angle' ( 5) and 'right_knee_angle'( 6) provide detailed insights into the posture and form of the yoga poses.This step enhances the dataset with valuable information, which is vital for precise classification.Here, atan2(dy, dx) is the arctangent function that calculates the angle formed by the vector (dx, dy) with respect to the positive x-axis.The subtraction of these arctangents gives the difference in angles between the two-line segments, resulting in the angle θ_rad in radians.The meticulous calculation of these angles contributes to a more comprehensive understanding of the yoga poses.The angles are calculated for each image in the dataset, offering valuable geometric insights into the orientation of body parts.This feature processing stage significantly enhances the dataset, providing both the landmark coordinates and the angles required for accurate pose classification.

3.4
Feature Normalization Data normalization plays a critical role in preparing pose coordinates for effective classification.It's essential because it ensures that data is consistent and ready for analysis.In the context of pose estimation, normalization standardizes the data and focuses on relative body part positions.

Figure 5: Torso center
We employ a specific method known as "Normalization by reference point" In this method, the chosen reference point is the "torso center," which is strategically located at the midpoint between the shoulder center and hip center.This reference point is pivotal because it effectively centers and scales the pose data based on a stable, central location within the body.
The advantage of using this method includes: 1. Centering and Scaling: By centering the data on the torso center, we eliminate variations arising from the position or size of the subject, thereby making the data consistent and comparable.2. Preserving Relative Relationships: This method is particularly useful in tasks where maintaining the relative relationships between body parts is essential.It ensures that the model can focus on how different body parts are positioned concerning the torso center.
Overall, feature normalization through the use of a reference point like the torso center optimizes the dataset for accurate pose classification.It mitigates potential biases introduced by variations in body size and posture, emphasizing the relative positioning of body parts.This prepares the data for the subsequent stages of the research, ensuring that the machine learning models operate on standardized and meaningful inputs.Figure 6 gives the normalization algorithm implemented.Store the normalized coordinates in `normalized_pose_data` using `body_part` as the key: { 'x': normalized_x, 'y': normalized_y, 'z': normalized_z, f"{body_part}_confidence": normalized_confidence }.
Update the original record with the `normalized_pose_data`.End loop (For each record in data).End Process.

5
Feature Expansion This stage of feature expansion is crucial for enriching the dataset.It involves extending the number of columns for each body part, providing a more comprehensive view of the data.As an example, consider the 'nose' body part, which is expanded into four distinct columns: 'nosex,' 'nosey,' 'nosez,' and 'nose_confidence.'This expansion process is applied uniformly to all body parts, effectively enhancing the dataset with detailed and valuable information.
In the previous stage, the dataset comprised 33 pose landmarks, 6 angles, and a label, totaling 40 columns.However, after completing this stage of feature expansion, the dataset now includes a total of 33 * 4 columns for pose landmarks (one set of four columns for each body part), 6 angle columns, and the label, resulting in a dataset with a grand total of 139 columns.This expanded dataset is better equipped to provide comprehensive insights for accurate pose classification.This data is stored as a CSV file for ML model training.

3.8
Next Steps In this stage, the dataset is divided into training and testing data using Python libraries, ensuring a robust evaluation of model performance.

Phase 1 -Training ML Models on Expanded Data with Only Coordinates Columns (133 columns):
This initial step involves training machine learning models using the expanded dataset, which exclusively contains coordinate columns.This phase assesses the performance of models that rely solely on geometric information.

Phase 2 -Training ML Models on Expanded Data with Both Coordinates and Angles Columns (139 columns):
In the subsequent step, the training expands to include the dataset with both coordinate and angle columns.This comprehensive dataset equips models with additional information from the calculated angles, enabling a more nuanced evaluation of performance.

Results
In the Results stage, we will delve into the outcomes of the previous stage, specifically the performance of our machine learning models on the datasets.This stage encompasses two essential phases for evaluation: • Email: editor@ijfmr.com

Phase 1 -ML model performance on Data with Only Coordinates Columns Phase 2 -ML model performance on Data with Both Coordinates and Angles Columns
For each of these models, precision, recall, and F1 score will be presented.

Precision:
Precision represents the ratio of true positive predictions to the total number of positive predictions.In the context of yoga pose classification, precision tells us how many correctly predicted yoga poses were actually correct.2. Recall: Recall, often called sensitivity, is the ratio of true positive predictions to the total number of actual positive instances.In our scenario, recall indicates how many of the actual yoga poses were correctly predicted.

F1 Score:
The F1 score is the harmonic mean of precision and recall.It provides a balanced assessment of a model's performance, considering both false positives and false negatives.A higher F1 score signifies a model's ability to achieve both high precision and recall.4. Accuracy: Accuracy measures the percentage of correctly predicted instances, considering both true positives and true negatives.While accuracy is a valuable overall metric, precision, recall, and F1 score are essential when dealing with imbalanced datasets or when different misclassification costs exist.
These metrics are essential for understanding how well the models classify the different yoga poses.Additionally, the accuracy for each model (based on performance on testing data) is provided to offer a quantitative assessment of their overall performance.The precision, recall, and F1 score provide insights into the strengths and weaknesses of each machine learning model, facilitating a detailed discussion of their performance.

Phase 1 Results Discussion
In Phase 1 Results, we assessed four machine learning models for yoga pose classification: Random Forest Classifier, Multilayer Perceptron (MLP), Support Vector Machine (SVM), and XGBoost, using accuracy, precision, recall, and F1-score.

Phase 2 Results
Table 7 leading to lower recall rates, especially for 'tree' and 'warrior2,' necessitating further fine-tuning.3. Support Vector Machine (SVM): Balanced accuracy at 89.82%, solid precision and recall, notably for 'down dog' and 'plank,' even with angle data.4. XGBoost: Impressed with 95.79% accuracy, strong precision, and robust recall for various poses, effectively utilizing both coordinates and angles data.Performance variations arose from model characteristics and data nature.Ensemble models excelled due to their strength in handling complex features, managing non-linearity, and resisting overfitting, enabling effective use of angle information.MLP's struggle with angles points to the need for architecture and hyperparameter optimization to leverage this data.
In summary, the Random Forest Classifier adapted well, MLP requires further optimization, SVM demonstrated robustness, and XGBoost excelled.The inclusion of angle data enhances pose classification potential, promising more accurate real-world applications.Further research should refine MLP's configuration and explore broader use cases for these models.

Future Works
There are several methods available to enhance results and improve the efficiency of the models.Below are some of these approaches

5.1
Normalization process using torso size Normalizing pose data using torso size can be advantageous in certain situations, especially in pose estimation tasks where you want to make the pose data scale-invariant or reduce the impact of variations in the subject's size or distance from the camera.Here are some advantages of normalizing using torso size: 1. Scale-Invariance: This approach ensures that pose data becomes scale-invariant, facilitating comparisons across subjects of varying sizes and subject-to-camera distances.2. Reduced Sensitivity to Distance: It mitigates the sensitivity of pose estimation to the subject's distance from the camera, resulting in more robust pose data.3. Improved Generalization: Normalizing based on torso size enhances model generalization, preventing overfitting to a specific subject's size and shape.The focus shifts to relative body part positions rather than absolute distances.4. Consistency: For comparing poses or tracking changes over time, torso size normalization provides more consistent and meaningful results.

5.3
Normalization to create embedding 1.This approach involves more complex normalization and embedding steps.2. It may provide a more abstract representation of the pose, which could be beneficial for certain machine learning tasks.3. It might be suitable when you want to feed the data into a neural network or other machine learning models.

Angle Normalization
In our ongoing research, we aim to enhance angle data consistency for yoga pose classification.We've calculated six angles from pose landmarks via a human pose estimation model.However, these angles can vary for the same yoga pose, potentially impacting classification accuracy.
To address this, we will implement angle normalization techniques (e.g., Min-Max Scaling, Z-Score Normalization, and Circular Statistics).These methods standardize angle ranges across instances of the same yoga pose, ensuring uniformity.Angle normalization will improve model performance, ensuring consistent and reliable yoga pose classification results, a vital contribution to our research's success.

Conclusion
In summary, the performance differences among Random Forest Classifier, XGBoost, and Multilayer Perceptron (MLP) in yoga pose classification were notably influenced by the inclusion of 6 angles as features.Random Forest Classifier and XGBoost outperformed MLP in accuracy when these angles were integrated.
Key factors contributing to Random Forest Classifier and XGBoost' s superior performance with added angles include: 1. Feature Engineering and Interpretability: Both models excel in handling interpretable features, and the included angles convey meaningful relationships, aiding these models in leveraging the added information effectively.These results emphasize the importance of selecting the right model for the specific task and underlie the need for extensive experimentation and evaluation.Model architecture should align with dataset characteristics and the desired balance between interpretability, complexity, and generalization.

Acknowledgment
We extend our heartfelt gratitude to Assistant Professor Anitha M for her unwavering support and invaluable guidance throughout the course of this research.Her expertise and encouragement played a pivotal role in exploring various dimensions of this study.
Additionally, we would like to express our gratitude to the professors of the Computer Science Department at Dayanand Sagar College of Engineering.Their generous support and facilitation of access to department computer labs were instrumental in our exploration of various research topics.

Authors' Biography
• P. Charith, a student at Dayanand Sagar College of Engineering, specializes in AI/ML, Data science, and computer vision, holding a degree in computer science from the same institution.Their research primarily focuses on AI/ML applications.

Figure 1 :
Figure 1: The above figure illustrates the overview of the research process involved

3 . 2 Feature
Extraction using Pose Estimation Model During this stage, the Blaze Pose model, provided by MediaPipe, is harnessed to extract 33 pose landmarks, each defined by the following attributes: 1. x and y: The coordinates of these landmarks are standardized, falling within the range of [0.0, 1.0],

Figure 2 :
Figure 2: The above diagram shows the body parts for which Blazepose provides pose landmarks

Figure 6 :
Figure 6: Normalization by Torso Center Algorith For each record in data do Create a copy of the record as `pose_data`.Remove the 'Label' and angles specified in `angle_list` from `pose_data`.

3 .
Overfitting and Hyperparameter Tuning: Deep learning models, such as MLP, are more prone to overfitting, especially with small or unregularized datasets.In contrast, ensemble methods are less susceptible to overfitting and often require less extensive hyperparameter tuning to perform well.4. Data Distribution and Complexity: Model performance is significantly influenced by data distribution and problem complexity.Decision tree-based models (Random Forest) may be better suited for certain data distributions, while gradient boosting (XGBoost) may excel in others.5. Data Inconsistency: High variation in angle ranges for the same yoga pose, such as the angle at the elbow ranging from 50 to 175 for the goddess pose, can introduce challenges in model generalization.Angle normalization techniques should be considered to address this issue effectively.

Figure 12 :
Figure 12: In the first image the angle at the elbow is around 40 degrees and for the second the angle is over 175 degrees

Table 1 : Dataset used Yoga Poses Images Train Test Downward dog pose
Merging test and train images within each subfolder creates a unified dataset for machine learning model training and testing.

Table 11 : Accuracy for all ML models
Experienced an accuracy drop to 81.75%, struggled with angles, 2. Non-Linearity and Complexity: Deep learning models like MLP require more complexity and data to capture non-linear relationships effectively.If data patterns are relatively simple, ensemble methods like Random Forest and XGBoost can excel.