↪ Filtration System
Last updated
Last updated
The coin data filtering system, as illustrated in the picture below, begins by collecting data from various cryptocurrency listings on the pump.fun platform. This data is then routed through a systematic queue where a specialized data handler processes it. Given the diverse nature of data produced by different cryptocurrency projects, a preliminary data handling step is crucial. This step transforms the data into a format suitable for further analysis.
The core of the system is the detection function that employs a CNB classifier to assess the attributes of each cryptocurrency based on historical data models. If the data aligns with the characteristics of potentially successful coins, it is forwarded to the next stage. Data identified with anomalies or undesirable traits is flagged as suspicious, while data meeting specific investment criteria is classified as event data.
The subsequent filtering function leverages the KNN algorithm to decisively categorize the information as event data. The integration of detection and filtering functions allows the system to make informed decisions based on the analysis.
Steps Involved in Data Integrity and Distortion Removal:
Data Cleaning using Noise Reduction: Noise in data refers to irrelevant or corrupt information that can skew analysis results. To address this, our system utilizes the Empirical Mode Decomposition (EMD) technique. EMD adaptively decomposes a complex signal into Intrinsic Mode Functions (IMFs), which are essential for isolating meaningful data from noise. The signal is further refined using the Hilbert Huang Transform to ensure only significant data attributes are retained.
Suspicious Data Detection: During data collection, some entries may include erroneous information, termed as suspicious data. The system uses the CNB classifier to differentiate these from regular entries effectively. This reduces server load by preventing the analysis of flawed data.
P (c|x) = P (x1|c) × P (x2|c) × ...... × P (xn |c) × P (c) where P (c|x) is the posterior probability of class (target) given-predictor (attribute). P (c) is the prior probability of class. P (x|c) is the likelihood which is the probability of the predictor given class. P (x) is the prior probability of the predictor
Event Data Detection: Legitimate or event data is identified using the KNN algorithm, which compares incoming data against known samples. This method is particularly effective in recognizing valid data entries suitable for investment or trading.
Evaluation matrix
The evaluation metrics are Time and Accuracy which are described as follows: Time: Time spent developing the model and making predictions. Accuracy: It is a ratio of accurately anticipated observations to the total observations.
where TP (True Positive): is the number of correct predictions that the occurrence is positive, TN (True Negative): is the number of correct predictions that the occurrence is negative.