ieeexplore.ieee.org/abstract/document/8603532

Preview meta tags from the ieeexplore.ieee.org website.

Linked Hostnames

2

Thumbnail

Search Engine Appearance

Google

https://ieeexplore.ieee.org/abstract/document/8603532

Analyzing Data Complexity Using Metafeatures for Classification Algorithm Selection

Supervised machine learning is defined as the task of seeking algorithms which, based on reasoning gained from historical data, produce a general hypothesis, can then be used to infer knowledge about new data in the future. Classification, an instance of supervised learning, is statistically defined as the problem of identifying to which particular subpopulation a new observation of data belongs. Given the large number of available classification algorithms, their combinations (e.g. ensembles) and possible parameter settings, finding the best method to analyze a new data set has become an ever more challenging task. Typically, finding the best classifier for a given data set involves empirically iterating through all candidate classifiers and choosing the one which provides the best classification accuracies. Clearly, this task is computationally very expensive and the computation cost increases with the addition of each new candidate algorithm. This problem is compounded by the fact that there is not adequate generalizability of these classification methods over data sets from different domains. For example, classifier performance obtained with medical imaging data may not hold for financial data when trying to achieve a similar classification task using the same classifier. How, then, does one efficiently choose a classifier which will provide better classification performance than others? In this context, this study aims to streamline the task of algorithm selection for classification using the meta-learning framework. We propose a methodology to analyze empirically a set of measures of data complexity, known as metafeatures, and investigate their influence on the classification performance of several widely used classifiers. Doing so allows a map of a performance metric to be generated over the metafeature space of data sets. This map is partitioned into regions where some classifiers perform better than others. Once implemented, a new data set can be located in the metafeature continuum and the appropriate classifier for the new data set can be chosen as the one that performs best in that region of the map. The problem of algorithm selection, then, involves merely calculating the metafeatures for a new data set.



Bing

Analyzing Data Complexity Using Metafeatures for Classification Algorithm Selection

https://ieeexplore.ieee.org/abstract/document/8603532

Supervised machine learning is defined as the task of seeking algorithms which, based on reasoning gained from historical data, produce a general hypothesis, can then be used to infer knowledge about new data in the future. Classification, an instance of supervised learning, is statistically defined as the problem of identifying to which particular subpopulation a new observation of data belongs. Given the large number of available classification algorithms, their combinations (e.g. ensembles) and possible parameter settings, finding the best method to analyze a new data set has become an ever more challenging task. Typically, finding the best classifier for a given data set involves empirically iterating through all candidate classifiers and choosing the one which provides the best classification accuracies. Clearly, this task is computationally very expensive and the computation cost increases with the addition of each new candidate algorithm. This problem is compounded by the fact that there is not adequate generalizability of these classification methods over data sets from different domains. For example, classifier performance obtained with medical imaging data may not hold for financial data when trying to achieve a similar classification task using the same classifier. How, then, does one efficiently choose a classifier which will provide better classification performance than others? In this context, this study aims to streamline the task of algorithm selection for classification using the meta-learning framework. We propose a methodology to analyze empirically a set of measures of data complexity, known as metafeatures, and investigate their influence on the classification performance of several widely used classifiers. Doing so allows a map of a performance metric to be generated over the metafeature space of data sets. This map is partitioned into regions where some classifiers perform better than others. Once implemented, a new data set can be located in the metafeature continuum and the appropriate classifier for the new data set can be chosen as the one that performs best in that region of the map. The problem of algorithm selection, then, involves merely calculating the metafeatures for a new data set.



DuckDuckGo

https://ieeexplore.ieee.org/abstract/document/8603532

Analyzing Data Complexity Using Metafeatures for Classification Algorithm Selection

Supervised machine learning is defined as the task of seeking algorithms which, based on reasoning gained from historical data, produce a general hypothesis, can then be used to infer knowledge about new data in the future. Classification, an instance of supervised learning, is statistically defined as the problem of identifying to which particular subpopulation a new observation of data belongs. Given the large number of available classification algorithms, their combinations (e.g. ensembles) and possible parameter settings, finding the best method to analyze a new data set has become an ever more challenging task. Typically, finding the best classifier for a given data set involves empirically iterating through all candidate classifiers and choosing the one which provides the best classification accuracies. Clearly, this task is computationally very expensive and the computation cost increases with the addition of each new candidate algorithm. This problem is compounded by the fact that there is not adequate generalizability of these classification methods over data sets from different domains. For example, classifier performance obtained with medical imaging data may not hold for financial data when trying to achieve a similar classification task using the same classifier. How, then, does one efficiently choose a classifier which will provide better classification performance than others? In this context, this study aims to streamline the task of algorithm selection for classification using the meta-learning framework. We propose a methodology to analyze empirically a set of measures of data complexity, known as metafeatures, and investigate their influence on the classification performance of several widely used classifiers. Doing so allows a map of a performance metric to be generated over the metafeature space of data sets. This map is partitioned into regions where some classifiers perform better than others. Once implemented, a new data set can be located in the metafeature continuum and the appropriate classifier for the new data set can be chosen as the one that performs best in that region of the map. The problem of algorithm selection, then, involves merely calculating the metafeatures for a new data set.

  • General Meta Tags

    12
    • title
      Analyzing Data Complexity Using Metafeatures for Classification Algorithm Selection | IEEE Conference Publication | IEEE Xplore
    • google-site-verification
      qibYCgIKpiVF_VVjPYutgStwKn-0-KBB6Gw4Fc57FZg
    • Description
      Supervised machine learning is defined as the task of seeking algorithms which, based on reasoning gained from historical data, produce a general hypothesis, ca
    • Content-Type
      text/html; charset=utf-8
    • viewport
      width=device-width, initial-scale=1.0
  • Open Graph Meta Tags

    3
    • og:image
      https://ieeexplore.ieee.org/assets/img/ieee_logo_smedia_200X200.png
    • og:title
      Analyzing Data Complexity Using Metafeatures for Classification Algorithm Selection
    • og:description
      Supervised machine learning is defined as the task of seeking algorithms which, based on reasoning gained from historical data, produce a general hypothesis, can then be used to infer knowledge about new data in the future. Classification, an instance of supervised learning, is statistically defined as the problem of identifying to which particular subpopulation a new observation of data belongs. Given the large number of available classification algorithms, their combinations (e.g. ensembles) and possible parameter settings, finding the best method to analyze a new data set has become an ever more challenging task. Typically, finding the best classifier for a given data set involves empirically iterating through all candidate classifiers and choosing the one which provides the best classification accuracies. Clearly, this task is computationally very expensive and the computation cost increases with the addition of each new candidate algorithm. This problem is compounded by the fact that there is not adequate generalizability of these classification methods over data sets from different domains. For example, classifier performance obtained with medical imaging data may not hold for financial data when trying to achieve a similar classification task using the same classifier. How, then, does one efficiently choose a classifier which will provide better classification performance than others? In this context, this study aims to streamline the task of algorithm selection for classification using the meta-learning framework. We propose a methodology to analyze empirically a set of measures of data complexity, known as metafeatures, and investigate their influence on the classification performance of several widely used classifiers. Doing so allows a map of a performance metric to be generated over the metafeature space of data sets. This map is partitioned into regions where some classifiers perform better than others. Once implemented, a new data set can be located in the metafeature continuum and the appropriate classifier for the new data set can be chosen as the one that performs best in that region of the map. The problem of algorithm selection, then, involves merely calculating the metafeatures for a new data set.
  • Twitter Meta Tags

    1
    • twitter:card
      summary
  • Link Tags

    9
    • canonical
      https://ieeexplore.ieee.org/abstract/document/8603532
    • icon
      /assets/img/favicon.ico
    • stylesheet
      https://ieeexplore.ieee.org/assets/css/osano-cookie-consent-xplore.css
    • stylesheet
      /assets/css/simplePassMeter.min.css?cv=20250812_00000
    • stylesheet
      /assets/dist/ng-new/styles.css?cv=20250812_00000

Links

17