
TNEGI//ETNI (热门博主)
  • 博客访问:


(2010-10-30 21:06:56) 下一个
                                      Ligong Chen's Definition on the Piecewise Regression and
                                                 The Basic Conceptual System of Statistics

关于本人(陈立功,Ligong Chern)的三分回归分析法的文章衔接:

                What is Piecewise regression?
        在统计学中,分段回归分析(Piecewise regression analysis,PRA),或简称分段回归(Piecewise regression),在广义的回归分析(Regression analysis)中是一种方法或分析的策略。它试图在一个被分割的、可连续测量的随机样本空间里找到一个或多个随机的临界点(Critical point,Threshold)以便将整个随机样本空间分割为两个或多个子空间,并在此基础上为每个子空间拟合一个临界模型,从而以一组随机可变的回归模型来描述和预测整个随机空间上复杂的回归关系。有了分段回归分析的方法和技术,我们就有可能依从或改变一个随机空间里的复杂关系以便实现特定的目的。因此,一个广义回归分析的完整策略应该由一个全域回归分析和分段回归分析组成[1]。根据对分段回归的上述定义,我们就不难理解到,它应该是处于整个统计学方法论的顶端位置[2]。
         In Statistics, the piecewise regression analysis (PRA) is a method or analytical strategy in general Regression analysis. It is based on finding one or more random critical points or thresholds on a segmented random variable to segment a continuably measured random sample space into two or more sub-spaces in order to describe randomly variable regression relationships in the whole measurable space. With the PRA, we may have an approach to follow or change the relationships in order to realize a particular purpose. Therefore, a complete strategy for the general regression analysis should be composed of a fullwise regression analysis and a piecewise regression analysis. According to the definition, we will understand that the PRA should be at the top of the large body of the methodology in Statistics.

         The regression models fitted in the piecewise regression analysis are sometimes called piecewise models or threshold models or segmented models. All of the three terms should share a same connotation, or they are synonyms.

                The Basic Conceptual System of Statistics
         Several basic concepts in Ligong Chen's paper or in his thinking process need to be clarified, and some of them need to be corrected. Due to a limited space of the JSM proceedings, he had no chance to do it. Anyone might feel very difficult when he/she tries to understand the ideas and the method in his paper if there were all the concepts stated here since some existing concepts' connotations have been adjusted and some new concepts are emerged. So, I would like to borrow here to give his explanation. 

         Individual: in the domain of epistemology, an individual is an independent existence, or substance, or entity, or object with all known, knowable and unknown attributes by which an individual can be distinguished from all others. Everything existing as the smallest unit in a specific scope can be called an individual. Every attribute about an individual should be certain rather than uncertain if it can be cognized or recognized, or when it is entered into an observation of a subject. In other words, it is itself rather than anything else because all of its attributes are certain at the moment of cognition or recognition. In contrary, a subject should have no way to know it if its attributes are uncertain in an observation; or it is immeasurable to the subject.
        属性:一个个体的一个属性(用符号A(字体:kunstler script)表示)是关于它的一个抽象的特征。这类抽象的特征通常有质和量两大类,由此我们可以在许多个体中定义一个群体或类。例如,一个个体可以有姓字、性别、身高和体重等属性。每一个属性是唯一的并且表达着一个特定的含义。
         Attribute: an attribute, denoted by A (kunstler script), is an abstraction of a characteristic of an individual with a specific quality or quantity by which we may define at least one group or category in the individuals; for example, an individual may have a name, gender status, age, height and weight, etc. Every attribute is unique and indicates a specific meaning.
        子属性:它是一个附属的属性且被定义在一个属性的名下,例如,姓名={亚里士多德,培根,黑格尔},性别={男,女,性别畸变}以及年龄={介于[0,140]之间的一个数值,如2,35或86岁},等等,其中{亚里士多德,培根,黑格尔}、{男,女,性别畸变}和{2, 35 or 86岁}等是被分别定义在姓名、性别和年龄等名下的子属性。
         Sub-attribute: an affiliate attribute is defined under the name of an attribute, for example, Name={Aristotle, Bacon, Hegel}, Gender={male, female, abnormity} or Age={a value that is in the range of [0, 140], i.e. 2, 35 or 86 years old}, etc., where {Aristotle, Bacon, Hegel}, {male, female, abnormity} and {2, 35 or 86 years old} are sub-attributes defined under the name of Name, Gender and Age, respectively.

         Invariable attribute: an attribute is said to be invariable if (1) it is itself, or (2) there are no sub-attributes that can be defined under its name, or (3) it is unnecessary to define the sub-attributes even if they exist. Thus there is no change or variability on the attribute in an observation or experiment so that it can be used to define a group or category clearly, for example, Gender=male, or Age>=18, or Gender=male and Age>=18, etc.

         Variable attribute: an attribute is said to be variable if there are at least two different sub-attributes that can be defined under its name in an observation or experiment. Every sub-attribute is distinguishable and can be defined clearly without any confusion and/or confliction with each other, thus the concept of variable attribute is equal to the concept of random variable in the current system, for example, Gender=(male, female, abnormity), 0<=Age<140, etc.

        Discretely variable attribute(DVA): an attribute is said to be discretely variable if all the sub-attributes defined under its name are qualitative, for example, locations and schools, trees and lakes, diseases and treatments, etc.
         Continuously variable attribute(CVA): a variable attribute is said to be continuously variable if all the sub-attributes defined under its name are quantitative, for example, height and weight, speed and acceleration, volume and ratio, etc.

        总体或总体空间:一个总体(用符号P(字体:kunstler script))是由一些有着相同的不变和可变属性的个体组成的一个群体或集合。总体中的个体构成了一个空间,即总体空间。通常,一个总体被认为是无限的,因为其中的个体数量可能是无限的,或者由于数量巨大以至于在一次有限的观察中不可能全部观察到。一个总体有可能进入一个或一群观察主体的一个特定的观察或试验范畴。
        Population or Population space(总体或总体空间): a population, denoted by P (kunstler script), is a group or set of all individuals with all the same invariable and variable attributes. All the individuals in a population constitute a space, or population space. Usually a population is considered to be infinite since the individuals may be infinite or in a too large number to be obtained. A population may be entered into a scope of an observation or experiment taken by a subject or a group of subjects.
        Scale space: a scale space, denoted by Ω, is a space constructed with all possible sub-attributes or outcomes without duplicates or conflictions of a variable attribute in an observation or experiment, for example, a questionnaire for a statistical survey. Thus, a scale space is a set of invariable attributes and variable attributes and the set may not be empty. It is a tool for a statistical survey. So, the scale space here is equal to the "sample space" in the current probability theory. Clearly, a scale space cannot be called a sample space since it is just a measurement tool rather than the sample itself.
         Measure: a measure, denoted by M, is an action taken by at least one subject in order to obtain original records or cognitions on all invariable and variable attributes with a certain number of individuals defined and selected in an observation or experiment for a specific purpose. Especially in Statistics, any measure is a random measure since any object that is measured is randomly obtained.
        Distribution: a distribution, denoted by D, is a result of a measuring action on a scale space.
         Sample(样本): a sample, denoted by S, is a complete result of all individuals in a measuring action, thus it is a complete distribution over a scale space. It is a random subset of a population. There should be no independent sample without a scale space associated with it, and vice versa. In the domain of Statistics, a sample is often called a dataset. A sample should be obtained with a random mechanism in order to be guaranteed to be a representative of a population since the individuals in a population are usually infinite. Thus, any sample in the domain of Statistics is a random sample. In Statistics, an individual in a sample is often called an observation or a random sample point or sample point in brief. Thus, an individual or an observation or a sample point in a sample cannot be called a “sample” again; otherwise it may cause confusions or conflictions with the sample itself, except in the case that only one individual is measured. In general, a sample itself as a whole is an individual in another scope of an observation, in which it is different from the individuals in the sample. It should have its own attributes, and every attribute should be certain, too, just as it is with any individual discussed above.
         Sample space: a sample space, shares S with sample, can be the sample itself or the dataset since in any sample there should be no duplicates, thus each sample point is an independent element even in the case that there is only one discrete variable with two or more categories and three or more observations in the sample. In other words we can say that in contrary, if a sample itself can not be called a sample space, what else can it be?
         Measurable space: a space is said to be measurable if everything in it can be measured on a scale space. Thus, a population is a measurable space since all individuals in it should be measurable on a scale space.
         Measured space: a space is said to be measured if everything in it is measured on a scale space, regardless that the measure on an individual is successful or unsuccessful. Thus, a sample is a measured space.
         随机映射:它是一个随机机制,用符号M(字体:kunstler script)。通过它一个样本或样本空间被从一个可测空间或总体在尺度空间上得到。
         Random mapping: it is a random mechanism by which a sample or sample space is obtained from a measurable space or population through a scale space, denoted by M (kunstler script).
         Probability space: a probability space, denoted by P, is a sample space which is probabilized into 1. We cannot define a probability space over a population space or measurable space since a scale space may not be a complete one for a population but is complete for a sample. In addition, a population space is usually unknown, so to define a probability space over a population space will give us an unknown space, thus the definition is in vain. We cannot define a probability space over a scale space alone either since the scale space is just a measurement tool rather than a real world that we try to know in statistics. However, a probability space should be defined over a scale space with all measured individuals in a sample space since the sample space is a distribution over the scale space. Thus, only the sample space is a complete space and can be probabilized over the scale space. Of course, a certain complete space that is well defined in mathematics may be probabilized into 1 as long as it satisfies some specific conditions in terms of the existing knowledge system, for example, all the theoretical distributions, such as normal distribution, standard normal distribution, t-distribution, F-distribution as well as Chi-square distribution, etc. Therefore, how to probabilize a sample space belongs to the domain of Mathematics, especially the Theory of Probability.
         Continuity and Continuability of space: We cannot directly discuss the continuity over a population space but only on a sample space. There are two different concepts in this scope. One is continuous space, and the other is continuable space. A continuous space is not equal to a continuable space. A space is said to be continuous if all individuals in a sample are in a certain sub-sample or the whole sample itself, for example, the records of 100 males’ height and the records of 100 females’ height can be considered as a continuous space respectively. However, if we put them together, then the records of the 200 peoples’ height will be considered as a continuable space rather than a continuous space since this mixed space may be an overlapped or a separated space of the two continuous spaces. However, it can be measured in a continuous manner as a whole single space.
         Indivisibility and Divisibility of space: the divisibility of a space should be understood if the space is a discrete space. It is difficult to understand the divisibility over a continuous space in philosophy. However, after the concept of continuable space is introduced into the knowledge system, everything should be simple, since a continuable space is not equal to a continuous space. Thus, a continuable space may be divisible.
         Statistic: a statistic, denoted by s, is an attribute about a sample or sample space. It is a random point measure since the sample is a random subset of a population. It is a real measurable function defined over a sample space thus a probability space. What Statistics does is to construct specific statistics to describe a sample space thus to infer the relevant attributes, which is denoted by a specific term, parameters, of the population space. Thus, a statistic is a random constant rather than a constant in mathematics, which is constant itself without a specific description. Thus, all records in a sample can be understood as random constants, too. A constant is said to be random only for a sample in the domain of Statistics. Therefore, we can say a statistic is certain to the given sample itself but uncertain to the population. However, a sample statistic may be different in a sub-sample from that of the sample since a sub-sample contributes less information to its own sub-sample statistic. For example, a single fullwise regression model will provide a certain or invariable regression relationship over the whole sample space; and a piecewise regression model will bring us a set of different regression relationships in different threshold intervals. Thus, the whole sample space can be segmented into several pieces or segments.
         Parameter: a parameter, denoted by p, is an attribute of a population and will be estimated and inferred with a relevant sampling statistic. It can be treated as an invariable attribute in a statistical estimate since such a treatment doesn't matter to a population. However, we should have to believe that it is variable in the natural history of itself.
        随机空间或随机系统:在我们所讨论的问题的范畴内,一个随机空间或随机系统(用符号R(字体:kunstler script)表示)是一个与上述全部概念相关联的抽象概念,也就是说,它是一个广义化的概念,而非上面提到的某个或某几个具体的概念。由于定义总体的不变属性和样本中个体的随机常量以及样本本身的全部统计量,一个随机空间可能包含了一定程度的确定性,从而在描述和推断总体时,我们的结论也就有了一定程度的确定性。但是,我们必须牢记,任何样本对其总体的非确定性是一个绝对的本质属性,因此,基于样本基础上的关于总体的全部描述在本质上是随机的或非确定性的。
         Random space or Random system: a random space or random system, denoted by R (kunstler script), is an abstract concept associated with all the concepts above in the domain that we discussed here. It is a generalized concept without a specific object among the concrete concepts stated above. In other words, all of the above concepts constitute a complete random space. A random space may contain a sort of certainty due to the invariable attributes for defining a population, as well as the random constants of all the individuals and all the statistics, thus we will have a sort of certainty in our description and inference on the population. However, we must remember that the uncertainty of a sample to the population is absolute, thus all the descriptions about the population based on a sample are essentially random or uncertain.
(注:本概念系统于2010年10月18日在Wikipedia网站上关于Piecewise regression analysis的词条中提了出来,由于涉嫌原创性研究以及可能引起的巨大学术争论,被Wikipedia管理人员于当月27日将整个词条删除)

[ 打印 ]