智慧即财富

陈立功的文学城博客:驰纵骋横,谈今博古,飞花扬月,行文交友
个人资料
TNEGI//ETNI (热门博主)
  • 博客访问:
正文

新统计算法所源自的三个基本思想

(2025-11-08 07:25:42) 下一个

关于“实事求是、一分为二、群众路线”的补充说明

【2025-11-05】

统计学是一门通用的科学认识方法论。作者在《哲学之于统计学》一书中所著的关于连续型随机变量的中心化位置(简称央位)的自加权期望算法以及关于加权期望分段回归分析中回归权重的算法等新计算法深刻地受到了“实事求是、一分为二、群众路线”这三个词语所蕴含的思想和方法的影响。可以说,作者将这些思想和方法内化在了这些新统计算法中。因此,以下是对该书所要展示的新统计思想和方法的一个重要补充和说明。这个说明被置于该书的目录之前。

Statistics is a universal scientific methodology. The author's new calculation methods in his book "Philosophy in Statistics," including the self-weighted expectation algorithm (or Cmean algorithm) for the centralized location (or center) of continuous random variables and the algorithm of regressive weights for weightedly expected piecewise regression analysis, are profoundly influenced by the ideas and methods embodied in the three phrases "Seeking Truth from Facts, Dividing One into Two, and Massiline (coined by the author by combining mass and line)." In fact, these ideas and methods are internalized into these new statistical algorithms. Therefore, the following is an important supplement and explanation to the new statistical ideas and methods presented in this book.

一、实事求是 (Seeking Truth from Facts)

作为一个词语,它在中国历史上的沿袭和演变是一个从古代成语到中国共产党核心思想的过程。它最早见于东汉时期班固(32~92)所著的《汉书·河间献王传》。他在文中赞扬汉景帝刘启的儿子刘德“修学好古,实事求是”,意为“以事实为根据寻求真理”。1914年,赵天麟校长为天津北洋大学题写了“实事求是”的校训,这被民国早期湖南公立工业专门学校的校长宾步程所继承,他也为1917年迁入岳麓书院办学的湖南工专题写了“实事求是”的牌匾作为校训。毛泽东在1916~1919年间曾寄居于岳麓书院。一般认为这是以后他引用该成语的渊源。

As a phrase, its evolution in Chinese history is a process from an ancient idiom to the core ideology of the Communist Party of China. It first appeared in the Book of Han - Biography of King Xian of Hejian, written by Ban Gu (32~92).  In the text, he praised Liu De, a son of Emperor Jing of Han, Liu Qi, for his “studiousness and love of antiquity, seeking truth from facts.” meaning “seeking truth based on facts.” In 1914, President Zhao Tianlin inscribed a motto “Seeking Truth from Facts” for Tianjin Beiyang University. This was inherited by Bin Bucheng, the principal of the Hunan Public Industrial College in the early Republic of China, who also wrote a plaque with the same motto for the Hunan Industrial College, which moved to Yuelu Academy in 1917. Mao Zedong stayed at Yuelu Academy from 1916 to 1919. This is generally considered to be the origin of his later use of the idiom.

在已出版的毛泽东著作中寻找“实事求是”一词,最早见于1938年10月14日,毛泽东在六届六中全会上的报告中指出:“共产党员应是实事求是的模范,又是具有远见卓识的模范。因为只有实事求是,才能完成确定的任务;只有远见卓识,才能不失前进的方向。”1940年1月,毛泽东在《新民主主义论》中指出:“科学的态度是‘实事求是’,‘自以为是’……的态度是决不能解决问题的。”

The earliest known instance of the term “seeking truth from facts” in Mao Zedong’s published works is found on October 14, 1938, in his report to the Sixth Plenary Session of the Sixth Central Committee of the Communist Party of China. Mao stated: “Communist Party members should be models of seeking truth from facts, and also models of foresight and insight. Only by seeking truth from facts can we accomplish the assigned tasks; only with foresight and insight can we maintain our direction.” In January 1940, in “On New Democracy,” Mao Zedong pointed out: “The scientific attitude is ‘seeking truth from facts,’ while the attitude of ‘subjective-assumption’... can never solve problems.”

但是,直到1941年,他才在《改造我们的学习》一文中对“实事求是”做了简单且完整的解释:“实事”就是客观存在着的一切事物,“是”就是客观事物的内部联系,即规律性,“求”就是我们去研究。这一现代化解释超越了其古代的字面意思,令其拥有了“在实践基础上探求科学真理”的涵义。他在该文中还尖锐地批评了党内一些领导干部凭“想当然”解决问题的行为。

However, it wasn’t until 1941 that he provided a simple yet complete explanation of “seeking truth from facts” in his article “Reform Our Study”: “Facts” refers to all objectively existing things; “truth” refers to the internal connections of objective things, that is, their regularity; and “seeking” means that we study them. This modern interpretation transcends its ancient literal meaning, giving it the connotation of “seeking scientific truth on the basis of practice.” In the same article, he also sharply criticized the behaviors of some leading cadres within the Party who solved problems based on “wishful thinking or assuming what it is.”

从毛泽东第一次在岳麓书院见到“实事求是”的牌匾到1941年对它做出解释,说明他对此进行过长期的思考和践行。他将这一成语与马列主义相结合所做的多次阐释以及他个人的实践,使其成为了中国共产党人所应有的核心哲学思想,是他们认识和改造世界的根本方法。其在实践中的具体体现是,尊重事物本来的面貌,遵循客观规律,避免主观想象和臆断;强调实践是检验真理的唯一标准,经验和实践是认识世界的基础;其目地是一切从实际出发,找到解决现实问题的方法,促进国家发展和人民福祉。因此,实事求是是一种与先验假定、主观主义、经验主义、教条主义和僵化思维等相对立的思维模式。

From the moment Mao Zedong first saw the plaque bearing the inscription "Seek Truth from Facts" at Yuelu Academy to his interpretation of it in 1941, it was evident that he had engaged in long-term reflection and practice regarding this principle. His numerous interpretations of this idiom in conjunction with Marxism-Leninism, along with his personal experience, made it a core philosophical thought inherent to the Chinese Communist Party and a fundamental method for understanding and transforming the world. In practice, this manifests itself in respecting the true nature of things, following objective laws, and avoiding subjective imagination and conjecture; emphasizing that practice is the sole criterion for testing truth, and that experience and practice are the foundation for understanding the world; and aiming to proceed from reality, find solutions to real-world problems, and promote national development and the well-being of the people. Therefore, seeking truth from facts is a thinking mode that stands in contrast to a priori assumptions, subjectivism, empiricism, dogmatism, and rigid thinking.

毛泽东在1939年12月撰写的文章《中国革命和中国共产党》中有一句话:“认清中国的国情,乃是认清一切革命问题的基本的根据。”(《毛泽东选集》第2卷,人民出版社1991年版,第633页)。同理,在统计学中我们可以说:“认清一个样本中所含有的基本信息,乃是认清一切统计问题的基础。”只有首先认清了问题,才有可能找到解决问题的正确方法,才有可能构建正确的统计算法。

In his article The Chinese Revolution and the Chinese Communist Party written in December 1939, Mao Zedong stated, "Understanding China's national conditions is the fundamental basis for understanding all revolutionary problems." (Selected Works of Mao Zedong, Vol. 2, People's Publishing House, 1991, p. 633). Similarly, in statistics, we can say: "Understanding the basic information contained in a sample is the foundation for understanding all statistical problems." Only by first understanding the problem can we find a correct method to solve it and construct a correct statistical algorithm.

本书在探索连续可变属性(即传统概念系统下的连续型随机变量)的凹-凸自权重算法时,首先考察的就是一个样本所带有的基本信息,即全部的样本点测量以及其中的最小和最大值,而这些样本“事实”构成了计算凹-凸自权重的全部基础。除此之外,无需引入任何关于总体分布形态的先验假定和其它外源性信息。即使是其算术均数或中位数等用来估计抽样分布的中心化位置的统计量也不被用于自权重的计算。这正是该权重被称为“自权重”的原因。最终,一个连续可变属性的抽样分布形态由其样本测量和凹-凸自权重在二维空间里得到描绘,而分布的中心化位置(简称央位或凸峰)可由其凸自加权均数估计。令人惊异的是,凹-凸自权重恰似每个样本点在其自身位置上的点密度,且凸自权重完美地量化了每个点在分布中对央位的集中趋势,而凹自权重则完美地量化了它们对央位的离散趋势。

In exploring concave-convex self-weighting algorithm for continuously variable attributes (i.e., continuous random variables in the traditional conceptual system), this book first examines the fundamental information contained in a sample: all sample point measurements and their minimum and maximum. These sample "facts" form the entire basis for calculating concave-convex self-weights. Beyond this, no prior assumptions about the population distribution or other exogenous information are required. Even statistics used to estimate the central location of a sampling distribution, such as the arithmetic mean or median, are not used in the calculation of the self-weights. This is precisely why the weight is called a "self-weight." Finally, the sampling distribution shape of a continuously variable attribute is depicted in a two-dimensional space by its sample measurements and concave-convex self-weights, and the central location of the distribution (referred to as the center or convex peak) can be estimated by its convex self-weighted mean. Surprisingly, the concave-convex self-weights are exactly like the point density of each sample point at its own location, and the convex self-weights perfectly quantify the central tendency of each point in the distribution to the center; while the concave self-weights perfectly quantify their dispersive tendency to the center.

二、一分为二 (Dividing One into Two)

虽然老子(约公元前571~470)在《道德经》第四十二章中有“道生一、一生二、二生三、三生万物”之说,但作为一个成语,“一分为二”被认为出自宋代邵雍(1012 ~1077)所著的《皇极经世绪言》卷七:“是故一分为二,二分为四”。

Although Lao Tzu (about 571~470 BC) stated in Chapter 42 of the Tao Te Ching, "The Tao gives birth to One, One gives birth to Two, Two gives birth to Three, Three gives birth to all things," the idiom "Dividing One into Two" is generally attributed to Shao Yong (1012~1077) in the Song Dynasty in his Huangji Jingshi Xuyan, Volume 7: "Therefore, One divides into two, and Two divides into four."

作为一个哲学概念,它指的是事物作为矛盾的统一体包含相互对立的两个方面,为此应全面看待事物的不同方面。常用于要求客观认识和分析问题的语境。

As a philosophical concept, it refers to the fact that things, as a unity of opposites, contain two mutually opposing aspects, thus requiring a comprehensive view of their different aspects. It is often used in contexts demanding objective understanding and analysis of problems.

对连续可变属性的凹-凸自权重的计算涉及到点对点的差异性和相似性,其结果则是将算术均数的算法中隐含的“每个样本点对分布央位的贡献相同”的等权重假定中的权重1分解为了凹自权重和凸自权重两个部分。于是,对于一个连续可变属性X的样本测量{xi}(i = 1, 2, …, n),我们总是会有ri + ci = 1,其中rixi的凹自权重,ci是其凸自权重。

The calculation of concave-convex self-weights for continuously variable attributes involves point-to-point differentialities and similarities. The result is that the weight 1 in the equal-weight assumption implicit in the arithmetic mean algorithm—that "each sample point contributes equally to the center of the distribution"—is decomposed into two parts: concave self-weights and convex self-weights. Therefore, for a sample measurements {xi}(i = 1, 2, …, n) of a continuously variable attribute X, we always have ri + ci = 1, where ri is the concave self-weight of xi and ci is its convex self-weight.

同样地,在基于回归加权途径重建分段回归算法时,回归权重的计算也涉及到预测值的变异和残差的变异,而不是像传统上基于数值型最优化的分段回归那样仅仅使用了残差的变异。对预测值变异的无视意味着丢弃了一大半的样本信息,这在基于回归权重估计未知临界点的加权期望时将必然导致结果的偏差。

Similarly, when reconstructing piecewise regression algorithms based on a regressive weighting approach, the calculation of regressive weights involves both the variations of predicted values ??and the variations of residuals, unlike traditional piecewise regression based on numerical optimization which only uses the variations of residuals. Ignoring the variations of predicted values ??means discarding a more than half portion of the sample information, which will inevitably lead to a biased result when estimating the weighted expectation of an unknown threshold based on the regression weights.

无论是凹-凸自加权还是回归加权,在使用样本信息时,其算法均应遵循“无信息损失和无信息冗余”这两个基本原则,这也是“一分为二”思想的体现。

Whether using concave-convex self-weighting or regression weighting, the algorithm should adhere to the two fundamental principles of "No information loss, and No information redundancy" when using sample information; this also reflects the thinking of "Dividing One into Two."

三、群众路线 (The Massiline)

这是一个近代汉语词汇,最早见于1922年中国共产党第二次代表大会通过的《组织章程决议案》:“党的一切运动都必须到广大群众里面去。” 1925年的中央扩大执委会决议案明确指出:“中国革命运动的将来命运,全看中国共产党会不会组织群众、引导群众。”

The "massiline" is a modern Chinese term, first appearing in the "Resolution on the Organizational Charter" adopted at the Second National Congress of the Communist Party of China in 1922: "ll movements of the Party must go among the broad masses." The resolution of the Enlarged Executive Committee of the Central Committee in 1925 clearly stated: "The future fate of the Chinese revolutionary movement depends entirely on whether the Communist Party of China can organize and guide the masses."

1928年6月的中国共产党第六次代表全国大会也作出了“党的总路线是争取群众”的重要论断。同年11月,中国共产党的领导人李立三在向浙江地区革命领导人传达中央精神的一次工作谈话中,首次使用了“群众路线”这一概念:“在总的争取群众路线之下,需要尽最大的努力到下层群众中去。”

The Sixth National Congress of the Communist Party of China in June 1928 also made the important assertion that "the Party's general line is to win over the masses." In November of the same year, Li Lisan, a leader of the Chinese Commuist Party, in a work talk and conveying the spirit of the Central Committee to the revolutionary leaders in Zhejiang Province, first time used the concept of the "massiline": "Under the general line of winning over the masses, we need to make the greatest effort to go to the grassroots."

1929年9月由陈毅起草、周恩来审定的《中央给红四军前委的指示信》三处提到“群众路线”,即筹款工作要“经过群众路线”,没收地主豪绅财产要“经过群众路线”,解决红军给养问题要“渐次做到由群众路线去找出路”。毛泽东在同年12月的古田会议决议中,第一次针对群众路线进行相关阐述:“党的工作要在党的讨论和决议之后,再经过群众路线去执行”。

In September 1929, the "Instructions from the Central Committee to the Front Committee of the Fourth Red Army," drafted by Chen Yi and reviewed by Zhou Enlai, mentioned the "massiline" in three places: fundraising should be carried out "through the massiline," the confiscation of landlords' and gentry's property should be carried out "through the massiline," and solving the Red Army's supply problem should be "gradually achieved by finding a way out through the massiline."

In the resolution of the Gutian Conference in December of the same year, Mao Zedong made his first elaboration on the massiline, stating that "the work of the Party must be implemented after the Party's discussion and resolution, and then through the massiline."

中国抗日战争时期,群众路线概念的内涵被毛泽东继续深化和发展。毛泽东在1943年6月为中央起草的《关于领导方法的若干问题》一文中指出:“在我党的一切实际工作中,凡属正确的领导,必须是从群众中来,到群众中去。这就是说,将群众的意见(分散的无系统的意见)集中起来(经过研究,化为集中的系统的意见),又到群众中去做宣传解释,化为群众的意见,使群众坚持下去,见之于行动,并在群众行动中考验这些意见是否正确。然后再从群众中集中起来,再到群众中坚持下去。如此无限循环,一次比一次地更正确、更生动、更丰富。这就是马克思主义的认识论。” (《毛泽东选集》第3卷第899页)

During the War of Resistance Against Japan, Mao Zedong continuously deepened and developed the concept of the massiline. In his article "Some Questions Concerning Methods of Leadership" drafted for the Central Committee in June 1943, Mao Zedong pointed out: "In all practical work of our Party, all correct leadership must come from the masses and go to the masses. This means that the opinions of the masses (scattered and unsystematic opinions) are gathered (and, after study, transformed into concentrated and systematic opinions), then disseminated and explained to the masses, becoming their own opinions, which are then upheld and put into action by the masses. The correctness of these opinions is then tested in the process of mass action. Then, these opinions are gathered again from the masses and upheld once more. This cycle repeats infinitely, becoming more correct, more vivid, and more comprehensive with each iteration. This is a Marxist epistemology."(Selected Works of Mao Zedong, Vol. 3, p. 899)

1945年5月2日,毛泽东在延安出版的《解放日报》上发表了《论联合政府》,其中进一步阐述了群众路线:“我们共产党人区别于其他任何政党的又一个显著的标志,就是和最广大的人民群众取得最密切的联系。全心全意地为人民服务,一刻也不脱离群众;一切从人民的利益出发,而不是从个人或小集团的利益出发;向人民负责和向党的领导机关负责的一致性;这些就是我们的出发点。”

On May 2, 1945, Mao Zedong published "On Coalition Government" in the Liberation Daily, published in Yan'an, in which he further elaborated on the massiline: "Another significant mark that distinguishes us Communists from any other political party is our closest ties with the broadest masses of the people. Serving the people wholeheartedly and never being separated from them for a moment; proceeding from the interests of the people in everything we do, not from the interests of individuals or small groups; the consistency between being responsible to the people and being responsible to the leading organs of the Party; these are our starting points."

在1945年4月23日至6月11日于延安召开的中国共产党第七次全国代表大会上,群众路线被确立为该党的根本政治路线和根本组织路线。

At the Seventh National Congress of the Communist Party of China, held in Yan'an from April 23 to June 11, 1945, the massiline was established as the Party's fundamental political and organizational line.

林彪1966年11月3日在天安门城楼上讲话(讲话稿经毛泽东审阅)说:“毛主席的路线,是让群众自己教育自己,自己解放自己的路线,是‘敢’字当头的路线,是敢于相信群众,敢于依靠群众,敢于放手发动群众的路线。”

In his speech on Tiananmen Square on November 3, 1966 (the speech was reviewed by Mao Zedong), Lin Biao said: "Chairman Mao's line is one that lets the masses educate themselves and liberate themselves; it is a line that puts 'daring' first; it is a line that dares to trust the masses, dares to rely on the masses, and dares to mobilize the masses."

综上所述,对群众路线的解释可被简化为:相信群众,依靠群众;从群众中来,到群众中去;将分散和无系统的群众意见集中起来,经过研究和整合,形成行动方案,并在群众的行动中对方案进行检验和修正。如此有限循环和改进。

In summary, the interpretation of the massiline can be simplified as: trusting the masses and relying on the masses; coming from the masses and going back to the masses; gathering scattered and unsystematic opinions from the masses, studying and integrating them to form action plans, and testing and revising these plans through the actions of the masses. This process involves a limited cycle of improvement.

将这个思想应用到统计学中就是,在相信样本点,依靠样本点的基础上,将样本点分散的、无系统的点滴权重计算或测量出来并集中起来,再经过算法整合得到目标可变属性的加权期望,最后在抽样实践和统计计算中检验加权期望的正确性、稳定性和可靠性。

Applying this idea to statistics means, based on trusting and relying on sample points, calculating or measuring the drops of scattered and unsystematic weights of the sample points and gathering them, and then integrating them through algorithms to obtain the weighted expectation of the target variable attributes, and finally verifying the correctness, stability, and reliability of the weighted expectation in sampling practice and statistical calculations.

连续可变属性的凹-凸自权重的计算就是遵循了上述基本思想。这个算法的基本思想是,相信一个抽样分布的中心化位置是由样本中的所有点共同决定的,每个样本点因其自身所在的位置对央位的贡献存在个体差异。这一思想是对算术均数算法中等权重假定的逆向突破。也正是这一突破,迫使作者寻找如何计算出一个样本中具有差异化的个体贡献;反之,如果没有这一突破,便不会有凹-凸自权重算法的诞生。

The calculation of concave-convex self-weights for continuously variable attributes follows the above basic ideas. The fundamental idea of ??this algorithm is that the center of a sampling distribution is determined by all points in the sample, and each sample point’s contribution to the center varies depending on its location. This idea represents a reverse breakthrough from the equal weight assumption in the arithmetic mean algorithm. It was precisely this breakthrough that compelled the author to find a way to calculate the differentiated individual contributions within a sample; conversely, without this breakthrough, the concave-convex self-weight algorithm would have never been developed.

同样地,加权分段回归中回归权重的构建也遵循了群众路线的思维,即一个被分割属性上的未知临界点是由回归空间中所有样本点在所有可变属性中的表现共同决定的,每个样本点正是以自身所在的位置对未知临界点有一份可变的贡献。我们所要做的就是将这份贡献计算或测量出来,由此即可得到被分割属性上未知临界点的加权期望估计。作者拒绝临界点由具有最大个体贡献的样本点决定的所谓“最优化”观点,因为这一观点体现的是某种“个人英雄主义”的鲁莽。不仅如此,迭代搜索临界点的过程中,所谓的“优化算子”将输出一个关于其自身的完整分布,而它的极值位于该分布的边界上。同时,作为可变属性的目标参数也在迭代过程中发生随机变异,也会形成一个完整的分布。这两个分布都有各自的期望。由于优化算子或回归权重都分别与目标参数相互关联,它们分别与目标参数构成的联合分布也一定存在着唯一的联合的央位,而这个央位正是唯一可期望的参数估计。

Similarly, the construction of regressive weights in weighted piecewise regression followed the massiline approach: an unknown threshold on a segmented attribute is determined by the combined performance of all sample points in a regression space across all variable attributes. Each sample point contributes a variable amount to the unknown threshold based on its position. Our task is to calculate or measure this contribution, thereby obtaining a weighted expectation estimate of the unknown threshold on the segmented attribute. The author rejected the so-called “optimal” view that the threshold is determined by the sample point with the largest individual contribution, because this view reflects a kind of reckless "individual heroism." Furthermore, during the iterative search for the threshold, the so-called "optimizer" outputs a complete distribution about itself, with its extrema located on the boundaries of this distribution. Simultaneously, the target parameter, as a variable attribute, also undergoes random variation during the iteration process, forming another complete distribution. Both distributions have their own expectations. Since the optimizer or regression weight is respectively correlated with the target parameter, the joint distribution with the target parameter must possess a unique joint center, and this center is precisely the only expected parameter estimate.

 

中国共产党在取得全国政权后,一直通过社会实践、教育、宣传和媒体等各种机会和渠道向全社会传播这些思想和方法。在毛泽东之后的不同时期,它们也一直被用于思考和解决中国所面临的各种问题。因此,毫无疑问,这些思想和方法对作者产生了潜移默化的影响。因此,作者愿意在此建议将本文设介绍的凹-凸自权重算法和回归权重算法等称为“群众路线法”。

After seizing national power, the Chinese Communist Party has consistently disseminated these ideas and methods to the entire society through various opportunities and channels, including social practice, education, propaganda, and the media. In different periods after Mao Zedong, they have also been used to think about and solve various problems facing China. Therefore, there is no doubt that these ideas and methods have had a subtle influence on the author. Thus, the author would like to suggest referring to the concave-convex self-weighting algorithm and regressive weighting algorithm introduced in this book as the "massiline method."

[ 打印 ]
评论
目前还没有任何评论
登录后才可评论.