![]() 作者:Philipp K. Janert 出版社: 东南大学 副标题: 基于开源工具的数据分析 出版年: 2011-5 页数: 509 定价: 82.00元 ISBN: 9787564126742 内容简介 · · · · · ·数据收集相对比较简单,而要把原始信息转化为有用的数据则需要你知道如何精确地抽取你想要的内容。通过这《基于开源工具的数据分析(影印版)》的深入讲解,那些对数据分析感兴趣的中等或者富有经验的程序员将可以学习到在商业环境中与数据打交道的技术。你将了解到如何观察数据来找出它所包含的信息,如何在概念模型里捕捉到这些想法,然后把你的理解通过商业计划、度量标准的精确报告和其他方式反馈给你所在的机构。 你将会通过每章结束部分的动手实践来慢慢体验各种概念。最重要的是,你将了解到如何思考你所希望获取的数据——而不是依赖于工具来替你思考。 . 使用图形来描述带有一个、两个或者十多个变量的数据 . 使用粗略计算以及维度和概率参数来开发概念模型 . 使用诸如模拟和聚类的集约计算方法来挖掘数据 . 通过报告、信息板和其他度量程序来让你的结论更容易理解 . 理解财务计算,包括货币... 作者简介 · · · · · ·PhilippcK.cJanert目前提供数据分析和数学模型的咨询服务,1他曾经是物理学家和软件工程师.a他是《GnuplotcincAction:UnderstandingcDatacwithcGraphs》c(Manning出版)的作者,c他为O’ReillycNetwork,cIBMcdeveloperWorks和IEEEcSoftware写过文章.a他拥有Washington大学理论物理学的博士学位 目录 · · · · · ·preface xiii1 introduction 1 data analysis 1 what’s in this book 2 what’s with theworkshops? 3 what’s with the math? 4 · · · · · ·() preface xiii 1 introduction 1 data analysis 1 what’s in this book 2 what’s with theworkshops? 3 what’s with the math? 4 what you’ll need 5 what’smissing 6 part i graphics: looking at data 2 a single variable: shape and distribution 11 dot and jitter plots 12 histograms and kernel density estimates 14 the cumulative distribution function 23 rank-order plots and lift charts 30 only when appropriate: summary statistics and box plots 33 workshop: numpy 38 further reading 45 3 two variables: establishing relationships 47 scatter plots 47 conquering noise: smoothing 48 .logarithmic plots 57 banking 61 linear regression and all that 62 showing what’s important 66 graphical analysis and presentation graphics 68 workshop: matplotlib 69 further reading 78 4 time as a variable: time-series analysis 79 examples 79 the task 83 smoothing 84 don’t overlook the obvious! 90 the correlation function 91 optional: filters and convolutions 95 workshop: scipy.signal 96 further reading 98 5 more than two variables: graphical multivariate analysis 99 false-color plots 100 a lot at a glance: multiplots 105 composition problems 110 novel plot types 116 interactive explorations 120 workshop: tools for multivariate graphics 123 further reading 125 6 intermezzo: a data analysis session 127 a data analysis session 127 workshop: gnuplot 136 further reading 138 part ii analytics: modeling data 7 guesstimation and the back of the envelope 141 principles of guesstimation 142 how good are those numbers? 151 optional: a closer look at perturbation theory and error propagation 155 workshop: the gnu scientific library (gsl) 158 further reading 161 8 models from scaling arguments 163 models 163 arguments from scale 165 mean-field approximations 175 common time-evolution scenarios 178 case study: how many servers are best? 182 why modeling? 184 workshop: sage 184 further reading 188 9 arguments from probability models 191 the binomial distribution and bernoulli trials 191 the gaussian distribution and the central limit theorem 195 power-law distributions and non-normal statistics 201 other distributions 206 optional: case study—unique visitors over time 211 workshop: power-law distributions 215 further reading 218 10 what you really need to know about classical statistics 221 genesis 221 statistics defined 223 statistics explained 226 controlled experiments versus observational studies 230 optional: bayesian statistics—the other point of view 235 workshop: r 243 further reading 249 11 intermezzo: mythbusting—bigfoot, least squares, and all that 253 how to average averages 253 the standard deviation 256 least squares 260 further reading 264 part iii computation: mining data 12 simulations 267 awarm-up question 267 monte carlo simulations 270 resampling methods 276 workshop: discrete event simulations with simpy 280 further reading 291 13 finding clusters 293 what constitutes a cluster? 293 distance and similarity measures 298 clustering methods 304 pre- and postprocessing 311 other thoughts 314 a special case:market basket analysis 316 aword ofwarning 319 workshop: pycluster and the c clustering library 320 further reading 324 14 seeing the forest for the trees: finding important attributes 327 principal component analysis 328 visual techniques 337 kohonen maps 339 workshop: pca with r 342 further reading 348 15 intermezzo: when more is different 351 a horror story 353 some suggestions 354 what about map/reduce? 356 workshop: generating permutations 357 further reading 358 part iv applications: using data 16 reporting, business intelligence, and dashboards 361 business intelligence 362 corporate metrics and dashboards 369 data quality issues 373 workshop: berkeley db and sqlite 376 further reading 381 17 financial calculations and modeling 383 the time value of money 384 uncertainty in planning and opportunity costs 391 cost concepts and depreciation 394 should you care? 398 is this all that matters? 399 workshop: the newsvendor problem 400 further reading 403 18 predictive analytics 405 introduction 405 some classification terminology 407 algorithms for classification 408 the process 419 the secret sauce 423 the nature of statistical learning 424 workshop: two do-it-yourself classifiers 426 further reading 431 19 epilogue: facts are not reality 433 a programming environments for scientific computation and data analysis 435 software tools 435 a catalog of scientific software 437 writing your own 443 further reading 444 b results from calculus 447 common functions 448 calculus 460 useful tricks 468 notation and basic math 472 where to go from here 479 further reading 481 c working with data 485 sources for data 485 cleaning and conditioning 487 sampling 489 data file formats 490 the care and feeding of your data zoo 492 skills 493 terminology 495 further reading 497 index 499 · · · · · · () |
很好的一本书,大力推荐这本书
这本就又回归朴实了
原来都是有因果关系的。
超赞