Proc phreg output option

Proc phreg output option

Posted: leshkin On: 06.06.2017

Lukas and I were trying to write a succinct comparison of the most popular packages that are typically used for data analysis. We used SAS in litigation consulting because we frequently had datasets in the GB range i.

In this relatively narrow context, it makes a lot of sense to use SAS: I recall a Cournot Equilibrium-finding simulation that we wrote using the SAS macro language, which would be quite difficult in R, I think.

I thought that most people consider SAS to have the highest learning curve, certainly higher than R. Have you considered Octave in this regard? Very nice graphing capability, Matlab syntax and library functions, open source. Good point that the standard of judging how good large dataset support is, is whether you can manipulate a big dataset the same way you manipulate a small dataset.

Justin — scipy certainly seems like it keeps improving. I remember once watching someone try to make a 3d plot. Then he booted up matlab and had one in less than a minute. Edward — I have seen it mentioned too. From glancing at its home page, it seems like a pretty small-time project. TS — yeah, i used octave just once for something simple. Matlab is typically stronger in this vein, but R has made significant progress with more recent versions. Some benchmark results can be found at:. In high energy particle physics, ROOT is the package of choice.

And the stuff that went wrong was just plain weird. This is obviously oversimplified — but that is the point of a succinct comparison. Python actually really shines above the others for handling large datasets using memmap files or a distributed computing approach. I think you are basically operating at a disadvantage if you are using the other packages at this point.

The only other reason I can see to use them is if you have no choice, for example if you inherited a ton of legacy code within your organization. But one thing it does well is very consistent syntax. When I came back to it, I was able to pick it up right where I left off. Mathematica may not do everything, but what it does do, it does elegantly. Michael Driscoll — good point! Mike — ah yes, i remember looking at ROOT a long time ago and thinking it was impressive.

Pete — yeah I whine about python. John — very interesting. I think many R users have had the experience of quickly forgetting how to do basic things. From David Knowles, who did the comparison Mike Driscoll linked to http: On top of being expensive Matlab is a nightmare if you want to run a program on lots of nodes because you need a license for every node! You should check out: Running full-fledged desktop MATLAB on a huge number of nodes is messy and of course very expensive not to mention that a single user would take away several licenses for which other users will have to wait.

Another guy from Mathworks, their head of Matlab product management Scott Hirsch, contacted me about the language issue and was very kind and clarifi-cative. The most interesting bits below. On Tue, Feb 24, at 7: On Tue, Feb 24, at This is very helpful. Here are some quick notes on where we are today. I really do appreciate getting your perspective. Gaurav — it sure would be nice if i could see how much this parallel toolbox costs without having to register for a login!

Anyway, competition is good and discussions like this keep everyone thinking about how to improve their platforms. Also, keep in mind that performance is often a sticking point for people when it need not be. Sturla has some strong points here: I know a lot of people using them, and they allow fast, easy statistical analysis with beautiful graphs out of the box.

I think this is a very incomplete comparison. If you want to make a real comparison, it should be more complete than this wiki article. And to give a bit of personal feedback: I know 2 people using STATA social science2 people using Excel philosophy and economicsseveral using LabView engineerssome using R statistical science, astronomyseveral using S-Lang astronomyseveral using Python astronomy and by using Python, I mean that they are using the packages they need, which might be numpy, scipy, matplotlib, mayavi2, pymc, kapteyn, pyfits, pytables and many more.

And this is the main advantage of using a real language for data analysis: I also know several people who use IDL and ROOT astronomy and physics. I have used IDL, ROOT, PDL, Excel if you really want to count that in and Python and I like Python best: I think that you really have to distinguish between data analysis and data visualization. In astronomy this is often handled by completely different software. In your example the people used scipy which does not offer a single visualization routineso you can not blame scipy for difficulties with 3D plots….

I can only say that we are improving — projects like pythonxy or enthought are really helpful too for people who want something more integrated. There is a fundamental problem for all those integrated solutions: Not having basic data structures hashmap, tree, etc… another one.

Making advanced UI in matlab, not easy either.

PROC PHREG: OUTPUT Statement :: SAS/STAT(R) User's Guide, Second Edition

You can build your own solution with the python stack: The C API is complete, and you can do things which are simply not possible with matlab. You want to handle very big datasets? Things like cython are also very powerful for people who need speed.

I believe those are partially consequences of not being integrated. Concerning the flaws you mentioned scipy. There are some efforts to remove redundancy, but not all of them will disappear. They are confusing for a newcomer they were for mebut they are pretty minor IMHO, compared to other problems.

You forgot support and continuity. In my experience, SAS offers very good support and continuity. Others claim SPSS does, too I have no experience there. For one-off disposable projects, R has lots of advantages. For commercial systems, not so many. For anyone in this situation I unequivocally recommend:.

Because given my situation there often are no canned routines. That means soon or later usually sooner I will be programming. It is object oriented, has very forgiving run-time behavior, fast turn around no edit, compile, debug cycles — just edit and run cyclesgreat built in structures, good modularity, and very good libraries.

I agree that Python is not wart-free. The version compatibility can sometimes be frustrating. It will never be as fast as MATLAB for certain things JIT compiling, etc. Python plotting is certainly not up to Mathematica standards although it is good. However, the Python community is very nice and very responsive.

And for all my desire not to spend time coding, I must admit I find Python programming fun to do. SciPy relaxes this requirement, and thus can leverage a lot of existing Fortran code.

Can any of these packages compute sparse SVDs like folks have used for Netflix K x 25K matrix with M partial entries? Or do regressions with millions of items and hundreds of thousands of coefficients? I typically wind up writing my own code to do this kind of thing in LingPipe, as do lots of other folks e.

BUGS is even worse than R in terms of scaling, but I can write my own custom samplers that fly in some cases and easily scale. R itself tends to just wrap the real computing in layers of scripts to massage data and do error checking.

proc phreg output option

The real code is often Fortran, but more typically C. That must be the same for SciPy given how relatively inefficient Python is at numerical computing. A problem I see with the way R and BUGS work is that they typically try to compile a declarative model e. The other thing to consider with these things from a commercial perspective is licensing.

Sci-Py has a mixed bag of licenses that is also not redistribution friendly. Even there, the big ones like lmer are quite stable. Accelerated linear algebra routines written by people who know the processors inside and out will result in big wins, obviously. You can also license the IKML separately and use it to compile NumPy if I recall correctly, David Cournapeau who commented above was largely responsible for this capability, so bravo!

During the last Google Summer of Code, the Cython team implemented a friendly interface between Cython and NumPy. This means that you can code your algorithms in Python, but still have the speed benefits of C. A number of posts above refer to plotting in 3D. We are always glad for suggestions on how to improve SciPy, so if you do try it out, please join the mailing list and tell us more about your experience.

You should probably add GenStat to your list, this is a UK package specialising in the biosciences. Nelder was the actual originator of GenStat. GenStat is also free for teaching world-wide and free for research to the developing world. I hope this helps. I do production work in SAS and mess around test new stuff, experimental analyses in R.

We have been using it for years and it is absolutely critical to our business model. SAS people find R obtuse with poor documentation, and the R people say the same about SAS myself included. SAS seems a bit better with large data sets. I think rpy2 is a great addition to Python and works quite well. For some graphs I actually prefer matplotlib to R. It is developing a programming language that seems useful. One real disadvantage in my book is its ability to hold only one dataset at a time, as well as a limit on the data size.

Syntactically I find R more intuitive for modeling though that is the lineage I grew up with. The other major disadvantage of matlab is distribution of programs, since Matlab is expensive.

The same complaint for SAS, as well: R, SAS, SPSS, etc. In large-scale production, SAS is second to none. R is great for simulation, exploratory data analysis, and graphics. For example, I was able to quickly write, debug, and validate an unconventional ROC analysis based on a paper I read.

As another example, bootstrapping analyses are much easier in R than SAS. If I could afford to shell out the money for a non-educational license, I would just to have it around for the tasks it does really well, like symbolic manipulation.

SPSS has for several years been offering smooth integration with both Python and R. There are extensive apis foe both.

Check out the possibilities at http: See also my blog at insideout. You can even easily build SPSS Statistics dialog boxes and syntax for R and Python programs. DevCentral has a collection of tools to facilitate this. To me, the only reason for using sas is because of its large data ability. It, from day one, trains it users to be a third rate programmer. The learning curve for SAS is actually very steep, particularily for a very logical person. SAS company has existed for at least 30 years.

How can this basic programming rule should be implemented after 30 years?! Also, if a variable is uninitialized, SAS will still let the code run. One time, I worked in a company, this simple stupid SAS design flaw causes our project 3 weeks of delay there is one uninitialized varaible among 80k lines of log, all blue. A couple of PhDs in the project who used C and Matlab did not believe why SAS makes such a stupid mistake. Yes, with a big disbelief, it made!

My ranking is that Matlab and R are about the same, Matlab is better in plots most times. R is better is manipulation datasets. After taking into account of cost, then the answer is more obvious. SAS was not designed by a language maven, like Pascal.

SAS clearly has kruft that shows its growth over time. Sort of like the UNIX tools, S, and R, actually. And, really, what competent programmer would ever use a variable without initializing or testing it first? For reference PDL Perl Data Language can be found at pdl.

Have you seen Resolver One? This means that numpy mostly works, and you can have one cell in the grid hold a complete dataset, then manipulate that dataset in bulk using spreadsheet-like formulae. Someone has also just built an extension that allows you to connect it to Rtoo. In theory, this means that you can get the best of all three — spreadsheet, numpy, and R — in your model, using the right tool for each job.

Being the resident MATLAB enthusiast in a house built on another tool, I will pitch in my two cents, by suggesting another spectrum along which these tools lie: A tool like MATLAB, which provides some but not complete direct statistical support, is somewhere in the middle. The trade-off here, naturally, is the ability to customize analysis vs. Most of the users on this post are biased towards packages like R, rather than packages like SAS, and I want to offer my perspective of the relative advantages and disadvantages of SAS relative to R.

I am primarily a SAS user over 20 years who has been using R as needed a few years to do things that SAS cannot do like MARS splinesor cannot do as well like exploratory data analysis and graphicsor requires expensive SAS products like Enterprise Miner to do like decision trees, neural networks, etc.

I have worked primarily for financial service credit cards companies. I have read huge compressed files of online data Double Clickhaving over 2 billion records, using SAS, to filter the data and keep only the records I needed. Each of the resulting SAS datasets were anywhere from 35 GB to 60 GB in size. As far as I know, no other statistical tool can process such large volumes of data programatically.

First we had to be able to read in the data and understand it.

Sampling the data for modeling purposes came later. I would run the SAS program overnight, and it would generally take anywhere from 6 to 12 hours to complete, depending on the load on the server. In theory, any statistical software that works with records one at a time should be able to process such large volumes of data, and maybe the Python based tools can do this. I do not know as I have never used them.

But I do know that R, and even tools like WEKA cannot process such volumes of data. Reading the data from a database, using R, can mitigate the large data problems encountered in R as does using packages like biglmbut SAS is the clear leader in handling large volumes of data. But languages like R do not need anything like SAS macros and can achieve the same results all in one, logically consistent, programming language, and do more, like enabling R users to write their own functions.

The equivalent to writing functions in R, in SAS, is to now program a new proc in C and know how to integrate it with SAS. An extremely steep learning curve. SAS is more of a suite of products, many of them with inconsistent programming constructs base SAS is totally different from SCL — formerly Screen Control language but now SAS Component Languageand proc SQL and proc IML are different from data step programming.

For the business world this is generally adequate. I forsee an increased use of R relative to SAS over time, as many statistical departments at Universities have started teaching R sometimes replacing SAS with R and students graduating from these universities will be more conversant with R, or equally conversant with both SAS and R.

Many of these students entering the workforce will gravitate towards R, and to the extent the companies they work for do not mandate which statistical software to use, the use of R is bound to increase over time. But the amount of data is also starting to grow, thanks to the internet, scanners used in grocery chainsetc. But not every organization works with such large datasets.

For someone who has started their careers using SAS, SAS is more than adequate to solve all problems faced in the business world, and there may seem to be no real reason, or even justification to learn packages like R or other statistical tools. To learn R, I have put in much personal time and effort, and I do like R and have been and forsee using it more frequently over time for exploratory data analysis, and in areas where I want to implement cutting edge methodologies, and where I am not hampered by large data issues.

For those who do not currently use R, it would be wise to start doing so, as R is going to be more widely used over time. The number of R users has already reached critical mass, and since R is free, this is bound to increase the usage of R as the R community grows. Furthermore, the R Help Digest, and the incredibly talented R users that support it, is an invaluable aid to anyone interested in learning R.

Your commentary in the blog brenacon. I developed an algrithm using Excel. Some values are 16 t0 20 decimal places, Excel calculates using An equation generating the correct value in cell E9, will give an incorect value in any cell entered thereafter. My programming experience is limited, Fortran WATIV,Pascal. I considered hiring a programmer from a local college or university. Your suggestion would be appreciated. I am a junior SAS user with only 3 year experience.

I am a big fan of Stata over SAS for medium and small businesses. I use Stata a LOT for economics, it has all the most-used predictive methods OLS, MLE, GLS, 2SLS, binary choice, etc models built it. I think the model would have to be pretty essoteric not to be found in Stata.

I ran Stata on linux server with 16GB ram and about 2TB of disk storage. I would not recommend using virtual memory for Stata.

That said, you can stick a lot of data in 16GB ram! If I pay attention to the variable sizes keep textual ones outI got s of millons of rows into memory. The GUI is probably the best feauture. You do the math. At that time I felt Stata was the superior product. Finally a direct Stata vs SAS comparison! I remember working with ops guys who would consider a server as good as dead once it started using swap. All programs that effectively use hard disks always have custom code to control when to move data on and off the disk.

Disk seeks and reads are just too slow and cumbersome compared to RAM to have the OS try to automatically handle it. This would be my guess why SAS handles on-disk data so well — they put a lot of engineering work into supporting that feature. Same for SQL databases, data warehouses, and inverted text indexes.

proc phreg output option

Or the widespread popuarity of Memcached among web engineers. R, Matlab, Stata and the rest were originally written for memory data and still work pretty much only in that setting. Tape, of course, is even further along the size-vs-latency spectrum than RAM or hard disk. Found linked from a comment on John D. I feel so old. Been using SAS for many years. But what the hell is this R??????

Great comparison of SPSS, SAS, and Stata by Acock a summary of his findings here — http: Also, charge an annual license. Can read Excel files directly. Used to have nicer graphs and charts than Stata but, see below. I dislike how SPSS handles missing values. Unlike SPSS, Stata sells a one-time license: This may be an unfair generalization, but my personal observation is that SPSS users within the social sciences, at least tend to have less quantitative training than Stata users.

Probably highly correlated with the GUI vs. CLI orientations of the two packages although each of them allows for both. I usually tell my Intro to Research Methods for the social sciencesthat….

I got burned by this once, back in my first year of grad school, T. I began programming in SAS every day at a financial exchange in It also translates most any text value into data, and supports custom input and output formats. It is meant to simplify the processing of large amounts of data for non-programmers. SAS used to have incredible documentation and support, at incredibly reasonable prices.

Unforturnately, the new generation of programmers and product managers have lost their way, and I agree that SAS has been becoming a beast. Unfortunately, EG is written in. I would have preferred it being written in Java so that the interface was more portable and supported a better threading model. One of the better features of SAS is that it is icici bank forex rate calculator an proc phreg output option programming language, but from the start in ?

Basically, a block of code is read, compiled, and then executed. This is why it is so efficient at forex trading courses in ireland huge amounts of data. Some place I read a complaint about SAS holding only one observation in memory at a time. SAS holds one or more blocks of observations records in memory at a time.

The number held is easily configurable. Each observation can be randomly accessed, whether in memory or not. Originally SAS did not support custom functions, SAS wanted to write them for you. The most unfortunate thing about SAS currently is that it has such a long legacy on uniprocessor machines, that it is having difficulty getting going in the SMP world, being able to properly take advantage of multi-threading and multi-processing. I believe this is due to lack of proper technical vision and leadership.

On a side note. Anyway, Macros have a place in programming. There have been a few times I wished Java supported macros and not just assertions, out of my own laziness. I am a firm believer in the right tool for the job, and that not everything is a nail, so I need more than just a hammer.

But, I still also program in Java, C, ksh, VBScript, Perl, etc. One thing you have to consider, is that using SciPy, you get all of the python libraries for free. That includes the Apache Hadoop code, if you choose to use that. And as someone above pointed out, there is now parallel processing built right in in the most recent distributions but I have no personal knowledge of that for MPI or whatever.

If you can get other engineers using your analysis, without sending them a excel spreadsheet, or a. On high quality graphics in R, one should certainly check out the Cara withdraw akun instaforex. Many graphics can be output in hip formats like SVG. Project Gemini sneak preview I doubt this would make Excel the platform of choice for doing anything fancy with large datasets anyways, but I am intrigued.

SAS has understanding commodity futures trading support for large files even on a modest machine. A few years ago I did a bunch of sims on my dissertation using it and it worked happily away without so much batting an eyelash on a crappy four year old Windoze XP machine with 1.

Also, programs like NLP nonlinear optimizationNLMIXED, MIXED, and GLIMMIX are really great for various mixed model applications—this is quite broad as many common models can be cast in the mixed model framework. NLMIXED in particular lets you westfield geelong new years day opening hours some pretty interesting models that would otherwise require special coding.

Stata 11 looks to be even better, as it incorporates procedures such as Multiple Imputation easily. The sheer amount of time spent doing MI followed by logistic regression or whatever is irritating. Stata speeds that up. Also when you own Stata you own it all and the upgrade pricing is quite reasonable.

Tech support is also solid. SPSS has a few gems in its otherwise incomprehensible mass of utter bilge. I love the graphics. Matlab has made real strides as a programming language and has superb numerics in it or didat least according to the numerics people I know including my numerical analysis professor.

However, Statistics Toolbox is iffy in terms of what procedures it supports, though it might have been updated. Graphics are also forex in pakistan lahore. But it is expensive.

Mathematica is nice for symbolic calculation. Matlab is good for linear algebra and related multivariate stats. I could never get any nice plotting out of it. R is powerful, but can be very awkward.

It can best tips provider for nifty options jpeg, png, and pdf files, make 3D plots and nice 2D plots as well. Two things put me off it: Mathematica is also able to link to R fastest way to earn money in mousehunt a third party add-on distributed by ScienceOps.

A post on R vs. Also, a discussion looking for solutions that are both fast to prototype and fast to execute: I do not understand why SAS is so much hailed here because it handles large datasets. Matlab can talk to mysql server and thus I do not see why SAS is needed in this case. I am working with hundreds of experiments, and while these are all extremely small, the issue is that Matlab does not ship with the database toolkit.

It makes it basically impossible to work with it unless you get the toolbox. This is true as well for the Curve Fitting Toolbox. I have used Stata and R but for my purposes I actually prefer and use Mathematica. Unsurprisingly nobody has discussed its use so I guess I will. I work in ecology and I use Mathematica almost exclusively for modeling.

For most people I would say using Mathematica for statistics is way more work than it is worth. Nevertheless, those who already use it for other things may find it is more than capable of performing almost any data analysis you can come up with using relatively little code. The addition of functionality targeted at statistics in versions 6 and how to transfer money into your own paypal account has made this use simpler, although the built in ANOVA package is still awkward and poorly documented.

I am using MATLAB along with SPSS. Does anyone know about how to connect SPSS with MATLAB? Or can we use any form of programming e. You need software that solves the following equation. It turns out IMF achieves that by letting every economist work with whatever they want. As a matter of fact, economists end up using Stata. Consider that most economics datasets are smaller than 1Gb. Stata MultiProcessor will work comfortably with up to 4Gb on the available machines.

But most importantly importing, massaging and cleaning data with Stata is so unbelievably efficient that every time I have to use another program I feel like I am walking knee-deep in mud. IMF has one copy of SAS that we use for big jobs, such as when I had Gb of data. One big difference between SAS and other programs is that SAS will try to keep working when something goes wrong. Occasionally I use Matlab or Gauss yes, Gauss!

No that was not all, I forgot one thing. Stata can map data using a free user-written add-in spmapso you can save yourself the time of learning some brainy GIS package. Does anyone know whether R, SAS, SPSS or other programs can do it?

As I can see, the syntax and grammar of R are really stupid. Furthermore, they are trying to claim they are very professional and very good at financial data analysis. People in my work place, an economic research trust, love STATA. Economists love STATA and they ask new comers to use STATA as well. R is discouraged in my work place for excuses like it is for statisticians.

I keep using it and keep discovering new ways of using it. It is just awesome that using R makes me feel bounded by nothing. BTW, anyone knows how to use R to send emails on various OS, Win, Mac, Unix, Linux.

I googled a bit and not very promising. Any plans to develop a package? What a beautiful world!! I use Matlab and STATA as well but R completely owns me. Being a bad boy naturally, I start to encourage stock market closing christmas comers to use R in my work place. I think it is Great to find open source statistical software. I guess it requires a special ment framework to actually enjoy struggling through the command structure, but if I were 25 years younger……… It really is a bugger to find that SPSS or whatever they like to be called and R come up with different parameter estimates on the same dataset [at least in the negative binomial model I compared].

Is there anyone out there with experience in comparing two or more of these packages on one and the same dataset? If you ask questions properly there, you will get answers. I would suggest a place to start: I would like to know R-PLUS. Does intel corporation stock market know where can I get the free training for R-PLUS.

And the Matlab culture seems to favor Perleqsue obfuscation of code as a value. SAS creeps me out: There are over 2, packages and it takes a bit of analysis just to decide which of the four Wavelet packages you want to use for your project — not just current features, but how well maintained the package appears to be, etc. There are really three questions to answer here: Bookmarks for February 12th from Statistical functions in Excel — The Endeavour.

SAS seems to excel at data handling, both with large datasets and with wacked proprietary formats how else can you read a how to make money with certificates of deposit text file and merge it with an access database from R is awesome because it is a fully featured language things like named parameters, object orientation, typing etc, and because every new data analysis algorithm probably gets implemented in it first these days.

I rather like the graphics. However, it is a mess, with bad naming conventions that have evolved badly over time, conflicting types, etc. Matlab stock market adviser awesome in its niche, which is NOT data analysis, but rather math modeling with scripts between 10 and lines. This is a great compilation, thank you. After working as an econometrics analyst for a while mainly using stata, I can tell the following about STATA: The big problem is: All together programming STATA feels like persuading STATA: It is very inconsitent sometimes very clumsy to handle and has silly limitations like string expressions limited to chars like learn to counterfeit money easy the early 20th century.

So go with stata for a little ad hoc statistics but do not use it for more sophisticated stuff, in that case learn R! People who use commercial portfolio optimizers are always surprised at how fast the Mathamatica optimizations run — faster then their own optimizers. Based on my experience, I can say that Mathematica is great for numerical and ordinary computational tasks. However, if you want to use a procedural programming approach, Mathematica supports that.

Regarding some of the other topics discussed above: It seems to be in an exponential phase of development currently — Stephen Wolfram outlines the development plans every year and the annual user conferenced — and his expectations seem to be pretty much on target.

I admit to a major case of hero worship, but I think he is achiving this goal. I am amazed that nobody has mentioned JMP. It is essentially equivalent to SPSS or STATA stock market return since 1929 capabilities but far easier to use certainly to teach or learn.

In the comparison I did not see Freemat. This is a open source tool that follows along the lines of MATLAB. It would interesting ricochet riches moneymakergroup see how poland stock exchange holidays community compares Freemat to Matlab.

The two drawbacks I see are: This means your scripts are not perfectly portable — in either neuronales system trading erfahrungen. It even lacked things like support for a code versioning system until recently.

The cost is also a major detractor. At the time decent Lisp implementations required much more resources than our target machines had. We therefore wrote a small scheme-like interpreter and implemented over that.

R is now being applied to much bigger problems than we ever anticipated and efficiency is a real issue. The reason for this is that while Lisp is great for programming it is not good for carrying out interactive data analysis.

That requires a mindset better expressed by standard math notation. We do plan to stocks and shares isa emerging markets the syntax thin enough that it is possible to still work at the Lisp level. I believe that the use of Lisp syntax was partially responsible for why XLispStat failed to gain a large user community.

The payoff we hope will be much greater flexibility and a big boost in performance we are working with SBCL so we gain from compilation.

For some simple calculations we are seeing orders of magnitude increases in performance over R, and quite big gains over Python…. I need software that will generate multivariate and logistic regression, and Stock market aged care survival curves.

Visualization is very important. That might be the best option to start with. I am working on my undergraduate degree in statistics in the SAS direction, which has surprised people in the field I meet. The choice was somewhat arbitrary; I just wanted something applied to complement a pure mathematics degree.

This post has opened many …. Thanks for the great discussion. After you peel back all the layers and look at the solution that would require the least effort, the most power, with the greatest flexibility, why anyone would choose anything other than RPy first, and then the language du joire that your employer would be using second as a backup and scrap the code war?

I mean I plan on learning Java, PHP and SAS as well, but that is really a personal choice.

So for me, it is a necessity. But the flip side is, wanting to make the leap into Financial Engineering after completing a doctorate in Engineering, RPy has also become a near Realistically, unless you just like coding, I have to say that what I have suggested makes the most sense for the average analysis pro.

And never as as fast as a professor and a grad student writing an academic implementation of brand-new methodology. Both R and SAS are here to stay, and finding ways to make them work better with each other is in the best interests of our customers.

You can also exchange data between SAS and R as data sets or matrices. In the future, users will be able to interface with R through the IML procedure. I am also amazed how few people here have said anything about StatSoft Statistica. No package other than sas can load the unstructured data like blogs posted here, analyze and extract the sentiments positive, negative, neutral about each of the packages debated here in pretty decent precision after sas bought teragram a few years ago.

Interesting Comparison of data analysis packages - CCPR Computing. A post above commented: In other words it takes a mountain of paperwork to merely add one field to a table.

It just kills analysis. Generally DBAs are loath to install the latter as it is difficult to optimize in the sense of minimizing the drain on operational systems. On IBM mainframes the choice of languages to use is limited and by default this will usually be SAS.

Most large organisations have SAS, at least Base SAS, installed by default because the Merrill MXG capacity planning software uses it. Hence cost is sort of irrelevant. It then tends to be used for anything requiring processing of text files even in production lse trading days and this often means processing text as text, e.

JCL with date dependent parameters, rather than as preparing data for loading into SAS datasets for statistical analysis. I know nothing about R but seeing a few code sample it struck me how it resembled APL to which we were introduced in our stats course in college in the early 70s, not surprising as both are matrix oriented.

Definitely not a programmer, and currently working in health research, where the choice locally is largely SAS or Stata, and I think this choice is generally driven by what package is used by the people you work with. Then I use SPSS and EpiInfo a little for teaching or giving to students who are doing one-off projects and so e. I highly recommend anyone interest give it a try 30 day demo. Advantages — easy to learn, powerful built-in functions, publication quality graphing much easier than MATLABstats functions included, unbelievable developer and community support; Disadvantages — not as powerful or fast with matrices; Open source — no; Typical Users — scientists, engineers.

We should include some structured programing i.

LISP and Heskell, it will be interesting to hear others opinion on these languages. Earn money clicking ads india of data analysis packages: R, Matlab, SciPy, Excel, SAS, SPSS, Stata Honglang Wang's Blog. I have used all of these programmes — and quite a few more — over the last 30 odd years.

What one uses tends to reflect personal history, intellectual communities, cost, etc but there are various points not highlighted in the discussion.

They have evolved into tools for analysing corporate databases, but they are wmz to make binary options expensive for ordinary users and dealing with academic licences is a pain. Increasingly the corporate focus means that they lag behind the state of the art in statistical methods, but there is no other choice when dealing with massive datasets — oh the days when such data sets had to be read from magnetic tapes!

In my view, its scripting language is not as bad as suggested by other comments and there is lots of support for, say, writing your own maximum likelihood routine or Monte Carlo analysis. R has a horrible learning curve but a very active research community, so it is useful for implementations of new statistical techniques not available in pre-packaged form. For many casual users what matters is the existence of a front-end — Rcmdr, GaussX, etc — that takes away the complexity of the underlying program.

Excel should never be used for any kind of serious statistical analysis. Actually, Statistica, JMP, and similar menu-driven programs fall into the same category: In such cases their primary use will be the extraction and manipulation of data that is voluminous and frequently updated, but not for data analysis.

For anyone thinking what to use the key questions to consider are: Are you primarily concerned with data management or data analysis? If data management, then steer clear of matrix-oriented languages which assume that your datasets are small ish and reasonably well organised. On the other hand, R or Matlab are essential if you want to analyse financial options using data extracted from Bloomberg.

Are your statistical needs routine — or, at least, standard within a research community? If so, go for a standard package with a convenient interface and easy learning curve or the one most commonly used in your community. The vast majority of users can rely upon whatever is the standard package within their discipline or work investing in stock market for beginners philippines — from econometrics to epidemiology — and they will get much better support if they stick with the standard choice.

How large an initial commitment of time and money do you expect to make? Are you a servizio delle iene forex or a professional researcher? Partly this is a matter of cost and partly a matter of the reproducibility of research results. Open source and other low cost programs are great for students, but if you are producing research for publication or repeated replication it is essential to have a chain of evidence.

R programs can be checked and reproduced for standard datasets, but even here there is a problem with documenting the ways in which more complex datasets have been manipulated. Personally, I use Stata much of the time but ultimately the choice of package is less important than good practice in managing and analysing data. That is the one thing about the older packages — they force you to document how your data was constructed and analysed which is as or more important than the statistical techniques that are used unless you are purely interested in statistical methods.

The comments here have interesting comparisons between Stata and R, are binary options legal in the uk secrets for economic analysis.

I am only using SAS for four years. The reason is simply they needed a SAS user instead of an R user when I was here, a research institute in a hospital. In fact, I had some experiences on R in school even better than SAS at that time. At the beginning, I thought SAS was very weird but now I am thinking R is weird. I know it is because I have learned SAS but not R.

Forgive my ignorance, comparing to SAS, How did the tsunami in japan affect the stock market is not well-organized, from package to documentation, except you know it very well. Since R is free, R may not be able to get a chance of playing a bigger market role before it is completely outdated.

I want to say, SAS is fast. Of course, all final results were oil futures current price in a nice formatted word file without any manual work. The logic of SAS is elegant as well. I believe, most of ambiguity in SAS syntax was resulted from our unfamiliarity and ignorance. As professional software, it is supposed to be working in a very reliable way. Being an applied statistician, I appreciate some very delicate and well-written procedures in SAS, including proc logistic, genmod, phreg, nlimixed, optmodel, and mcmc.

I like their coding inside and well-documented. I realized that all the words could be exactly uttered from an experienced R user, SPSS user, Stata….

Software tools for data analysis — an overview R User Groups. Quite many of commentators have the background on academic environment as do I where the requirements are different than on business side. On research most of the things done are new and are done once whereas on commerce the matter is usually the automation of certain repeated tasks.

But anyway, the thing I wanted to bring into the discussion is that for light weight analytics some database systems like PostgreSql seem to provide built in tools well comparable to Excel.

And at least certain commercial extensions of PG have a bit more advanced stuff such as linear regression and stuff built in. Most likely Oracle and some others have all this and even more but I am not familiar with them. How are you doing with it?

SAS/STAT(R) User's Guide, Second Edition

R has more than enough power, particularly in the single user setting you are in. At the last ISBE behavioural ecology conference in Perth, the statistical symposium that followed the conference focused entirely on R, including to my memory some nice new routines for MCMC. If you asked me, I would say that R is alive and kicking in science. Well, at least in my corner of it. I am teaching a high school course in introductory statistics. I also want the students to use a relevant tool — meaning something that can be used into college and maybe beyond.

The three choices I am currently thinking about are: I am speechless that you are even putting R and Mathematica there for high school students.

It is the tool your students will be most likely to use in their future career. In this world, not everyone will become statistician…. Mathematica is a lot more than statistics, and can be very affordable for educational applications.

You have to teach it as though everybody is going to become a statistician. The goal of a good education is to show people the heights and inspire them to continue on their own, not to teach them what average bureaucrats do with their boring jobs. In that case NOBODY in that class will become a musician. Mathematica has huge numbers of teaching modules, as well as effective ways notebooks for the students to communicate their results.

Students can even create workbooks to teach a concept to their classmates by writing neat mathematica widgetry that does not require programming knowledge. Second R — being a real programming language, and it can help them develop logical thinking skills in addition to their stats skills.

Again strange for me no one mention SAS On Demand for Academics: SAS offers completly free access to its flagship products: Enterprise Guide, Enterprise Miner, Forecast Server for acadamics, both professors and students. It can be used not only for teaching but also for reserach purposes free of charge.

I have been using SAS for more time than I care to admit, but still only one digit when you write it in Hexadecimal. SAS, in many ways, is not a real programming language. This frustrated me to no end when I started with it, and that was before the era of ODS Statistical Graphics that make producing graphics as easy as producing any other kind of tabular data. After all, if you are a in a data analysis and reporting position, and you have a real programming language, you also need incredible discipline to keep everybody rowing in the same direction.

SAS is designed to keep you from programming, by using pre-written procedures with lots-and-lots-o-options, as was pointed out earlier. Yet for a data analysis system, it has several extremely coherent way of carrying results from one step of the analysis to the next, but it is restrictive enough that people can generally pick up SAS quickly. Something is not outdated just because you say it is.

What features, or lack thereof, make it outdated. Remember again, that SAS was never intended to be a full programming language in the sense that most professional programmers think of the term.

Therefore, it should not be compared with things like Python, or even to your greatest programming language ever invented. But it is misleading just to throw out a figure without stating what it is based upon. SAS and JMP prices are generaly available at http: Prices for SAS and JMP are NOT available on the websites.

At this time, R and Python used together gives the most power and possibilities. We need both at this time. Excel with VBA macros is necessity at lower stratum. I am a SAS user, so my comments only relate to SAS.

I think there is a lot of misinformation about SAS. For example, SAS has many tools that enable you to never have to use their programming language JMP, Enterprise Guide, Enterprise Miner, etc. Also, SAS has really focused alot of their efforts on industry solutions. It is more of a point and click environment for the different type of analytics needed. Now, I might be biased because we are a SAS shop, but SAS is definitely not as archaic and unusable as some people make it sound.

I only have extensive experience with with MATLAB, but it is very nice for the analyses that I typically do. Most of my projects last a few months at a time and subsequent projects are usually too different to make much code reuse practical.

The MATLAB error messages and help files are excellent. As for the cost, a week or two shaved off of analysis SW development due to the good documentation and strong user community pretty much pays for the license. Here are they key features that I use: One day, I hope to become a real programmer with a pony tail and a deep-seated disdain for a handful of shameful programming practices, but until then, MATLAB will keep helping me get work done quickly. I use STATA for most of my statistical estimations, but now I need to do something that is not implemented yet in STATA.

So I plan to do it in MATLAB. One of the steps involve estimating a VAR p model. I can do it in STATA and get results. Results are similar, but statistically different! That is the coefficients given by STATA are outside the confidence intervals provided by MATLAB.

One place where I as a non-statistician working in biology ecology got results fairly quickly and in a way I could actually understand was using the Resampling Stats add-in for Excel written in VBA, I think.

Is there anything similar for Gnumeric or Libre Office Calc? Just wanted to say that at least for these kind of non-parametric tests and teaching Excel might have a role to play.

I have used R, Octave and SAS. While SAS has a lot of weaknesses like a patchwork of language syntaxes it can easily handle large data sets. Here is an example of where SAS is king: This was all messy data ie. I was able to quickly switch between SAS data steps and PROC SQL to get things working. SAS has the flexibility to force things together that I think is hard to find elsewhere. Finally clicked run and went home for the weekend. If there is a problem with the code, SAS will email me.

Do that in R. So SAS is powerful, however it is prohibitively expensive; even my company government contractor cannot afford it, we use the clients copies instead. The programming language sucks, I hate the development environment, and the list can go on.

I think the main advantage SAS has is its ability to handle big datasets, its hundreds of built in functions and its SQL pass through language for manipulating said large data and analyzing it in one place. Other more open source solutions are eroding these advantages. Flexible, Handles Lots of data in almost any format on almost any machine, Well-built Functions, Macros that are easy to set up and deploy, PROC SQL!

Confused-on-Multi-thread processing, expensive, developer environment sucks. Are we talking about doing statistics or are we talking about data analysis? I think some people need to let go of their elitist attitude and be a little more open-minded.

But times have changed recently. I am in a more dynamic environment. I have learned that I need a more flexible environment due to the following scenarios: What do I do then? Call my IT department? Unfortunately, scenarios like these call for a more lower level programming language. But who wants to go that route?! So I thought… But I discovered Python and I am glad that I did.

It is a relatively simple language to learn. Python is truly a hidden gem. With Python coupled with Pandas and Matplotlib, I have the best of all worlds. I can choose the right tool for the right job, and yes, that includes using Excel too if I have to. If you have an inclination to learning programming, I would definitely give Python a try and check out Pandas and Matplotlib.

Otherwise, I would stick with domain specific languages like MATLAB, SAS, SPSS, Stata, etc. I am wondering if the Statistics Toolbox is going to be sufficient for the things I need to do with it, or whether I am going to have to take the plunge with R.

SAS is for rich firms; R is for ALL firms if smartbecause it is free. SAS is outdated; R is updated. SAS grows slowly, with annual updates that also not free ; R grows exponentially with several newly tested and validated packages all FREE.

But two smart things SAS did in their product strategy: If a Firm is smart, then they use the right tool for the job. Smart firms NEVER base decisions solely on if something is free or not.

They use multiple criteria in decision making. SAS is not outdated. Usually people who make these statements have NO clue what SAS is, have never used it, or are biased. I do this because I use the right code for the job.

If you have the SAS server to deal with, yes it will be much more complex to install. But standalone SAS Base and EG are a breeze to install. SAS BASE and R are comparable and need one to know SAS or R language and should have some statistics background. You can run SAS as a standalone desktop version. R has only recently been able to handle the amounts of large data that SAS has been able to handle for years.

By the way, the support you get from SAS the company is hands down the best support you will ever get from any company. With R, you rely on yourself or the community. Although you could get support from some companies like Revolution who have commercial R support.

Have you guys heard about IGOR Pro? I am by no means an expert user, and I am not sure if the following list is true for most people. Anyways, here are the pros and cons in my personal opinion:.

Publication quality graphs Extremely easy to turn data analysis routines into GUIs Clear programming language Excellent documentation Runs on Mac and Win. This program is used extensively in an experiment I used to work for.

First avoid SPSS and Strata at all cost. Use SAS is you are absolutely forced to. No other reasons to. Excel is fine for extremely simple things. Want to add two columns of data that are already in Excel? Want to sum a column? Sure do it in Excel. Excel is only useful in that it allows you to look at the data. If you add a new column or define a column based on a formula then it is fairly useful because these things persist.

Anyone can look at your spreadsheet and tell what you are up to. Excel is therefore useful for calculations that often occur in business especially when the amount of data is small. Beyond that … move on. Do you need a real general purpose programming language with an enormous set of libraries to analyze your data. That is, is it important that your analysis is integrated into a larger system that needs to do a lot more than data analysis. If so, you want someting like Python.

Then ask yourself, if you are willing to pay a lot of money to develop on a platform that is closed source and expensive and therefore has a small community. If you are not, then avoid matlab or IDL or similar big ticket packages. You will want to avoid these anyway unless you are working closely with people who are tied to them.

Python is a powerful and fairly easy to learn general purpose programming language. If you only knew one language Python would probably be the best one to know. The Python data analysis stack: R is also free. While it is true that you can program anything in R or javascript or phpit is not really used as a general purpose programming language. It is used for data analysis and statistics and it is very good for that.

In that sense it feels more like matplab, IDL, mathematica etc. In short, programmers are going to prefer Python and scientsts, engineers and business folk will probably prefer R. R is just simpler to use. Not simple like Excel, but simpler than python. Installing packages is faster. The platform is not changing rapidly.

There are more good books and free tutorials on getting started. There are more data analysis, machine learning and statistics packages for R. In short, R is just made to be usable by people who may not be real programmers. Yeah, you can program and may have done so for years but your focus is not really on software development. If you are a real developer, you are likely to find R and certainly matlab pretty awkward languges.

However R has many flaws. It handles memory poorly. It is rather awkward and old fashioned in many respects. Integrating into other programming frameworks is not easy. People do not often deploy R software to clients. R feels a bit old. So I think the choice is simple. Do you want to create and deploy real software? If the first, go with python. If you are just analyzing data and not deploying anthing and you are greatly attracted to the number of libraries already written for R, you might preffer to go that way.

In the longer run, I think R will be replaced with something else, perhaps Python, perhaps something else. But R still feels easier to work with and I think most people who are not excellent programmers will prefer it.

I will be glad to learn R if it is in demand. Actually SPSS and Statistica are the most used programs for statistical analysis in the SMB sector. I see no means to learn another needless stat software. There is new kid on the block — Julia julialang. I know it is at infant stage, but how does its concept and design compare with more mature big brothers — R, MatLab, SAS, SPSS or even Excel?

Skytree Brings Machine Learning to the Masses Greg Emmerich. Take care with those units… Lonely Joe Parker. Comparison of data analysis packages - Homologus.

We use SPSS on large data-sets in production, terabytes of data and billions of records. It works well and has proven very stable and cost effective running on large servers over the years.

The user base using the program in this way is small in comparison to SAS. The R and Python add-ins make SPSS a good choice if you have analysts who want to program in something else. SAS, historically in my mind, is less a statistical program than it is a career.

SAS is just a shit. Is there an alternative as Mallab is expensive? Please do give a comparative analysis if you have one. I think PDL pdl.

R is also better and no doubt soaring in popularity. I have found R to be the most robust and amenable tool, and learned it well before the books were written on the subject using the free manuals off Google.

Debugging is easy in R if you have a good design philosophy and spend more time planning and less time coding. In my previous job I had a very basic filter with SAS and pitched it to the SAS consultant who we were paying big bucks. Not surprisingly he never got back to me. R frees the hamster from its wheel and leaves the SAS guys in the dust. With packages like swirl, etc, there really is no excuse as to not being able to jump into R within 2 hours or less if you have a basic knowledge of object oriented programming.

Data Do Thy Math. Stock market live today - binary option contractor comparison. Primerjava analize podatkov paketov: Cognitive Science to Data Science Jeremy Karnowski. Stock Market Analysis Using The Sas System Technical Analysis Pdf My Stock Market Place. My website is brenocon. AI and Social Science — Brendan O'Connor. R, Matlab, SciPy, Excel, SAS, SPSS, Stata Posted on February 23, Name Advantages Disadvantages Open source?

Two big divisions on the table: The more programming-oriented solutions are R, Matlab, and Python.

PROC PHREG: ODS Table Names :: SAS/STAT(R) User's Guide

More analytic solutions are Excel, SAS, Stata, and SPSS. Why is there duplication between numpy and scipy e. You can use SAGE or Enthought but neither is standard yet. In terms of functionality and approach, SciPy is closest to Matlab, but it feels much less mature.

Python is clearly better on most counts. Everyone says SAS is very bad. Matlab is the best for developing new mathematical algorithms. Very popular in machine learning. SPSS and Stata in the same category: Stata is a lot cheaper than SPSS, people usually seem to like it, and it seems popular for introductory courses. My impression is they get used by people who want the easiest way possible to do the sort of standard statistical analyses that are very orthodox in many academic disciplines.

ANOVA, multiple regressions, t- and chi-squared significance tests, etc. I know dozens of people under 30 doing statistical stuff and only one knows SAS. At that R meetup last week, Jim Porzak asked the audience if there were any recent grad students who had learned R in school. Many hands went up. Then he asked if SAS was even offered as an option. All hands went down. That is, ones that mostly have to stay on disk?

There are a few multi-machine data processing frameworks that are somewhat standard e. Or quite possibly something else. This was an interesting point at the R meetup. SAS people complain about poor graphing capabilities.

Matlab visualization support is controversial. Matplotlib follows the Matlab model, which is fine, but is uglier than either IMO. Excel has a far, far larger user base than any of these other options. Most of the packages listed above run Fortran numeric libraries for the heavy lifting. Can anyone prove me wrong? The open-source ones I know of are Weka and Orange. I hear there are zillions of commercial ones too. Jerome Friedman, a big statistical learning guy, has an interesting complaint that they should focus more on traditional things like significance tests and experimental design.

Here ; the article that inspired this rant. What do people think? This entry was posted in Best Posts. R, Matlab, SciPy, Excel, SAS, SPSS, Stata Eric Sun says: February 23, at 8: April 5, at February 23, at That said, these are flaws, but they seem pretty minor to me. Some benchmark results can be found at: April 19, at 5: February 24, at February 24, at 3: February 24, at 6: February 24, at 9: February 24, at 2: I work for the parallel computing team at The MathWorks.

February 25, at For people who like poking at numbers: February 25, at 2: February 25, at 4: For anyone in this situation I unequivocally recommend: February 25, at 6: February 25, at 9: February 26, at 9: February 26, at February 27, at 3: I will note that no one defended SAS. March 4, at 2: StatSoft is the only major package with R integration…The best of both worlds.

February 21, at 2: March 5, at 3: March 5, at 2: In short, I keep both SAS and R around, and use both frequently. I used SPSS a long time ago, and have no interest in trying it again. March 5, at 6: This integration is free with SPSS Base. March 11, at 4: I used Matlab, R, stata, spss and SAS over the years.

March 12, at 1: Get off of my lawn! March 13, at 4: March 13, at 9: March 13, at 5: March 13, at 7: March 16, at 4: September 20, at 6: Ryan, Your commentary in the blog brenacon. March 20, at 3: April 19, at 2: April 30, at 6: May 1, at 2: May 1, at 9: May 27, at 9: May 28, at 4: May 28, at 1: I usually tell my Intro to Research Methods for the social sciencesthat… On a scale of … R, Matlab, etc.

I now keep my gradebooks in Stata. May 29, at 1: June 11, at 1: June 14, at June 18, at July 17, at 6: July 17, at July 26, at 9:

Rating 4,6 stars - 593 reviews
inserted by FC2 system