Library Data.table On Mac

One of the easiest and most reliable ways of getting data into R is to use text files, in particular CSV (comma-separated values) files. The CSV file format uses commas to separate the different elements in a line, and each line of data is in its own line in the text file, which makes CSV files ideal for representing tabular data.

  • Here's a solution that uses a wrapper to tidy up the output of the data.table transpose function. With really large data sets this seems to be more efficient than the dcast/melt approach (I tested it on a 8000 row x 29000 column data set, the below function works in about 3 minutes but dcast/melt crashed R).
  • Mar 27, 2017  Creating a Managed Object Model. Much of Core Data’s functionality depends on the schema you create to describe your application’s entities.
  • Sep 21, 2018  Hi @Berta! The attempt to install data.table failed due to a problem compiling the package from source. Before trying to fix that, though, you might try installing the binary (already compiled) version that's available — it's not that much older than the source version.

The additional benefit of CSV files is that almost any data application supports export of data to the CSV format. This is certainly the case for most spreadsheet applications, including Microsoft Excel and OpenOffice Calc.

Mar 16, 2018  The number MacBooks shipping in 2018 will grow by double digits, if an analysts prediction is correct. That would make MacBook sales growth stronger than iPhone or iPad.

In the following examples, assume that you have a CSV file stored in a convenient folder in your file system. To convert an Excel spreadsheet to CSV format, you need to choose File→Save As, which gives you the option to save your file in a variety of formats.

Keep in mind that a CSV file can represent only a single worksheet of a spreadsheet. Finally, be sure to use the topmost row of your worksheet (row 1) for the column headings.

In R, you use the read.csv() function to import data in CSV format. This function has a number of arguments, but the only essential argument is file, which specifies the location and filename. To read a file called elements.csv located at f: use read.csv() with file.path:

R imports the data into a data frame. As you can see, this example has ten observations of nine variables.

Notice that the default option is to convert character strings into factors. Thus, the columns Name, Block, State.At.STP, Occurrence, and Description all have been converted to factors. Also, notice that R converts spaces in the column names to periods (for example, in the column State.At.STP).

This default option of converting strings to factors when you use read.table() can be a source of great confusion. You’re often better off importing data that contains strings in such a way that the strings aren’t converted factors, but remain character vectors. To import data that contains strings, use the argument stringsAsFactors=FALSE to read.csv() or read.table():

If you have a file in the EU (European Union) format (where commas are used as decimal separators and semicolons are used as field separators), you need to import it to R using the read.csv2() function.

Manipulate R Data Frames Using SQL

The sqldf() function is typically passed a single argument whichis an SQL select statement where the table names are ordinary R dataframe names. sqldf() transparently sets up a database, imports thedata frames into that database, performs the SQL select or otherstatement and returns the result using a heuristic to determine whichclass to assign to each column of the returned data frame. The sqldf()or read.csv.sql() functions can also be used to read filtered filesinto R even if the original files are larger than R itself can handle.'RSQLite', 'RH2', 'RMySQL' and 'RPostgreSQL' backends are supported.

Readme

To write it, it took three months; to conceive it – three minutes; tocollect the data in it – all my life.F. ScottFitzgerald

Introduction

sqldf is an Rpackage for runing SQL statements onR data frames, optimized for convenience. The user simply specifies anSQL statement in R using data frame names in place of table names and adatabase with appropriate table layouts/schema is automatically created,the data frames are automatically loaded into the database, thespecified SQL statement is performed, the result is read back into R andthe database is deleted all automatically behind the scenes making thedatabase's existence transparent to the user who only specifies the SQLstatement. Surprisingly this can at timesbeevenfasterthanthecorresponding pure R calculation (although the purpose of the project isconvenience and not speed). Thislinksuggests that for aggregations over highly granular columns that sqldfis faster than another alternative tried. sqldf is free softwarepublished under the GNU General Public License that can be downloadedfrom CRAN.

sqldf supports (1) the SQLite backend database(by default), (2) the H2 java database, (3)the PostgreSQL database and (4) sqldf 0.4-0onwards also supports MySQL. SQLite, H2, MySQLand PostgreSQL are free software. SQLite and H2 are embedded serverlesszero administration databases that are included right in the R driverpackages,RSQLite andRH2, so thatthere is no separate installation for either one. A number of highprofile projects use SQLite. H2 is a java database which contains a large collection of SQL functionsand supports Date and other data types. It is the most popular databasepackage among scalapackages.PostgreSQL is a client/server database and unlike SQLite and H2 must beseparately installed but it has a particularly powerful version of SQL,e.g. itswindowfunctions,so the extra installation work can be worth it. sqldf supports theRPostgreSQL driver in R. Like PostgreSQL, MySQL is a client serverdatabase that must be installed independently so its not as easy toinstall as SQLite or H2 but its very popular and is widely used as theback end for web sites.

The information below mostly concerns the default SQLite database. Theuse of H2 with sqldf is discussed in FAQ#10which discusses differences between using sqldf with SQLite and H2 andalso shows how to modify the code in the Examples sectionto use sqldf/H2 rather than sqldf/SQLite. There is some information onusing PostgreSQL with sqldf in FAQ#12and an example in Example 17.Lag . The unit testsprovide examples that can work with all five data base drivers (coveringfour databases) supported by sqldf. They are run by loading whicheverdatabase is to be tested (SQLite is the default) and running:demo('sqldf-unitTests')

  • Problem with no argument form of sqldf -sqldf())

sqldf is an Rpackage for running SQL statementson R data frames, optimized for convenience. sqldf works with theSQLite, H2,PostgreSQL orMySQL databases. SQLite has the leastprerequisites to install. H2 is just as easy if you have Java installedand also supports Date class and a few additional functions. PostgreSQLnotably supports Windowing functions providing the SQL analogue of the Rave function. MySQL is a particularly popular database that drives manyweb sites.

More information can be found from within R by installing and loadingthe sqldf package and then entering?sqldf and?read.csv.sql.A number of examples are on this page and more examples areaccessible from within R in the examples section of the?sqldf helppage.

Data.table

As seen from this example which uses the built in BOD data frame:

with sqldf the user is freed from having to do the following, all ofwhich are automatically done:

  • database setup
  • writing the create table statement which defines each table
  • importing and exporting to and from the database
  • coercing of the returned columns to the appropriate class in commoncases

It can be used for:

  • learning SQL if you know R
  • learning R if you know SQL
  • as an alternate syntax for data frame manipulation, particularly forpurposes of speeding these up, since sqldf with SQLite as theunderlying database is often faster than performing the samemanipulations in straight R
  • reading portions of large files into R without reading the entirefile (example 6b and example 13 below show two different ways andexamples 6e, 6f below show how to read random portions of a file)

In the case of SQLite it consists of a thin layer over theRSQLiteDBI interface to SQLiteitself.

In the case of H2 it works on top of theRH2DBI driver which in turnuses RJDBC and JDBC to interface to H2 itself.

In the case of PostgreSQL it works on top of theRPostgreSQLDBI driver.

There is also some untested code in sqldf for use with theMySQL database using theRMySQLDBI driver.

To get information on how to cite sqldf in papers, issue the Rcommands:

If you have not used R before and want to try sqldf with SQLite, googlefor single letter R, download R, install iton Windows, Mac or UNIX/Linux and then start R and at R console enterthis:

To try it with H2 rather than SQLite the process is similar. Ensure thatyou have the java runtime installed, install R asabove and start R. From within R enter this ensuring that the version ofRH2 that you have is RH2 0.1-2.6 or later:

sqldf has beenextensivelytestedwith multiple architectures and database back ends but there are noguarantees.

Problem is that installer gives message that sqldf is not available

Seehttps://stackoverflow.com/questions/27772756/sqldf-doesnt-install-on-ubuntu-14-04

Problem with no argument form of sqldf - sqldf())

The no argument form, i.e. sqldf() is used for opening and closing aconnection so that intermediate sqldf statements can all use the sameconnection. If you have forgotten whether the last sqldf() opened orclosed the connection this code will close it if it is open andotherwise do nothing:

Thanks to Chris Davishttps://groups.google.com/d/msg/sqldf/-YAvaJnlRrY/7nF8tpBnrcAJfor pointing this out.

Problem involvling tcltk

The most common problem is that the tcltk package and tcl/tk itself aremissing. Historically these were bundled with the Windows version of Rso Windows users should not experience any problems on this account.Since R version 3.0.0 Mac versions of R also have the tcltk package andTcl/Tk itself bundled so if you are having a problem on the Mac you mayonly need to upgrade to the latest version of R. If upgrading to thelatest version of R does not help then using this line will usuallyallow it to work even without the tcltk package and tcl/tk itself:

Running the above options line before using sqldf, e.g. put thatoptions line in your .Rprofile, is all that is needed to get sqldf towork without the tcltk package and tcl/tk itself in most cases; however,this does have the downside that it will use the R engine which isslower. An alternative, is to rebuild R yourself as discussed here:https://permalink.gmane.org/gmane.comp.lang.r.fedora/235

If the above does not resolve the problem then read the more detaileddiscussion below.

A related problem is that your R installation is flawed or incomplete insome way and the main way to fix thiat is to fix your installation of R.This will not only affect sqldf but also many other R packages soinformation on installing them can also help here. In particularinstallation information for the Rcmdrpackagemay be useful since its likely that if you can install Rcmdr then youcan also install sqldf.

  • sqldf uses the gsubfn R package which normally uses the tcltk Rpackage which in turn uses tcl/tk itself. The tcltk package is acore component of R so a complete distribution of R should havetcltk capability. For this to happen tcl/tk must be present atthe time R itself was built (the build process automaticallyexcludes tcltk capability if it does not sense that tcl/tk ispresent at the time R itself is built) but it is possible to rungsubfn and therefore also sqldf without tcl/tk present at the timesqldf runs (although it will run slower if you do this). There arethree possibilities: (1) tcltk capability absent. If thiscommand from within R capabilities()[['tcltk']] is FALSE thenyour distribution of R was built without tcltk capability. In thatcase you must use a different distribution of R. All commondistributions of R including the CRAN distribution for Windows andmost distributions for Linux do have tcltk capability. Note that agiven version of R may have been built with or without tcltkcapability so simply checking which version of R you have won't tellyou whether your distribution was built correctly. This situationmostly affects distributions of R built by the user or improperlybuilt by others and then distributed. (2) tcl/tk missing onsystem (a) If your distribution of R was built with tcltkcapaility as described in the last point but you don't have tcl/tkitself on your system you can simply install tcl/tk yourself. Inmost cases this is actually quite easy to do -- its typically a oneline apt-get on Linux. There is information about installing tcl/tknear the end of FAQ#5 or(b) if your distribution of R was built with tcltk capability asdescribed in the first point but you don't have tcl/tk on yoursystem and you don't want to bother to install it then issue the Rcommand:

In that case gusbfn will use the slower R engine instead of the fastertcltk engine so you won't need tcl/tk installed on your system in thefirst place. Be sure you are using gsubfn 0.6-4 or later if you use thisoption since prior versions of gsubfn had a bug which could interferewith the use of this option. To check your version of gsubfn:

Library
  • using an old version of R, sqldf or some other software. If that isthe problem upgrade to the most recent versions onCRAN. Alsobe sure you are using the latest versions of other packages used bysqldf. If you are getting NAMESPACE errors then this is likely theproblem. You can find the current version of Rhere and then installsqldf from within R using install.packages('sqldf') . If youalready have the current version of R and have installed thepackages you want then you can update your installed packages to thecurrent version by entering this in R: update.packages() . In mostcases all the mirrors are up to date but if that should fail toupdate to the most recent packages on CRAN then try using a more upto date mirror.

  • unexpected errors concerning H2, MySQL or PostgreSQL. sqldfautomatically uses H2, MySQL or PostgreSQL if the R package RH2,RMySQL or RpgSQL is loaded, respectively. If none of them are loadedit uses sqlite. To force it to use sqlite even though one of thoseothers is loaded (1) add the drv = 'SQLite' argument to each sqldfcall or (2) issue the R command:

in which case all sqldf calls will use sqlite. See FAQ#7 for more info.

  • message about tcltk being missing or other tcltk problem. This isreally the same problem discussed in the first point above. Upgradeto sqldf 0.4-5 or later. If it still persists then set this option:options(gsubfn.engine = 'R') which causes R code to be substitutedfor the tcl code or else just install the tcltk package. See FAQ#5 formore info. If you installed the tcltk package and it still hasproblems then remove the tcltk package and try these steps again.

  • error messages regarding a data frame that has a dot in its name.The dot is an SQL operator. Either quote the name appropriately orchange the name of the data frame to one without a dot.

  • as recommended in theINSTALL fileits better to install sqldf using install.packages('sqldf') andnotinstall.packages('sqldf', dep = TRUE) since the latterwill try to pull in every R database driver package supported bysqldf which increases the likelihood of a problem with installation.Its unlikely that you need every database that sqldf supports sodoing this is really asking for trouble. The recommended way doesinstall sqlite automatically anyways and if you want any of theadditional ones just install them separately.

  • Mac users. According tohttp://cran.us.r-project.org/bin/macosx/tools/Tcl/Tk comes with R 3.0.0 and later but if you are using an earlierversion of R look at thislink.

1. How does sqldf handle classes and factors?

sqldf uses a heuristic to assign classes and factor levels to returnedresults. It checks each column name returned against the column names inthe input data frames and if the output column name matches any inputcolumn name then it assigns the input class to the output. If two inputdata frames have the same column names then this automatic assignment isdisabled if they differ in class. Also if method = 'raw' then theautomatic class assignment is disabled. This also extends to factorlevels as well so that if an output column corresponds to an inputcolumn that is of class 'factor' then the factor levels of the inputcolumn are assigned to the output column (again assuming that only oneinput column has the output column name). Also in the case of factorsthe levels of the output must appear among the levels of the input.

sqldf knows about Date, POSIXct and chron (dates, times) classes but notPOSIXlt and other date and time classes.

Previously this section had an example of how the heuristic could goawry but improvements in the heuristic in sqldf 0.4-0 are such that thatexample now works as expected.

2. Why does sqldf seem to mangle certain variable names?

Staring with RSQLite 1.0.0 and sqldf 0.4-9 dots in column names are nolonger translated to underscores.

If you are using an older version of these packages then note that sincedot is an SQL operator the RSQLite driver package converts dots tounderscores so that SQL statements can reference such columns unquoted.

Also note that certain names are SQL keywords. These can be found usingthis code:

Note that using such names can sometimes result in an error message suchas:

which appears to suggest that there is no column but that is because ithas a different name than expected. For an example of what happens:

3. Why does sqldf('select var(x) from DF') not work?_from_DF')_not_work?)

The SQL statement passed to sqldf must be a valid SQL statementunderstood by the database. The functions that are understood includesimple SQLite functions and aggregate SQLite functions and functions intheRSQLite.extfunspackage. Thus in this case in place of var(x) one could use variance(x)from the RSQLite.extfuns package. For SQLite functions see the lists ofcore functions, aggregatefunctions and date and timefunctions.

If each group is not too large we can use group_concat to return allgroup members and then later use apply in R to use R functions toaggregate results. For example, in the following we summarize the datausing sqldf and then apply a function based on var:

4. How does sqldf work with 'Date' class variables?

Geant4 on mac dyld library not loaded rpath libg4tree.dylib in hindi. The H2 database has specific support for Date class variables so with H2Date class variables work as expected:

In R, Date class dates are stored internally as the number of dayssince 1970-01-01 -- often referred to as the UNIX Epoch. (They arestored this way on non-UNIX platforms as well.) When the dates aretransferred to SQLite they are stored as these numbers in SQLite. (sqldfhas a heuristic that attempts to ascertain whether the column representsa Date but if it cannot ascertain this then it returns the numericinternal version.)

In SQLite this is what happens:

The examples below use RSQLite 0.11-0 (prior to that version they wouldreturn wrong answers. With RSQLite it will return the correct answer butDate class columns will be returned as numeric if sqldf's heuristiccannot automatically determine if they are to be of class 'Date'. Ifyou name the output column the same name as an input column which has'Date' class then it will correctly infer that the output is to be ofclass 'Date' as well.

Also note this code:

See date and time functionsfor more information. An example using times but not dates can be foundhereand some discussion on using POSIXct can be foundhere .

5. I get a message about the tcltk package being missing.

The sqldf package uses the gsubfn package for parsing and the gsubfnpackage optionally uses the tcltk R package which in turn uses stringprocessing language, tcl, internally.

If you are getting erorrs about the tcltk R package being missing orabout tcl/tk itself being missing then:

Windows. This should not occur on Windows with the standarddistributions of R. If it does you likely have a version of R that wasbuilt improperly and you will have to get a complete properly builtversion of R that was built to work with tcltk and tcl/tk and includestcl/tk itself.

Mac. This should not occur on recent versions of R on Mac. If itdoes occur upgrade your R installation to a recent version. If you mustuse an older version of R on the Mac then get tcl/tk here:http://cran.us.r-project.org/bin/macosx/tools/

UNIX/Linux. If you don't already have tcl/tk itself on your system trythis to install it like this (thanks to Eric Iversion):

Also see this message by Rolf Turner:https://stat.ethz.ch/pipermail/r-help/2011-April/274424.html.

In some cases it may be possible to bypass the need for tcltk and tcl/tkaltogether by running this command before you run sqldf:

In that case the gsubfn package will use alternate R code instead oftcltk (however, it will be slightly slower).

Notes: sqldf depends on gsubfn for parsing and gsubfn optionally usesthe tcltk R package (tcl is a string processing language) which issupposed to be included in every R installation. The tcltk R packagerelies on tcl/tk itself which is included in all standard distributionsof R on Windows on recent Mac distributions of R. Many Linuxdistributions include tcl/tk itself right in the Linux distributionitself.

Also note that whatever build of R you are using must have had tcl/tkpresent at the time R was built (not just at the time its used) or elsethe R build process will automatically turn off tcltk capability withinR. If that is the case supplying tcltk and tcl/tk later won't help. Youmust use a build of R that has tcltk capability built in. (If the R wasbuilt with tcltk capability then adding the tcltk package (if itsmissing) and tcl/tk will work.)

6. Why are there problems when we use table names or column names that are the same except for case?

SQL is case insensitive so table names a and A are the same as faras SQLite is concerned. Note that in the example below it did produce awarning that something is wrong although that might not be the case inall situations.

7. Why are there messages about MySQL?

sqldf can use several different databases. The database is specified inthe drv= argument to the sqldf function. If drv= is not specifiedthen it uses the value of the 'sqldf.driver' global option todetermine which database to use. If that is not specified either then ifthe RPostgreSQL, RMySQL or RH2 package is loaded (it checks in thatroder) it uses the associated database and otherwise uses SQLite. Thusif you do not specify the database and you have one of those packagesloaded it will think you intended to use that database. If its likelythat you will have one of these packages loaded but you do not want tothat package with sqldf be sure to set the sqldf.driver option, e.g.options(sqldf.driver = 'SQLite') .

8. Why am I having problems with update?

Although data frames referenced in the SQL statement(s) passed to sqldfare automatically imported to SQLite, sqldf does not automaticallyexport anything for safety reasons. Thus if you update a table usingsqldf you must explicitly return it as shown in the examples below.

Note that in the select statement we referred to the table as main.DF(main is always the name of the sqlite database.) If we had referredto the table as DF (without qualifying it as being in main) sqldfwould have fetched DF from our R workspace rather than using theupdated one in the sqlite database.

One other problem can arise if the data has factors. Here we wouldnormally get the wrong result because we are asking it to add a value tocolumn b that is not among the factor levels in b but by usingmethod = 'raw' we can tell it not to automatically assign classes tothe result.

Another way around this is to avoid the entire problem in the firstplace by not using a factor for b. If we had defined column b ascharacter or numeric instead of factor then we would not have had tospecify method = 'raw'.

9. How do I examine the layout that SQLite uses for a table? which tables are in the database? which databases are attached?

Try these approaches to get the indicated meta data:

10. What are some of the differences between using SQLite and H2 with sqldf?

sqldf will use the H2 database instead of sqlite if theRH2 package is loaded.Features supported by H2 not supported by SQLite include Date classcolumns and certainfunctions such asVAR_SAMP, VAR_POP, STDDEV_SAMP, STDDEV_POP, various XML functionsand CSVREAD.

Note that the examples below require RH2 0.1-2.6 or later.

Here are some commands. The meta commands here are specific to H2 (forSQLite's meta data commands seeFAQ#9):

If RH2 is loaded then it will use H2 so if you wish to use SQLiteanyways then either use the drv= argument to sqldf:

or set the following global option:

When using H2:

  • in H2 a column such as Sepal.Length is not converted toSepal_Length (which older versions of RSQLite do) but remains asSepal.Length. For example,

Also sqlite orders the result above even without the order clause and h2translates 'Sepal Length' to Sepal.Length .

Library Data.table On Mac Computer

  • quoting rules in H2 are stricter than in SQLite. In H2, to quote anidentifier use double quotes whereas to quote a constant use singlequotes.

  • file objects are not supported. They are not really needed becauseH2 supports aCSVREADfunction. Note that on Windows one can use the R notation ~ torefer to the home directory when specifying filenames if usingSQLite but not with CSVREAD in H2.

  • currently the only SQL statements supported by sqldf when using H2are select, show and call (whereas all are supported with SQLite).

  • H2 does not support the using clause in SQL select statements butdoes support on. Also it implicitly uses on rather than using innatural joins which means that selected and where conditionvariables that are merged in natural joins must be qualified in H2but need not be in SQLite.

The examples in the Examples section are redone below using H2. Where H2does not support the operation the SQLite code is given instead. Notethat this section is a bit out of date and some of the items that itsays are not supported actually are supported now.

11. Why am I having difficulty reading a data file using SQLite and sqldf?

SQLite is fussy about line endings. Note the eol argument toread.csv.sql can be used to specify line endings if they are differentthan the normal line endings on your platform. e.g.

eol can also be used as a component to the sqldf file.formatargument.

12. How does one use sqldf with PostgreSQL?

Install 1. PostgreSQL, 2. RPostgreSQL R package 3. sqldf itself.RPostgreSQL and sqldf are ordinary R package installs.

Make sure that you have created an empty database, e.g. 'test'. Thecreatedb program that comes with PostgreSQL can be used for that. e.g.from the console/shell create a database called test like this:

Here is an example using RPostgreSQL and after that we show an exampleusing RpgSQL. The options statement shown below can be entered directyor alternately can be put in your .Rprofile. The values shown here areactually the defaults:

For another example using over and partition by see: this cumsumexample

Also note that log and log10 in R correspond to ln and log,respectively, in PostgreSQL.

13. How does one deal with quoted fields in read.csv.sql?

read.csv.sql provides an interface to sqlite's csv reader. That readeris not very flexible (but is fast) and, in particular, it does notunderstand quoted fields but rather regards the quotes as part of thefield itself. To read a file using read.csv.sql and remove all doublequotes from it at the same time on Windows try this assuming you haveRtools installed and on your path (or the corresponding tr syntax onUNIX depending on your shell):

or equivalently:

Another program to look at is thecsvfix program (this is a freeexternal program -- not an R program). For example suppose we havecommas in two contexts: (1) as separators between fields and withindouble quoted fields. To handle that case we can use csvfix totranslate the separators to semicolon stripping off the double quotes atthe same time (assuming we have installed csvfix and we have put it inour path):

14. How does one read files where numeric NAs are represented as missing empty fields?

Translate the empty fields to some number that will represent NA andthen fix it up on the R end.

Another program that can be used in filters is the free csvfix . Forexample, suppose that csvfix is on our path and that NA values arerepresented as NA in numeric fields. We would like to convert them to-999 and then later remove them.

Another way in which the input file can be malformed is that not everyline has the same number of fields. In that case csvfx pad -n can beused to pad it out as in this example:

15. Why do certain calculations come out as integer rather than double?

SQLite/RSQLite, h2/RH2, PostgreSQL all perform integer division onintegers; however, RMySQL/MySQL performs real division.

16. How can one read a file off the net or a csv file in a zip file?

Use read.csv.sql and specify the URL of the file:

Since files off the net could have any end of line be careful to specifyit properly for the file of interest.

As an alternative one could use the filter argument. To use this wget(download,Windows) must bepresent on the system command path.

Here is an example of reading a zip file which contains a single filethat is a csv :

In the line of code above it is assumed that 7z(download) is present and on thesystem command path. The example is for Windows. On UNIX use /dev/nullin place of NUL.

If we had a .tar.gz file it could be done like this:

assuming that tar is available on our path. (Normally tar is availableon Linux and on Windows its available as part of theRtools distribution onCRAN.)

Note that filter causes the filtered output to be stored in atemporary file and then read into sqlite. It does not actually read thedata directly from the net into sqlite or directly from the zip ortar.gz file to sqlite.

Note: The examples in this section assume sqldf 0.4-4 or later.

These examples illustrate usage of both sqldf and SQLite. For sqldf withH2 see FAQ#10.For PostgreSQL seeFAQ#12.Also the 'sqldf-unitTests' demo that comes with sqldf works undersqldf with SQLite, H2, PostgreSQL and MySQL. David L. Reiner has createdsome further exampleshere andPaul Shannon has exampleshere.

Example 1. Ordering and Limiting

Here is an example of sorting and limiting output from an SQL selectstatement on the iris data frame that comes with R. Note that althoughthe iris dataset uses the name Sepal.Length older versions of theRSQLite driver convert that to Sepal_Length; however, newer versionsdo not. After installing sqldf in R, just type the first two lines intothe R console (without the >):

Example 2. Averaging and Grouping

Here is an example which processes an SQL select statement whosefunctionality is similar to the R aggregate function.

Example 3. Nested Select

Here is a more complex example. For each Species, find the average SepalLength among those rows where Sepal Length exceeds the average SepalLength for that Species. Note the use of a subquery and explicit columnnaming:

Note that PostgreSQL is the only free database that supportswindowfunctions(similar to ave function in R) which would allow a differentformulation of the above. For more on using sqldf with PostgreSQL seeFAQ#12

which in R corresponds to this R code (i.e. partition..over inPostgreSQL corresponds to ave in R):

Here is some sample data with the correlated subquery from thisWikipedia page:

Example 4. Join

The different type of joins are pictured in this image:i.imgur.com/1m55Wqo.jpg. (SQLite does not support right joins but theother databases sqldf supports do.) We define a new data frame, Abbr,join it with iris and perform the aggregation:

Although the above is probably the shortest way to write it in SQL,using natural join can be a bit dangerous since one must be very sureone knows precisely which column names are common to both tables. Forexample, had we included the row_names as a column in both tables (byspecifying row.names = TRUE to sqldf) the natural join would not workas intended since the row_names columns would participate in the join.An alternate and safer way to write this would be with join andusing:

~~~~ {.prettyprint}

sqldf('select Abbr, avg('Sepal.Length

View User Library On Mac

Functions in sqldf

Name Description
sqldfSQL select on data frames
read.csv.sqlRead File Filtered by SQL
sqldf-packagesqldf package overview
No Results!

Last month downloads

Details

imports chron , DBI
depends gsubfn (>= 0.6) , proto , R (>= 3.1.0) , RSQLite
suggests MASS , RH2 , RMySQL , RPostgreSQL , svUnit , tcltk
ContributorsG. Grothendieck

Access Library On Mac

Include our badge in your README

Find Photo Library On Mac

[![Rdoc](http://www.rdocumentation.org/badges/version/sqldf)](http://www.rdocumentation.org/packages/sqldf)