A delimited text file, sometimes called a "csv" file or a flat file, is a very common method for encapsulating a dataset. Almost all systems or other file formats have a way to export data into delimited files, making it an easy and simple way to exchange files between systems. However, despite there being fairly strict standards in place for how a delimited file is to be created, there are differences from one system to the next that require options to be set and sometimes cause issues. This document will explain how to import a delimited text file and the various options available. Please see https://support.lityxiq.com/402593-Potential-Errors-Importing-Numeric-Fields-with-Non-Numeric-Data for other common issues of which you should be aware when importing delimited files.
Note that a delimited file is "just" a file, so it can be stored and retrieved from a variety of types of file locations. For example, delimited files can be placed on FTP sites, in Google Drive, in an Amazon S3 bucket, or in the LityxIQ File Manager (see https://support.lityxiq.com/374367-Using-the-File-Manager for more on the File Manager). Regardless of the location of the file you wish to import, the method and options for importing are the same.
Whether creating a new raw dataset (https://support.lityxiq.com/319229-Create-New-Dataset) or editing an existing one (https://support.lityxiq.com/261835-Define-the-Source-Settings-for-a-Raw-Dataset), the options you will use are on the Data Source tab.
Left Side of Data Source Tab
The left side of the Data Source tab contains settings related to locating the file itself.
Location - As mentioned above, the location of a delimited text file can be the LityxIQ Manager or a Data Connection you have setup. Select the location/connection where the file can be found.
Use Filename Template - In this discussion, we will use the setting "Standard Import - No Template". The other options available are described elsewhere, but in any case do not affect the other settings for the delimited text file.
Filename - When you click the Filename box, a file selection dialog will appear similar to that shown below. LityxIQ will make the connection to the chosen location and display the files and folders it finds. Navigate the files and folders to find the one you want to import. See the https://support.lityxiq.com/374367-Using-the-File-Manager document for more help on navigating the files and folders and selecting a file. The document is specific to the LityxIQ File Manager, but navigation is similar regardless of the location.
- When you select a file, LityxIQ will attempt to determine the optimal setting for other options in this dialog. For example, it will try to determine the type of file and the delimiter and then pre-populate those settings.
Compressed File - LityxIQ supports import of files that are not compressed, or are compressed in a variety of formats.
- Not Compressed - select this option if the file is not compressed (the default).
- Gz, Bz2, or Zip - select one of these options if the delimited text file is stored in these compressed formats.
- Special note - In the case of a zip file, the delimited text file will be embedded as one of potentially many files compressed within the zip file. In this case, additional options will appear.
- You must enter the name of the delimited file within the text file, including a path to that file if the zip contains folders. Note there is a special template that can be used when entering the name of the text file: [@basefilename]. This template code will be replaced with the zip filename minus the ".zip" extension of that name. In the example below, the zip file selected is "V00342etc.zip" and the Filename is entered as "[@basefilename].txt". LityxIQ will translate the filename it attempts to find as "V00342etc.txt". In other words, [@basefilename] is replaced by "V00342etc". This approach is helpful if you want to help automate an import process where the filename within the zip file is always the same as the zip file itself, except with a changed extension.
- If the zip file has been saved with a password, you enter that in the Password box.
- Note that you cannot build the dictionary from the dialog as you would do with a non-compressed file. Simply select to load the file and it will build the dictionary automatically.
Right Side of Data Source Tab
The right side of the Data Source tab relates to options specific to the formatting and structure or nuances of the file and how it is imported.
File Type - Select Delimited/CSV for a delimited text file.
Null String - Enter the string that represents how NULL values are represented in the file. Commonly, delimited files use one of three ways to specify NULL data, shown below. While LityxIQ allows you to specify any string as representing NULL values, many systems that create delimited files will denote them in a few possible ways. For example:
- An empty string (in LityxIQ, do not enter anything in this box)
- The string "NULL"
- The string "\N"
Delimiter - Enter the character that delineates (delimits) fields from one another. Common examples are the comma (,) character (called a CSV file) and the pipe ( | ) character. Another common delimiter is the tab character (such files are often referred to as TSV files). To specify the tab delimiter, enter the two characters Backslash-Lowercase t in the box (\t).
Use Escape Character - Set this checkbox if the file uses the backslash characeter (\) as an escape character. This becomes important in cases where the value within a string field in the delimited file includes the double quote character or the delimiter character. There are two commonly found situations:
- Leave this unchecked (the default) if your system exports data in "standard csv" format (which would "double up" embedded quote characters, and enclose the entire field in quotes if the delimiting character appears within the field).
- In some cases, the file might have been created in such a way that these characters are "escaped" with the backslash, meaning they are preceded in the file by a backslash. For this situation, check this box to have LityxIQ better recognize it.
- In either case, LityxIQ will do its best to resolve conflicts behind the scenes.
Mac Line Endings - Set this checkbox if the file uses mac line endings (the CR character only). This should only be necessary if the file was created on a Mac computer. If you are having issues and suspect that the file was created on a Mac, you can try using this checkbox, or check with the person who created the file. Even without this box checked, LityxIQ will attempt to determine automatically if it is a MAC file.
Header Row Number - Enter the row number of the header row (usually it's in Row 1 which is the default) which has the variable names. Enter 0 if the file has no header row, in which case the variable names will be created automatically using the naming convention "Column_n" with "n" being the column number.
Row Limit- Enter the number of rows to import. Use 0 to import all rows, the default. This is a way to test settings on a much larger file by only importing, say, the first 100000 rows before trying the entire file.
Trim Trailing Spaces - With this box checked, spaces at the end of strings that are imported are automatically removed being stored in the LityxIQ dataset.