Excel and CSV files - Are there different types of *.csv files?

I've created an *.csv file for importing into an application but the csv file won't import properly.  I have to put quotes around the import fields due to commas being a 'possible' part of an address like "123 E Main St., Apt. 100B".  Without commas the import performs properly.

Is this comment appropriate in this situation:

The problem seems to be that the application can’t import it in a standard txt/csv file. There has to be a code missing somewhere or an extra code somewhere, but I can’t see it. It works great if it reads it as an Excel csv file.

Are all *.csv files created equal or is there a 'special' *.csv from Excel?

Suggestions on a resolution...?

Thanks!

A CSV file is (by strict definition) a TXT file with comma separated values. In reality, there are many different characteres used as the delimiter of the values; a semi-colon (;), tab or a pipe (|) being a couple of more common.

The encapsulator is most commonly a full quote ("). A pair of these are used to wrap text fields that may or may not contain the character used as the delimiter. You have to be careful that a full quote does not appear within the wrapping quotes or the result is malformed garbage.

Other than that, a CSV is a CSV. Different programs will use different rules to import a CSV and you often have to tailor the CSV file to suit. Excel's import from text (Data tab, Get External Data, From Text) allows for various deliminator characters as well as different encapsulator characters. You can also specify text input on a particular field of information so a large number remains as a large number and is not converted to scientific notation.

6 people found this reply helpful

·

Was this reply helpful?

Sorry this didn't help.

Great! Thanks for your feedback.

How satisfied are you with this reply?

Thanks for your feedback, it helps us improve the site.

How satisfied are you with this reply?

Thanks for your feedback.

total1 wrote:

I've created an *.csv file for importing into an application but the csv file won't import properly.  I have to put quotes around the import fields due to commas being a 'possible' part of an address like "123 E Main St., Apt. 100B".  Without commas the import performs properly.
[....]
The problem seems to be that the application can’t import it in a standard txt/csv file. There has to be a code missing somewhere or an extra code somewhere, but I can’t see it. It works great if it reads it as an Excel csv file.

Are all *.csv files created equal or is there a 'special' *.csv from Excel?


There may be several factors in play here.

First, when you do Save As in Excel, you will see several "different" CSV options.  I have not experimented to see their differences.  I suspect (but don't know for sure) that the difference between CSV and CSV(Macintosh) is the line terminator (CR, LF, CR LF or LF CR).  And IIRC, there is only a subtle difference between CSV and CSV(MS-DOS) which probably does not apply to you.  (Do a Google search.)

Second, despite the name (comma-separated values), I believe the actual separator used depends the settings in the Region and Language Option control panel.  The exporting application and importing application must agree.

Finally, there really are no standards for the CSV file format.  There is an "RFC", an attempt to describe a standard.  But it is really post-facto to most implementations.  Moreover, such standards only specify the content of the file (export encoding); they do not impose standards on how to read (import interpretation) the file.

I suggest that you open the CSV file using Notepad to how your data is actually written to the file.

For your example, you should see literally "123 E Main St., Apt. 100B" with the double-quotes.

If you do not, the most likely explanation is that your R&LO settings specify a different list separator than comma.  Excel double-quotes strings only as-needed to avoid confusion with the list separator.

3 people found this reply helpful

·

Was this reply helpful?

Sorry this didn't help.

Great! Thanks for your feedback.

How satisfied are you with this reply?

Thanks for your feedback, it helps us improve the site.

How satisfied are you with this reply?

Thanks for your feedback.

I'll start by agreeing with the others - while CSV started off as a "standard" file format, it has many variants and these days rather than thinking of the CSV as representing "Comma Separated Values", I think of more as "Common Separator for Values".  And as noted the common separator can be almost anything someone wants to code into their file processing, but the most usual are the comma, [tab] character, semi-colon and the pipe character.  Then we get off into text delimiters (' or ") and the aforementioned terminating character(s) for a row/line of data.

 

In Excel, you've got 2 basic ways (with a variation of each for MS-DOS/Windows systems and another for Macs) to create one of these files.  The basic difference is that using Save AS with the CSV option results in a .csv file that uses commas for field separators and double-quotes for text delimiters.  You can also use Save As with the TXT option, which pretty much writes the same type of file, but uses the[Tab] character for the field separator instead of the comma.  This results in a file with a .txt filename extension.

With a 'standard' CSV file, you can open it in Excel simply by identifying it to Excel or double-clicking its filename in your file browser.  Excel automatically attempts to interpret it as a CSV file with commas for field separators and double-quote characters for text delimiters.

 

With a TXT file, you can use Excel's import data from Text file option:

[Data] tab, {Get External Data} group, "From Text" option.  This option gives you the ability to tell excel what the separator character is/are (you can identify more than 1), and what the text delimiter is.

 

A word about the text delimiter:  Normally they are only added to a field when that field contains the separator character itself - as in your address with a comma example.

 

If you have a field that already has double-quotes around it, then when you look at the created CSV or TXT file with Notepad, you'll find that it looks like this in the file:

"""Started with one double-quote at each end, saved with 3 double-quotes at each end!"""

 

ONE QUESTION: when your other application tries to import the file, is it giving you extra fields where you aren't expecting them, or just generating an error telling you that it cannot read the file?

I am free because I know that I alone am morally responsible for everything I do. R.A. Heinlein

Was this reply helpful?

Sorry this didn't help.

Great! Thanks for your feedback.

How satisfied are you with this reply?

Thanks for your feedback, it helps us improve the site.

How satisfied are you with this reply?

Thanks for your feedback.

Here's some stuff you might find helpful in some way:
Change the " text delimiter to a ' in Excel for use in saving .CSV files:

http://www.ozgrid.com/forum/showthread.php?t=134498

 

 

I am free because I know that I alone am morally responsible for everything I do. R.A. Heinlein

2 people found this reply helpful

·

Was this reply helpful?

Sorry this didn't help.

Great! Thanks for your feedback.

How satisfied are you with this reply?

Thanks for your feedback, it helps us improve the site.

How satisfied are you with this reply?

Thanks for your feedback.

This situation is perplexing to me and thanks for your response with some great information.

1) The application generates an error message with the full-quote/double-quote encapsulation of a field (like the address field mentioned at the start of the thread).

2) Looking at the data file in hex format this is what I see for the customer data:

1st row contains the data field names/labels such as Customer No, Account Name, Contact etc

2nd row contains actual customer data.

End of first row contains "0D 0A" <CR LF>

End of second row contains "0D 0D 0A" <CR CR LF>

This file generates the error message.

****************

This file imports successfully for the FIRST data row (second actual row) but fails on the SECOND data row (third actual row) with the first actual row being the field names.

1) End of actual first row contains "0D 0A"

2) End of actual second row contains: "0D"

3) End of actual third row contains: "0D 0D 0A"

Thoughts????

Was this reply helpful?

Sorry this didn't help.

Great! Thanks for your feedback.

How satisfied are you with this reply?

Thanks for your feedback, it helps us improve the site.

How satisfied are you with this reply?

Thanks for your feedback.

You could use your hex editor to replace <CRLF> with <LF>, repeating until there are no more replacements. If the CSV will not import, then replace <LF> with <CRLF> and try again.

There are a number of programs that can accomplish this operation including Notepad++ and even MS Word.

Was this reply helpful?

Sorry this didn't help.

Great! Thanks for your feedback.

How satisfied are you with this reply?

Thanks for your feedback, it helps us improve the site.

How satisfied are you with this reply?

Thanks for your feedback.

 
 

Question Info


Last updated April 18, 2024 Views 29,606 Applies to: