CDSoft / pp

PP - Generic preprocessor (with pandoc in mind) - macros, literate programming, diagrams, scripts...

Home Page:http://cdelord.fr/pp

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Include CSV file as markdown table

duzyn opened this issue · comments

It would be nice if pp can include CSV file as markdown table like iA write did.

Markdown tables are painful to write manually. On the other hand, spreadsheet editors like Numbers or Excel make it easy to create tables with complex formulas. A long-time standard for exporting these tables is the comma separated value format (csv). You can compose a table in Excel, and then have all the calculations exported in a plain text csv. And now you can reference a csv file in iA Writer the same as way as images:

Using the first prototype simulating said behavior, we knew this was the way forward. It’s hard to describe how transclusion feels other than it just feels right. Beginning with a simple, straightforward syntax kept reaping dividends.

All spreadsheet programs can also export tab separated values (TSV) and those are very easy to convert into Markdown pipe tables using a decent text editor which supports regular expressions and preferably likewise substitutions. The following Perl program "tsv2pipetable.pl" takes a TSV file on stdin, converts it into a pipe table assuming that the first line is a header row and spits out the pipe table on stdout. You can use it as a filter in pp with

!exec(perl tsv2pipetable.pl <filename.tsv)
#!/usr/bin/env perl

# tsv2pipetable.pl - convert a file with tab separated values to a Pandoc pipe table
#
# Usage:
#
#     perl tsv2pipetable.pl <filename.tsv >pipetable.md

use strict;
use warnings;
use 5.008005
use open qw[ :utf8 :std ];

my $colfill = '-' x 5; # for the column spec row

# slurp all lines into an array
my @rows = <>;

# construct the column spec row

## make a copy of the first line
unshift @rows, $rows[0];

## replace column contents in the colspec row with five dashes each
$rows[1] =~ s/[^\t\n]+/$colfill/g;

# process each row

for ( @rows ) {
    ## escape existing pipes if any
    s/\|/\\|/g;
    ## replace tabs with pipes
    s/\t/|/g;
    ## insert a pipe at the start of each row
    s/^/|/;
}

# join the lines and print the pipetable to stdout
print STDOUT join "", @rows;

__END__

I hope this helps.
Just ask if something isn't clear!

That's very helpful. I forgot the powerful !exec() macro.
I also found some others' project like these ones:

I'm using csv2md for my writings now with this macro:

\exec(csv2md 1.csv)

Good idea.
To avoid external dependencies I can add a macro \csv that takes the name of a CSV file.

For example:
\csv(file.csv)

Which table format in the output file would be the better? May be grid table to allow multiline cells.

The format could be deduced from the nature of columns (number are right-aligned, text is left aligned). If the csv file has no header, it can be given as a parameter.

\csv(file.csv)(field 1 | field 2 | ...) ==> add a header line
\csv(file.csv)() ==> headless table containing all the lines of the csv file
\csv(file.csv) ==> the first line of the csv file is used as the header line

The format of the input file could be inferred from the most frequent character amount some popular separators (',' ';' tab ...).

I also found some others' project like these ones:

Just make sure that they support quoted values and escaped comma/quote characters!
The good thing with TSV is that tabs are simply illegal in values. Thus you are sure that all tabs are field separators and no quoting or escaping is needed.

I have added the csv macro. See https://github.com/CDSoft/pp#csv-tables

It's based on a Haskell CSV parser which supports quoted values. The separator is the most frequent separator in the file. I guess it should work most of the time. If not I'll add an optional parameter to define it.

Thanks @duzyn and @CDSoft ! This new macro is really useful.

pp convert the csv file to grid_tables which require lining up columns. Now there is a problem when converting CJK csv files, because pp can't line up CJK characters.

This is the original Chinese csv:

中文(Chinese),英文(English),价格(Price)
香蕉,Banana,3.9
苹果,Apple,4.5

And this is the converted table:

+-------------+-------------+-----------+
| 中文(Chinese) | 英文(English) | 价格(Price) |
+:============+:============+==========:+
| 香蕉          | Banana      |       3.9 |
+-------------+-------------+-----------+
| 苹果          | Apple       |       4.5 |
+-------------+-------------+-----------+

And this is the converted HTML by Pandoc:

<table style="width:58%;">
<colgroup>
<col style="width: 19%" />
<col style="width: 19%" />
<col style="width: 19%" />
</colgroup>
<thead>
<tr class="header">
<th style="text-align: left;">中文(Chinese</th>
<th style="text-align: left;">) | 英文(Eng</th>
<th style="text-align: right;">lish) | 价格(Price)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">香蕉</td>
<td style="text-align: left;"><div class="line-block">Banana</div></td>
<td style="text-align: right;"><div class="line-block">      3.9</div></td>
</tr>
<tr class="even">
<td style="text-align: left;">苹果</td>
<td style="text-align: left;"><div class="line-block">Apple</div></td>
<td style="text-align: right;"><div class="line-block">      4.5</div></td>
</tr>
</tbody>
</table>

@duzyn:

Now there is a problem when converting CJK csv files, because pp can't line up CJK characters.

Probably you should consider using a pandoc filter to process PP's output file and cleanup the table cells alignment:

You can write filters in almost any scripting language you're comfortable with, Haskell and Lua being supported natively by pandoc and not requiring third party tools/languages to be installed.

I'd probably go for a Lua filter, and just iterate through every column twice: once to calculate the widest cell, the second pass to add trailing spaces to every cell that is shorter than that max value. Keeping track of Chinese chars which should count as two-spaces shouldn't be hard if you know the Unicode ranges you're looking for.

I wrote:

nnor \g2p vipyygv:v/|/d{jp:s/+/|/g

forgetting to escape the pipes! It should read:

nnor \g2p vip<esc>yygv:v/\|/d<cr>{jp:s/+/\|/g<cr>

pp could also generate pipe tables directly (at least in markdown).

pp 2.7.1 generates pipe tables (in markdown only).

Thanks Christophe!