Loading a csv file¶
We load a tab separated data file using the load_table()
function. The format is inferred from the filename suffix and you will note, in this case, it’s not actually a csv file.
from cogent3 import load_table
table = load_table("data/stats.tsv")
table
Locus | Region | Ratio |
---|---|---|
NP_003077 | Con | 2.5386 |
NP_004893 | Con | 121351.4264 |
NP_005079 | Con | 9516594.9789 |
NP_005500 | NonCon | 0.0000 |
NP_055852 | NonCon | 10933217.7090 |
5 rows x 3 columns
Note
The known filename suffixes for reading are .csv
, .tsv
and .pkl
or .pickle
(Python’s pickle format).
Note
If you invoke the static column types argument, i.e.``load_table(…, static_column_types=True)`` and the column data are not static, those columns will be left as a string type.
Loading from a url¶
The cogent3
load functions support loading from a url. We load the above .tsv
file directly from GitHub.
from cogent3 import load_table
table = load_table("https://raw.githubusercontent.com/cogent3/cogent3/develop/doc/data/stats.tsv")
Loading delimited specifying the format¶
Although unnecessary in this case, it’s possible to override the suffix by specifying the delimiter using the sep
argument.
from cogent3 import load_table
table = load_table("data/stats.tsv", sep="\t")
table
Locus | Region | Ratio |
---|---|---|
NP_003077 | Con | 2.5386 |
NP_004893 | Con | 121351.4264 |
NP_005079 | Con | 9516594.9789 |
NP_005500 | NonCon | 0.0000 |
NP_055852 | NonCon | 10933217.7090 |
5 rows x 3 columns
Loading delimited data without a header line¶
To create a table from the follow examples, you specify your header and use make_table()
.
Using load_delimited()
¶
This is just a standard parsing function which does not do any filtering or converting elements to non-string types.
from cogent3.parse.table import load_delimited
header, rows, title, legend = load_delimited("data/CerebellumDukeDNaseSeq.pk", header=False, sep="\t")
rows[:4]
[['chr1',
'29214',
'29566',
'chr1.1',
'626',
'.',
'0.0724',
'3.9',
'-1',
'159'],
['chr1',
'89933',
'90118',
'chr1.2',
'511',
'.',
'0.0313',
'1.59',
'-1',
'94'],
['chr1',
'545979',
'546193',
'chr1.3',
'543',
'.',
'0.0428',
'2.23',
'-1',
'100'],
['chr1',
'713797',
'714639',
'chr1.4',
'1000',
'.',
'0.3215',
'16.0',
'-1',
'380']]
Using FilteringParser
¶
from cogent3.parse.table import FilteringParser
reader = FilteringParser(with_header=False, sep="\t")
rows = list(reader("data/CerebellumDukeDNaseSeq.pk"))
rows[:4]
[['chr1',
'29214',
'29566',
'chr1.1',
'626',
'.',
'0.0724',
'3.9',
'-1',
'159'],
['chr1',
'89933',
'90118',
'chr1.2',
'511',
'.',
'0.0313',
'1.59',
'-1',
'94'],
['chr1',
'545979',
'546193',
'chr1.3',
'543',
'.',
'0.0428',
'2.23',
'-1',
'100'],
['chr1',
'713797',
'714639',
'chr1.4',
'1000',
'.',
'0.3215',
'16.0',
'-1',
'380']]
Selectively loading parts of a big file¶
Loading a set number of lines from a file¶
The limit
argument specifies the number of lines to read.
from cogent3 import load_table
table = load_table("data/stats.tsv", limit=2)
table
Locus | Region | Ratio |
---|---|---|
NP_003077 | Con | 2.5386 |
NP_004893 | Con | 121351.4264 |
2 rows x 3 columns
Loading only some rows¶
If you only want a subset of the contents of a file, use the FilteringParser
. This allows skipping certain lines by using a callback function. We illustrate this with stats.tsv
, skipping any rows with "Ratio"
> 10.
from cogent3.parse.table import FilteringParser
reader = FilteringParser(
lambda line: float(line[2]) <= 10, with_header=True, sep="\t"
)
table = load_table("data/stats.tsv", reader=reader, digits=1)
table
Locus | Region | Ratio |
---|---|---|
NP_003077 | Con | 2.5 |
NP_005500 | NonCon | 0.0 |
2 rows x 3 columns
You can also negate
a condition, which is useful if the condition is complex. In this example, it means keep the rows for which Ratio > 10
.
reader = FilteringParser(
lambda line: float(line[2]) <= 10, with_header=True, sep="\t", negate=True
)
table = load_table("data/stats.tsv", reader=reader, digits=1)
table
Locus | Region | Ratio |
---|---|---|
NP_004893 | Con | 121351.4 |
NP_005079 | Con | 9516595.0 |
NP_055852 | NonCon | 10933217.7 |
3 rows x 3 columns
Loading only some columns¶
Specify the columns by their names.
from cogent3.parse.table import FilteringParser
reader = FilteringParser(columns=["Locus", "Ratio"], with_header=True, sep="\t")
table = load_table("data/stats.tsv", reader=reader)
table
Locus | Ratio |
---|---|
NP_003077 | 2.5386 |
NP_004893 | 121351.4264 |
NP_005079 | 9516594.9789 |
NP_005500 | 0.0000 |
NP_055852 | 10933217.7090 |
5 rows x 2 columns
Or, by their index.
from cogent3.parse.table import FilteringParser
reader = FilteringParser(columns=[0, -1], with_header=True, sep="\t")
table = load_table("data/stats.tsv", reader=reader)
table
Locus | Ratio |
---|---|
NP_003077 | 2.5386 |
NP_004893 | 121351.4264 |
NP_005079 | 9516594.9789 |
NP_005500 | 0.0000 |
NP_055852 | 10933217.7090 |
5 rows x 2 columns
Note
The negate
argument does not affect the columns evaluated.
Load raw data as a list of lists of strings¶
We just use FilteringParser
.
from cogent3.parse.table import FilteringParser
reader = FilteringParser(with_header=True, sep="\t")
data = list(reader("data/stats.tsv"))
We just display the first two lines.
data[:2]
[['Locus', 'Region', 'Ratio'], ['NP_003077', 'Con', '2.5386013224378985']]
Note
The individual elements are all str
.
Make a table from header and rows¶
from cogent3 import make_table
header = ["A", "B", "C"]
rows = [range(3), range(3, 6), range(6, 9), range(9, 12)]
table = make_table(header=["A", "B", "C"], data=rows)
table
A | B | C |
---|---|---|
0 | 1 | 2 |
3 | 4 | 5 |
6 | 7 | 8 |
9 | 10 | 11 |
4 rows x 3 columns
Make a table from a dict
¶
For a dict
with key’s as column headers.
from cogent3 import make_table
data = dict(A=[0, 3, 6], B=[1, 4, 7], C=[2, 5, 8])
table = make_table(data=data)
table
A | B | C |
---|---|---|
0 | 1 | 2 |
3 | 4 | 5 |
6 | 7 | 8 |
3 rows x 3 columns
Specify the column order when creating from a dict
.¶
table = make_table(header=["C", "A", "B"], data=data)
table
C | A | B |
---|---|---|
2 | 0 | 1 |
5 | 3 | 4 |
8 | 6 | 7 |
3 rows x 3 columns
Create the table with an index¶
A Table
can be indexed like a dict if you designate a column as the index (and that column has a unique value for every row).
table = load_table("data/stats.tsv", index_name="Locus")
table["NP_055852"]
Locus | Region | Ratio |
---|---|---|
NP_055852 | NonCon | 10933217.7090 |
1 rows x 3 columns
table["NP_055852", "Region"]
'NonCon'
Note
The index_name
argument also applies when using make_table()
.
Create a table from a pandas.DataFrame
¶
from pandas import DataFrame
from cogent3 import make_table
data = dict(a=[0, 3], b=["a", "c"])
df = DataFrame(data=data)
table = make_table(data_frame=df)
table
a | b |
---|---|
0 | a |
3 | c |
2 rows x 2 columns
Create a table from header and rows¶
from cogent3 import make_table
table = make_table(header=["a", "b"], data=[[0, "a"], [3, "c"]])
table
a | b |
---|---|
0 | a |
3 | c |
2 rows x 2 columns
Create a table from dict¶
make_table()
is the utility function for creating Table
objects from standard python objects.
from cogent3 import make_table
data = dict(a=[0, 3], b=["a", "c"])
table = make_table(data=data)
table
a | b |
---|---|
0 | a |
3 | c |
2 rows x 2 columns
Create a table from a 2D dict¶
from cogent3 import make_table
d2D = {
"edge.parent": {
"NineBande": "root",
"edge.1": "root",
"DogFaced": "root",
"Human": "edge.0",
},
"x": {
"NineBande": 1.0,
"edge.1": 1.0,
"DogFaced": 1.0,
"Human": 1.0,
},
"length": {
"NineBande": 4.0,
"edge.1": 4.0,
"DogFaced": 4.0,
"Human": 4.0,
},
}
table = make_table(
data=d2D,
)
table
edge.parent | x | length |
---|---|---|
root | 1.0000 | 4.0000 |
root | 1.0000 | 4.0000 |
root | 1.0000 | 4.0000 |
edge.0 | 1.0000 | 4.0000 |
4 rows x 3 columns
Create a table that has complex python objects as elements¶
from cogent3 import make_table
table = make_table(
header=["abcd", "data"],
data=[[range(1, 6), "0"], ["x", 5.0], ["y", None]],
missing_data="*",
digits=1,
)
table
abcd | data |
---|---|
range(1, 6) | 0 |
x | 5.0 |
y | None |
3 rows x 2 columns
Create an empty table¶
from cogent3 import make_table
table = make_table()
table
0 rows x 0 columns